Perl and CGI

 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me



Go Back   Tutorialized ForumsWeb Design & DevelopmentPerl and CGI

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Tutorialized Forums Sponsor:
  #1  
Old February 23rd, 2009, 05:21 PM
sheff sheff is offline
Registered User
Tutorialized Newbie (0 - 499 posts)
 
Join Date: Jan 2009
Posts: 2 sheff User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 32 m 10 sec
Reputation Power: 0
Page Ripper script

Hello
I'm using an old cgi script called pageripper. It of course takes part of a page from a website and places the specified data on my site. BUT the outbound links don't seem to work. The url shows my domain instaed of the intended taget url. I have included the script so you can see and perhaps offer some input:
---------
Code:
#!/usr/bin/perl
##################################################  ###########################
# Date          : 25/05/2001
# Author        : ACDesigns
# Program       : PageRipper
# Version       : 1.50.0000
# Copyright     : http://www.pageripper.com/
# Variables     :
#       
#
# DO NOT MODIFY THIS UTILITY.
# CHANGE THE CHMOD AND CALL THE SCRIPT
#
# COPYRIGHT
#
#
# To use the SSI call the script by placing the following HTML on your
# webpage      <!--#exec cgi="cgi-bin/pageripper.cgi"-->   
# Be sure to CHMOD script to 755 for script to work
#
#
#
# MAKE SURE THE PERL PATH ABOVE IS CORRECT OR IT WON'T WORK
# MUST HAVE LWP:USERAGENT INSTALLED
#
#
#=========By Installing This Script You agree to the terms, and disclaimer
#=========in the Readme.txt file along as on http://www.pageripper.com/faq.htm
#                                                                            
# PageRipper may be modified free of charge by anyone so long as this
# copyright message and the above header remains intact. pageripper.cjb.net
# does not take any liability that might arise from the script.
##################################################  ###########################

 require LWP::UserAgent;

# read command prompt
read(STDIN, $commandprompt, $ENV{'CONTENT_LENGTH'});

# Split the name-value pairs
 @theactualdata = split(/&/, $commandprompt);

 $newcommandprompt = '';

 foreach $actualdata (@theactualdata) {
   ($name, $value) = split(/=/, $actualdata);
   if ($name ne 'page')
      { $newcommandprompt = $newcommandprompt . $actualdata } ;

# change all the + to blank spaces
   $name  =~ tr/+/ /;
   $value =~ tr/+/ /;

# convert %20 to ' ' (a blank space), HEX - character conversion
   $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

# convert <  > to '' (NULL)
   $value =~ s/<([^>]|\n)*>//g;

# eliminate the less than and greater than signs
   $value =~ s/<//g;
   $value =~ s/>//g;
   $URL{$name} = $value;

 }

print "Content-type: text/html\n\n";

# store my url

$url            = $URL{'url'};


#=================================================  ==============================
#  ####   VARIABLES THAT MUST BE CHANGED FOR SCRIPT TO WORK   ####
#=================================================  ==============================

# 1 - MUST CHANGE - Change The URL below to the page you wish to begin extracting from

$url = 'http://www.variety.com';




# 2 - Starting Point to extract from (must be exactly the way it is on the page)

$start = '<h3>News:</h3>';





# 3 - Ending HTML

$finish = '<!-- end news -->';





# 4 - If links don't work change the HTML below to reflect the link source (remove the #'s)

  $sourcelink = '<a href="http://www.variety.com/';
  $newlink = '<a href="http://www.variety.com/';




# 5 - If images don't work change the html below to reflect the image source (remove the #'s)

# $sourceimage = 'img src=\"';
# $newimage = 'img src=\"';





#=================================================  ==============================
#  DO NOT CHANGE ANYTHING BELOW THIS LINE.  SCRIPT WILL NOT WORK.  
#=================================================  ==============================






$start =~ s/ /\\s\+/;
$start =~ s/\//\\\//;
$finish =~s/ /\\s\+/;
$finish =~ s/\//\\\//;

# initialise the mem vars
$t              = '';
$content        = '';
$contenttype    = '';
$body           = '';
$head           = '';
$title          = '';
$keywords       = '';
$description    = '';
$error          = 0 ;
$htmlcode       = 0 ;

# assign the browser agent
$agent ='Mozilla/9.0 (Windows 98;US) Opera 4.00 [en]'; 

# print the HTML header
print "<HTML><HEAD></HEAD><BODY>";

# has the user defined the URL
if (!$url)
 { print "Enter the URL\n"; }
else
{
# EXTRACT the page         
   &getpage($url);

# if error grabbing the page then print the return CODE RFC 2068.
# HTTP 1.1 Protocol
   if ($error)
   {  print "Error Getting Page : $url Code : $htmlcode<br>"; }

# ERROR or NO ERROR print whatever is grabbed.
   $t = $content;

# convert newline or return to blank spaces globally
   $t =~ s/\r/\n/gi;
   $t =~ s/\n/ /gi;

# convert multiple blanks to a single blank globally
   $t =~ s/  / /gi;

# convert tab to single blank globally
   $t =~ s/\t/ /gi;
  
# get the meta tags, keywords, title, and descriptions
   &gettags($t);

# print the return results
    print "$stuff";



#   print "<tr><td bgcolor=\"#DDDDDD\">Body</td><td bgcolor=\"#CCCCCC\">$body</td></tr></table>";
}

# end HTML

#print "<div align='right'>";
#print "  <table border='0' cellpadding='0' cellspacing='0' width='100%'>";
#print "    <tr>";
#print "      <td width='100%' align='right'><font size='1'>The Webs My Stage";
#print "       </font></td>";
#print "    </tr>";
#print "  </table>";
print "</body></html>";



# GET THE META TAGS 
sub gettags {
        my $mmyurl = shift;

# store the content to $body variable
        $stuff = $mmyurl;



#----------------begin here



# get the page...
    $stuff = ($stuff =~s/$start(.+)$finish/ /i) ? $1 : 'No Title';



$stuff =~ s/$sourcelink/$newlink/ig;
$stuff =~ s/$newlink$newlink/$newlink/ig;



$stuff =~ s/$sourceimage/$newimage/ig;






#----------------end here
#=================================================  ========================
# DO NOT CHANGE ANYTHING BELOW.  SCRIPT WILL NOT WORK
#=================================================  ========================



}

#  GET THE PAGE
sub getpage {
        my $myurl = shift;
# $myurl stores the URL

# define the user agent
        my $ua = new LWP::UserAgent;

#        $cookies ="";

# define the timeout
        $ua->timeout(60);

# set the browser agent
        $ua->agent($agent);

# GET THE PAGE
        my $req = HTTP::Request->new(GET => $myurl);
        my $res = $ua->request($req);

# get the content of the page and the content type
        $content = $res->content;
        $contenttype = $res->content_type;

# if 200 success, then no error.
        if ($res->is_success) {
		$error =0;
                }

# get the HTML error code RFC 2068 and store in HTMLCODE var.
# dont forget to set the error flag.
        else {
                $htmlcode = $res->code();
                $error =1;
        }
}

Reply With Quote
Reply

Viewing: Tutorialized ForumsWeb Design & DevelopmentPerl and CGI > Page Ripper script


Developer Shed Advertisers and Affiliates


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 

Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.

© 2003-2018 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap