alright mate, I've never used php or perl to trawler through a site like that before, although I'm sure it is possible, what with have done is made a perl script to create static html pages with a template text file and inserting the relevant information on the page from the database, I'm guessing you are using a database to make the dynamic pages anyway, so it is simply a case of replacing some text in a standard file and writing it to a page name you decide.
so example code is below (Some of the text has been replaced): #!/usr/bin/perl #use strict; use DBI; my $fileName; my $title; my $body; my $desc; my $keywords; my $htmlCode; my $htmlCodeNew; my $dbh; my $sth; my $otbl; my $otblt; my $ctbl; my $dateser; my $day; my $daytype; my $month; my $siteDir; $siteDir = "the directory on the server your site is stored"; ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time); $day = (Sunday,Monday,Tuesday,Wednesday,Thursday,Friday,Saturday)[$wday]; if ($mday == 1) {$daytype = "st"} elsif ($mday == 2) {$daytype = "nd"} elsif ($mday == 3) {$daytype = "rd"} else {$daytype = "th"} $year = ($year + 1900); $month = (January,February,March,April,May,June,July,August,September,October,Novembe r,December)[$mon]; $fileName = ''; $title = ''; $body = ''; $htmlCode = ''; $htmlCodeNew = ''; $dateser = "$day, $mday$daytype $month $year"; $otbl = "<TR><TD width=10></TD><TD>"; $otblt = "<TR><TD width=10></TD><TD class=title2>"; $ctbl = "</TD><TD></TD></TR>\n"; open (standardHTML, "<text/page.txt") or die "Can't Open Standard HTML File\r\n"; ($dev, $ino, $mode, $nLink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks) = stat standardHTML; read(standardHTML, $htmlCode, $size); close(standardHTML); $dbh = DBI->connect ( "dbi:Pg:dbname=Welcome-SW", "username", "password"); if ( !defined $dbh ) { die "Cannot connect to database!\n"; } $sth = $dbh->prepare("SELECT * FROM companies"); if ( !defined $sth ) { die "Cannot prepare statement: $DBI::errstr\n"; } $sth->execute; my($id, $name, $address1, $address2, $town, $pcode, $county, $phone, $fax, $domain, $email, $contact, $description, $type, $level); $sth->bind_columns(\$id, \$name, \$address1, \$address2, \$town, \$pcode, \$county, \$phone, \$fax, \$domain, \$email, \$contact, \$description, \$type, \$level); while( $sth->fetch() ) { $htmlCodeNew = ''; $body = '<BR>'; $title = ''; $fileName = ''; $keywords = ''; $desc = ''; $fileName = $siteDir."places/$id.htm"; $title = "$name, $city, $county, & site title"; if (open (Picture, "<$siteDir/images/places/$id.jpg")) { close(Picture); $body = $body."$otblt $name</TD><TD rowspan=\"6\"><IMG SRC=\"/images/places/$id.jpg\" WIDTH=200 HEIGHT=140 BORDER=0 ALT=\"$name\" ALIGN=RIGHT></TD></TR>\n"; } else { $body = $body."$otblt $name $ctbl"; } if ($address1 ne '') { $body = $body."$otbl $address1<BR>\n"; } else { $body = $body."$otbl\n"; } if ($address2 ne '') { $body = $body."$address2<BR>\n"; } if ($city ne '') { $body = $body."$city<BR>\n"; } if ($pcode ne '') { $body = $body."$pcode\n"; $body = $body." \(<A HREF=\"http://uk.multimap.com/p/browse.cgi?pc=".$pcode."\&scale=10000\" target=\"_blank\">MAP</A>\)\n<BR>\n"; } if ($county ne '') { $body = $body."$county"; } if ($address1 ne '') {$body = $body."$ctbl $otbl $ctbl"; } if ($phone ne '') { $body = $body."$otbl Tel: $phone $ctbl"; } if ($fax ne '') { $body = $body."$otbl Fax: $fax $ctbl"; } if ($email ne '') { $body = $body."$otbl <SCRIPT>\nfunction email()\n{\nvar win2 = window.open(\"/cgi-bin/mail.cgi?id=$id\", \"second\", \"resizable=no,height=270,width=450\");\n}\n</SCRIPT>\n<IMG SRC=\"/images/email.gif\" onClick=\"email()\"> $ctbl"; } if ($domain ne '') { $body = $body."$otbl <A href=\"/cgi-bin/site.cgi?id=$id\" target=\"_blank\"><IMG SRC=\"/images/website.gif\" border=\"0\"></a> $ctbl"; } $body = $body."$otbl <SCRIPT>\nfunction comm()\n{\nvar win2 = window.open(\"/cgi-bin/comm.cgi?id=$id\", \"second\", \"resizable=no,height=270,width=450\");\n}\n</SCRIPT>\n<IMG SRC=\"/images/comment.gif\" onClick=\"comm()\"> $ctbl"; if ($description ne '') { $body = $body."$otbl $description $ctbl"; } $sth2 = $dbh->prepare("SELECT commauthorname, commcomment, commrating, commdate FROM comments WHERE compid='$id' and commApproved='1'"); if ( !defined $sth2 ) { die "Cannot prepare statement: $DBI::errstr\n"; } $sth2->execute; my($auth_name, $comment, $rating, $commdate); $sth2->bind_columns(\$auth_name, \$comment, \$rating, \$commdate); while( $sth2->fetch() ) { $commdate =~ /^([0-9]{4})-([0-9]{2})-([0-9]{2})/; $commmonth = (0,January,Februrary,March,April,May,June,July,August,September,October,Nove mber,December)[$2]; $commyear = $1; if ($3 == '01') {$cdaytype = "st";} elsif ($3 == '02') {$cdaytype = "nd";} elsif ($3 == '03') {$cdaytype = "rd";} else {$cdaytype = "th";} $commdateser = "$3$cdaytype $commmonth $commyear"; $comment =~ s/\. /\.\r<BR>\n/g ; $comment =~ s/\, /\,\r<BR>\n/g ; $comment =~ s/\! /\!\r<BR>\n/g ; $comment =~ s/\? /\?\r<BR>\n/g ; $comment = "\n".$comment."\n"; if ($auth_name eq '') {$auth_name = 'Anonymous';} $body = $body.$otbl.'<IMG SRC="/images/'.$rating.'stars.jpg" ALT="">'.$ctbl; $body = $body."<TR><TD width=10></TD><TD>$comment$ctbl"; $body = $body."$otbl Submitted by: $auth_name<br>Date: $commdateser"; $body = $body."$otbl <IMG SRC=\"/images/dot_cl.gif\" WIDTH=1 HEIGHT=4 HSPACE=0 VSPACE=4 BORDER=0 ALT=\"\"> $ctbl"; } $sth2->finish; $desc = "$name, $city, $county, & site title"; $keywords = "$name, $city, $county, & site title"; $htmlCodeNew = $htmlCode; $htmlCodeNew =~ s/!title!/\r\n$title\r\n/ ; $htmlCodeNew =~ s/!desc!/\r\n$desc\r\n/ ; $htmlCodeNew =~ s/!keywords!/\r\n$keywords\r\n/ ; $htmlCodeNew =~ s/!date!/\r\n$dateser\r\n/ ; $htmlCodeNew =~ s/!body!/\r\n$body\r\n/ ; open (FileToWrite, ">$fileName") or die "Can't Create/Open File\r\n"; print FileToWrite "$htmlCodeNew"; close(FileToWrite); } $sth->finish; $dbh->disconnect(); Don't know if that will help at all, and this is just one of the scripts we use to build some pages. > > hi mate i am interested in finding example trawlers in either php > or perl , > is mod_fast_cgi a better way to do this ? is there a way of > trawling through > each page creating a html version of it and also converting the links to > other pages and the other pages too ? > > "Mark Cubitt" <[EMAIL PROTECTED]> wrote in message > news:<[EMAIL PROTECTED]>... > > Are all the news stories in a database? > > > > If so we are current in devolopment of a simaliar site (although not > news), > > we are using Perl scripts to build all the HTML pages (both Index's and > > Pages) during the evening (4am), it takes about 8 secs to build > 2500 pages > > and the Perl scripts could be a lot more efficiant, we are > using PHP todo > > all the admin and stat areas of the site. > > > > Static HTML pages not only increase the speed but they also improve SEO > > (Search Engine Optimasion) if handled correctly. > > > > Don't know if this will help you but maybe it is a an idea. > > > > Also i'm sorry if this answer takes this a little off topic. > > > > regards > > > > Mark Cubitt > > > > > -----Original Message----- > > > From: electroteque [mailto:[EMAIL PROTECTED] > > > Sent: 24 February 2003 11:13 > > > To: [EMAIL PROTECTED] > > > Subject: Re: [PHP] creating flat versions of php pages > > > > > > > > > i dont understand what you mean , anyway, this is a short term > > > situation as > > > the site is pretty slow and cpu intensive, the setup is a sun solaris > box > > > running apache 1.3 and php3 , just had a meeting today to push > > > through apahe > > > 2 and php 4.3 ut wont happen for a while , we need a solution in the > next > > > month, we basically need a system that will trawl the site or > the latest > > > news and flatten them so they are less taxing on the server as > > > our last time > > > cost us dearly , 170,000 hits in one day ! > > > so would flattening to html solve our problem or is the server > > > going to get > > > hammered anyway ? > > > > > > "Gonzo" <[EMAIL PROTECTED]> wrote in message > > > news:[EMAIL PROTECTED] > > > > > hi guys we have 2 very high traffic sites and we need > > > > > to be able to make these pages load faster , we are > > > > > coming up with a solution to make flat file versions of > > > > > the sites in the short term before we start using > > > > > caching technologhies is this the right way to go about > > > > > this ? > > > > > > > > I think anyone trying to answer this would need to know > > > > more about your setup. Caching is not necessarily the > > > > answer if a dynamic page is dynamic enough every time to > > > > be different. (Please keep this on-list) > > > > > > > > > also whats is the best way to parse a php page > > > > > ie foo.php?id=1, foo.php?id=2 with the contents of each > > > > > story to a file and update all the links. maybe with > > > > > naming conventions like foo/1.html, foo/2.html in a mod > > > > > rewrite style > > > > > > > > I like using Variables from URI: > > > > > > > > http://nirvani.org/software/variables_from_uri/ > > > > > > > > Gonzo > > > > > > > > > > > > > > > > > > > > -- > > > PHP General Mailing List (http://www.php.net/) > > > To unsubscribe, visit: http://www.php.net/unsub.php > > > > > > > > > > > > ****************************************************************** > > > ************ > > > This email has been virus checked by the Eclipse Internet > > > MAILsafe service > > > ****************************************************************** > > > ************ > > > www: http://www.eclipse.net.uk/ email: > > > [EMAIL PROTECTED] > > > ****************************************************************** > > > ************ > > > > > > > > > > > > > ****************************************************************** > ************ > This email has been virus checked by the Eclipse Internet > MAILsafe service > ****************************************************************** > ************ > www: http://www.eclipse.net.uk/ email: > [EMAIL PROTECTED] > ****************************************************************** > ************ > > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php