Kevin Zembower wrote:
>
> I'm trying to do a quick-n-dirty (well, I've been at work on it three
> hours now) analysis of Apache web logs. I'm trying to count the number
> of records from robots or spiders. For my purposes, a robot or spider is
> a request from either an unresolved IP address, or one that has "bot",
> "spider", "crawl" or "search" in it's resolved domain name. I don't
> count at all requests that come from my LAN (172.16.0.0/16) or domain
> (jhuccp.org). My program so far is this:
> #!/usr/local/bin/perl -w
> my ($robotcount, $totalcount) = 0;
> while (<>) {
> next if /^172\.16/;
> next if /^.*?jhuccp\.org +?/;
> $totalcount++;
> if
> (/^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|.*(bot|crawl|spider|search).*?)
> ..*$/) {
> print;
> $robotcount++;
> }
> }
> print "Robot count is $robotcount\tTotal count is $totalcount\t Ratio
> is " . $robotcount/$totalcount . "\n";
>
> This correctly picks up the numerical IP addresses, but also matches
> records like this:
> dup-200-66-146-45.prodigy.net.mx - - [30/Jun/2002:00:03:50 -0400] "GET
> /prs/sj41/sj41chap1_3.stm HTTP/1.1" 200 9379
>
>"http://search.t1msn.com.mx/results.asp?q=relaci%C3%B3n+sexual&origq=yahoo&FORM=IE4&v=1&cfg=SMCSP&nosp=0&thr=&submitbutton.x=39&submitbutton.y=12"
> "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
>
> Here, the word "search" is in the referrer field.
>
> How do I tell it to search only up to the first space character? I
> think I can do it by defining a second variable that is just the part of
> the record up to the first space, and matching on that. But, is there a
> another way, probably using the 'minimizing' quantifiers?
if
(/^(?:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|\S*(?:bot|crawl|spider|search)\S*)\s/)
{
John
--
use Perl;
program
fulfillment
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]