Re: [htdig3-dev] Summary and patch for robots.txt

Geoff Hutchison Tue, 8 Feb 2000 15:17:50 -0800
At 9:55 PM +0200 2/8/00, Valdas Andrulis wrote:
>So there is the fix(i thinks this code was thought this way, common
>error with if else):
>
>--- htlib/HtRegex.cc.old        Tue Feb  8 21:31:40 2000
>+++ htlib/HtRegex.cc    Tue Feb  8 21:32:21 2000
>@@ -39,11 +39,15 @@
>         if (str == NULL) return;
>         if (strlen(str) <= 0) return;
>         if (!case_sensitive)
>+       {
>           if (regcomp(&re, str, REG_EXTENDED|REG_ICASE) == 0)
>                 compiled = 1;
>+       }
>         else
>+       {
>           if (regcomp(&re, str, REG_EXTENDED) == 0)
>                 compiled = 1;
>+       }
>  }
>
>  void

Whoops! This is a good bug-fix. This is probably going to cause a 
number of problems with things like exclude_urls and limit_urls_to as 
well.

As for the robots.tx, I think we want to stick to the first matching 
section. Any matching section overrides the *, but I think Gilles's 
code is what we want.

i.e.

User-agent: *
Disallow: *
User-agent: htdig
Disallow: /cgi-bin/
Disallow: /cat/

I think this is the typical (and expected) format. If Loic's search 
turns up some interesting examples of other formats, we may want to 
consider a more liberal parser. I think we probably want to consider 
an Allow section, but it would be a bit tricky.

-Geoff

P.S. I'm currently quite swamped, so I will probably not be 
responding to much discussion--I don't want to rush off a response 
and stick my foot in my mouth!


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this.
Re: [htdig3-dev] Summary and patch for robots.txt

Reply via email to