Hi,
I found 15 robots.txt out of 600 using the Allow attribute. These
are
robots.txt.146
robots.txt.169
robots.txt.209
robots.txt.276
robots.txt.293
robots.txt.321
robots.txt.384
robots.txt.404
robots.txt.412
robots.txt.498
robots.txt.52
robots.txt.53
robots.txt.61
robots.txt.628
robots.txt.82
in http://www.senga.org/htdig/robots/.
And 2 using User-Agent with something different from *. The
first one is interesting since it suggests how it's
implemented by harvest (unless the author of the robots.txt is mistaken :-).
# robots.txt for www.carleton.ca
User-Agent: CULibraryHTDig
Allow: /~ssdata
Disallow: /
User-Agent: harvest
User-Agent: Harvest/1.5.19
Disallow: /rrdr
Disallow: /tlrc
Disallow: /lyris
Disallow: /cgi-bin
Disallow: /experts
Disallow: /gallery
Disallow: /bookstore
Disallow: /stats.html
Disallow: /duc/events
Disallow: /cu/directories
Disallow: /CCS/docs/matlab
Disallow: /ccs/docs/matlab
# robots.txt for ?
User-Agent: *
Disallow: /requisition
Disallow: /CGI
Disallow: /cgi-bin
Disallow: /STAT
User-Agent: Linkbot
Disallow: /requisition
Disallow: /CGI
Disallow: /cgi-bin
Disallow: /STAT
User-Agent: Roverbot
Disallow: /
--
Loic Dachary
24 av Secretan
75019 Paris
Tel: 33 1 42 45 09 16
e-mail: [EMAIL PROTECTED]
URL: http://www.senga.org/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.