New submission from larsfuse :
The standard (http://www.robotstxt.org/robotstxt.html) says:
> To allow all robots complete access:
> User-agent: *
> Disallow:
> (or just create an empty "/robots.txt" file, or don't use one at all)
Here I give python an empty file:
$ curl http://10.223.68.186/robots.txt
$
Code:
rp = robotparser.RobotFileParser()
print (robotsurl)
rp.set_url(robotsurl)
rp.read()
print( "fetch /", rp.can_fetch(useragent = "*", url = "/"))
print( "fetch /admin", rp.can_fetch(useragent = "*", url = "/admin"))
Result:
$ ./test.py
http://10.223.68.186/robots.txt
('fetch /', False)
('fetch /admin', False)
And the result is, robotparser thinks the site is blocked.
--
components: Library (Lib)
messages: 331595
nosy: larsfuse
priority: normal
severity: normal
status: open
title: robotparser reads empty robots.txt file as "all denied"
type: behavior
versions: Python 2.7
___
Python tracker
<https://bugs.python.org/issue35457>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com