[issue35457] robotparser reads empty robots.txt file as "all denied"

2018-12-11 Thread larsfuse


New submission from larsfuse :

The standard (http://www.robotstxt.org/robotstxt.html) says:

> To allow all robots complete access:
> User-agent: *
> Disallow:
> (or just create an empty "/robots.txt" file, or don't use one at all)

Here I give python an empty file:
$ curl http://10.223.68.186/robots.txt
$

Code:

rp = robotparser.RobotFileParser()
print (robotsurl)
rp.set_url(robotsurl)
rp.read()
print( "fetch /", rp.can_fetch(useragent = "*", url = "/"))
print( "fetch /admin", rp.can_fetch(useragent = "*", url = "/admin"))

Result:

$ ./test.py
http://10.223.68.186/robots.txt
('fetch /', False)
('fetch /admin', False)

And the result is, robotparser thinks the site is blocked.

--
components: Library (Lib)
messages: 331595
nosy: larsfuse
priority: normal
severity: normal
status: open
title: robotparser reads empty robots.txt file as "all denied"
type: behavior
versions: Python 2.7

___
Python tracker 
<https://bugs.python.org/issue35457>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35457] robotparser reads empty robots.txt file as "all denied"

2018-12-17 Thread larsfuse


larsfuse  added the comment:

> (...) refers users, for file structure, to 
> http://www.robotstxt.org/orig.html. This says nothing about the effect of an 
> empty file, so I don't see this as a bug.

That is incorrect. From that url you can find:
> The presence of an empty "/robots.txt" file has no explicit associated 
> semantics, it will be treated as if it was not present, i.e. all robots will 
> consider themselves welcome.

So this is definitely a bug.

--

___
Python tracker 
<https://bugs.python.org/issue35457>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com