Well that is interesting but why should that happen in case I am using a different User Agent because I tried doing status=rp.can_fetch('Mozilla/5.0', " http://en.wikipedia.org/wiki/Sachin_Tendulkar") but even that returns false Is there something wrong with the syntax , Is there a catch that i don't understand. On Thu, Jan 22, 2009 at 10:45 PM, Andre Engels <andreeng...@gmail.com>wrote:
> On Thu, Jan 22, 2009 at 6:08 PM, amit sethi <amit.pureene...@gmail.com> > wrote: > > hi , I need help as to how i can fetch a wikipedia article i tried > changing > > my user agent but it did not work . Although as far as my knowledge of > > robots.txt goes , looking at en.wikipedia.org/robots.txt it does not > seem it > > should block a useragent (*, which is what i would normally use) from > > accesing a simple article like say > > "http://en.wikipedia.org/wiki/Sachin_Tendulkar" but still robotparser > > returns false > > status=rp.can_fetch("*", "http://en.wikipedia.org/wiki/Sachin_Tendulkar > ") > > where rp is a robot parser object . why is that? > > Yes, Wikipedia is blocking the Python default user agent. This was > done to block the main internal bot in its early days (it was > misbehaving by getting each page twice); when it got to allowing the > bot again, it had already changed to having its own user agent string, > and apparently it was not deemed necessary to unblock the user > string... > > > > > -- > André Engels, andreeng...@gmail.com > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > -- A-M-I-T S|S
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor