On Fri, Jan 23, 2009 at 11:25 AM, Andre Engels wrote:
> On Fri, Jan 23, 2009 at 10:37 AM, amit sethi
> wrote:
>> so is there a way around that problem ??
>
> Ok, I have done some checking around, and it seems that the Wikipedia
> server is giving a return code of 403 (forbidden), but still givin
"Kent Johnson" wrote
Rather than editing the existing code and making it non standard
why not subclass robotparser:
That won't work, it is urllib.URLOpener() that he is patching and
Sorry, yes I misread that post as modifying robotparser, it
should have been URLOpener.
But...
robotparse
On Fri, Jan 23, 2009 at 5:37 AM, Andre Engels wrote:
> Looking further I found that a 'cleaner' way to make the same change
> is to add to the code of URLopener (outside any method):
>
> version = ''
You can do this without modifying the standard library source, by
import urllib
urllib.URLop
On Fri, Jan 23, 2009 at 6:23 AM, Alan Gauld wrote:
> Rather than editing the existing code and making it non standard
> why not subclass robotparser:
>
> class WP_RobotParser(robotparser):
> def __init__(self, *args, *kwargs):
> robotparser.__init__(self, *args, *kwargs)
> self.
"Andre Engels" wrote
developers of Wikimedia why this is done, but for now you can
resolve
this by editing robotparser.py in the following way:
In the __init__ of the class URLopener, add the following at the
end:
self.addheaders = [header for header in self.addheaders if header[0]
!= "Us
On Fri, Jan 23, 2009 at 12:07 PM, amit sethi wrote:
> well thanks ... it worked well ... but robotparser is in urllib isn't there
> a module like robotparser in
> urllib2
You'll have to ask someone else about that part...
--
André Engels, andreeng...@gmail.com
_
well thanks ... it worked well ... but robotparser is in urllib isn't there
a module like robotparser in
urllib2
On Fri, Jan 23, 2009 at 3:55 PM, Andre Engels wrote:
> On Fri, Jan 23, 2009 at 10:37 AM, amit sethi
> wrote:
> > so is there a way around that problem ??
>
> Ok, I have done some che
On Fri, Jan 23, 2009 at 11:25 AM, Andre Engels wrote:
> In the __init__ of the class URLopener, add the following at the end:
>
> self.addheaders = [header for header in self.addheaders if header[0]
> != "User-Agent"] + [('User-Agent', '')]
>
> (probably
>
> self.addheaders = [('User-Agent', '')]
On Fri, Jan 23, 2009 at 10:37 AM, amit sethi wrote:
> so is there a way around that problem ??
Ok, I have done some checking around, and it seems that the Wikipedia
server is giving a return code of 403 (forbidden), but still giving
the page - which I think is weird behaviour. I will check with t
so is there a way around that problem ??
On Fri, Jan 23, 2009 at 2:25 PM, Andre Engels wrote:
> On Fri, Jan 23, 2009 at 9:09 AM, amit sethi
> wrote:
> > Well that is interesting but why should that happen in case I am using a
> > different User Agent because I tried doing
> > status=rp.can_fet
On Fri, Jan 23, 2009 at 9:09 AM, amit sethi wrote:
> Well that is interesting but why should that happen in case I am using a
> different User Agent because I tried doing
> status=rp.can_fetch('Mozilla/5.0',
> "http://en.wikipedia.org/wiki/Sachin_Tendulkar";)
> but even that returns false
> Is th
Well that is interesting but why should that happen in case I am using a
different User Agent because I tried doing
status=rp.can_fetch('Mozilla/5.0', "
http://en.wikipedia.org/wiki/Sachin_Tendulkar";)
but even that returns false
Is there something wrong with the syntax , Is there a catch that i d
On Thu, Jan 22, 2009 at 6:08 PM, amit sethi wrote:
> hi , I need help as to how i can fetch a wikipedia article i tried changing
> my user agent but it did not work . Although as far as my knowledge of
> robots.txt goes , looking at en.wikipedia.org/robots.txt it does not seem it
> should block a
13 matches
Mail list logo