Re: [Tutor] can I walk or glob a website?

Marc Tompkins Wed, 18 May 2011 10:56:10 -0700

On Wed, May 18, 2011 at 2:06 AM, Albert-Jan Roskam <fo...@yahoo.com> wrote:


> Hello,
>
> How can I walk (as in os.walk) or glob a website? I want to download all
> the pdfs from a website (using urllib.urlretrieve), extract certain figures
> (using pypdf- is this flexible enough?) and make some statistics/graphs from
> those figures (using rpy and R). I forgot what the process of 'automatically
> downloading' is called again, something that sounds like 'whacking' (??)
>
>
I think the word you're looking for is "scraping".

I actually did something (roughly) similar a few years ago, to download a
collection of free Russian audiobooks for my father-in-law (an avid reader
who was quickly going blind.)

I crawled the site looking for .mp3 files, then returned a tree from which I
could select files to be downloaded.  It's horribly crude, in retrospect,
and I'm embarrassed re-reading my code - but if you're interested I can
forward it (if only as an example of what _not_to do.)

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] can I walk or glob a website?

Reply via email to