Rem, what OS are you trying this on? Windows XP SP2 has a limit of
around 40 tcp connections per second...
Remarkable wrote:
> Hello all
>
> I am trying to write a reliable web-crawler. I tried to write my own
> using recursion and found I quickly hit the "too many sockets" open
> problem. So I lo
Alex Martelli wrote:
> S Borg <[EMAIL PROTECTED]> wrote:
>
>
>> Hello,
>>
>> I have been writing very simple Python programs that parse HTML and
>>such, mainly just to get
>>a better feel for the language. Here is my question: If I parsed an
>>HTML page into all of the image
>>files listed on tha
Use BeautifulSoup to get all the image tags out of the html.
You'll need to join the urls of the images to the url of the page
(urlparse.urljoin off the top of my head). If you look at BeautifulSoup
you will see how to get the 'src' reference of each image tag.
All the best,
Fuzzyman
http://www.
S Borg wrote:
> Hello,
>
> I have been writing very simple Python programs that parse HTML and
> such, mainly just to get
> a better feel for the language. Here is my question: If I parsed an
> HTML page into all of the image
> files listed on that page, how could I request all of those images an
S Borg <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I have been writing very simple Python programs that parse HTML and
> such, mainly just to get
> a better feel for the language. Here is my question: If I parsed an
> HTML page into all of the image
> files listed on that page, how could I request