Re: [Tutor] Strategy to read a redirecting html page

ian douglas Tue, 31 May 2011 16:41:57 -0700


On 05/31/2011 04:34 PM, Hugo Arts wrote:

On Wed, Jun 1, 2011 at 1:00 AM, Karim<karim.liat...@free.fr>  wrote:

Hello,

I am having issue in reading a html page which is redirected to a new page.
I get the first warning/error message page and not the redirection one.
Should I request a second time the same url page or Should I loop forever
until the
page content is the correct (by parsing it) one?
Do you have a better strategy or perhaps some modules deal w/ that issue?
I am using python 2.7.1 on Linux ubuntu 11.04 and the modules urllib2,
urllib, etc...
The webpage is secured but I registered a password manager.

urllib2 works at the HTTP level, so it can't catch redirects that
happen at the HTML level unfortunately. You'll have to parse the page,
look for a<meta http-equiv="refresh" tag, and fetch the URL from it.
That's a pretty simple parsing job, probably doable with regexes. But
you're free to use a proper html parser of course.

Also, given that the 301/302 redirect you get in that response couldALSO redirect, I'd suggest looping until a counter is exhausted, so youdon't end up in an infinite loop if pages redirect to each other.


-id

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Strategy to read a redirecting html page

Reply via email to