On 05/31/2011 04:34 PM, Hugo Arts wrote:
On Wed, Jun 1, 2011 at 1:00 AM, Karim<karim.liat...@free.fr> wrote:
Hello,
I am having issue in reading a html page which is redirected to a new page.
I get the first warning/error message page and not the redirection one.
Should I request a second time the same url page or Should I loop forever
until the
page content is the correct (by parsing it) one?
Do you have a better strategy or perhaps some modules deal w/ that issue?
I am using python 2.7.1 on Linux ubuntu 11.04 and the modules urllib2,
urllib, etc...
The webpage is secured but I registered a password manager.
urllib2 works at the HTTP level, so it can't catch redirects that
happen at the HTML level unfortunately. You'll have to parse the page,
look for a<meta http-equiv="refresh" tag, and fetch the URL from it.
That's a pretty simple parsing job, probably doable with regexes. But
you're free to use a proper html parser of course.
Also, given that the 301/302 redirect you get in that response could
ALSO redirect, I'd suggest looping until a counter is exhausted, so you
don't end up in an infinite loop if pages redirect to each other.
-id
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor