On Tue, Jul 7, 2009 at 1:20 PM, David Kim<davidki...@gmail.com> wrote: > On Tue, Jul 7, 2009 at 7:26 AM, Kent Johnson<ken...@tds.net> wrote: >> >> curl works because it ignores the redirect to the ToS page, and the >> site is (astoundingly) dumb enough to serve the content with the >> redirect. You could make urllib2 behave the same way by defining a 302 >> handler that does nothing. > > Many thanks for the redirect pointer! I also found > http://diveintopython.org/http_web_services/redirects.html. Is the > handler class on this page what you mean by a handler that does > nothing? (It looks like it exposes the error code but still follows > the redirect).
No, all of those examples are handling the redirect. The SmartRedirectHandler just captures additional status. I think you need something like this: class IgnoreRedirectHandler(urllib2.HTTPRedirectHandler): def http_error_301(self, req, fp, code, msg, headers): return None def http_error_302(self, req, fp, code, msg, headers): return None > I guess i'm still a little confused since, if the > handler does nothing, won't I still go to the ToS page? No, it is the action of the handler, responding to the redirect request, that causes the ToS page to be fetched. > For example, I ran the following code (found at > http://stackoverflow.com/questions/554446/how-do-i-prevent-pythons-urllib2-from-following-a-redirect) That is pretty similar to the DiP code... > I suspect I am not understanding something basic about how urllib2 > deals with this redirect issue since it seems everything I try gives > me the same ToS page. Maybe you don't understand how redirect works in general... >> Generally you have to post to the same url as the form, giving the >> same data the form does. You can inspect the source of the form to >> figure this out. In this case the form is >> >> <form method="post" action="/products/consent.php"> >> <input type="hidden" value="tiwd/products/derivserv/data_table_i.php" >> name="urltarget"/> >> <input type="hidden" value="1" name="check_one"/> >> <input type="hidden" value="tiwdata" name="tag"/> >> <input type="submit" value="I Agree" name="acknowledgement"/> >> <input type="submit" value="Decline" name="acknowledgement"/> >> </form> >> >> You generally need to enable cookie support in urllib2 as well, >> because the site will use a cookie to flag that you saw the consent >> form. This tutorial shows how to enable cookies and submit form data: >> http://personalpages.tds.net/~kent37/kk/00010.html > > I have seen the login examples where one provides values for the > fields username and password (thanks Kent). Given the form above, > however, it's unclear to me how one POSTs the form data when you > aren't actually passing any parameters. Perhaps this is less of a > Python question and more an http question (which unfortunately I know > nothing about either). Yes, the parameters are listed in the form. If you don't have at least a basic understanding of HTTP and HTML you are going to have trouble with this project... Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor