Thanks Kent, perhaps I'll cool the Python jets and move on to HTTP and HTML. I was hoping it would be something I could just pick up along the way, looks like I was wrong.
dk On Tue, Jul 7, 2009 at 1:56 PM, Kent Johnson<ken...@tds.net> wrote: > On Tue, Jul 7, 2009 at 1:20 PM, David Kim<davidki...@gmail.com> wrote: >> On Tue, Jul 7, 2009 at 7:26 AM, Kent Johnson<ken...@tds.net> wrote: >>> >>> curl works because it ignores the redirect to the ToS page, and the >>> site is (astoundingly) dumb enough to serve the content with the >>> redirect. You could make urllib2 behave the same way by defining a 302 >>> handler that does nothing. >> >> Many thanks for the redirect pointer! I also found >> http://diveintopython.org/http_web_services/redirects.html. Is the >> handler class on this page what you mean by a handler that does >> nothing? (It looks like it exposes the error code but still follows >> the redirect). > > No, all of those examples are handling the redirect. The > SmartRedirectHandler just captures additional status. I think you need > something like this: > class IgnoreRedirectHandler(urllib2.HTTPRedirectHandler): > def http_error_301(self, req, fp, code, msg, headers): > return None > > def http_error_302(self, req, fp, code, msg, headers): > return None > >> I guess i'm still a little confused since, if the >> handler does nothing, won't I still go to the ToS page? > > No, it is the action of the handler, responding to the redirect > request, that causes the ToS page to be fetched. > >> For example, I ran the following code (found at >> http://stackoverflow.com/questions/554446/how-do-i-prevent-pythons-urllib2-from-following-a-redirect) > > That is pretty similar to the DiP code... > >> I suspect I am not understanding something basic about how urllib2 >> deals with this redirect issue since it seems everything I try gives >> me the same ToS page. > > Maybe you don't understand how redirect works in general... > >>> Generally you have to post to the same url as the form, giving the >>> same data the form does. You can inspect the source of the form to >>> figure this out. In this case the form is >>> >>> <form method="post" action="/products/consent.php"> >>> <input type="hidden" value="tiwd/products/derivserv/data_table_i.php" >>> name="urltarget"/> >>> <input type="hidden" value="1" name="check_one"/> >>> <input type="hidden" value="tiwdata" name="tag"/> >>> <input type="submit" value="I Agree" name="acknowledgement"/> >>> <input type="submit" value="Decline" name="acknowledgement"/> >>> </form> >>> >>> You generally need to enable cookie support in urllib2 as well, >>> because the site will use a cookie to flag that you saw the consent >>> form. This tutorial shows how to enable cookies and submit form data: >>> http://personalpages.tds.net/~kent37/kk/00010.html >> >> I have seen the login examples where one provides values for the >> fields username and password (thanks Kent). Given the form above, >> however, it's unclear to me how one POSTs the form data when you >> aren't actually passing any parameters. Perhaps this is less of a >> Python question and more an http question (which unfortunately I know >> nothing about either). > > Yes, the parameters are listed in the form. > > If you don't have at least a basic understanding of HTTP and HTML you > are going to have trouble with this project... > > Kent > -- morenotestoself.wordpress.com _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor