[Tutor] Retrieving Webpage Source, a Problem with 'onclick'
Hi, I am trying to loop over all of the links in a given webpage and retrieve the source of each of the child pages in turn. My problem is that the links are in the following form: [begin html] link1 link2 link3 link4 [end html] So clicking the links appears to call the Javascript function gS to dynamically create pages. I can't figure out how to get urllib/urllib2 to work here as the URL of each of these links is http://www.thehomepage.com/#. I have tried to get mechanize to click each link, once again it doesn't send the onclick request and just goes to http://www.thehomepage.com/# This blog (http://blog.tomtebo.org/programming/lagen.nu_tech_2.html) strongly suggests that the easiest way to do this is to use IE and COM automation (which is fine as I am working on a windows PC) so I have tried importing win32com.client and actually getting IE to click the link: [begin code] ie = Dispatch("InternetExplorer.Application") ie.Visible = 1 ie.Navigate('http://www.thehomepage.com') #it takes a little while for page to load if ie.Busy: sleep(2) #Print page title print ie.LocationName test=ie.Document.links ie.Navigate(ie.Document.links(30)) [end code] Which should just click the 30th link on the page. As with the other methods this takes me to http://www.thehomepage/# and doesn't call the Javascript. If somebody who has more experience in these matters could suggest a course of action I would be grateful. I'm more than happy to use any method (urllib, mechanize, IE & COM as tried so far) just so long as it works :) Thanks in advance, Craig. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Web browser
Hi, As was recommended to me in a previous thread if you're on a Windows machine with IE installed then PAMIE (http://www.pamie.sourceforge.net) can simplify using the IE COM and does nearly everything you need. Pamie doesn't support frames correctly yet, but it is very easy to hack in support as and where it is needed. --Craig On Sat, 11 Jun 2005, Ismael Garrido wrote: > Hello. > > I've been looking around for a web browser either written in python, or > with python bindings. > What I need to do is load a web-page, enter a password-protected site > and follow certain links, it needs to have frames and follow the refresh > meta. I'm running winxp, python 2.4 > > At first I thought about using a python web browser, but none I could > find had frames support. The most promising ones were quite old too. And > one that was completly written in Python used and old version and just > kept crashing in python 2.4. > Then, I thought about using some browser with bindings. I've looked all > over and found about nothing. Mozilla and its xpcom just seems quite > hard and I'm not sure if that does the job I want. I tried finding COM > bindings in other browsers, but I coudn't understand them and make them > work... > > Any suggestion would be greatly appreciated. > Ismael > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor