You may be interested in Pamie: http://pamie.sourceforge.net/
Kent Craig Booth wrote: > Hi, > > I am trying to loop over all of the links in a given webpage and > retrieve the source of each of the child pages in turn. > > My problem is that the links are in the following form: > > [begin html] > <a href="#" onclick="gS(1020,19);return false;" class="ln">link1</a> > <a href="#" onclick="gS(1020,8);return false;" class="ln">link2</a> > <a href="#" onclick="gS(1020,14);return false;" class="ln">link3</a> > <a href="#" onclick="gS(1020,1);return false;" class="ln">link4</a> > [end html] > > So clicking the links appears to call the Javascript function gS to > dynamically create pages. > > I can't figure out how to get urllib/urllib2 to work here as the URL of > each of these links is http://www.thehomepage.com/#. > > I have tried to get mechanize to click each link, once again it doesn't > send the onclick request and just goes to http://www.thehomepage.com/# > > This blog (http://blog.tomtebo.org/programming/lagen.nu_tech_2.html) > strongly suggests that the easiest way to do this is to use IE and COM > automation (which is fine as I am working on a windows PC) so I have tried > importing win32com.client and actually getting IE to click the link: > > [begin code] > > ie = Dispatch("InternetExplorer.Application") > ie.Visible = 1 > ie.Navigate('http://www.thehomepage.com') > > #it takes a little while for page to load > if ie.Busy: > sleep(2) > > #Print page title > print ie.LocationName > > test=ie.Document.links > ie.Navigate(ie.Document.links(30)) > > [end code] > > Which should just click the 30th link on the page. As with the other > methods this takes me to http://www.thehomepage/# and doesn't call the > Javascript. > > If somebody who has more experience in these matters could suggest a > course of action I would be grateful. I'm more than happy to use any > method (urllib, mechanize, IE & COM as tried so far) just so long as it > works :) > > Thanks in advance, > Craig. > > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor