Hi, I am trying to loop over all of the links in a given webpage and retrieve the source of each of the child pages in turn.
My problem is that the links are in the following form: [begin html] <a href="#" onclick="gS(1020,19);return false;" class="ln">link1</a> <a href="#" onclick="gS(1020,8);return false;" class="ln">link2</a> <a href="#" onclick="gS(1020,14);return false;" class="ln">link3</a> <a href="#" onclick="gS(1020,1);return false;" class="ln">link4</a> [end html] So clicking the links appears to call the Javascript function gS to dynamically create pages. I can't figure out how to get urllib/urllib2 to work here as the URL of each of these links is http://www.thehomepage.com/#. I have tried to get mechanize to click each link, once again it doesn't send the onclick request and just goes to http://www.thehomepage.com/# This blog (http://blog.tomtebo.org/programming/lagen.nu_tech_2.html) strongly suggests that the easiest way to do this is to use IE and COM automation (which is fine as I am working on a windows PC) so I have tried importing win32com.client and actually getting IE to click the link: [begin code] ie = Dispatch("InternetExplorer.Application") ie.Visible = 1 ie.Navigate('http://www.thehomepage.com') #it takes a little while for page to load if ie.Busy: sleep(2) #Print page title print ie.LocationName test=ie.Document.links ie.Navigate(ie.Document.links(30)) [end code] Which should just click the 30th link on the page. As with the other methods this takes me to http://www.thehomepage/# and doesn't call the Javascript. If somebody who has more experience in these matters could suggest a course of action I would be grateful. I'm more than happy to use any method (urllib, mechanize, IE & COM as tried so far) just so long as it works :) Thanks in advance, Craig. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor