[Tutor] Retrieving Webpage Source, a Problem with 'onclick'

Craig Booth Sat, 21 May 2005 04:23:47 -0700

Hi,

   I am trying to loop over all of the links in a given webpage and
retrieve the source of each of the child pages in turn.


   My problem is that the links are in the following form:

[begin html]
<a href="#" onclick="gS(1020,19);return false;" class="ln">link1</a>
<a href="#" onclick="gS(1020,8);return false;" class="ln">link2</a>
<a href="#" onclick="gS(1020,14);return false;" class="ln">link3</a>
<a href="#" onclick="gS(1020,1);return false;" class="ln">link4</a>
[end html]

  So clicking the links appears to call the Javascript function gS to
dynamically create pages.

  I can't figure out how to get urllib/urllib2 to work here as the URL of
each of these links is http://www.thehomepage.com/#.

  I have tried to get mechanize to click each link, once again it doesn't
send the onclick request and just goes to http://www.thehomepage.com/#

This blog (http://blog.tomtebo.org/programming/lagen.nu_tech_2.html)
strongly suggests that the easiest way to do this is to use IE and COM
automation (which is fine as I am working on a windows PC) so I have tried
importing win32com.client and actually getting IE to click the link:

[begin code]

ie = Dispatch("InternetExplorer.Application")
ie.Visible = 1
ie.Navigate('http://www.thehomepage.com')

#it takes a little while for page to load
if ie.Busy:
    sleep(2)

#Print page title
print ie.LocationName

test=ie.Document.links
ie.Navigate(ie.Document.links(30))

[end code]

  Which should just click the 30th link on the page.  As with the other
methods this takes me to http://www.thehomepage/# and doesn't call the
Javascript.

   If somebody who has more experience in these matters could suggest a
course of action I would be grateful.  I'm more than happy to use any
method (urllib, mechanize, IE & COM as tried so far) just so long as it
works :)

   Thanks in advance,
      Craig.

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Retrieving Webpage Source, a Problem with 'onclick'

Reply via email to