Re: [Tutor] Titles from a web page

Alan Gauld Thu, 05 May 2011 01:15:33 -0700


"louis leichtnam" <l.leicht...@gmail.com> wrote

I'm trying to write a program that looks in a webpage in find thetitles of
a subsection of the page:
Can you help me out? I tried using regular expression but I keephitting
walls and I don't know what to do...


Regular expressions are the wrong tool for parsing HTML unless
you are searching for something very simple.

There is an html parser in the Python standard library (*) that you
can use if the HTML is reasonably well formed. If its sloppy you
would be better with something like BeautifulSoup or lxml.

If the page is written in XHTML then you could also use the
element tree module which is designed for XML parsing.

(*)In fact there are two! - htmllib and HTMLParser. The former is more
powerful but more complex. Some brief examples can be found
in my tutor here:

http://www.alan-g.me.uk/tutor/tutwebc.htm

Note, the topic is not complete, the last few sections are
placeholders only...

HTH,

Alan G.


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Titles from a web page

Reply via email to