> Which method is best and most pythonic to scrape text data with > minimal formatting?
Use the HTMLParser module. > I want to change the above to: > > <p><b>Trigger:</b> Debate on budget in Feb-Mar. New moves to > cutmedical costs by better technology.</p> > > Since I wanted some practice in regex, I started with something like this: Using regex is usually the wrong way to parse html for anything beyond the trivial. The parser module helps deal with the complexities. > So I'm thinking of using sgmllib.py (as in the Dive into Python > example). Is this where I should be using libxml2.py? As you can > tell this is my first foray into both parsing and regex so advice in > terms of best practice would be very helpful. There is an html parser which is built on the sgml one. Its rather more specific to your task. Alan G. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor