Re: [Tutor] extract plain english words from html

2005-10-14 Thread Andrew P
You could try: http://www.aminus.org/rbre/python/cleanhtml.py YMMV, as the kids say. But I did choose this over BeautifulSoup or Strip-o-gram to do this particular thing. I don't remember -why- I chose it, but there you go. Easy enough to test all three :) Oh, and if you just want a whole page

Re: [Tutor] extract plain english words from html

2005-10-14 Thread Kent Johnson
Marc Buehler wrote: > hi. > > i have a ton of html files from which i want to > extract the plain english words, and then write > those words into a single text file. If you just want the text from a single tag in the document then BeautifulSoup will work well, as Danny and Bob suggest. If you h

Re: [Tutor] extract plain english words from html

2005-10-14 Thread bob
At 03:50 PM 10/14/2005, Marc Buehler wrote: >hi. > >i have a ton of html files from which i want to >extract the plain english words, and then write >those words into a single text file. http://www.crummy.com/software/BeautifulSoup/ will read the html, let you step from tag to tag and extract the

Re: [Tutor] extract plain english words from html

2005-10-14 Thread Danny Yoo
On Fri, 14 Oct 2005, Marc Buehler wrote: > i have a ton of html files from which i want to extract the plain > english words, and then write those words into a single text file. Hi Marc, The BeautifulSoup parser should be able to do what you want: http://www.crummy.com/software/Beautiful

[Tutor] extract plain english words from html

2005-10-14 Thread Marc Buehler
hi. i have a ton of html files from which i want to extract the plain english words, and then write those words into a single text file. example: <... all kinds html tags ...> this is text from the above, i want to extract the string 'this is text' and write it out to a text file. note that