2007/7/12, kublai <[EMAIL PROTECTED]>:
> For a project, I need to develop a corpus of online news stories. I'm
> looking for an application that, given the url of a web page, "copies"
> the rendered text of the web page (not the source HTNL text), opens a
> text editor (Notepad), and displays the copied text for the user to
> examine and save into a text file. Graphics and sidebars to be
> ignored. The examples I have come across are much too complex for me
> to customize for this simple job. Can anyone lead me to the right
> direction?
def textonly(url):
# Get the HTML source on url and give only the main text
f = urllib2.urlopen(url)
text = f.read()
r = re.compile('\<[^\<\>]*\>')
newtext = r.sub('',text)
while newtext != text:
text = newtext
newtext = r.sub('',text)
return text
--
Andre Engels, [EMAIL PROTECTED]
ICQ: 6260644 -- Skype: a_engels
--
http://mail.python.org/mailman/listinfo/python-list