pattern = re.compile("""<h[1-2]><a href="/(.*?)">(.*?)\.</p>""", re.DOTALL)
Kent
Ron Nixon wrote:
Trying to scrape a newspaper site for articles using this code whic ws done with help from the list:
import urllib, re
pattern = re.compile("""<h[1-2]><a
href="/(.*)">(.*).</p>""", re.DOTALL)
page
=urllib.urlopen("http://www.startribune.com").read()
for headline, body in pattern.findall(page): print body
It should grab articles from this:
<h2><a href="/stories/507/5240764.html">Sid Hartman: Franchise could be moved</a></h2><p>If Reggie Fowler and his business partners from New Jersey are approved to buy the Vikings franchise from Red McCombs, it is my opinion the franchise remains in danger of eventually being relocated.</p>
and give me this: Sid Hartman: Franchise could be moved</a></h2><p>If Reggie Fowler and his business partners from New Jersey are approved to buy the Vikings franchise from Red McCombs, it is my opinion the franchise remains in danger of eventually being relocated.
Instead it gives me this:<b>Boxerjam</b></a>. from this : href="http://www.startribune.com/stories/1559/4773140.html"><b>Boxerjam</b></a>. </p></div>
I know the re works in other programs I've tried. Is there something different about re's in Python?
__________________________________ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor