Problem solved. Thanks
--- Kent Johnson <[EMAIL PROTECTED]> wrote: > Try it with non-greedy matches. You are matching > everything from the first <hX><a to the last </p> > in one match. Also I think you want to escape the . > before </p> (you want just paragraphs that end > in a period?) > > pattern = re.compile("""<h[1-2]><a > href="/(.*?)">(.*?)\.</p>""", re.DOTALL) > > Kent > > Ron Nixon wrote: > > Trying to scrape a newspaper site for articles > using > > this code whic ws done with help from the list: > > > > import urllib, re > > pattern = re.compile("""<h[1-2]><a > > href="/(.*)">(.*).</p>""", re.DOTALL) > > page > > > =urllib.urlopen("http://www.startribune.com").read() > > > > > for headline, body in pattern.findall(page): > > print body > > > > It should grab articles from this: > > > > <h2><a href="/stories/507/5240764.html">Sid > Hartman: > > Franchise could be moved</a></h2><p>If Reggie > Fowler > > and his business partners from New Jersey are > approved > > to buy the Vikings franchise from Red McCombs, it > is > > my opinion the franchise remains in danger of > > eventually being relocated.</p> > > > > and give me this: Sid Hartman: Franchise could be > > moved</a></h2><p>If Reggie Fowler and his business > > partners from New Jersey are approved to buy the > > Vikings franchise from Red McCombs, it is my > opinion > > the franchise remains in danger of eventually > being > > relocated. > > > > Instead it gives me this:<b>Boxerjam</b></a>. from > > this : > > > href="http://www.startribune.com/stories/1559/4773140.html"><b>Boxerjam</b></a>. > > </p></div> > > > > I know the re works in other programs I've tried. > Is > > there something different about re's in Python? > > > > > > > > > > > > __________________________________ > > Do you Yahoo!? > > Yahoo! Mail - Find what you need with new enhanced > search. > > http://info.mail.yahoo.com/mail_250 > > _______________________________________________ > > Tutor maillist - Tutor@python.org > > http://mail.python.org/mailman/listinfo/tutor > > > > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > __________________________________ Do you Yahoo!? The all-new My Yahoo! - What will yours do? http://my.yahoo.com _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor