On Sat, 29 Jan 2011 15:09:14 -0500, Jesse Rosenthal <jrosenthal at jhu.edu> wrote: > So BS is the best I could find for this job
No doubt. I once tried to scrape http://theeconomist.com. It has so broken html that all parsers broke down. BeautifulSoup at least made it through and didn't completely fail. so I agree it is the best thing for surely broken html email Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20110129/4e2d0240/attachment.pgp>
