I think you'd do better using the pyparsing library
On Friday, January 22, 2016 at 9:02:00 AM UTC-5, inhahe wrote: > I hope this is an appropriate mailing list for BeautifulSoup questions, > it's been a long time since I've used python-list and I don't remember if > third-party modules are on topic. I did try posting to the BeautifulSoup > mailing list on Google groups, but I've waited a day or two and my message > hasn't been approved yet. > > Say I have the following HTML (I hope this shows up as plain text here > rather than formatting): > > <div style="font-size: 20pt;"><span style="color: #000000;"><em><strong>"Is > today the day?"</strong></em></span></div> > > And I want to extract the "Is today the day?" part. There are other places > in the document with <em> and <strong>, but this is the only place that > uses color #000000, so I want to extract anything that's within a color > #000000 style, even if it's nested multiple levels deep within that. > > - Sometimes the color is defined as RGB(0, 0, 0) and sometimes it's defined > as #000000 > - Sometimes the <strong> is within the <em> and sometimes the <em> is > within the <strong>. > - There may be other discrepancies I haven't noticed yet > > How can I do this in BeautifulSoup (or is this better done in lxml.html)? > Thanks -- https://mail.python.org/mailman/listinfo/python-list
