On Wed, Dec 5, 2012 at 7:13 PM, Ed Owens <eowens0...@gmx.com> wrote: >>>> str(string) > '[<div class="wx-timestamp">\n<div class="wx-subtitle wx-timestamp">Updated: > Dec 5, 2012, 5:08pm EST</div>\n</div>]' >>>> m = re.search('":\b(\w+\s+\d+,\s+\d+,\s+\d+:\d+.m\s+\w+)<', str(string)) >>>> print m > None
You need a raw string for the boundary marker \b (i.e the boundary between \w and \W), else it creates a backspace control character. Also, I don't see why you have ": at the start of the expression. This works: >>> s = 'Updated: Dec 5, 2012, 5:08pm EST</div>' >>> m = re.search(r'\b(\w+\s+\d+,\s+\d+,\s+\d+:\d+.m\s+\w+)<', s) >>> m.group(1) 'Dec 5, 2012, 5:08pm EST' But wouldn't it be simpler and more reliable to use an HTML parser? _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor