In article <[email protected]>,
Chris Rebert <[email protected]> wrote:
> regex = compile("(\d\d)/(\d\d)/(\d{4})")
I would probably write that as either
r"(\d{2})/(\d{2})/(\d{4})"
or (somewhat less likely)
r"(\d\d)/(\d\d)/(\d\d\d\d)"
Keeping to one consistent style makes it a little easier to read. Also,
don't forget the leading `r` to get raw strings. I've long since given
up trying to remember the exact rules of what needs to get escaped and
what doesn't. If it's a regex, I just automatically make it a raw
string.
Also, don't overlook the re.VERBOSE flag. With it, you can write
positively outrageous expressions which are still quite readable. With
it, you could write this regex as:
r" (\d{2}) / (\d{2}) / (\d{4}) "
which takes up only slightly more space, but makes it a whole lot easier
to scan by eye.
I'm still going to stand by my previous statement, however. If you're
trying to parse HTML, use an HTML parser. Using a regex like this is
perfectly fine for parsing the CDATA text inside the HTML <td> element,
but pattern matching the HTML markup itself is madness.
--
http://mail.python.org/mailman/listinfo/python-list