Re: pattern matching

Roy Smith Wed, 23 Feb 2011 20:13:22 -0800

In article <[email protected]>,
 Chris Rebert <[email protected]> wrote:


> regex = compile("(\d\d)/(\d\d)/(\d{4})")

I would probably write that as either

r"(\d{2})/(\d{2})/(\d{4})"

or (somewhat less likely)

r"(\d\d)/(\d\d)/(\d\d\d\d)"

Keeping to one consistent style makes it a little easier to read.  Also, 
don't forget the leading `r` to get raw strings.  I've long since given 
up trying to remember the exact rules of what needs to get escaped and 
what doesn't.  If it's a regex, I just automatically make it a raw 
string.

Also, don't overlook the re.VERBOSE flag.  With it, you can write 
positively outrageous expressions which are still quite readable.  With 
it, you could write this regex as:

r" (\d{2}) / (\d{2}) / (\d{4}) "

which takes up only slightly more space, but makes it a whole lot easier 
to scan by eye.

I'm still going to stand by my previous statement, however.  If you're 
trying to parse HTML, use an HTML parser.  Using a regex like this is 
perfectly fine for parsing the CDATA text inside the HTML <td> element, 
but pattern matching the HTML markup itself is madness.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: pattern matching

Reply via email to