> I have a regex that matches dates in various formats.  I've tested the regex 
> in a reliable testbed, and it seems to match what I want (dates in formats 
> like "1 Jan 2010" and "January 1, 2010" and also "January 2008").  It's just 
> that using re.findall with it is giving me weird output.  I'm using Python 
> 2.6.5 here, and I've put in line breaks for clarity's sake:
> 
> >>> import re
> 
> >>> date_regex = 
> >>> re.compile(r"([0-3]?[0-9])?((\s*)|(\t*))((Jan\.?u?a?r?y?)|(Feb\.?r?u?a?r?y?)|(Mar\.?c?h?)|(Apr\.?i?l?)|(May)|(Jun[e.]?)|(Jul[y.]?)|(Aug\.?u?s?t?)|(Sep[t.]?\.?e?m?b?e?r?)|(Oct\.?o?b?e?r?)|(Nov\.?e?m?b?e?r?)|(Dec\.?e?m?b?e?r?))((\s*)|(\t*))(2?0?[0-3]?[0-9]\,?)?((\s*)|(\t*))(2?0?[01][0-9])")

This will also match '1 Janry 2010'. 
Not sure if it should?


<snip>two examples</snip>

> >>> test_output = re.findall(date_regex, "The date was January 1, 2008.  But 
> >>> it was not January 2, 2008.")
> 
> >>> print test_output
> [('', ' ', ' ', '', 'January', 'January', '', '', '', '', '', '', '', '', '', 
> '', '', ' ', ' ', '', '1,', ' ', ' ', '', '2008'), ('', ' ', ' ', '', 
> 'January', 'January', '', '', '', '', '', '', '', '', '', '', '', ' ', ' ', 
> '', '2,', ' ', ' ', '', '2008')]
> 
> A friend says: " I think that the problem is that every time that you have a 
> parenthesis you get an output. Maybe there is a way to suppress this."
> 
> My friend's explanation speaks to the empties, but maybe not to the two 
> Januaries.  Either way, what I want is for re.finall, or some other re method 
> that perhaps I haven't properly explored, to return the matches and just the 
> matches.       
> 
> I've read the documentation, googled various permutations etc, and I can't 
> figure it out.  Any help much appreciated.

The docs say: " If one or more groups are present in the pattern, return a list 
of groups". So your friend is right.

In fact, your last example shows exactly this: it shows a list of two tuples. 
The tuples contain individual group matches, the two list elements are your two 
date matches.
You could solve this by grouping the entire regex (so r"(([0-3 .... [0-9]))" ; 
I would even use a named group), and then picking out the first tuple element 
of each list element:
[(' January 1, 2008', '', ' ', ' ', '', 'January', 'January', '', '', '', '', 
'', '', '', '', '', '', '', ' ', ' ', '', '1,', ' ', ' ', '', '2008'), (' 
January 2, 2008', '', ' ', ' ', '', 'January', 'January', '', '', '', '', '', 
'', '', '', '', '', '', ' ', ' ', '', '2,', ' ', ' ', '', '2008')]


Hth,

  Evert

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to