> I have a regex that matches dates in various formats. I've tested the regex > in a reliable testbed, and it seems to match what I want (dates in formats > like "1 Jan 2010" and "January 1, 2010" and also "January 2008"). It's just > that using re.findall with it is giving me weird output. I'm using Python > 2.6.5 here, and I've put in line breaks for clarity's sake: > > >>> import re > > >>> date_regex = > >>> re.compile(r"([0-3]?[0-9])?((\s*)|(\t*))((Jan\.?u?a?r?y?)|(Feb\.?r?u?a?r?y?)|(Mar\.?c?h?)|(Apr\.?i?l?)|(May)|(Jun[e.]?)|(Jul[y.]?)|(Aug\.?u?s?t?)|(Sep[t.]?\.?e?m?b?e?r?)|(Oct\.?o?b?e?r?)|(Nov\.?e?m?b?e?r?)|(Dec\.?e?m?b?e?r?))((\s*)|(\t*))(2?0?[0-3]?[0-9]\,?)?((\s*)|(\t*))(2?0?[01][0-9])")
This will also match '1 Janry 2010'. Not sure if it should? <snip>two examples</snip> > >>> test_output = re.findall(date_regex, "The date was January 1, 2008. But > >>> it was not January 2, 2008.") > > >>> print test_output > [('', ' ', ' ', '', 'January', 'January', '', '', '', '', '', '', '', '', '', > '', '', ' ', ' ', '', '1,', ' ', ' ', '', '2008'), ('', ' ', ' ', '', > 'January', 'January', '', '', '', '', '', '', '', '', '', '', '', ' ', ' ', > '', '2,', ' ', ' ', '', '2008')] > > A friend says: " I think that the problem is that every time that you have a > parenthesis you get an output. Maybe there is a way to suppress this." > > My friend's explanation speaks to the empties, but maybe not to the two > Januaries. Either way, what I want is for re.finall, or some other re method > that perhaps I haven't properly explored, to return the matches and just the > matches. > > I've read the documentation, googled various permutations etc, and I can't > figure it out. Any help much appreciated. The docs say: " If one or more groups are present in the pattern, return a list of groups". So your friend is right. In fact, your last example shows exactly this: it shows a list of two tuples. The tuples contain individual group matches, the two list elements are your two date matches. You could solve this by grouping the entire regex (so r"(([0-3 .... [0-9]))" ; I would even use a named group), and then picking out the first tuple element of each list element: [(' January 1, 2008', '', ' ', ' ', '', 'January', 'January', '', '', '', '', '', '', '', '', '', '', '', ' ', ' ', '', '1,', ' ', ' ', '', '2008'), (' January 2, 2008', '', ' ', ' ', '', 'January', 'January', '', '', '', '', '', '', '', '', '', '', '', ' ', ' ', '', '2,', ' ', ' ', '', '2008')] Hth, Evert _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor