On Tue, Jul 10, 2012 at 9:26 PM, Alexander Q. <alexxqu...@gmail.com> wrote:
> I'm a bit confused about extracting data using re.search or re.findall. > > Say I have the following code: tuples = > re.findall(r'blahblah(\d+)yattayattayatta(\w+)moreblahblahblah(\w+)over', > text) > > So I'm looking for that string in 'text', and I intend to extract the > parts which have parentheses around them. And it works: the variable > "tuples", which I assigned to get the return of re.findall, returns a tuple > list, each 'element' therein being a tuple of 3 elements (which is what I > wanted since I had 3 sets of parentheses). > > My question is how does Python know to return just the part in the > parentheses and not to return the "blahblah" and the "yattayattayatta", > etc...? The 're.search' function returns the whole thing, and if I want > just the parentheses parts, I do tuples.group(1) or tuples.group(2) or > tuples.group(3), depending on which set of parentheses I want. Does the > re.findall command by default ignore anything outside of the parentheses > and only return the parentheses as a grouping withing one tuple (i.e., the > first element in "tuples" would be, as it is, a list comprised of 3 > elements corresponding respectively to the 1st, 2nd, and 3rd parentheses)? > Thank you for reading. > > -Alex > from the documentation for findall: The *string* is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. That should clear everything up. As for *why* it behaves this way, I have no idea. It may be legacy behavior. HTH, Hugo
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor