[Python-Dev] New bug, directly assigned, okay?
I just added a new bug on SF (1175396) and because I think that it is related to other bugs that were assigned to Walter Doerwald, I assigned this new bug directly to Walter too. Is that good practice or does someone else usually assign SF bugs to people? --Irmen ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New bug, directly assigned, okay?
Irmen de Jong wrote: I just added a new bug on SF (1175396) and because I think that it is related to other bugs that were assigned to Walter Doerwald, I assigned this new bug directly to Walter too. Is that good practice or does someone else usually assign SF bugs to people? I've certainly done that a few times myself - I figure that even if I get it wrong, the recipient will either pass it on to a more appropriate person, or simply revert it back to unassigned. I usually try to put in a comment to say *why* I've assigned it the way I have, though. Picking an assignee at random should probably be discouraged, but if there is someone that makes sense, then I don't see a problem with asking them to look at it directly. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.skystorm.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New bug, directly assigned, okay?
Nick Coghlan wrote: > Irmen de Jong wrote: > >> I just added a new bug on SF (1175396) and because I think >> that it is related to other bugs that were assigned to >> Walter Doerwald, I assigned this new bug directly to Walter too. >> >> Is that good practice or does someone else usually assign SF bugs to >> people? > > > I've certainly done that a few times myself - I figure that even if I > get it wrong, the recipient will either pass it on to a more appropriate > person, or simply revert it back to unassigned. Ah, okay. > I usually try to put in a comment to say *why* I've assigned it the way > I have, though. Picking an assignee at random should probably be > discouraged, but if there is someone that makes sense, then I don't see > a problem with asking them to look at it directly. Yep, that's what I've done. In my bug report (about codecs.readline) I referenced the two other bugs related to it (those were assigned to Walter). Thanks, Irmen. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] hierarchicial named groups extension to the re library
I've written an extension to the re library, to provide a more complete matching of hierarchical named groups in regular expressions. I've set up a sourceforge project for it: http://pyre2.sourceforge.net/ re2 extracts a hierarchy of named groups matches from a string, rather than the flat, incomplete dictionary that the standard re module returns. (ie. the re library only returns the ~last~ match for named groups - not a list of ~all~ the matches for the named groups. And the hierarchy of those named groups is non-existant in the flat dictionary of matches that results. ) eg. >>> import re >>> buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping' >>> regex='^((?P(?P\d+) (?P[^,]+))(, )?)*$' >>> pat1=re.compile(regex) >>> m=pat1.match(buf) >>> m.groupdict() {'verse': '10 lords a-leaping', 'number': '10', 'activity': 'lords a-leaping'} >>> import re2 >>> buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping' >>> regex='^((?P(?P\d+) (?P[^,]+))(, )?)*$' >>> pat2=re2.compile(regex) >>> x=pat2.extract(buf) >>> x {'verse': [{'number': '12', 'activity': 'drummers drumming'}, {'number': '11', 'activity': 'pipers piping'}, {'number': '10', 'activity': 'lords a-leaping'}]} (See http://pyre2.sourceforge.net/ for more details.) I am wondering what would be the best direction to take this project in. Firstly is it, (or can it be made) useful enough to be included in the python stdlib? (ie. Should I bother writing a PEP for it.) And if so, would it be best to merge its functionality in with the re library, or to leave it as a separate module? And, also are there any suggestions/criticisms on the library itself? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: hierarchicial named groups extension to the re library
[EMAIL PROTECTED] wrote: import re2 buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping' regex='^((?P(?P\d+) (?P[^,]+))(, )?)*$' pat2=re2.compile(regex) x=pat2.extract(buf) x {'verse': [{'number': '12', 'activity': 'drummers drumming'}, {'number': '11', 'activity': 'pipers piping'}, {'number': '10', 'activity': 'lords a-leaping'}]} Is a dictionary the good container or should another class be used? Because in the example the content of the "verse" group is lost, excluding its sub-groups. Something like a hierarchic MatchObject could provide access to both information, the sub-groups and the group itself. Also, should it be limited to named groups? I am wondering what would be the best direction to take this project in. Firstly is it, (or can it be made) useful enough to be included in the python stdlib? (ie. Should I bother writing a PEP for it.) And if so, would it be best to merge its functionality in with the re library, or to leave it as a separate module? And, also are there any suggestions/criticisms on the library itself? I find the feature very interesting, but being used to live without it, I have difficulty evaluating its usefulness. However, it reminds me how much at first I found strange that only the last match was kept, so I think, FWIW, that on a purist point of vue the functionality would make sense in the stdlib in some way or another. Regards, Nicolas ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: hierarchicial named groups extension to the re library
Nicolas Fleury <[EMAIL PROTECTED]> wrote: > > [EMAIL PROTECTED] wrote: > import re2 > buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping' > regex='^((?P(?P\d+) (?P[^,]+))(, )?)*$' > pat2=re2.compile(regex) > x=pat2.extract(buf) If one wanted to match the API of the re module, one should use pat2.findall(buf), which would return a list of 'hierarchical match objects', though with the above, one should really return a list of 'verse' items (the way the regular expression is written). > x > > > > {'verse': [{'number': '12', 'activity': 'drummers > > drumming'}, {'number': '11', 'activity': 'pipers > > piping'}, {'number': '10', 'activity': 'lords a-leaping'}]} > > Is a dictionary the good container or should another class be used? > Because in the example the content of the "verse" group is lost, > excluding its sub-groups. Something like a hierarchic MatchObject could > provide access to both information, the sub-groups and the group itself. Its contents are not lost, look at the overall dictionary... In any case, I think one can do better than a dictionary. >>> x=pat2.match(buf) #or x=pat2.findall(buf)[0] >>> x '12 drummers drumming,' >>> dir(x) ['verse'] >>> x.verse '12 drummers drumming,' >>> dir(x.verse) ['number', 'activity'] >>> x.verse.number '12' >>> x.verse.activity 'drummers drumming' ...would get my vote (or using obj.group(i) semantics I discuss below). I notice that this is basically what the re2 module already does (having read the web page), though rather than... >>> pat2.extract(buf).verse[1].activity 'pipers piping' I would prefer... >>> pat2.findall(buf)[1].verse.activity 'pipers piping' For .verse[1] or .verse[2] to make sense, it implies that the pattern is something like... ((?P... )(?P...)) ... which it isn't. I understand that the decision was probably made to make it similar to the case of... ((?P... (?p...)+)) ... where multiple matches for goo would require x.foo.goo[i]. > Also, should it be limited to named groups? Probably not. I would suggest using matchobj.group(i) semantics to match the standard re module semantics, though only allow returning items in the current level of the hierarchy. That is, one could use x.verse.group(1) and get back '12', but x.group(1) would return '12 pipers piping' > > I am wondering what would be the best direction to take this project in. > > > > Firstly is it, (or can it be made) useful enough to be included in the > > python stdlib? (ie. Should I bother writing a PEP for it.) > > > > And if so, would it be best to merge its functionality in with the re > > library, or to leave it as a separate module? > > > > And, also are there any suggestions/criticisms on the library itself? > > I find the feature very interesting, but being used to live without it, > I have difficulty evaluating its usefulness. However, it reminds me how > much at first I found strange that only the last match was kept, so I > think, FWIW, that on a purist point of vue the functionality would make > sense in the stdlib in some way or another. re2 can be used as a limited structural parser. This makes the re module useful for more things than it is currently. The question of it being in the standard library, however, I think should be made based on the criteria used previously (whatever they were). - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: hierarchicial named groups extension to the re library
Josiah Carlson wrote: Nicolas Fleury <[EMAIL PROTECTED]> wrote: [EMAIL PROTECTED] wrote: import re2 buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping' regex='^((?P(?P\d+) (?P[^,]+))(, )?)*$' pat2=re2.compile(regex) x=pat2.extract(buf) If one wanted to match the API of the re module, one should use pat2.findall(buf), which would return a list of 'hierarchical match objects', though with the above, one should really return a list of 'verse' items (the way the regular expression is written). As far as I can understand, the two are orthogonal. findall is used to match the regular expression multiple times; in that case the regular expression is still matched only once. {'verse': [{'number': '12', 'activity': 'drummers drumming'}, {'number': '11', 'activity': 'pipers piping'}, {'number': '10', 'activity': 'lords a-leaping'}]} Is a dictionary the good container or should another class be used? Because in the example the content of the "verse" group is lost, excluding its sub-groups. Something like a hierarchic MatchObject could provide access to both information, the sub-groups and the group itself. Its contents are not lost, look at the overall dictionary... In any case, I think one can do better than a dictionary. In that specific example, I meant that the space between "10" and "lords a-leaping" was not stored in the dictionary, unless you talk about the dictionary from re instead of re2. Your proposal fixes that, by making the entire content of the parent group (verse) accessible. x=pat2.match(buf) #or x=pat2.findall(buf)[0] x '12 drummers drumming,' dir(x) ['verse'] x.verse '12 drummers drumming,' It is very easy to use, but I doubt it is a good idea as a return value for match (maybe a match object could have a function to return this easy-to-use object). It would mean that the name of the groups are limited by the interface of the match object returned (what would happen if a group is named "start", "end" of simpliy "group"?). Another solution is to use x["verse"] instead (or continue use a "group" method). Also, should it be limited to named groups? Probably not. I would suggest using matchobj.group(i) semantics to match the standard re module semantics, though only allow returning items in the current level of the hierarchy. That is, one could use x.verse.group(1) and get back '12', but x.group(1) would return '12 pipers piping' Totally agree that matchobj.group interface should be matched. Should group return another match object? Or maybe another function to get match objects of groups? Something like: x.groupobj("verse").group("number") or str(x["verse"]["number"]) Regards, Nicolas ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: hierarchicial named groups extension to the re library
Josiah Carlson wrote: re2 can be used as a limited structural parser. This makes the re module useful for more things than it is currently. The question of it being in the standard library, however, I think should be made based on the criteria used previously (whatever they were). In general, if developers can readily agree that a functionality should be added (i.e. it is "obvious" for some reason), it is added right away. Otherwise, a PEP should be written, and reviewed by the community. In the specific case, Chris Ottrey submitted a link to his project to the SF patches tracker, asking for inclusion. I felt that there is likely no immediate agreement, and suggested he asks on python-dev, and writes a PEP. If this kind of functionality would fall on immediate rejection for some reason, even writing the PEP might be pointless. If the functionality is generally considered useful, a PEP can be written, and then implemented according to the PEP procedures (i.e. collect feedback, discuss alternatives, ask for BDFL pronouncement). I personally think that the proposed functionality should *not* live in a separate module, but somehow be integrated into SRE. Whether or not the proposed functionality is useful in the first place, I don't know. I never have nested named groups in my regular expressions. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] hierarchicial named groups extension to the re library
Nicolas Fleury wrote: > > ottrey at py.redsoft.be wrote: > import re2 > buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping' > regex='^((?P(?P\d+) (?P[^,]+))(, )?)*$' > pat2=re2.compile(regex) > x=pat2.extract(buf) > x > > > > {'verse': [{'number': '12', 'activity': 'drummers > > drumming'}, {'number': '11', 'activity': 'pipers > > piping'}, {'number': '10', 'activity': 'lords a-leaping'}]} > > Is a dictionary the good container or should another class be used? > Because in the example the content of the "verse" group is lost, > excluding its sub-groups. Something like a hierarchic MatchObject could > provide access to both information, the sub-groups and the group itself. Yes, very good point. Actually it ~is~ a container (that uses dict as it's base class). (I probably should add the following lines to the example.) >>> type(x) >>> x._value '12 drummers drumming, 11 pipers piping, 10 lords a-leaping' >>> x.verse[0]._value '12 drummers drumming' Josiah Carlson jcarlson at uci.edu wrote: > If one wanted to match the API of the re module, one should use > pat2.findall(buf), which would return a list of 'hierarchical match > objects' Well, that would be something I'd want to discuss here. As I'm not sure if I actually ~want~ to match the API of the re module. > Also, should it be limited to named groups? I have given that some thought as well. Internally un-named groups are recursively given the names _group0, _group1 etc as they are found. And then those groups are recursively matched. And in the final step the resulting _Match object is compressed and those un-named groups are discarded. IMO If you don't bother to name a group then you probably aren't going to be interested in it anyway - so why keeping a reference to it? eg. If you only wanted to extract the numbers from those verses... >>> regex='^(((?P\d+) ([^,]+))(, )?)*$' >>> pat2=re2.compile(regex) >>> x=pat2.extract(buf) >>> x {'number': ['12', '11', '10']} Before the compression stage the _Match object actually looked like this: {'_group0': {'_value': '12 drummers drumming, 11 pipers piping, 10 lords a-leaping', '_group0': [{'_value': '12 drummers drumming, ', '_group1': ', ', '_group0': {'_value': '12 drummers drumming', '_group1': 'drummers drumming', 'number': '12'}}, {'_value': '11 pipers piping, ', '_group1': ', ', '_group0': {'_value': '11 pipers piping', '_group1': 'pipers piping', 'number': '11'}}, {'_value': '10 lords a-leaping', '_group0': {'_value': '10 lords a-leaping', '_group1': 'lords a-leaping', 'number': '10'}}]}} But the compression algorithm collected the named groups and brought them to the surface, to return the much nicer looking: {'number': ['12', '11', '10']} NB. There are also a few other tricks up the sleeve of re2. eg. It allows for named groups to be repeated in different branches of a named group hierarchy, without the name redefinition error that the re library will complain about. eg. >>> pat1=re2.compile( '(?P(?P(?P[\w ]+)),(?P(?P[\w ]+)))' ) >>> pat1.extract('Mum,Dad') {'parents': {'father': {'name': 'Dad'}, 'mother': {'name': 'Mum'}}} > I find the feature very interesting, but being used to live without it, > I have difficulty evaluating its usefulness. Yes - this is a good point too, because it ~is~ different from the re library. re2 aims to do all that searching, grouping, iterating and collecting and constructing work for you. > However, it reminds me how much at first I found strange that only the > last match was kept, so I think, FWIW, that on a purist point of vue the > functionality would make sense in the stdlib in some way or another. Actually that "last match only" confusion was part of the motivation for writing it in the first place. > For .verse[1] or .verse[2] to make sense, it implies that the pattern is > something like... > ((?P... )(?P...)) > ... which it isn't. Good pickup! You've seen through my smoke and mirrors. ;-) That list of verses was actually created in the compression stage. (The stage that I failed to mention in my first post.) ie. The regex was: ((?P(?P\d+) (?P[^,]+))(, )?)* Which returns an un-named list of verse groups. Something like: {'_group0': [ {'verse': {'number': '12', 'activity': 'drummers drumming'}, {'verse': {'number': '11', 'activity': 'pipers piping'}}, {'verse': {'number': '10', 'activity': 'lords a-leaping'}}]} But the compression algorithm discarded that '_group0' key and brought the 'verse' groups to the surface, then grouped them together in one 'verse' list. ie. to make: {'verse': [{'number': '12', 'activity': 'drummers drumming'}, {'number': '11', 'activity': 'pipers piping'}, {'number': '10', 'activity': 'lords a-leaping'}]} > > Also, should it be limited to named groups? > > Probably not. I would suggest using matchobj.group(i) semantics to > match the standard re module semantics, though only allow returning > items in the current level of the hierarchy. That is, one could use >