[Python-Dev] New bug, directly assigned, okay?

2005-04-02 Thread Irmen de Jong
I just added a new bug on SF (1175396) and because I think
that it is related to other bugs that were assigned to
Walter Doerwald, I assigned this new bug directly to Walter too.

Is that good practice or does someone else usually assign SF bugs to people?

--Irmen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New bug, directly assigned, okay?

2005-04-02 Thread Nick Coghlan
Irmen de Jong wrote:
I just added a new bug on SF (1175396) and because I think
that it is related to other bugs that were assigned to
Walter Doerwald, I assigned this new bug directly to Walter too.
Is that good practice or does someone else usually assign SF bugs to people?
I've certainly done that a few times myself - I figure that even if I get it 
wrong, the recipient will either pass it on to a more appropriate person, or 
simply revert it back to unassigned.

I usually try to put in a comment to say *why* I've assigned it the way I have, 
though. Picking an assignee at random should probably be discouraged, but if 
there is someone that makes sense, then I don't see a problem with asking them 
to look at it directly.

Cheers,
Nick.
--
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
http://boredomandlaziness.skystorm.net
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New bug, directly assigned, okay?

2005-04-02 Thread Irmen de Jong
Nick Coghlan wrote:
> Irmen de Jong wrote:
> 
>> I just added a new bug on SF (1175396) and because I think
>> that it is related to other bugs that were assigned to
>> Walter Doerwald, I assigned this new bug directly to Walter too.
>>
>> Is that good practice or does someone else usually assign SF bugs to
>> people?
> 
> 
> I've certainly done that a few times myself - I figure that even if I
> get it wrong, the recipient will either pass it on to a more appropriate
> person, or simply revert it back to unassigned.

Ah, okay.

> I usually try to put in a comment to say *why* I've assigned it the way
> I have, though. Picking an assignee at random should probably be
> discouraged, but if there is someone that makes sense, then I don't see
> a problem with asking them to look at it directly.

Yep, that's what I've done.
In my bug report (about codecs.readline) I referenced the two
other bugs related to it (those were assigned to Walter).


Thanks,
Irmen.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] hierarchicial named groups extension to the re library

2005-04-02 Thread ottrey

I've written an extension to the re library, to provide a more
complete matching of hierarchical named groups in regular expressions.

I've set up a sourceforge project for it:

  http://pyre2.sourceforge.net/

re2 extracts a hierarchy of named groups matches from a string,
rather than the flat, incomplete dictionary that the
standard re module returns.

(ie. the re library only returns the ~last~ match for named groups - not
a list of ~all~ the matches for the named groups.  And the hierarchy of
those named groups is non-existant in the flat dictionary of matches
that results. )

eg.

>>> import re
>>> buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping'
>>> regex='^((?P(?P\d+) (?P[^,]+))(, )?)*$'
>>> pat1=re.compile(regex)
>>> m=pat1.match(buf)
>>> m.groupdict()
{'verse': '10 lords a-leaping', 'number': '10',
'activity': 'lords a-leaping'}

>>> import re2
>>> buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping'
>>> regex='^((?P(?P\d+) (?P[^,]+))(, )?)*$'
>>> pat2=re2.compile(regex)
>>> x=pat2.extract(buf)
>>> x
{'verse': [{'number': '12', 'activity': 'drummers
drumming'}, {'number': '11', 'activity': 'pipers
piping'}, {'number': '10', 'activity': 'lords a-leaping'}]}



(See http://pyre2.sourceforge.net/ for more details.)


I am wondering what would be the best direction to take this project in.

Firstly is it, (or can it be made) useful enough to be included in the
python stdlib?  (ie. Should I bother writing a PEP for it.)

And if so, would it be best to merge its functionality in with the re
library, or to leave it as a separate module?

And, also are there any suggestions/criticisms on the library itself?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: hierarchicial named groups extension to the re library

2005-04-02 Thread Nicolas Fleury
[EMAIL PROTECTED] wrote:
import re2
buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping'
regex='^((?P(?P\d+) (?P[^,]+))(, )?)*$'
pat2=re2.compile(regex)
x=pat2.extract(buf)
x
{'verse': [{'number': '12', 'activity': 'drummers
drumming'}, {'number': '11', 'activity': 'pipers
piping'}, {'number': '10', 'activity': 'lords a-leaping'}]}
Is a dictionary the good container or should another class be used? 
Because in the example the content of the "verse" group is lost, 
excluding its sub-groups.  Something like a hierarchic MatchObject could 
provide access to both information, the sub-groups and the group itself. 
 Also, should it be limited to named groups?

I am wondering what would be the best direction to take this project in.
Firstly is it, (or can it be made) useful enough to be included in the
python stdlib?  (ie. Should I bother writing a PEP for it.)
And if so, would it be best to merge its functionality in with the re
library, or to leave it as a separate module?
And, also are there any suggestions/criticisms on the library itself?
I find the feature very interesting, but being used to live without it, 
I have difficulty evaluating its usefulness.  However, it reminds me how 
much at first I found strange that only the last match was kept, so I 
think, FWIW, that on a purist point of vue the functionality would make 
sense in the stdlib in some way or another.

Regards,
Nicolas
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: hierarchicial named groups extension to the re library

2005-04-02 Thread Josiah Carlson

Nicolas Fleury <[EMAIL PROTECTED]> wrote:
> 
> [EMAIL PROTECTED] wrote:
> import re2
> buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping'
> regex='^((?P(?P\d+) (?P[^,]+))(, )?)*$'
> pat2=re2.compile(regex)
> x=pat2.extract(buf)

If one wanted to match the API of the re module, one should use
pat2.findall(buf), which would return a list of 'hierarchical match
objects', though with the above, one should really return a list of
'verse' items (the way the regular expression is written).

> x
> > 
> > {'verse': [{'number': '12', 'activity': 'drummers
> > drumming'}, {'number': '11', 'activity': 'pipers
> > piping'}, {'number': '10', 'activity': 'lords a-leaping'}]}
> 
> Is a dictionary the good container or should another class be used? 
> Because in the example the content of the "verse" group is lost, 
> excluding its sub-groups.  Something like a hierarchic MatchObject could 
> provide access to both information, the sub-groups and the group itself. 

Its contents are not lost, look at the overall dictionary...  In any
case, I think one can do better than a dictionary.

>>> x=pat2.match(buf) #or x=pat2.findall(buf)[0]
>>> x
'12 drummers drumming,'
>>> dir(x)
['verse']
>>> x.verse
'12 drummers drumming,'
>>> dir(x.verse)
['number', 'activity']
>>> x.verse.number
'12'
>>> x.verse.activity
'drummers drumming'

...would get my vote (or using obj.group(i) semantics I discuss below).
I notice that this is basically what the re2 module already does (having
read the web page), though rather than...
>>> pat2.extract(buf).verse[1].activity
'pipers piping'

I would prefer...

>>> pat2.findall(buf)[1].verse.activity
'pipers piping'

For .verse[1] or .verse[2] to make sense, it implies that the pattern is
something like...
((?P... )(?P...))
... which it isn't.

I understand that the decision was probably made to make it similar to
the case of...
((?P... (?p...)+))

... where multiple matches for goo would require x.foo.goo[i].


>   Also, should it be limited to named groups?

Probably not.  I would suggest using matchobj.group(i) semantics to
match the standard re module semantics, though only allow returning
items in the current level of the hierarchy.  That is, one could use
x.verse.group(1) and get back '12', but x.group(1) would return '12
pipers piping'


> > I am wondering what would be the best direction to take this project in.
> > 
> > Firstly is it, (or can it be made) useful enough to be included in the
> > python stdlib?  (ie. Should I bother writing a PEP for it.)
> > 
> > And if so, would it be best to merge its functionality in with the re
> > library, or to leave it as a separate module?
> > 
> > And, also are there any suggestions/criticisms on the library itself?
> 
> I find the feature very interesting, but being used to live without it, 
> I have difficulty evaluating its usefulness.  However, it reminds me how 
> much at first I found strange that only the last match was kept, so I 
> think, FWIW, that on a purist point of vue the functionality would make 
> sense in the stdlib in some way or another.

re2 can be used as a limited structural parser.  This makes the re
module useful for more things than it is currently. The question of it
being in the standard library, however, I think should be made based on
the criteria used previously (whatever they were).

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: hierarchicial named groups extension to the re library

2005-04-02 Thread Nicolas Fleury
Josiah Carlson wrote:
Nicolas Fleury <[EMAIL PROTECTED]> wrote:
[EMAIL PROTECTED] wrote:
import re2
buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping'
regex='^((?P(?P\d+) (?P[^,]+))(, )?)*$'
pat2=re2.compile(regex)
x=pat2.extract(buf)
If one wanted to match the API of the re module, one should use
pat2.findall(buf), which would return a list of 'hierarchical match
objects', though with the above, one should really return a list of
'verse' items (the way the regular expression is written).
As far as I can understand, the two are orthogonal.  findall is used to 
match the regular expression multiple times; in that case the regular 
expression is still matched only once.

{'verse': [{'number': '12', 'activity': 'drummers
drumming'}, {'number': '11', 'activity': 'pipers
piping'}, {'number': '10', 'activity': 'lords a-leaping'}]}
Is a dictionary the good container or should another class be used? 
Because in the example the content of the "verse" group is lost, 
excluding its sub-groups.  Something like a hierarchic MatchObject could 
provide access to both information, the sub-groups and the group itself. 
Its contents are not lost, look at the overall dictionary...  In any
case, I think one can do better than a dictionary.
In that specific example, I meant that the space between "10" and "lords 
a-leaping" was not stored in the dictionary, unless you talk about the 
dictionary from re instead of re2.  Your proposal fixes that, by making 
the entire content of the parent group (verse) accessible.

x=pat2.match(buf) #or x=pat2.findall(buf)[0]
x
'12 drummers drumming,'
dir(x)
['verse']
x.verse
'12 drummers drumming,'
It is very easy to use, but I doubt it is a good idea as a return value 
for match (maybe a match object could have a function to return this 
easy-to-use object).  It would mean that the name of the groups are 
limited by the interface of the match object returned (what would happen 
if a group is named "start", "end" of simpliy "group"?).

Another solution is to use x["verse"] instead (or continue use a "group" 
method).

 Also, should it be limited to named groups?
Probably not.  I would suggest using matchobj.group(i) semantics to
match the standard re module semantics, though only allow returning
items in the current level of the hierarchy.  That is, one could use
x.verse.group(1) and get back '12', but x.group(1) would return '12
pipers piping'
Totally agree that matchobj.group interface should be matched.  Should 
group return another match object?  Or maybe another function to get 
match objects of groups?  Something like:
x.groupobj("verse").group("number")
or
str(x["verse"]["number"])

Regards,
Nicolas
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: hierarchicial named groups extension to the re library

2005-04-02 Thread Martin v. Löwis
Josiah Carlson wrote:
re2 can be used as a limited structural parser.  This makes the re
module useful for more things than it is currently. The question of it
being in the standard library, however, I think should be made based on
the criteria used previously (whatever they were).
In general, if developers can readily agree that a functionality should
be added (i.e. it is "obvious" for some reason), it is added right away.
Otherwise, a PEP should be written, and reviewed by the community.
In the specific case, Chris Ottrey submitted a link to his project to
the SF patches tracker, asking for inclusion. I felt that there is
likely no immediate agreement, and suggested he asks on python-dev,
and writes a PEP.
If this kind of functionality would fall on immediate rejection for
some reason, even writing the PEP might be pointless. If the
functionality is generally considered useful, a PEP can be written,
and then implemented according to the PEP procedures (i.e. collect
feedback, discuss alternatives, ask for BDFL pronouncement).
I personally think that the proposed functionality should *not* live
in a separate module, but somehow be integrated into SRE. Whether or
not the proposed functionality is useful in the first place, I don't
know. I never have nested named groups in my regular expressions.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] hierarchicial named groups extension to the re library

2005-04-02 Thread ottrey

Nicolas Fleury  wrote:
>
> ottrey at py.redsoft.be wrote:
> import re2
> buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping'
> regex='^((?P(?P\d+) (?P[^,]+))(, )?)*$'
> pat2=re2.compile(regex)
> x=pat2.extract(buf)
> x
> >
> > {'verse': [{'number': '12', 'activity': 'drummers
> > drumming'}, {'number': '11', 'activity': 'pipers
> > piping'}, {'number': '10', 'activity': 'lords a-leaping'}]}
>
> Is a dictionary the good container or should another class be used?
> Because in the example the content of the "verse" group is lost,
> excluding its sub-groups.  Something like a hierarchic MatchObject could
> provide access to both information, the sub-groups and the group itself.

Yes, very good point.
Actually it ~is~ a container (that uses dict as it's base class).
(I probably should add the following lines to the example.)

>>> type(x)

>>> x._value
'12 drummers drumming, 11 pipers piping, 10 lords a-leaping'
>>> x.verse[0]._value
'12 drummers drumming'


Josiah Carlson jcarlson at uci.edu wrote:
> If one wanted to match the API of the re module, one should use
> pat2.findall(buf), which would return a list of 'hierarchical match
> objects'

Well, that would be something I'd want to discuss here.
As I'm not sure if I actually ~want~ to match the API of the re module.

> Also, should it be limited to named groups?

I have given that some thought as well.
Internally un-named groups are recursively given the names _group0,
_group1 etc as they are found.  And then those groups are recursively
matched. And in the final step the resulting _Match object is compressed
and those un-named groups are discarded.

IMO If you don't bother to name a group then you probably aren't going
to be interested in it anyway - so why keeping a reference to it?

eg.
If you only wanted to extract the numbers from those verses...

>>> regex='^(((?P\d+) ([^,]+))(, )?)*$'
>>> pat2=re2.compile(regex)
>>> x=pat2.extract(buf)
>>> x
{'number': ['12', '11', '10']}

Before the compression stage the _Match object actually looked like this:

{'_group0': {'_value': '12 drummers drumming, 11 pipers piping, 10
lords
a-leaping', '_group0': [{'_value': '12 drummers drumming, ',
'_group1':
', ', '_group0': {'_value': '12 drummers drumming', '_group1':
'drummers
drumming', 'number': '12'}}, {'_value': '11 pipers piping, ',
'_group1':
', ', '_group0': {'_value': '11 pipers piping', '_group1':
'pipers
piping', 'number': '11'}}, {'_value': '10 lords a-leaping',
'_group0':
{'_value': '10 lords a-leaping', '_group1': 'lords a-leaping',
'number':
'10'}}]}}

But the compression algorithm collected the named groups and brought
them to the surface, to return the much nicer looking:

{'number': ['12', '11', '10']}


NB. There are also a few other tricks up the sleeve of re2.

eg.
It allows for named groups to be repeated in different branches of a
named group hierarchy, without the name redefinition error that the re
library will complain about.

eg.
>>> pat1=re2.compile(
  '(?P(?P(?P[\w ]+)),(?P(?P[\w
]+)))'
)
>>> pat1.extract('Mum,Dad')
{'parents': {'father': {'name': 'Dad'}, 'mother': {'name':
'Mum'}}}


> I find the feature very interesting, but being used to live without it,
> I have difficulty evaluating its usefulness.

Yes - this is a good point too, because it ~is~ different from the re
library.  re2 aims to do all that searching, grouping, iterating and
collecting and constructing work for you.

> However, it reminds me how much at first I found strange that only the
> last match was kept, so I think, FWIW, that on a purist point of vue the
> functionality would make sense in the stdlib in some way or another.

Actually that "last match only" confusion was part of the motivation for
writing it in the first place.


> For .verse[1] or .verse[2] to make sense, it implies that the pattern is
> something like...
> ((?P... )(?P...))
> ... which it isn't.

Good pickup!
You've seen through my smoke and mirrors.  ;-)
That list of verses was actually created in the compression stage.
(The stage that I failed to mention in my first post.)

ie. The regex was:

((?P(?P\d+) (?P[^,]+))(, )?)*

Which returns an un-named list of verse groups.

Something like:

{'_group0': [ {'verse': {'number': '12', 'activity': 'drummers
drumming'}, {'verse': {'number': '11', 'activity': 'pipers
piping'}},
{'verse': {'number': '10', 'activity': 'lords a-leaping'}}]}

But the compression algorithm discarded that '_group0' key and brought
the 'verse' groups to the surface, then grouped them together in one
'verse' list.

ie. to make:

{'verse': [{'number': '12', 'activity': 'drummers
drumming'}, {'number': '11', 'activity': 'pipers
piping'}, {'number': '10', 'activity': 'lords a-leaping'}]}

> > Also, should it be limited to named groups?
>
> Probably not.  I would suggest using matchobj.group(i) semantics to
> match the standard re module semantics, though only allow returning
> items in the current level of the hierarchy.  That is, one could use
>