[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-29 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20101230.zip is a new version of the regex module. I've delayed the building of the tables for fast searching until their first use, which, hopefully, will mean that fewer will be actually built. -- Added file: http://bugs.pytho

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-30 Thread Matthew Barnett
Matthew Barnett added the comment: The project is now at: https://code.google.com/p/mrab-regex/ Unfortunately it doesn't have the revision history. I don't know why not. -- ___ Python tracker <http://bugs.python.

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-30 Thread Matthew Barnett
Matthew Barnett added the comment: msg124904: It would, of course, be slower on first use, but I'm surprised that it's (that much) slower afterwards. msg124905, msg124906: I have those matching now. msg124931: The sources are in TortoiseBzr, but I couldn't upload, so

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-30 Thread Matthew Barnett
Matthew Barnett added the comment: Even after much uninstalling and reinstalling (and reboots) I never got TortoiseSVN to work properly, so I switched to TortoiseHg. The sources are now at: https://code.google.com/p/mrab-regex-hg

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-31 Thread Matthew Barnett
Matthew Barnett added the comment: Why not? :-) -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-31 Thread Matthew Barnett
Matthew Barnett added the comment: Just to check, does this still work with your changes of msg124959? regex.search(r'\d{4}(\s*\w)?\W*((?!\d)\w){2}', "XX") For me it fails to match! -- ___ Python tracker <http://bu

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-01-03 Thread Matthew Barnett
Matthew Barnett added the comment: I've just done a bug fix. The issue is at: https://code.google.com/p/mrab-regex-hg/ BTW, Jacques, I trust that your regression tests don't test how long a regex takes to fail to match, because a bug could cause such a non-match to occur to

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-01-16 Thread Matthew Barnett
Matthew Barnett added the comment: That line crept in somehow. As it's been there since the 2010-12-24 release and you're the first one to have a problem with it (and you've already fixed it), it looks like a new upload isn't urgently needed (I don't have any

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-01-25 Thread Matthew Barnett
Matthew Barnett added the comment: I've reduced the size of some internal tables. -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Pytho

[issue11198] re sub subn backreferrence too few replacements

2011-02-11 Thread Matthew Barnett
Matthew Barnett added the comment: Argument 4 of re.subn(...) is 'count', the maximum number of replacements to perform, but you're passing in the MULTILINE flag, which happens to have the integer value 8, hence you're limiting the maximum number

[issue3262] re.split doesn't split with zero-width regex

2008-07-02 Thread Matthew Barnett
New submission from Matthew Barnett <[EMAIL PROTECTED]>: re.split doesn't split a string when the regex matches a zero characters. For example: re.split(r'\b', 'a b') returns ['a b'] instead of ['', 'a', &#

[issue3262] re.split doesn't split with zero-width regex

2008-07-02 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: The attached patch appears to work. -- keywords: +patch Added file: http://bugs.python.org/file10794/split_zero_width.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue3262] re.split doesn't split with zero-width regex

2008-07-02 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: I've found that this issue has been discussed before: #988761. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.

[issue3262] re.split doesn't split with zero-width regex

2008-07-02 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: New patch version after studying #988761 and doing more testing. Added file: http://bugs.python.org/file10797/split_zero_width_2.diff ___ Python tracker <[EMAIL PROTECTED]> <http://

[issue3262] re.split doesn't split with zero-width regex

2008-07-02 Thread Matthew Barnett
Changes by Matthew Barnett <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file10794/split_zero_width.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue3262] re.split doesn't split with zero-width regex

2008-07-08 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: There appear to be 2 opinions on this issue: 1. It's a bug, a corner case that got missed. 2. It's always been like this, so it's probably a design decision, although no-one can't point to where or when the decisi

[issue3511] Incorrect charset range handling with ignore case flag?

2008-08-06 Thread Matthew Barnett
New submission from Matthew Barnett <[EMAIL PROTECTED]>: While working on the regex code in sre_compile.py I came across the following code in the handling of charset ranges in _optimize_charset: for i in range(fixup(av[0]), fixup(av[1])+1): charmap[i] = 1 The function

[issue3654] Duplicated test name in regex test script

2008-08-23 Thread Matthew Barnett
New submission from Matthew Barnett <[EMAIL PROTECTED]>: The regex test script test_re.py has 2 tests called 'test_ignore_case'. -- components: Tests messages: 71813 nosy: mrabarnett severity: normal status: open title: Duplicated test name in regex test script vers

[issue516762] have a way to search backwards for re

2008-09-08 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Does this request still stand? I'm working on the re module at the moment. -- nosy: +mrabarnett ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.py

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-09 Thread Matthew Barnett
New submission from Matthew Barnett <[EMAIL PROTECTED]>: This is a major reworking of the re module in Python 2.5.2. Added atomic groups. Added possessive quantifiers. Lookbehinds can now be variable length. Typically x2 faster. More changes to follow. -- components: R

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-09 Thread Matthew Barnett
Changes by Matthew Barnett <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file11447/regex_2.5.2.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-09 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Corrected the diff file. I worked from Python 2.5.2 because that's what I'm currently using. I'll work from the trunk in future. Added file: http://bugs.python.org/file11451/regex_2.5.2.diff _

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-10 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: This is different work from a different author than #2636. I've submitted what I've done so far in case my computer gets hit by a bus. :-) I still have more work to do on it, so I'm not concerned that it might not ge

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-13 Thread Matthew Barnett
Changes by Matthew Barnett <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file11451/regex_2.5.2.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-13 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Corrected the diff file, again. :-( The atomic groups and possessive quantifiers are as described at http://www.regular-expressions.info. Added file: http://bugs.python.org/file11484/regex_2.5.

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-15 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: I know what you mean about the dependencies! My current problem is that now I'm working with the current trunk, which means using Visual C++ Express 2008 instead of 2005. When debugging it's behaving like the debug inf

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-15 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Used Visual C++ Express 2005 and the PC\VS8.0 directory. Same problem. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-15 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: _sre.c is over 6000, but it does contain macros. I didn't have this problem when based on Python 2.5.2 in Express 2005. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-20 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: This patch is now based on Python 2.6rc2. I've reduced the number of macros and used functions instead, provided that it didn't cost much in terms of speed. In many cases it should be faster than the current release, and at

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-20 Thread Matthew Barnett
Changes by Matthew Barnett <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file11530/regex_2.6rc2.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-20 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Bugfix. Added file: http://bugs.python.org/file11532/regex_2.6rc2.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue3262] re.split doesn't split with zero-width regex

2008-09-21 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: I wonder whether it could be put into Python 3 where certain breaks in backwards compatibility are to be expected. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-21 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Fixed the matching of word boundaries when searching and matching in substrings. Added file: http://bugs.python.org/file11543/regex_2.6rc2+1.diff ___ Python tracker <[EMAIL PROTECTE

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-21 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: regex_2.6rc2+2.diff is a bugfix for capture groups in look-behinds. Added file: http://bugs.python.org/file11552/regex_2.6rc2+2.diff ___ Python tracker <[EMAIL PROTECTED]> <http://

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-21 Thread Matthew Barnett
Changes by Matthew Barnett <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file11552/regex_2.6rc2+2.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-21 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Needed to correct regex_2.6rc2+2.diff. Added file: http://bugs.python.org/file11553/regex_2.6rc2+2.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-21 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: regex_2.6rc2+3.diff adds reverse searching with the re.REVERSE/re.R and "(?r)" flag. This gives results such as: >>> re.findall("(\w+)", "one two three") ['one', 'two', &#x

[issue516762] have a way to search backwards for re

2008-09-21 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Implemented as part of #3825. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue516762> ___

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-22 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: regex_2.6rc2+4.diff fixes the ordering of the capture groups for reverse searching. Added file: http://bugs.python.org/file11558/regex_2.6rc2+4.diff ___ Python tracker <[EMAIL PROTECTE

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-22 Thread Matthew Barnett
Changes by Matthew Barnett <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file11558/regex_2.6rc2+4.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-22 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Correction of regex_2.6rc2+4.diff. (Aargh!) Added file: http://bugs.python.org/file11559/regex_2.6rc2+4.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue433031] SRE: x++ isn't supported

2008-09-22 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Implemented in #2636 and #3825. -- nosy: +mrabarnett ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-23 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Patch regex_2.6rc2+5.diff adds scoped and 'negative' flags for (?i), (?m) and (?s). The other flags remain unchanged in behaviour. See #433024, #433027 and #433028. Added file: http://bugs.python.org/file11585/reg

[issue433024] SRE: (?flag) isn't properly scoped

2008-09-23 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Implemenetd in #3825. -- nosy: +mrabarnett ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.

[issue433027] SRE: (?-flag) is not supported.

2008-09-23 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Implemented in #3825. -- nosy: +mrabarnett ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.

[issue433028] SRE: (?flag:...) is not supported

2008-09-23 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Implemented in #3825. -- nosy: +mrabarnett ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-23 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Patch regex_2.6rc2+6.diff is a bugfix. Added file: http://bugs.python.org/file11587/regex_2.6rc2+6.diff ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-09-24 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Comparing item 2 and item 3, I think that item 3 is the Pythonic choice and item 2 is a bad idea. Item 4: back-references in the pattern are like \1 and (?P=name), not \g<1> or \g, and in the replacement string are like \g<1&

[issue1647489] zero-length match confuses re.finditer()

2008-09-24 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: This also affects re.findall(). -- nosy: +mrabarnett ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.o

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-09-24 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Regarding item 22: there's also #1647489 ("zero-length match confuses re.finditer()"). This had me stumped for a while, but I might have a solution. I'll see whether it'll fix item 22 too. I wasn't plan

[issue1647489] zero-length match confuses re.finditer()

2008-09-24 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: What should: [m.groups() for m in re.finditer(r'(^z*)|(^q*)|(\w+)', 'abc')] return? Should the second group also yield a zero-width match before the third group is tried? I think it

[issue1647489] zero-length match confuses re.finditer()

2008-09-24 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: What about r'(^z*)|(q*)|(\w+)'? I could imagine that the first group could match only at the start of the string, but if the second group doesn't have that restriction then it could match the second time, and only after

[issue1647489] zero-length match confuses re.finditer()

2008-09-24 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: FYI, I posted msg73737 after finding that the fix for the original case was really very simple, but then thought about whether it would behave as expected when there were more zero-width matches, hence the later

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-09-25 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Just out of interest, is there any plan to include #1160 while we're at it? ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-09-25 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: For reference, these are all the regex-related issues that I've found (including this one!): id : activity : title #2636: 25/09/08 : Regexp 2.7 (modifications to current re 2.2.2) #1160: 25/09/08 : Medium size reg

[issue1647489] zero-length match confuses re.finditer()

2008-09-25 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: I have to report that the fix appears to be successful: >>> print [m.groups() for m in re.finditer(r'(^z*)|(\w+)', 'abc')] [('', None), (None, 'abc')] >>> print re.findall(r&

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-09-25 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: #814253 is part of the fix for variable-width lookbehind. BTW, I've just tried a second time to register with Launchpad, but still no reply. :-( ___ Python tracker <[EMAIL PRO

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-09-25 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Tried [EMAIL PROTECTED] twice, no reply. Succeeded with [EMAIL PROTECTED] ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-09-25 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: I've been completely unable to get Bazaar to work with Launchpad: authentication errors and bzrlib.errors.TooManyConcurrentRequests. ___ Python tracker <[EMAIL PROTECTED]> <ht

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-09-26 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: I have it working finally! ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2636> ___ _

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-09-26 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: I did a search on the permissions problem: https://answers.launchpad.net/bzr/+question/34332. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-09-27 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: I haven't yet found out how to turn on compression when getting the branches, so I've only looked at lp:~pythonregexp2.7/python/issue2636+01+09-02+17+18+19+20+21+24+26. I did see that the SRE_FLAG_REVERSE flag was miss

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-09-29 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: issue2636-01+09-02+17_backport.diff is the backport fix. Still unable to compress the download, so that's >200MB each time! Added file: http://bugs.python.org/file11657/issue2636-01+09-02+17

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-09-30 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: The explanation of the zero-width bug is incorrect. What happens is this: The functions for finditer(), findall(), etc, perform searches and want the next one to continue from where the previous match ended. However, if the mat

[issue694374] Recursive regular expressions

2008-09-30 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: I'll have a look at this. No promises, though. -- nosy: +mrabarnett ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.py

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-10-02 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: I've found an interesting difference between Python and Perl regular expressions: In Python: \Z matches at the end of the string In Perl: \Z matches at the end of the string or before a newline at the

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-10-02 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Perl v5.10 offers the ability to have duplicate capture group numbers in branches. For example: (?|(a)|(b)) would number both of the capture groups as group 1. Something to include? ___

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-10-02 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: I've extended the group referencing. It now has: Forward group references (\2two|(one))+ \g-type group references (n is name or number) \g (Python re replacement string) \g{n} (Perl) \g'n' (Per

[issue694374] Recursive regular expressions

2008-10-13 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Perl (?number) for calling numbered groups and (?&name) for named groups (Perl also supports (?P>name)). (?R) is equivalent to (?0). It's interesting that the documentation for both Perl and PCRE say that they support

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-02-03 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100204.zip is a new version of the regex module. I've added splititer and added a build for Python 3.1. -- versions: +Python 3.1 Added file: http://bugs.python.org/file16122/issue2636-2010020

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-02-09 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100210.zip is a new version of the regex module. The reported bugs appear to be fixed now. -- Added file: http://bugs.python.org/file16195/issue2636-20100210.zip ___ Python tracker <h

[issue1160] Medium size regexp crashes python

2010-02-09 Thread Matthew Barnett
Matthew Barnett added the comment: As stated in msg73781, this is being addressed in issue #2636. My regex module handles the test case without complaint: >>> import regex >>> r = regex.compile('|'.join('%d'%x for x in range(7000))) >>> r.match(&

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-02-10 Thread Matthew Barnett
Matthew Barnett added the comment: I've been aware for some time that exception messages in Python 2 can't be Unicode, but I wasn't sure which encoding to use, so I've decided to use that of sys.stdout. It appears to work OK in IDLE and at the Python prompt. issue2636

[issue2537] re.compile(r'((x|y+)*)*') should fail

2010-02-11 Thread Matthew Barnett
Matthew Barnett added the comment: The re module is addressed in issue #2636. BTW, my regex module behaves like Ruby: >>> regex.sub(r"((x|y)*)*", "(\\1, \\2)", "xyyzy", count=1) '(, y)zy' >>> regex.sub(r"((x|y+)*)*", &quo

[issue2537] re.compile(r'((x|y+)*)*') should fail

2010-02-11 Thread Matthew Barnett
Matthew Barnett added the comment: The issue started about updating the re module and adding features that other languages already possess in their regex implementations (the last time any significant work was done on it was in 2003). The hope is that the new regex implementation will

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-02-17 Thread Matthew Barnett
Matthew Barnett added the comment: The main text at http://pypi.python.org/pypi/regex appears to have lost its backslashes, for example: The Unicode escapes u and U are supported. instead of: The Unicode escapes \u and \U are supported

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-02-17 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100218.zip is a new version of the regex module. I've added '.' to the permitted characters when parsing the name of a property. The name itself is no longer reported in the error message. I've also corrected the positi

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-02-18 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100219.zip is a new version of the regex module. The regex module should give the same results as the re module for backwards compatibility. The ignorecase bug is now fixed. This new version releases the GIL when matching on str and bytes (str

[issue7951] Should str.format allow negative indexes when used for __getitem__ access?

2010-02-18 Thread Matthew Barnett
Matthew Barnett added the comment: On a related note, this doesn't work either: >>> "{-1}".format("x", "y", "z") Traceback (most recent call last): File "", line 1, in "{-1}".format("x", "y"

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-02-22 Thread Matthew Barnett
Matthew Barnett added the comment: I don't know what happened there. I didn't notice that the zip file was way too small. Here's a replacement (still called issue2636-20100222.zip). Unicode script properties are already included, at least those whose definitions at htt

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-02-22 Thread Matthew Barnett
Matthew Barnett added the comment: OK, you've convinced me, \X is supported. :-) issue2636-20100223.zip is a new version of the regex module. -- Added file: http://bugs.python.org/file16331/issue2636-20100223.zip ___ Python tracker

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-02-24 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100224.zip is a new version of the regex module. It includes support for matching based on Unicode scripts as well as on Unicode blocks and properties. -- Added file: http://bugs.python.org/file16362/issue2636-20100224.zip

[issue1528154] New sequences for Unicode groups and block ranges needed

2010-02-25 Thread Matthew Barnett
Matthew Barnett added the comment: \p{name} is supported for Unicode properties, scripts and blocks in my regex module (see issue #2636). It also supports the POSIX set syntax, although I'm not sure that we really need to have 2 ways of doing it, eg \p{Alpha} and [[:

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-02-25 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100226.zip is a new version of the regex module. It now supports the branch reset (?|...|...), enabling the different branches of an alternation to reuse group numbers. -- Added file: http://bugs.python.org/file16375/issue2636-20100226

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-03-03 Thread Matthew Barnett
Matthew Barnett added the comment: \X shouldn't be allowed in a character class because it's equivalent to \P{M}\p{M}*. It's a bug, now fixed in issue2636-20100304.zip. I'm not convinced about the set intersection and difference stuff. Isn't that overdoing it a litt

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-03-22 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100323.zip is a new version of the regex module. It now includes a test script. Most of the tests come from the existing test scripts. -- Added file: http://bugs.python.org/file16626/issue2636-20100323.zip

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-03-31 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100331.zip is a new version of the regex module. It includes speed-ups and a minor bugfix. -- Added file: http://bugs.python.org/file16709/issue2636-20100331.zip ___ Python tracker <h

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-04-12 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100413.zip is a new version of the regex module. It includes additional speed-ups. -- Added file: http://bugs.python.org/file16905/issue2636-20100413.zip ___ Python tracker <http://bugs.python.

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-04-13 Thread Matthew Barnett
Matthew Barnett added the comment: Yes, it passed all the tests, although I've since found a minor bug that isn't covered/caught by them, so I'll need to add a few more tests. Anyway, do: regex.match(ur"\p{Ll}", u"a") regex.match(ur'(?u)\w'

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-04-13 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100414.zip is a new version of the regex module. I think I might have identified the cause of the problem, although I still haven't been able to reproduce it, so I can't be certain. --

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-04-13 Thread Matthew Barnett
Matthew Barnett added the comment: Oops, forgot the file! :-) -- Added file: http://bugs.python.org/file16916/issue2636-20100414.zip ___ Python tracker <http://bugs.python.org/issue2

[issue8465] Backreferences vs. escapes: a silent failure solved

2010-04-20 Thread Matthew Barnett
Matthew Barnett added the comment: Octal escapes are at most 3 octal digits, so the normal way to handle "\41" + "1" is "\0411". Some languages support variable-length hex escapes of the form "\x{1B}", so we could add that and also "\o{41}"

[issue3262] re.split doesn't split with zero-width regex

2010-04-26 Thread Matthew Barnett
Matthew Barnett added the comment: You could try the regex module mentioned in issue 2636. -- ___ Python tracker <http://bugs.python.org/issue3262> ___ ___ Pytho

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2008-10-17 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Further to msg74203, I can see no reason why we can't allow duplicate capture group names if the groups are on different branches are are thus mutually exclusive. For example: (?Pa)|(?Pb) Apart from this I think that dupl

[issue4328] "à" in u"foo" raises a misleading error

2008-11-15 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: The left operand is a bytestring and the right operand is a unicode string, so it makes sense that it raises an exception, although it would be clearer if it said "'in ' requires unicode string as left operand".

[issue4430] time.strptime does not allow same format directive twice

2008-11-25 Thread Matthew Barnett
Matthew Barnett <[EMAIL PROTECTED]> added the comment: Subversion is formatting a string from a time (strftime), so a repeated placeholder is OK. You're trying to _parse_ a time from a string (strptime). If you're telling it that 2 different parts of the string are the date, w

[issue4971] Incorrect title case

2009-01-17 Thread Matthew Barnett
New submission from Matthew Barnett : I've found that the following 4 Unicode characters/codepoints don't behave as I'd expect: Dž (U+01C5), Lj (U+01C8), Nj (U+01CB), Dz (U+01F2). For example, u'\u01C5'.istitle() returns True and unicodedata.category(u'\u01C5'

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-02-03 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-features.diff is based on Python 2.6. It includes: Named Unicode characters eg \N{LATIN CAPITAL LETTER A} Unicode character properties eg \p{Lu} (uppercase letter) and \P{Lu} (not uppercase letter) Other character properties not restricted to

[issue1519638] Unmatched Group issue - workaround

2009-02-05 Thread Matthew Barnett
Matthew Barnett added the comment: This has been addressed in issue #2636. -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue1519638> ___ ___

[issue1693050] \w not helpful for non-Roman scripts

2009-02-05 Thread Matthew Barnett
Matthew Barnett added the comment: In issue #2636 I'm using the following: Alpha is Ll, Lo, Lt, Lu. Digit is Nd. Word is Ll, Lo, Lt, Lu, Mc, Me, Mn, Nd, Nl, No, Pc. These are what are specified at http://www.regular-expressions.info/posixbrackets.html -- nosy: +mraba

<    1   2   3   4   5   6   >