Matthew Barnett added the comment:
issue2636-20101230.zip is a new version of the regex module.
I've delayed the building of the tables for fast searching until their first
use, which, hopefully, will mean that fewer will be actually built.
--
Added file: http://bugs.pytho
Matthew Barnett added the comment:
The project is now at:
https://code.google.com/p/mrab-regex/
Unfortunately it doesn't have the revision history. I don't know why not.
--
___
Python tracker
<http://bugs.python.
Matthew Barnett added the comment:
msg124904: It would, of course, be slower on first use, but I'm surprised that
it's (that much) slower afterwards.
msg124905, msg124906: I have those matching now.
msg124931: The sources are in TortoiseBzr, but I couldn't upload, so
Matthew Barnett added the comment:
Even after much uninstalling and reinstalling (and reboots) I never got
TortoiseSVN to work properly, so I switched to TortoiseHg. The sources are now
at:
https://code.google.com/p/mrab-regex-hg
Matthew Barnett added the comment:
Why not? :-)
--
___
Python tracker
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe:
Matthew Barnett added the comment:
Just to check, does this still work with your changes of msg124959?
regex.search(r'\d{4}(\s*\w)?\W*((?!\d)\w){2}', "XX")
For me it fails to match!
--
___
Python tracker
<http://bu
Matthew Barnett added the comment:
I've just done a bug fix. The issue is at:
https://code.google.com/p/mrab-regex-hg/
BTW, Jacques, I trust that your regression tests don't test how long a regex
takes to fail to match, because a bug could cause such a non-match to occur to
Matthew Barnett added the comment:
That line crept in somehow.
As it's been there since the 2010-12-24 release and you're the first one to
have a problem with it (and you've already fixed it), it looks like a new
upload isn't urgently needed (I don't have any
Matthew Barnett added the comment:
I've reduced the size of some internal tables.
--
___
Python tracker
<http://bugs.python.org/issue2636>
___
___
Pytho
Matthew Barnett added the comment:
Argument 4 of re.subn(...) is 'count', the maximum number of replacements to
perform, but you're passing in the MULTILINE flag, which happens to have the
integer value 8, hence you're limiting the maximum number
New submission from Matthew Barnett <[EMAIL PROTECTED]>:
re.split doesn't split a string when the regex matches a zero characters.
For example:
re.split(r'\b', 'a b') returns ['a b'] instead of ['', 'a',
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
The attached patch appears to work.
--
keywords: +patch
Added file: http://bugs.python.org/file10794/split_zero_width.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
I've found that this issue has been discussed before: #988761.
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
New patch version after studying #988761 and doing more testing.
Added file: http://bugs.python.org/file10797/split_zero_width_2.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://
Changes by Matthew Barnett <[EMAIL PROTECTED]>:
Removed file: http://bugs.python.org/file10794/split_zero_width.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
There appear to be 2 opinions on this issue:
1. It's a bug, a corner case that got missed.
2. It's always been like this, so it's probably a design decision,
although no-one can't point to where or when the decisi
New submission from Matthew Barnett <[EMAIL PROTECTED]>:
While working on the regex code in sre_compile.py I came across the
following code in the handling of charset ranges in _optimize_charset:
for i in range(fixup(av[0]), fixup(av[1])+1):
charmap[i] = 1
The function
New submission from Matthew Barnett <[EMAIL PROTECTED]>:
The regex test script test_re.py has 2 tests called 'test_ignore_case'.
--
components: Tests
messages: 71813
nosy: mrabarnett
severity: normal
status: open
title: Duplicated test name in regex test script
vers
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Does this request still stand? I'm working on the re module at the moment.
--
nosy: +mrabarnett
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.py
New submission from Matthew Barnett <[EMAIL PROTECTED]>:
This is a major reworking of the re module in Python 2.5.2.
Added atomic groups.
Added possessive quantifiers.
Lookbehinds can now be variable length.
Typically x2 faster.
More changes to follow.
--
components: R
Changes by Matthew Barnett <[EMAIL PROTECTED]>:
Removed file: http://bugs.python.org/file11447/regex_2.5.2.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Corrected the diff file. I worked from Python 2.5.2 because that's what
I'm currently using. I'll work from the trunk in future.
Added file: http://bugs.python.org/file11451/regex_2.5.2.diff
_
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
This is different work from a different author than #2636. I've
submitted what I've done so far in case my computer gets hit by a bus.
:-) I still have more work to do on it, so I'm not concerned that it
might not ge
Changes by Matthew Barnett <[EMAIL PROTECTED]>:
Removed file: http://bugs.python.org/file11451/regex_2.5.2.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Corrected the diff file, again. :-(
The atomic groups and possessive quantifiers are as described at
http://www.regular-expressions.info.
Added file: http://bugs.python.org/file11484/regex_2.5.
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
I know what you mean about the dependencies!
My current problem is that now I'm working with the current trunk, which
means using Visual C++ Express 2008 instead of 2005. When debugging it's
behaving like the debug inf
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Used Visual C++ Express 2005 and the PC\VS8.0 directory. Same problem.
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
_sre.c is over 6000, but it does contain macros. I didn't have this
problem when based on Python 2.5.2 in Express 2005.
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
This patch is now based on Python 2.6rc2.
I've reduced the number of macros and used functions instead, provided
that it didn't cost much in terms of speed. In many cases it should be
faster than the current release, and at
Changes by Matthew Barnett <[EMAIL PROTECTED]>:
Removed file: http://bugs.python.org/file11530/regex_2.6rc2.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Bugfix.
Added file: http://bugs.python.org/file11532/regex_2.6rc2.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
I wonder whether it could be put into Python 3 where certain breaks in
backwards compatibility are to be expected.
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Fixed the matching of word boundaries when searching and matching in
substrings.
Added file: http://bugs.python.org/file11543/regex_2.6rc2+1.diff
___
Python tracker <[EMAIL PROTECTE
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
regex_2.6rc2+2.diff is a bugfix for capture groups in look-behinds.
Added file: http://bugs.python.org/file11552/regex_2.6rc2+2.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://
Changes by Matthew Barnett <[EMAIL PROTECTED]>:
Removed file: http://bugs.python.org/file11552/regex_2.6rc2+2.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Needed to correct regex_2.6rc2+2.diff.
Added file: http://bugs.python.org/file11553/regex_2.6rc2+2.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
regex_2.6rc2+3.diff adds reverse searching with the re.REVERSE/re.R and
"(?r)" flag.
This gives results such as:
>>> re.findall("(\w+)", "one two three")
['one', 'two',
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Implemented as part of #3825.
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue516762>
___
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
regex_2.6rc2+4.diff fixes the ordering of the capture groups for reverse
searching.
Added file: http://bugs.python.org/file11558/regex_2.6rc2+4.diff
___
Python tracker <[EMAIL PROTECTE
Changes by Matthew Barnett <[EMAIL PROTECTED]>:
Removed file: http://bugs.python.org/file11558/regex_2.6rc2+4.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Correction of regex_2.6rc2+4.diff. (Aargh!)
Added file: http://bugs.python.org/file11559/regex_2.6rc2+4.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Implemented in #2636 and #3825.
--
nosy: +mrabarnett
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Patch regex_2.6rc2+5.diff adds scoped and 'negative' flags for (?i),
(?m) and (?s). The other flags remain unchanged in behaviour.
See #433024, #433027 and #433028.
Added file: http://bugs.python.org/file11585/reg
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Implemenetd in #3825.
--
nosy: +mrabarnett
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Implemented in #3825.
--
nosy: +mrabarnett
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Implemented in #3825.
--
nosy: +mrabarnett
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Patch regex_2.6rc2+6.diff is a bugfix.
Added file: http://bugs.python.org/file11587/regex_2.6rc2+6.diff
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Comparing item 2 and item 3, I think that item 3 is the Pythonic choice
and item 2 is a bad idea.
Item 4: back-references in the pattern are like \1 and (?P=name), not
\g<1> or \g, and in the replacement string are like \g<1&
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
This also affects re.findall().
--
nosy: +mrabarnett
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.o
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Regarding item 22: there's also #1647489 ("zero-length match confuses
re.finditer()").
This had me stumped for a while, but I might have a solution. I'll see
whether it'll fix item 22 too.
I wasn't plan
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
What should:
[m.groups() for m in re.finditer(r'(^z*)|(^q*)|(\w+)', 'abc')]
return? Should the second group also yield a zero-width match before the
third group is tried? I think it
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
What about r'(^z*)|(q*)|(\w+)'? I could imagine that the first group
could match only at the start of the string, but if the second group
doesn't have that restriction then it could match the second time, and
only after
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
FYI, I posted msg73737 after finding that the fix for the original case
was really very simple, but then thought about whether it would behave
as expected when there were more zero-width matches, hence the later
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Just out of interest, is there any plan to include #1160 while we're at it?
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
For reference, these are all the regex-related issues that I've found
(including this one!):
id : activity : title
#2636: 25/09/08 : Regexp 2.7 (modifications to current re 2.2.2)
#1160: 25/09/08 : Medium size reg
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
I have to report that the fix appears to be successful:
>>> print [m.groups() for m in re.finditer(r'(^z*)|(\w+)', 'abc')]
[('', None), (None, 'abc')]
>>> print re.findall(r&
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
#814253 is part of the fix for variable-width lookbehind.
BTW, I've just tried a second time to register with Launchpad, but still
no reply. :-(
___
Python tracker <[EMAIL PRO
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Tried [EMAIL PROTECTED] twice, no reply. Succeeded with
[EMAIL PROTECTED]
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
I've been completely unable to get Bazaar to work with Launchpad:
authentication errors and bzrlib.errors.TooManyConcurrentRequests.
___
Python tracker <[EMAIL PROTECTED]>
<ht
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
I have it working finally!
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue2636>
___
_
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
I did a search on the permissions problem:
https://answers.launchpad.net/bzr/+question/34332.
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
I haven't yet found out how to turn on compression when getting the
branches, so I've only looked at
lp:~pythonregexp2.7/python/issue2636+01+09-02+17+18+19+20+21+24+26. I
did see that the SRE_FLAG_REVERSE flag was miss
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
issue2636-01+09-02+17_backport.diff is the backport fix.
Still unable to compress the download, so that's >200MB each time!
Added file: http://bugs.python.org/file11657/issue2636-01+09-02+17
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
The explanation of the zero-width bug is incorrect. What happens is this:
The functions for finditer(), findall(), etc, perform searches and want
the next one to continue from where the previous match ended. However,
if the mat
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
I'll have a look at this. No promises, though.
--
nosy: +mrabarnett
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.py
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
I've found an interesting difference between Python and Perl regular
expressions:
In Python:
\Z matches at the end of the string
In Perl:
\Z matches at the end of the string or before a newline at the
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Perl v5.10 offers the ability to have duplicate capture group numbers in
branches. For example:
(?|(a)|(b))
would number both of the capture groups as group 1.
Something to include?
___
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
I've extended the group referencing. It now has:
Forward group references
(\2two|(one))+
\g-type group references
(n is name or number)
\g (Python re replacement string)
\g{n} (Perl)
\g'n' (Per
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Perl (?number) for calling numbered groups and (?&name) for named groups
(Perl also supports (?P>name)). (?R) is equivalent to (?0).
It's interesting that the documentation for both Perl and PCRE say that
they support
Matthew Barnett added the comment:
issue2636-20100204.zip is a new version of the regex module.
I've added splititer and added a build for Python 3.1.
--
versions: +Python 3.1
Added file: http://bugs.python.org/file16122/issue2636-2010020
Matthew Barnett added the comment:
issue2636-20100210.zip is a new version of the regex module.
The reported bugs appear to be fixed now.
--
Added file: http://bugs.python.org/file16195/issue2636-20100210.zip
___
Python tracker
<h
Matthew Barnett added the comment:
As stated in msg73781, this is being addressed in issue #2636.
My regex module handles the test case without complaint:
>>> import regex
>>> r = regex.compile('|'.join('%d'%x for x in range(7000)))
>>> r.match(&
Matthew Barnett added the comment:
I've been aware for some time that exception messages in Python 2 can't be
Unicode, but I wasn't sure which encoding to use, so I've decided to use that
of sys.stdout.
It appears to work OK in IDLE and at the Python prompt.
issue2636
Matthew Barnett added the comment:
The re module is addressed in issue #2636.
BTW, my regex module behaves like Ruby:
>>> regex.sub(r"((x|y)*)*", "(\\1, \\2)", "xyyzy", count=1)
'(, y)zy'
>>> regex.sub(r"((x|y+)*)*", &quo
Matthew Barnett added the comment:
The issue started about updating the re module and adding features that other
languages already possess in their regex implementations (the last time any
significant work was done on it was in 2003).
The hope is that the new regex implementation will
Matthew Barnett added the comment:
The main text at http://pypi.python.org/pypi/regex appears to have lost its
backslashes, for example:
The Unicode escapes u and U are supported.
instead of:
The Unicode escapes \u and \U are supported
Matthew Barnett added the comment:
issue2636-20100218.zip is a new version of the regex module.
I've added '.' to the permitted characters when parsing the name of a property.
The name itself is no longer reported in the error message.
I've also corrected the positi
Matthew Barnett added the comment:
issue2636-20100219.zip is a new version of the regex module.
The regex module should give the same results as the re module for backwards
compatibility.
The ignorecase bug is now fixed.
This new version releases the GIL when matching on str and bytes (str
Matthew Barnett added the comment:
On a related note, this doesn't work either:
>>> "{-1}".format("x", "y", "z")
Traceback (most recent call last):
File "", line 1, in
"{-1}".format("x", "y"
Matthew Barnett added the comment:
I don't know what happened there. I didn't notice that the zip file was way too
small. Here's a replacement (still called issue2636-20100222.zip).
Unicode script properties are already included, at least those whose
definitions at htt
Matthew Barnett added the comment:
OK, you've convinced me, \X is supported. :-)
issue2636-20100223.zip is a new version of the regex module.
--
Added file: http://bugs.python.org/file16331/issue2636-20100223.zip
___
Python tracker
Matthew Barnett added the comment:
issue2636-20100224.zip is a new version of the regex module.
It includes support for matching based on Unicode scripts as well as on Unicode
blocks and properties.
--
Added file: http://bugs.python.org/file16362/issue2636-20100224.zip
Matthew Barnett added the comment:
\p{name} is supported for Unicode properties, scripts and blocks in my regex
module (see issue #2636).
It also supports the POSIX set syntax, although I'm not sure that we really
need to have 2 ways of doing it, eg \p{Alpha} and [[:
Matthew Barnett added the comment:
issue2636-20100226.zip is a new version of the regex module.
It now supports the branch reset (?|...|...), enabling the different branches
of an alternation to reuse group numbers.
--
Added file: http://bugs.python.org/file16375/issue2636-20100226
Matthew Barnett added the comment:
\X shouldn't be allowed in a character class because it's equivalent to
\P{M}\p{M}*. It's a bug, now fixed in issue2636-20100304.zip.
I'm not convinced about the set intersection and difference stuff. Isn't that
overdoing it a litt
Matthew Barnett added the comment:
issue2636-20100323.zip is a new version of the regex module.
It now includes a test script. Most of the tests come from the existing test
scripts.
--
Added file: http://bugs.python.org/file16626/issue2636-20100323.zip
Matthew Barnett added the comment:
issue2636-20100331.zip is a new version of the regex module.
It includes speed-ups and a minor bugfix.
--
Added file: http://bugs.python.org/file16709/issue2636-20100331.zip
___
Python tracker
<h
Matthew Barnett added the comment:
issue2636-20100413.zip is a new version of the regex module.
It includes additional speed-ups.
--
Added file: http://bugs.python.org/file16905/issue2636-20100413.zip
___
Python tracker
<http://bugs.python.
Matthew Barnett added the comment:
Yes, it passed all the tests, although I've since found a minor bug that isn't
covered/caught by them, so I'll need to add a few more tests.
Anyway, do:
regex.match(ur"\p{Ll}", u"a")
regex.match(ur'(?u)\w'
Matthew Barnett added the comment:
issue2636-20100414.zip is a new version of the regex module.
I think I might have identified the cause of the problem, although I still
haven't been able to reproduce it, so I can't be certain.
--
Matthew Barnett added the comment:
Oops, forgot the file! :-)
--
Added file: http://bugs.python.org/file16916/issue2636-20100414.zip
___
Python tracker
<http://bugs.python.org/issue2
Matthew Barnett added the comment:
Octal escapes are at most 3 octal digits, so the normal way to handle "\41" +
"1" is "\0411".
Some languages support variable-length hex escapes of the form "\x{1B}", so we
could add that and also "\o{41}"
Matthew Barnett added the comment:
You could try the regex module mentioned in issue 2636.
--
___
Python tracker
<http://bugs.python.org/issue3262>
___
___
Pytho
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Further to msg74203, I can see no reason why we can't allow duplicate
capture group names if the groups are on different branches are are thus
mutually exclusive. For example:
(?Pa)|(?Pb)
Apart from this I think that dupl
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
The left operand is a bytestring and the right operand is a unicode
string, so it makes sense that it raises an exception, although it would
be clearer if it said "'in ' requires unicode string as left
operand".
Matthew Barnett <[EMAIL PROTECTED]> added the comment:
Subversion is formatting a string from a time (strftime), so a repeated
placeholder is OK.
You're trying to _parse_ a time from a string (strptime). If you're
telling it that 2 different parts of the string are the date, w
New submission from Matthew Barnett :
I've found that the following 4 Unicode characters/codepoints don't
behave as I'd expect: Dž (U+01C5), Lj (U+01C8), Nj (U+01CB), Dz (U+01F2).
For example, u'\u01C5'.istitle() returns True and
unicodedata.category(u'\u01C5'
Matthew Barnett added the comment:
issue2636-features.diff is based on Python 2.6. It includes:
Named Unicode characters eg \N{LATIN CAPITAL LETTER A}
Unicode character properties eg \p{Lu} (uppercase letter) and \P{Lu}
(not uppercase letter)
Other character properties not restricted to
Matthew Barnett added the comment:
This has been addressed in issue #2636.
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue1519638>
___
___
Matthew Barnett added the comment:
In issue #2636 I'm using the following:
Alpha is Ll, Lo, Lt, Lu.
Digit is Nd.
Word is Ll, Lo, Lt, Lu, Mc, Me, Mn, Nd, Nl, No, Pc.
These are what are specified at
http://www.regular-expressions.info/posixbrackets.html
--
nosy: +mraba
101 - 200 of 541 matches
Mail list logo