Matthew Barnett added the comment:
The support for locales in the re module is limited to those with 1 byte per
character, and only for a few properties (those provided by the underlying C
library), so maybe it could do the following:
If the LOCALE flag is set, then read the current locale
Matthew Barnett added the comment:
When you lookup the pattern in the cache, include the current locale as part of
the key if the pattern is locale-sensitive (you can let it be None if the
pattern is not locale-sensitive).
--
___
Python tracker
Matthew Barnett added the comment:
@Serhiy: You're overlooking that the LOCALE flag could be inline, e.g.
r'(?L)\w+'.
Basically, if you've seen the pattern before, you know whether it has an inline
LOCALE flag; if you haven't seen the pattern before, you'll need
Matthew Barnett added the comment:
In the regex module, I borrowed the \g<...> escape from .sub's replacement
string to provide an alternative way to refer to a group in a pattern, and that
let me remove the limit.
--
___
Python tra
Matthew Barnett added the comment:
For reference, the regex module normally considers the line ending to be '\n',
but it has a WORD flag ('(?w)') that turns on the Unicode definition of a
'word' character as
Matthew Barnett added the comment:
After some thought, I've come to the conclusion that the GCD of two integers
should be negative only if both of those integers are negative. The basic
algorithm is that you find all of the prime factors of the integers and then
return the product o
Matthew Barnett added the comment:
As it appears that there isn't general agreement on how to calculate the GCD
when negative numbers are involved, I needed to look for another way of
thinking about it.
Splitting off the sign as another factor was what I came up with.
Pragmatism beats p
Matthew Barnett added the comment:
+1 for leaving it to the user to make it negative if so desired.
--
___
Python tracker
<http://bugs.python.org/issue22
Matthew Barnett added the comment:
There's an interesting bit of history here:
http://www.gossamer-threads.com/lists/python/dev/236584
--
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
I prefer to include the line and column numbers if it's a multi-line pattern,
not just if the line number is > 1.
BTW, it's shorter if you do this:
self.colno = pos - pattern.rfind(newline, 0, pos)
If there's no newline, .rf
Matthew Barnett added the comment:
It takes a long time due to excessive backtracking.
The regex implementation on PyPI finishes quickly because it contains some
extra logic to reduce the chances of that happening, but it could be tricky
trying to incorporate that into the existing re module
Matthew Barnett added the comment:
I don't know of any regex implementation that lets you do that.
--
type: behavior -> enhancement
___
Python tracker
<http://bugs.python.org
Matthew Barnett added the comment:
Yes.
If it's not a valid repeat, then it's treated as a literal.
Perl does the same.
By the way, "\1" isn't a group reference; it's the same as "\x01". You should
be either doubling the backslashes (&qu
Matthew Barnett added the comment:
issue2636-20090726.zip is a new implementation of the re engine. It
replaces re.py, sre.py, sre_constants.py, sre_parse.py and
sre_compile.py with a new re.py and replaces sre_constants.h, sre.h and
_sre.c with _re.h and _re.c.
The internal engine no longer
Matthew Barnett added the comment:
issue2636-20090727.zip contains regex.py, _regex.h, _regex.c and also
_regex.pyd (for Python 2.6 on Windows). For Windows machines just put
regex.py and _regex.pyd into Python's Lib\site-packages folder. I've
changed the name so that it won
Matthew Barnett added the comment:
issue2636-20090729.zip contains regex.py, _regex.h, _regex.c which will
work with Python 2.5 as well as Python 2.6, and also 2 builds of
_regex.pyd (for Python 2.5 and Python 2.6 on Windows).
This version supports accessing the capture groups by subscripting
Changes by Matthew Barnett :
Removed file: http://bugs.python.org/file14592/issue2636-20090729.zip
___
Python tracker
<http://bugs.python.org/issue2636>
___
___
Python-bug
Matthew Barnett added the comment:
Unfortunately I found a bug in regex.py, caused when I made it
compatible with Python 2.5. :-(
issue2636-20090729.zip is now corrected.
--
Added file: http://bugs.python.org/file14594/issue2636-20090729.zip
Matthew Barnett added the comment:
I'd like to suggest that it the output could/should be encoded in UTF-8.
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/i
Matthew Barnett added the comment:
I was thinking that if you're converting a Python 2.x script to Python
3.x using 2to3 then also encoding the new script in UTF-8 might be a
good idea.
--
___
Python tracker
<http://bugs.python.org/i
Matthew Barnett added the comment:
issue2636-20090804.zip is a new version of the regex module.
The memory leak has been fixed.
--
Added file: http://bugs.python.org/file14642/issue2636-20090804.zip
___
Python tracker
<http://bugs.python.
Matthew Barnett added the comment:
In a regular expression (...) will group and capture, whereas (?:...)
will only group and not capture.
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue6
Matthew Barnett added the comment:
issue2636-20090810.zip should fix the empty-string bug.
--
Added file: http://bugs.python.org/file14682/issue2636-20090810.zip
___
Python tracker
<http://bugs.python.org/issue2
Matthew Barnett added the comment:
issue2636-20090810#2.zip has some further improvements and bugfixes.
--
Added file: http://bugs.python.org/file14683/issue2636-20090810#2.zip
___
Python tracker
<http://bugs.python.org/issue2
Matthew Barnett added the comment:
issue2636-20090810#3.zip adds more Unicode character properties such as
"\p{Lowercase_Letter}", and also Unicode script ranges.
In addition, the 'findall' method now accepts an 'overlapped' argument
for finding o
Matthew Barnett added the comment:
issue2636-20090815.zip fixes the bugs found in msg91598 and msg91607.
The regex engine currently lacks some of the optimisations that the re
engine has, but I've concluded that even with them the extra work that
the engine needs to do to make it ea
Matthew Barnett added the comment:
"(?![a-z0-9])" is a negative lookahead, so "(?![a-z0-9])0" is saying
that the next character shouldn't be any of [a-z0-9], yet it should
match "0". Hence, no matches.
--
nosy: +mrabarnett
__
Matthew Barnett added the comment:
Instead of a new flag, a '*' could be put after the quantifier, eg:
(\d+)(?:\.(\d+)){3}*
MatchObject.group(1) would be a string and MatchObject.group(2) would be
a list of strings.
The group references could be \g<1>, \g<2:0>, \g&l
Matthew Barnett added the comment:
I'm still tinkering with my regex engine (issue #2636).
Some timings:
re.compile(r'(\s+.*)*x').search('a ' * 25)
20.23secs
regex.compile(r'(\s+.*)*x').search('a ' * 25)
0.10secs
--
Matthew Barnett added the comment:
Surely this is to be expected when working with bytestrings. You should
be working in Unicode and using UTF-8 only for input and output.
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.python.org/issue7
Matthew Barnett added the comment:
The problem with the shorthand form is that the generators use the
values that are bound to 'a' and 'p' when they are iterated, not when
they are created. You can test this by inserting:
a = "X"
just before the assert: y
Matthew Barnett added the comment:
issue2636-20100116.zip is a new version of the regex module.
I've given up on the breadth-wise matching - it was too difficult finding a
pattern structure that would work well for both depth-first and breadth-wise.
It probably still needs some tweak
Matthew Barnett added the comment:
"[9-A]" is equivalent to "[9:;<=>?...@a]", or should be.
It'll be fixed in issue #2636.
___
Python tracker
&l
Matthew Barnett added the comment:
issue2636-features-3.diff is based on the 2.x trunk.
Added comments.
Restricted line lengths to no more than 80 characters
Added common POSIX character classes like [[:alpha:]].
Added further checks to reduce unnecessary backtracking.
I've decided to r
Matthew Barnett added the comment:
issue2636-features-4.diff includes:
Bugfixes
msg74203: duplicate capture group numbers
msg74904: duplicate capture group names
Added file: http://bugs.python.org/file13185/issue2636-features-4.diff
___
Python tracker
Matthew Barnett added the comment:
The definition of a word in the new re module (actually targetted at
Python 2.7) is currently a sequence of L&, N&, M& and Pc.
I suppose ideally we want the definitions of a word and an identifier to
be basically the same, except that an iden
Matthew Barnett added the comment:
The usual trick is to append "_":
xhtmlNode('div',class_='sidebar')
Could you modify the function to remove the trailing "_"?
--
nosy: +mrabarnett
___
Python tr
Matthew Barnett added the comment:
The normal use of a keyword argument is to refer to a formal argument,
which is an identifier. Being able to wrap it up into a dict is a later
addition, and it's necessary to turn the identifier into a string
because it's not possible to use a bar
Matthew Barnett added the comment:
issue2636-features-5.diff includes:
Bugfixes
Added \G anchor (from Perl).
\G is the anchor at the start of a search, so re.search(r'\G(\w)') is
the same as re.match(r'(\w)').
re.findall normally performs a series of searches, eac
Matthew Barnett added the comment:
As part of issue #2636 group references now work in lookbehinds.
However, your example:
(?<=(...)\1)abc
will fail but:
(?<=\1(...))abc
will succeed.
Why? Well, in lookbehinds it searches backwards. In the first regex it
sees the group ref
Matthew Barnett added the comment:
issue2636-features-6.diff includes:
Bugfixes
Added group access via subscripting.
>>> m = re.search("(\D*)(?\d+)(\D*)", "abc123def")
>>> len(m)
4
>>> m[0]
'abc123def'
>>> m[1]
'abc&
Matthew Barnett added the comment:
At the moment binding occurs either right-to-left with "=", eg.
x = y
where "x" is the new name, or left-to-right, eg.
import x as y
where "y" is the new name.
If the order is to be right-to-left then using "a
Matthew Barnett added the comment:
Just for the record, I wasn't happy with "~=" either, and I have no
problem with just forgetting the whole idea.
--
___
Python tracker
<http://bugs.pytho
Matthew Barnett added the comment:
An additional feature that could be borrowed, though in slightly
modified form, from Perl is case-changing controls in replacement
strings. Roughly the idea is to add these forms to the replacement string:
\g<1> provides capture group 1
Matthew Barnett added the comment:
Ah, too Perlish! :-)
Another feature request that I've decided not to consider any further is
recursive regular expressions. There are other tools available for that
kind of thing, and I don't want the re module to go the way of Perl 6's
rul
Matthew Barnett added the comment:
There are 2 reasons:
1. I've been told that my current patches contain too many differences
from the current implementation, so basically I have to go back to the
start and introduce any changes a little at a time, without knowing
whether any parti
Matthew Barnett added the comment:
Patch issue2636-patch-1.diff contains a stripped down version of my
regex engine and the other changes that are necessary to make it work.
--
Added file: http://bugs.python.org/file13449/issue2636-patch-1.diff
Matthew Barnett added the comment:
FYI, I did tidy up the class and add a 'scaniter' method when I was
working on issue #2636; it might yet see the light of day if it gets the
go ahead!
--
nosy: +mrabarnett
___
Python trac
Matthew Barnett added the comment:
I implemented \p, \P and [:...:] for the simple categories (eg "Lu" and
"upper", but not "IsGreek") in the work I did for issue #2636.
--
nosy: +mrabarnett
___
Python tracker
<ht
Matthew Barnett added the comment:
One of the limitations is that it identifies what matched by using
capture groups, so if the expressions provided contain captures then it
gets confused! :-)
I handled that by 1) rejecting named captures and 2) changing unnamed
captures into non-captures
New submission from Matthew Barnett :
Patch idle-args.diff adds a dialog for entering command-line arguments
for a script from within IDLE itself.
--
components: IDLE
files: idle-args.diff
keywords: patch
messages: 85341
nosy: mrabarnett
severity: normal
status: open
title: Command-line
Matthew Barnett added the comment:
What do you mean "towards the end of the file"? What are the offsets of
the two lines? (I'm thinking it might be something to do with the \r\n
lying across a boundary, such as the 4GB boundary.)
--
no
Matthew Barnett added the comment:
Try issue2636-patch-2.diff.
--
Added file: http://bugs.python.org/file13707/issue2636-patch-2.diff
___
Python tracker
<http://bugs.python.org/issue2
Matthew Barnett added the comment:
How about a 'full' form and a 'key' form generated by the function:
def codec_key(name):
return name.lower().replace("-", "").replace("_", "")
The key form would be the key to an available code
Matthew Barnett added the comment:
Well, there are multiple UTF encodings, so no to "utf".
Are there multiple Latin encodings? Not in Python 2.6.2 under those names.
I'd probably insist on names that are strictish(?), ie correct, give o
Matthew Barnett added the comment:
I agree that it's a bug.
A workaround is r'([xy])(?:\s{0,65534}\1)+'. A repeat of 65535 is
treated as unlimited (but no warning is given).
--
nosy: +mrabarnett
___
Python tracker
<http://bugs.py
Matthew Barnett added the comment:
It includes Unicode character properties, but not the Unicode script
identification, because the Python Unicode database contains the former
but not the latter.
Although they could be added to the re module, IMHO their proper place
is in the Unicode database
Matthew Barnett added the comment:
I've just found that:
[1] + foo()
crashes, but:
[1].__add__(foo())
gives:
Traceback (most recent call last):
File "", line 1, in
[1].__add__(foo())
TypeError: can only concatenate list (not "foo")
Matthew Barnett added the comment:
Re: msg107776.
If it looks like an integer (ie, can be converted to an integer by 'int') then
it's positional, otherwise it's a key. An optimisation is to perform a quick
check upfront to see whether it st
Matthew Barnett added the comment:
That's a good question. :-)
Possibly just an optional sign followed by one or more digits.
Another possibility that occurs to me is for it to default to positional if it
looks like an integer, but allow quoting to force it to be a key:
>>>
Matthew Barnett added the comment:
Your original:
"{0[-1]}".format('fox')
is a worse gotcha than:
"{-1}".format('fox')
because you're much less likely to want to do the latter.
It's one of those things that it would be nice to
Matthew Barnett added the comment:
Issue #2636 resulted in the new regex module (also available on PyPI), so this
issue is addressed by that, but there's no patch for the re module.
--
___
Python tracker
<http://bugs.python.org/issu
Matthew Barnett added the comment:
issue2636-20100706.zip is a new version of the regex module.
I've added your examples to the unit tests. The module now passes.
Keep up the good work! :-)
--
Added file: http://bugs.python.org/file17877/issue2636-2010070
Matthew Barnett added the comment:
Should a regex compile if a group is referenced before it's defined?
Consider this:
(?:(?(2)(a)|(b))+
Other regex implementations permit forward references to groups.
BTW, I had a look at the re module, found it too difficult, and so started on
m
Matthew Barnett added the comment:
I started with trying to modify the existing re module, but I wanted to make
too many changes, so in the end I decided to make a clean break and start on a
new implementation which was compatible with the existing re module and which
could replace the
Matthew Barnett added the comment:
The file at:
http://pypi.python.org/pypi/regex
was downloaded 75 times, if that's any help. (Now reset to 0 because of the bug
fix.)
If it's included in 3.2 then there's the question of whether it should replace
the re module an
Matthew Barnett added the comment:
As a crude guide of the speed difference, here's Python 2.6:
re regex
bm_regex_compile.py 86.53secs 260.19secs
bm_regex_effbot.py 13.70secs8.94secs
bm_regex_v8.py 15.66secs9.09secs
Note
Matthew Barnett added the comment:
issue2636-20100709.zip is a new version of the regex module.
I've moved most of the regex module's Python code into a private module.
--
Added file: http://bugs.python.org/file17912/issue2636-20
Matthew Barnett added the comment:
Here's a patch for Python 3.1, if anyone's still interested after 5 years. :-)
--
keywords: +patch
nosy: +mrabarnett
Added file: http://bugs.python.org/file17930/from_template.diff
___
Python trac
Matthew Barnett added the comment:
issue2636-20100719.zip is a new version of the regex module.
Just a few more tweaks for speed.
--
Added file: http://bugs.python.org/file18054/issue2636-20100719.zip
___
Python tracker
<http://bugs.python.
Matthew Barnett added the comment:
This has already been reported in issue #3511.
--
___
Python tracker
<http://bugs.python.org/issue2636>
___
___
Python-bug
Matthew Barnett added the comment:
issue2636-20100725.zip is a new version of the regex module.
More tweaks for speed.
re regex
bm_regex_compile.py 87.05secs 278.00secs
bm_regex_effbot.py 14.00secs6.58secs
bm_regex_v8.py 16.11secs
Matthew Barnett added the comment:
No.
Wouldn't that break compatibility with 're'?
--
___
Python tracker
<http://bugs.python.org/issue2636>
___
___
Matthew Barnett added the comment:
That's a possibility.
I must admit that I don't entirely understand it enough to implement it (the OP
said "I don't believe that the algorithm for this is a
whole lot more complicated"), and I don't have a need for it myself,
Matthew Barnett added the comment:
(1) would break existing code. It would also mean that you wouldn't have access
to the start and end positions of the matches either.
(2) would also break existing code which is expecting a list. It's like the
change that happened when some met
Matthew Barnett added the comment:
Ah, I see what you mean. I still think you're wrong, though! :-)
The 'for' loop is doing is basically this:
it = re.finditer(r'(\w+):(\w+)', text)
try:
while True:
match_object = next(it)
Matthew Barnett added the comment:
Not a bug.
Python 2 had 'range', which returned a list, and 'xrange', which returned an
xrange object which was iterable:
>>> range(7)
[0, 1, 2, 3, 4, 5, 6]
>>> xrange(7)
xrange(7)
>>> list(xrange(7))
[0, 1, 2
Matthew Barnett added the comment:
So prune would default to None?
None means current behaviour (prune if sep is None else don't prune)
True means prune empty strings
False means don't prune empty string
--
nosy: +mrabarnett
___
Pyth
Matthew Barnett added the comment:
See issue 17087: "Improve the repr for regular expression match objects".
It was decided that it might be a bad idea to show the entire matched portion
of the string because it could be very long, so it's shown truncated if
necessary.
--
Matthew Barnett added the comment:
Probably "...", although we also have to consider that the matched portion
could in fact not be truncated but just happen to end with "...", although that
would be a rare occurrence.
--
___
P
Matthew Barnett added the comment:
I agree with Marco that it shouldn't be too verbose. I'd like to suggest that
it says that it's compatible (i.e. has the same API), but with additional
features.
--
___
Python tracker
<http
Matthew Barnett added the comment:
With the VERSION0 flag (the default behaviour), it should behave the same as
the re module, and that's not going to change.
--
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
Ah, well, if it hasn't changed after this many years, it never will. Expect one
or two changes to the text. :-)
--
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
I'm just wondering whether the problem is just due to the locale's encoding
being UTF-8. The locale support in re really only works with encodings that use
1 byte/character.
--
___
Python trac
Matthew Barnett added the comment:
The report says "== encodings: locale=UTF-8, FS=utf-8".
It says that "test_locale_caching" was skipped, but also that
"test_locale_flag" failed.
--
___
Python tracker
<
Matthew Barnett added the comment:
It would be a bug if it was supported but gave the wrong result.
It has never been supported (the re module predates PEP 357), so it's a new
feature.
--
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
There's a move to treat invalid escape sequences as an error (see issue 27364).
The previous behaviour was to treat them as literals.
The replacement template string contains \d, which is not a valid escape
sequence (it's valid for the pattern, b
Matthew Barnett added the comment:
FTR, I can't reproduce it. This is what I get on Windows XP (32-bit):
Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (In
tel)] on win32
Type "help", "copyright", "credits" or "lice
Matthew Barnett added the comment:
When I tried it, I got matches for both 'match' and 'fullmatch' (output
attached as 'output.txt') as expected.
--
Added file: http://bugs.python.org/file43984/output.txt
___
Python tr
Matthew Barnett added the comment:
Are you using the same version on the other systems?
I've had a quick look through the bug tracker and found some fixes for
fullmatch that postdate Python 3.4.0, so I'll suggest you just update to a more
recent version of
Matthew Barnett added the comment:
"*" and the other quantifiers ("+", "?" and "{...}") operate on the preceding
_item_, not the entire preceding expression. For example, "ab*" means "a"
followed by zero or more repeats of "
Matthew Barnett added the comment:
FYI, I did eventually add it to my regex implementation. It was quite
challenging!
--
___
Python tracker
<http://bugs.python.org/issue694
Matthew Barnett added the comment:
This question should've been posted to python-l...@python.org, not here.
Your functions are calling themselves, but not returning the result of the call
to their own callers.
--
___
Python tracker
Matthew Barnett added the comment:
I've attached fnmatch_implementation.py, which is a simple pure-Python
implementation of the fnmatch function.
It's not as susceptible to catastrophic backtracking as the current re-based
one. For example:
fnmatch('a' * 50, '*a*&
Matthew Barnett added the comment:
The way the re handles ranges is to convert the two endpoints to lowercase and
then check whether the lowercase form of the character in the text is in that
range.
For example, [A-Z] is converted to the range [\x41-\x5A], and the lowercase
form of
Matthew Barnett added the comment:
In issue #3511 the range was slightly unusual, so closing it seemed a
reasonable approach, but the range in this issue is less clearly a problem. My
preference would be to fix it, if possible.
--
___
Python
Matthew Barnett added the comment:
The regex behaves the same as re.
The reason it isn't supported is that \0 starts an octal escape sequence.
--
___
Python tracker
<http://bugs.python.org/is
Matthew Barnett added the comment:
I already use it in the regex module for named groups. I don't think it would
ever be a problem in practice because the names are invariably handled as
strings.
--
nosy: +mrabarnett
___
Python tracker
Matthew Barnett added the comment:
It's not a bug.
The documentation says """Split string by the occurrences of pattern. If
capturing parentheses are used in pattern, then the text of all groups in the
pattern are also returned as part of the resulting list."
Matthew Barnett added the comment:
Here are some simpler examples of the bug:
re.compile('.*yz', re.S).findall('xyz')
re.compile('.?yz', re.S).findall('xyz')
re.compile('.+yz', re.S).findall('xyz')
Unfortunately I find it difficult to
401 - 500 of 541 matches
Mail list logo