[issue22410] Locale dependent regexps on different locales

2014-09-14 Thread Matthew Barnett
Matthew Barnett added the comment: The support for locales in the re module is limited to those with 1 byte per character, and only for a few properties (those provided by the underlying C library), so maybe it could do the following: If the LOCALE flag is set, then read the current locale

[issue22410] Locale dependent regexps on different locales

2014-09-18 Thread Matthew Barnett
Matthew Barnett added the comment: When you lookup the pattern in the cache, include the current locale as part of the key if the pattern is locale-sensitive (you can let it be None if the pattern is not locale-sensitive). -- ___ Python tracker

[issue22410] Locale dependent regexps on different locales

2014-09-18 Thread Matthew Barnett
Matthew Barnett added the comment: @Serhiy: You're overlooking that the LOCALE flag could be inline, e.g. r'(?L)\w+'. Basically, if you've seen the pattern before, you know whether it has an inline LOCALE flag; if you haven't seen the pattern before, you'll need

[issue22437] re module: number of named groups is limited to 100 max

2014-09-18 Thread Matthew Barnett
Matthew Barnett added the comment: In the regex module, I borrowed the \g<...> escape from .sub's replacement string to provide an alternative way to refer to a group in a pattern, and that let me remove the limit. -- ___ Python tra

[issue22491] Support Unicode line boundaries in regular expression

2014-09-25 Thread Matthew Barnett
Matthew Barnett added the comment: For reference, the regex module normally considers the line ending to be '\n', but it has a WORD flag ('(?w)') that turns on the Unicode definition of a 'word' character as

[issue22477] GCD in Fractions

2014-09-25 Thread Matthew Barnett
Matthew Barnett added the comment: After some thought, I've come to the conclusion that the GCD of two integers should be negative only if both of those integers are negative. The basic algorithm is that you find all of the prime factors of the integers and then return the product o

[issue22477] GCD in Fractions

2014-09-25 Thread Matthew Barnett
Matthew Barnett added the comment: As it appears that there isn't general agreement on how to calculate the GCD when negative numbers are involved, I needed to look for another way of thinking about it. Splitting off the sign as another factor was what I came up with. Pragmatism beats p

[issue22477] GCD in Fractions

2014-09-25 Thread Matthew Barnett
Matthew Barnett added the comment: +1 for leaving it to the user to make it negative if so desired. -- ___ Python tracker <http://bugs.python.org/issue22

[issue18043] No mention of `match.regs` in `re` documentation

2014-10-04 Thread Matthew Barnett
Matthew Barnett added the comment: There's an interesting bit of history here: http://www.gossamer-threads.com/lists/python/dev/236584 -- ___ Python tracker <http://bugs.python.org/is

[issue22578] Add additional attributes to re.error

2014-10-08 Thread Matthew Barnett
Matthew Barnett added the comment: I prefer to include the line and column numbers if it's a multi-line pattern, not just if the line number is > 1. BTW, it's shorter if you do this: self.colno = pos - pattern.rfind(newline, 0, pos) If there's no newline, .rf

[issue19994] re.match does not return or takes long time

2013-12-17 Thread Matthew Barnett
Matthew Barnett added the comment: It takes a long time due to excessive backtracking. The regex implementation on PyPI finishes quickly because it contains some extra logic to reduce the chances of that happening, but it could be tricky trying to incorporate that into the existing re module

[issue20678] re does not allow back references in {} matching operator

2014-02-18 Thread Matthew Barnett
Matthew Barnett added the comment: I don't know of any regex implementation that lets you do that. -- type: behavior -> enhancement ___ Python tracker <http://bugs.python.org

[issue20678] re does not allow back references in {} matching operator

2014-02-18 Thread Matthew Barnett
Matthew Barnett added the comment: Yes. If it's not a valid repeat, then it's treated as a literal. Perl does the same. By the way, "\1" isn't a group reference; it's the same as "\x01". You should be either doubling the backslashes (&qu

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-07-26 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20090726.zip is a new implementation of the re engine. It replaces re.py, sre.py, sre_constants.py, sre_parse.py and sre_compile.py with a new re.py and replaces sre_constants.h, sre.h and _sre.c with _re.h and _re.c. The internal engine no longer

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-07-27 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20090727.zip contains regex.py, _regex.h, _regex.c and also _regex.pyd (for Python 2.6 on Windows). For Windows machines just put regex.py and _regex.pyd into Python's Lib\site-packages folder. I've changed the name so that it won

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-07-28 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20090729.zip contains regex.py, _regex.h, _regex.c which will work with Python 2.5 as well as Python 2.6, and also 2 builds of _regex.pyd (for Python 2.5 and Python 2.6 on Windows). This version supports accessing the capture groups by subscripting

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-07-29 Thread Matthew Barnett
Changes by Matthew Barnett : Removed file: http://bugs.python.org/file14592/issue2636-20090729.zip ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bug

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-07-29 Thread Matthew Barnett
Matthew Barnett added the comment: Unfortunately I found a bug in regex.py, caused when I made it compatible with Python 2.5. :-( issue2636-20090729.zip is now corrected. -- Added file: http://bugs.python.org/file14594/issue2636-20090729.zip

[issue5093] 2to3 with a pipe on non-ASCII script

2009-07-31 Thread Matthew Barnett
Matthew Barnett added the comment: I'd like to suggest that it the output could/should be encoded in UTF-8. -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/i

[issue5093] 2to3 with a pipe on non-ASCII script

2009-07-31 Thread Matthew Barnett
Matthew Barnett added the comment: I was thinking that if you're converting a Python 2.x script to Python 3.x using 2to3 then also encoding the new script in UTF-8 might be a good idea. -- ___ Python tracker <http://bugs.python.org/i

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-08-03 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20090804.zip is a new version of the regex module. The memory leak has been fixed. -- Added file: http://bugs.python.org/file14642/issue2636-20090804.zip ___ Python tracker <http://bugs.python.

[issue6663] re.findall does not always return a list of strings

2009-08-07 Thread Matthew Barnett
Matthew Barnett added the comment: In a regular expression (...) will group and capture, whereas (?:...) will only group and not capture. -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue6

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-08-10 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20090810.zip should fix the empty-string bug. -- Added file: http://bugs.python.org/file14682/issue2636-20090810.zip ___ Python tracker <http://bugs.python.org/issue2

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-08-10 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20090810#2.zip has some further improvements and bugfixes. -- Added file: http://bugs.python.org/file14683/issue2636-20090810#2.zip ___ Python tracker <http://bugs.python.org/issue2

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-08-10 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20090810#3.zip adds more Unicode character properties such as "\p{Lowercase_Letter}", and also Unicode script ranges. In addition, the 'findall' method now accepts an 'overlapped' argument for finding o

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-08-15 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20090815.zip fixes the bugs found in msg91598 and msg91607. The regex engine currently lacks some of the optimisations that the re engine has, but I've concluded that even with them the extra work that the engine needs to do to make it ea

[issue6797] When Beginning Expression with Lookahead Assertion I get no Matches

2009-08-29 Thread Matthew Barnett
Matthew Barnett added the comment: "(?![a-z0-9])" is a negative lookahead, so "(?![a-z0-9])0" is saying that the next character shouldn't be any of [a-z0-9], yet it should match "0". Hence, no matches. -- nosy: +mrabarnett __

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Matthew Barnett
Matthew Barnett added the comment: Instead of a new flag, a '*' could be put after the quantifier, eg: (\d+)(?:\.(\d+)){3}* MatchObject.group(1) would be a string and MatchObject.group(2) would be a list of strings. The group references could be \g<1>, \g<2:0>, \g&l

[issue1662581] the re module can perform poorly: O(2**n) versus O(n**2)

2009-10-21 Thread Matthew Barnett
Matthew Barnett added the comment: I'm still tinkering with my regex engine (issue #2636). Some timings: re.compile(r'(\s+.*)*x').search('a ' * 25) 20.23secs regex.compile(r'(\s+.*)*x').search('a ' * 25) 0.10secs --

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-11-28 Thread Matthew Barnett
Matthew Barnett added the comment: Surely this is to be expected when working with bytestrings. You should be working in Unicode and using UTF-8 only for input and output. -- nosy: +mrabarnett ___ Python tracker <http://bugs.python.org/issue7

[issue7439] Bug or expected behavior? I cannot tell.

2009-12-05 Thread Matthew Barnett
Matthew Barnett added the comment: The problem with the shorthand form is that the generators use the values that are bound to 'a' and 'p' when they are iterated, not when they are created. You can test this by inserting: a = "X" just before the assert: y

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-01-15 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100116.zip is a new version of the regex module. I've given up on the breadth-wise matching - it was too difficult finding a pattern structure that would work well for both depth-first and breadth-wise. It probably still needs some tweak

[issue3511] Incorrect charset range handling with ignore case flag?

2009-02-20 Thread Matthew Barnett
Matthew Barnett added the comment: "[9-A]" is equivalent to "[9:;<=>?...@a]", or should be. It'll be fixed in issue #2636. ___ Python tracker &l

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-02-24 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-features-3.diff is based on the 2.x trunk. Added comments. Restricted line lengths to no more than 80 characters Added common POSIX character classes like [[:alpha:]]. Added further checks to reduce unnecessary backtracking. I've decided to r

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-02-25 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-features-4.diff includes: Bugfixes msg74203: duplicate capture group numbers msg74904: duplicate capture group names Added file: http://bugs.python.org/file13185/issue2636-features-4.diff ___ Python tracker

[issue5358] Unicode control characters are not allowed as identifiers

2009-02-27 Thread Matthew Barnett
Matthew Barnett added the comment: The definition of a word in the new re module (actually targetted at Python 2.7) is currently a sequence of L&, N&, M& and Pc. I suppose ideally we want the definitions of a word and an identifier to be basically the same, except that an iden

[issue5382] Allow Python keywords as keyword arguments for functions.

2009-02-27 Thread Matthew Barnett
Matthew Barnett added the comment: The usual trick is to append "_": xhtmlNode('div',class_='sidebar') Could you modify the function to remove the trailing "_"? -- nosy: +mrabarnett ___ Python tr

[issue5382] Allow Python keywords as keyword arguments for functions.

2009-02-27 Thread Matthew Barnett
Matthew Barnett added the comment: The normal use of a keyword argument is to refer to a formal argument, which is an identifier. Being able to wrap it up into a dict is a later addition, and it's necessary to turn the identifier into a string because it's not possible to use a bar

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-02-28 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-features-5.diff includes: Bugfixes Added \G anchor (from Perl). \G is the anchor at the start of a search, so re.search(r'\G(\w)') is the same as re.match(r'(\w)'). re.findall normally performs a series of searches, eac

[issue814253] Grouprefs in lookbehind assertions

2009-03-05 Thread Matthew Barnett
Matthew Barnett added the comment: As part of issue #2636 group references now work in lookbehinds. However, your example: (?<=(...)\1)abc will fail but: (?<=\1(...))abc will succeed. Why? Well, in lookbehinds it searches backwards. In the first regex it sees the group ref

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-03-06 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-features-6.diff includes: Bugfixes Added group access via subscripting. >>> m = re.search("(\D*)(?\d+)(\D*)", "abc123def") >>> len(m) 4 >>> m[0] 'abc123def' >>> m[1] 'abc&

[issue1714448] if something as x:

2009-03-14 Thread Matthew Barnett
Matthew Barnett added the comment: At the moment binding occurs either right-to-left with "=", eg. x = y where "x" is the new name, or left-to-right, eg. import x as y where "y" is the new name. If the order is to be right-to-left then using "a

[issue1714448] if something as x:

2009-03-15 Thread Matthew Barnett
Matthew Barnett added the comment: Just for the record, I wasn't happy with "~=" either, and I have no problem with just forgetting the whole idea. -- ___ Python tracker <http://bugs.pytho

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-03-22 Thread Matthew Barnett
Matthew Barnett added the comment: An additional feature that could be borrowed, though in slightly modified form, from Perl is case-changing controls in replacement strings. Roughly the idea is to add these forms to the replacement string: \g<1> provides capture group 1

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-03-22 Thread Matthew Barnett
Matthew Barnett added the comment: Ah, too Perlish! :-) Another feature request that I've decided not to consider any further is recursive regular expressions. There are other tools available for that kind of thing, and I don't want the re module to go the way of Perl 6's rul

[issue694374] Recursive regular expressions

2009-03-24 Thread Matthew Barnett
Matthew Barnett added the comment: There are 2 reasons: 1. I've been told that my current patches contain too many differences from the current implementation, so basically I have to go back to the start and introduce any changes a little at a time, without knowing whether any parti

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-03-28 Thread Matthew Barnett
Matthew Barnett added the comment: Patch issue2636-patch-1.diff contains a stripped down version of my regex engine and the other changes that are necessary to make it work. -- Added file: http://bugs.python.org/file13449/issue2636-patch-1.diff

[issue5337] Scanner class in re module undocumented

2009-03-29 Thread Matthew Barnett
Matthew Barnett added the comment: FYI, I did tidy up the class and add a 'scaniter' method when I was working on issue #2636; it might yet see the light of day if it gets the go ahead! -- nosy: +mrabarnett ___ Python trac

[issue1528154] New sequences for Unicode groups and block ranges needed

2009-03-30 Thread Matthew Barnett
Matthew Barnett added the comment: I implemented \p, \P and [:...:] for the simple categories (eg "Lu" and "upper", but not "IsGreek") in the work I did for issue #2636. -- nosy: +mrabarnett ___ Python tracker <ht

[issue5337] Scanner class in re module undocumented

2009-03-31 Thread Matthew Barnett
Matthew Barnett added the comment: One of the limitations is that it identifies what matched by using capture groups, so if the expressions provided contain captures then it gets confused! :-) I handled that by 1) rejecting named captures and 2) changing unnamed captures into non-captures

[issue5680] Command-line arguments when running in IDLE

2009-04-03 Thread Matthew Barnett
New submission from Matthew Barnett : Patch idle-args.diff adds a dialog for entering command-line arguments for a script from within IDLE itself. -- components: IDLE files: idle-args.diff keywords: patch messages: 85341 nosy: mrabarnett severity: normal status: open title: Command-line

[issue1744752] Newline skipped in "for line in file"

2009-04-08 Thread Matthew Barnett
Matthew Barnett added the comment: What do you mean "towards the end of the file"? What are the offsets of the two lines? (I'm thinking it might be something to do with the \r\n lying across a boundary, such as the 4GB boundary.) -- no

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-04-16 Thread Matthew Barnett
Matthew Barnett added the comment: Try issue2636-patch-2.diff. -- Added file: http://bugs.python.org/file13707/issue2636-patch-2.diff ___ Python tracker <http://bugs.python.org/issue2

[issue5902] Stricter codec names

2009-05-02 Thread Matthew Barnett
Matthew Barnett added the comment: How about a 'full' form and a 'key' form generated by the function: def codec_key(name): return name.lower().replace("-", "").replace("_", "") The key form would be the key to an available code

[issue5902] Stricter codec names

2009-05-04 Thread Matthew Barnett
Matthew Barnett added the comment: Well, there are multiple UTF encodings, so no to "utf". Are there multiple Latin encodings? Not in Python 2.6.2 under those names. I'd probably insist on names that are strictish(?), ie correct, give o

[issue6156] Error compiling valid regex

2009-05-31 Thread Matthew Barnett
Matthew Barnett added the comment: I agree that it's a bug. A workaround is r'([xy])(?:\s{0,65534}\1)+'. A repeat of 65535 is treated as unlimited (but no warning is given). -- nosy: +mrabarnett ___ Python tracker <http://bugs.py

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-06-23 Thread Matthew Barnett
Matthew Barnett added the comment: It includes Unicode character properties, but not the Unicode script identification, because the Python Unicode database contains the former but not the latter. Although they could be added to the re module, IMHO their proper place is in the Unicode database

[issue8847] crash appending list and namedtuple

2010-06-05 Thread Matthew Barnett
Matthew Barnett added the comment: I've just found that: [1] + foo() crashes, but: [1].__add__(foo()) gives: Traceback (most recent call last): File "", line 1, in [1].__add__(foo()) TypeError: can only concatenate list (not "foo")

[issue7951] Should str.format allow negative indexes when used for __getitem__ access?

2010-06-14 Thread Matthew Barnett
Matthew Barnett added the comment: Re: msg107776. If it looks like an integer (ie, can be converted to an integer by 'int') then it's positional, otherwise it's a key. An optimisation is to perform a quick check upfront to see whether it st

[issue7951] Should str.format allow negative indexes when used for __getitem__ access?

2010-06-14 Thread Matthew Barnett
Matthew Barnett added the comment: That's a good question. :-) Possibly just an optional sign followed by one or more digits. Another possibility that occurs to me is for it to default to positional if it looks like an integer, but allow quoting to force it to be a key: >>>

[issue7951] Should str.format allow negative indexes when used for __getitem__ access?

2010-06-14 Thread Matthew Barnett
Matthew Barnett added the comment: Your original: "{0[-1]}".format('fox') is a worse gotcha than: "{-1}".format('fox') because you're much less likely to want to do the latter. It's one of those things that it would be nice to

[issue1519638] Unmatched Group issue - workaround

2010-06-25 Thread Matthew Barnett
Matthew Barnett added the comment: Issue #2636 resulted in the new regex module (also available on PyPI), so this issue is addressed by that, but there's no patch for the re module. -- ___ Python tracker <http://bugs.python.org/issu

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-07-05 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100706.zip is a new version of the regex module. I've added your examples to the unit tests. The module now passes. Keep up the good work! :-) -- Added file: http://bugs.python.org/file17877/issue2636-2010070

[issue9179] Lookback with group references incorrect (two issues?)

2010-07-06 Thread Matthew Barnett
Matthew Barnett added the comment: Should a regex compile if a group is referenced before it's defined? Consider this: (?:(?(2)(a)|(b))+ Other regex implementations permit forward references to groups. BTW, I had a look at the re module, found it too difficult, and so started on m

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-07-06 Thread Matthew Barnett
Matthew Barnett added the comment: I started with trying to modify the existing re module, but I wanted to make too many changes, so in the end I decided to make a clean break and start on a new implementation which was compatible with the existing re module and which could replace the

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-07-06 Thread Matthew Barnett
Matthew Barnett added the comment: The file at: http://pypi.python.org/pypi/regex was downloaded 75 times, if that's any help. (Now reset to 0 because of the bug fix.) If it's included in 3.2 then there's the question of whether it should replace the re module an

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-07-06 Thread Matthew Barnett
Matthew Barnett added the comment: As a crude guide of the speed difference, here's Python 2.6: re regex bm_regex_compile.py 86.53secs 260.19secs bm_regex_effbot.py 13.70secs8.94secs bm_regex_v8.py 15.66secs9.09secs Note

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-07-08 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100709.zip is a new version of the regex module. I've moved most of the regex module's Python code into a private module. -- Added file: http://bugs.python.org/file17912/issue2636-20

[issue1158231] string.Template does not allow step-by-step replacements

2010-07-10 Thread Matthew Barnett
Matthew Barnett added the comment: Here's a patch for Python 3.1, if anyone's still interested after 5 years. :-) -- keywords: +patch nosy: +mrabarnett Added file: http://bugs.python.org/file17930/from_template.diff ___ Python trac

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-07-18 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100719.zip is a new version of the regex module. Just a few more tweaks for speed. -- Added file: http://bugs.python.org/file18054/issue2636-20100719.zip ___ Python tracker <http://bugs.python.

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-07-19 Thread Matthew Barnett
Matthew Barnett added the comment: This has already been reported in issue #3511. -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bug

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-07-24 Thread Matthew Barnett
Matthew Barnett added the comment: issue2636-20100725.zip is a new version of the regex module. More tweaks for speed. re regex bm_regex_compile.py 87.05secs 278.00secs bm_regex_effbot.py 14.00secs6.58secs bm_regex_v8.py 16.11secs

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-07-26 Thread Matthew Barnett
Matthew Barnett added the comment: No. Wouldn't that break compatibility with 're'? -- ___ Python tracker <http://bugs.python.org/issue2636> ___ ___

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-07-26 Thread Matthew Barnett
Matthew Barnett added the comment: That's a possibility. I must admit that I don't entirely understand it enough to implement it (the OP said "I don't believe that the algorithm for this is a whole lot more complicated"), and I don't have a need for it myself,

[issue9529] Converge re.findall and re.finditer

2010-08-06 Thread Matthew Barnett
Matthew Barnett added the comment: (1) would break existing code. It would also mean that you wouldn't have access to the start and end positions of the matches either. (2) would also break existing code which is expecting a list. It's like the change that happened when some met

[issue9529] Converge re.findall and re.finditer

2010-08-07 Thread Matthew Barnett
Matthew Barnett added the comment: Ah, I see what you mean. I still think you're wrong, though! :-) The 'for' loop is doing is basically this: it = re.finditer(r'(\w+):(\w+)', text) try: while True: match_object = next(it)

[issue28880] range(i, j) doesn't work

2016-12-05 Thread Matthew Barnett
Matthew Barnett added the comment: Not a bug. Python 2 had 'range', which returned a list, and 'xrange', which returned an xrange object which was iterable: >>> range(7) [0, 1, 2, 3, 4, 5, 6] >>> xrange(7) xrange(7) >>> list(xrange(7)) [0, 1, 2

[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Matthew Barnett
Matthew Barnett added the comment: So prune would default to None? None means current behaviour (prune if sep is None else don't prune) True means prune empty strings False means don't prune empty string -- nosy: +mrabarnett ___ Pyth

[issue29074] repr doesn't give full result for this re math result

2016-12-26 Thread Matthew Barnett
Matthew Barnett added the comment: See issue 17087: "Improve the repr for regular expression match objects". It was decided that it might be a bad idea to show the entire matched portion of the string because it could be very long, so it's shown truncated if necessary. --

[issue29074] repr doesn't give full result for this re math result

2016-12-28 Thread Matthew Barnett
Matthew Barnett added the comment: Probably "...", although we also have to consider that the matched portion could in fact not be truncated but just happen to end with "...", although that would be a rare occurrence. -- ___ P

[issue22594] Add a link to the regex module in re documentation

2017-02-07 Thread Matthew Barnett
Matthew Barnett added the comment: I agree with Marco that it shouldn't be too verbose. I'd like to suggest that it says that it's compatible (i.e. has the same API), but with additional features. -- ___ Python tracker <http

[issue22594] Add a link to the regex module in re documentation

2017-02-07 Thread Matthew Barnett
Matthew Barnett added the comment: With the VERSION0 flag (the default behaviour), it should behave the same as the re module, and that's not going to change. -- ___ Python tracker <http://bugs.python.org/is

[issue22594] Add a link to the regex module in re documentation

2017-02-08 Thread Matthew Barnett
Matthew Barnett added the comment: Ah, well, if it hasn't changed after this many years, it never will. Expect one or two changes to the text. :-) -- ___ Python tracker <http://bugs.python.org/is

[issue29571] test_re is failing when local is set for `en_IN`

2017-02-15 Thread Matthew Barnett
Matthew Barnett added the comment: I'm just wondering whether the problem is just due to the locale's encoding being UTF-8. The locale support in re really only works with encodings that use 1 byte/character. -- ___ Python trac

[issue29571] test_re is failing when local is set for `en_IN`

2017-02-15 Thread Matthew Barnett
Matthew Barnett added the comment: The report says "== encodings: locale=UTF-8, FS=utf-8". It says that "test_locale_caching" was skipped, but also that "test_locale_flag" failed. -- ___ Python tracker <

[issue27177] re match.group should support __index__

2016-06-02 Thread Matthew Barnett
Matthew Barnett added the comment: It would be a bug if it was supported but gave the wrong result. It has never been supported (the re module predates PEP 357), so it's a new feature. -- ___ Python tracker <http://bugs.python.org/is

[issue27471] sre_constants.error: bad escape \d

2016-07-08 Thread Matthew Barnett
Matthew Barnett added the comment: There's a move to treat invalid escape sequences as an error (see issue 27364). The previous behaviour was to treat them as literals. The replacement template string contains \d, which is not a valid escape sequence (it's valid for the pattern, b

[issue27669] Bug in re.fullmatch() specific to Windows XP

2016-08-02 Thread Matthew Barnett
Matthew Barnett added the comment: FTR, I can't reproduce it. This is what I get on Windows XP (32-bit): Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (In tel)] on win32 Type "help", "copyright", "credits" or "lice

[issue27669] Bug in re.fullmatch() specific to Windows XP

2016-08-02 Thread Matthew Barnett
Matthew Barnett added the comment: When I tried it, I got matches for both 'match' and 'fullmatch' (output attached as 'output.txt') as expected. -- Added file: http://bugs.python.org/file43984/output.txt ___ Python tr

[issue27669] Bug in re.fullmatch() specific to Windows XP

2016-08-03 Thread Matthew Barnett
Matthew Barnett added the comment: Are you using the same version on the other systems? I've had a quick look through the bug tracker and found some fixes for fullmatch that postdate Python 3.4.0, so I'll suggest you just update to a more recent version of

[issue27800] Regular expressions with multiple repeat codes

2016-08-19 Thread Matthew Barnett
Matthew Barnett added the comment: "*" and the other quantifiers ("+", "?" and "{...}") operate on the preceding _item_, not the entire preceding expression. For example, "ab*" means "a" followed by zero or more repeats of "

[issue694374] Recursive regular expressions

2013-02-23 Thread Matthew Barnett
Matthew Barnett added the comment: FYI, I did eventually add it to my regex implementation. It was quite challenging! -- ___ Python tracker <http://bugs.python.org/issue694

[issue17297] Issue with return in recursive functions

2013-02-25 Thread Matthew Barnett
Matthew Barnett added the comment: This question should've been posted to python-l...@python.org, not here. Your functions are calling themselves, but not returning the result of the call to their own callers. -- ___ Python tracker

[issue8402] Add a function to escape metacharacters in glob/fnmatch

2013-03-07 Thread Matthew Barnett
Matthew Barnett added the comment: I've attached fnmatch_implementation.py, which is a simple pure-Python implementation of the fnmatch function. It's not as susceptible to catastrophic backtracking as the current re-based one. For example: fnmatch('a' * 50, '*a*&

[issue17381] IGNORECASE breaks unicode literal range matching

2013-03-07 Thread Matthew Barnett
Matthew Barnett added the comment: The way the re handles ranges is to convert the two endpoints to lowercase and then check whether the lowercase form of the character in the text is in that range. For example, [A-Z] is converted to the range [\x41-\x5A], and the lowercase form of &#

[issue17381] IGNORECASE breaks unicode literal range matching

2013-03-11 Thread Matthew Barnett
Matthew Barnett added the comment: In issue #3511 the range was slightly unusual, so closing it seemed a reasonable approach, but the range in this issue is less clearly a problem. My preference would be to fix it, if possible. -- ___ Python

[issue17426] \0 in re.sub substitutes to space

2013-03-15 Thread Matthew Barnett
Matthew Barnett added the comment: The regex behaves the same as re. The reason it isn't supported is that \0 starts an octal escape sequence. -- ___ Python tracker <http://bugs.python.org/is

[issue17447] str.identifier shouldn't accept Python keywords

2013-03-17 Thread Matthew Barnett
Matthew Barnett added the comment: I already use it in the regex module for named groups. I don't think it would ever be a problem in practice because the names are invariably handled as strings. -- nosy: +mrabarnett ___ Python tracker

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Matthew Barnett
Matthew Barnett added the comment: It's not a bug. The documentation says """Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list."

[issue17998] internal error in regular expression engine

2013-05-17 Thread Matthew Barnett
Matthew Barnett added the comment: Here are some simpler examples of the bug: re.compile('.*yz', re.S).findall('xyz') re.compile('.?yz', re.S).findall('xyz') re.compile('.+yz', re.S).findall('xyz') Unfortunately I find it difficult to

<    1   2   3   4   5   6   >