[issue11283] incorrect pattern in the re module docs for conditional regex

2011-02-25 Thread wesley chun

wesley chun  added the comment:

i wanted to add one additional comment that it would be nice to have a
regex that works with search() (in addition to match()) because such
an email address may appear in the middle of a line, say a From: or
To: email header.

the fix of using a '$' prevents this from happening, so i'm not 100%
satisfied with the patch although it does fix the regex to get it
working with match().

--

___
Python tracker 
<http://bugs.python.org/issue11283>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11283] incorrect pattern in the re module docs for conditional regex

2011-02-22 Thread wesley chun

New submission from wesley chun :

In the re docs, it states the following for the conditional regular expression 
syntax:

(?(id/name)yes-pattern|no-pattern)
Will try to match with yes-pattern if the group with given id or name exists, 
and with no-pattern if it doesn’t. no-pattern is optional and can be omitted. 
For example, (<)?(\w+@\w+(?:\.\w+)+)(?(1)>) is a poor email matching pattern, 
which will match with '' as well as 'u...@host.com', but not 
with '':

>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', ''))
True
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', 'u...@host.com'))
True
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', '>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', 'u...@host.com>'))
True

This error has existed since this feature was added in 2.4...
http://docs.python.org/release/2.4.4/lib/re-syntax.html

... through the 3.3. docs...
http://docs.python.org/dev/py3k/library/re.html#regular-expression-syntax

The fix is to add the end char '$' to the regex to get all 4 working:


>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', ''))
True
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', 'u...@host.com'))
True
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', '>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', 'u...@host.com>'))
False

If accepted, I propose this patch (also attached):

$ svn diff re.rst
Index: re.rst
===
--- re.rst  (revision 88499)
+++ re.rst  (working copy)
@@ -297,9 +297,9 @@
 ``(?(id/name)yes-pattern|no-pattern)``
Will try to match with ``yes-pattern`` if the group with given *id* or 
*name*
exists, and with ``no-pattern`` if it doesn't. ``no-pattern`` is optional 
and
-   can be omitted. For example,  ``(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)`` is a poor 
email
+   can be omitted. For example,  ``(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)`` is a 
poor email
matching pattern, which will match with ``''`` as well as
-   ``'u...@host.com'``, but not with ``''`` .

--
assignee: docs@python
components: Documentation, Regular Expressions
files: re.rst
messages: 129041
nosy: docs@python, wesley.chun
priority: normal
severity: normal
status: open
title: incorrect pattern in the re module docs for conditional regex
versions: Python 2.5, Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3
Added file: http://bugs.python.org/file20833/re.rst

___
Python tracker 
<http://bugs.python.org/issue11283>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com