Ezio Melotti <[email protected]> added the comment:
With 3.2 the situation is more complicated because there is a strict and a
non-strict mode.
The strict mode uses:
attrfind = re.compile(
r'\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*'
r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~@]*))?')
and the tolerant mode uses:
attrfind_tolerant = re.compile(
r'\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*'
r'(\'[^\']*\'|"[^"]*"|[^>\s]*))?')
This means that the strict mode doesn't allow valid non-ASCII chars, and that
tolerant mode is a little too permissive.
The attached patch changes the strict regex to be more permissive and leaves
the tolerant regex unchanged. The difference between the two are now so small
that the tolerant version could be removed, except that re.search is used
instead of re.match when the tolerant regex is used.
----------
nosy: +r.david.murray
Added file: http://bugs.python.org/file21545/issue7311-3.diff
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue7311>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com