[issue1708652] Exact matching
Tom Lynn added the comment: I don't know whether it should stand, I'm somewhere around 0 on it myself. So I guess that means it shouldn't, since it's easier to add features than remove them. The problem is that once you're aware of the need for it you need it less. In case other people are +1, I'll note that "exact" isn't a very nice name either, not being a verb. "exact_match" is a bit long but probably better (and better than "match_exact"). -- ___ Python tracker <http://bugs.python.org/issue1708652> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1708652] Exact matching
Tom Lynn added the comment: I'm still unsure. I think this confusion does cause bugs in real-world code. Perhaps more prominence for \A and \Z in the docs? There's already a section comparing regexps starting '^' with match under "Matching vs Searching". The problem is basically that ^ and $ have weird semantics but are better recognised than \A and \Z. Looking over the docs again I see that the docs for $ are still misleading, in a way that's related to this issue: foo matches both 'foo' and 'foobar', while the regular expression foo$ matches only 'foo'. "foo$ matches only 'foo' (out of 'foo' and 'foobar')" is the correct interpretation of that, but it's easy to read it as "foo$ means exact_match('foo')", which is the misconception I was hoping to put to rest with this (foo$ also matches the 'foo' part of 'foo\nbar', even with flags=0). -- ___ Python tracker <http://bugs.python.org/issue1708652> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1708652] Exact matching
Tom Lynn added the comment: Actually, looking at the second part of the docs for $ (on "foo.$") makes me think the main motivating case here may be a bug in re.match:: >>> re.match('foo$', 'foo\n\n') >>> re.match('foo$', 'foo\n') <_sre.SRE_Match object at 0x00A98678> Shortening an input string shouldn't ever cause it to match, should it? -- ___ Python tracker <http://bugs.python.org/issue1708652> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1708652] Exact matching
Tom Lynn added the comment: Oh dear, I'm wrong on two fronts (I wish Roundup had post editing). a) foo$ doesn't match the 'foo' part of 'foo\nbar' as I stated above, but does match the 'foo' part of 'foo\n'. b) Obviously shortening an input string can cause it to match. It's still weird though. -- ___ Python tracker <http://bugs.python.org/issue1708652> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1708652] Exact matching
Tom Lynn added the comment: (Sorry to comment on a closed issue, it was closed as I was writing this.) It's not that I'm not convinced of the need, just not of the solution. I still think that there are problems here: a) forgetting any \Z or $ terminator to .match() is easy, b) $ is easily misunderstood (and not just by me) and I suspect commonly dangerously misused in validation routines as a result, c) '(?:%s)\Z' % regexp is noisy, combines two less-understood features, and makes simple regexps hard to read, d) '(?:%s)\Z' % regexp.pattern requires recompilation of the regexp. I think another method is probably the best solution to these, but it may have too much cost (though I'm not sure what that cost would be). Largely orthogonally, I'd like to see \Z encouraged over $ in the docs, and preferably a version of this table (probably under Matching vs Searching), corrected if I'm wrong of course: NON-MULTILINE: '^' is equivalent to '\A' '$' is equivalent to '(?:\Z|(?=\n\Z))' MULTILINE: '^' is equivalent to '(?:\A|(?<=\n))' '$' is equivalent to '(?:\Z|(?=\n))' But the docs already try to express the above table (or its correction) in English, so you may feel it wouldn't add anything, in which case I'd still like to see any corrections for my own edification if possible. -- ___ Python tracker <http://bugs.python.org/issue1708652> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1859] textwrap doesn't linebreak on "\n"
Tom Lynn added the comment: I've also been attempting to look into this and came up with an almost identical patch, which is promising: https://bitbucket.org/tlynn/issue1859/diff/textwrap.py?diff2=041c9deb90a2&diff1=f2c093077fbf I missed the wordsep_simple_re though. Testing it is the hard part. I've got a few examples that could become tests in that repository, but it's far from conclusive. One corner case I found is trailing whitespace becoming a blank line: >>> from textwrap import TextWrapper >>> T = TextWrapper(replace_whitespace=False, drop_whitespace=False, width=9) >>> T.wrap('x'*9 + ' \nfoo') ['x', ' ', 'foo'] I think it's fine. drop_whitespace=True removes the blank line, and those who really want drop_whitespace=False can remove the blank lines easily. -- ___ Python tracker <http://bugs.python.org/issue1859> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1708652] Exact matching
Tom Lynn <[EMAIL PROTECTED]> added the comment: Yes, that's right. The binary aspect of it was something of a red herring, I'm afraid, although I still think that (or parsing in general) is an important use case. The prime motivation it that it's easy to either forget the '\Z' or to use '$' instead, which both cause subtle bugs. An exact() method might help to avoid that. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1708652> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1631394] sre module has misleading docs
Tom Lynn added the comment: Thanks for fixing this. I now also note that (?<=...), (?http://bugs.python.org/file9284/undoc-patch.txt _ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1631394> _ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1631394] sre module has misleading docs
Tom Lynn added the comment: Nice changes to the wording. (For the record: it's r60316 in fact.) _ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1631394> _ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2670] urllib2 build_opener() fails if two handlers use the same default base class
New submission from Tom Lynn <[EMAIL PROTECTED]>: urllib2.py:424 (Py 2.4) or urllib2.py:443 (Py 2.5) in build_opener():: skip = [] for klass in default_classes: for check in handlers: if inspect.isclass(check): if issubclass(check, klass): skip.append(klass) elif isinstance(check, klass): skip.append(klass) for klass in skip: default_classes.remove(klass) This can cause klass to be appended to skip multiple times, which then causes an exception in the final line quoted above. skip should be a set (and append changed to add), or "if klass not in skip:" should be added before each "skip.append(klass)". -- components: Library (Lib) messages: 65683 nosy: tlynn severity: normal status: open title: urllib2 build_opener() fails if two handlers use the same default base class type: behavior versions: Python 2.4, Python 2.5 __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2670> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5021] doctest.testfile should set __name__, can't use namedtuple
New submission from Tom Lynn : This file fails when run with doctest.testfile:: >>> print __name__ __builtin__ >>> print globals()['__name__'] # fails with KeyError: __name__ __builtin__ "__builtin__" is probably not a good value, but more importantly, this means that you can't use namedtuples in text file doctests, since namedtuple() inspects the calling frame:: >>> from namedtuple import namedtuple >>> t = namedtuple('fred', 'x') # fails (I presume this is the same for "from collections import namedtuple", but I've not tested with 2.6+.) A workaround is to add this line at the start of the test:: >>> __name__ = 'test' -- components: Library (Lib) messages: 80322 nosy: tlynn severity: normal status: open title: doctest.testfile should set __name__, can't use namedtuple type: feature request versions: Python 2.5 ___ Python tracker <http://bugs.python.org/issue5021> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5022] doctest should allow running tests with "python -m doctest"
New submission from Tom Lynn : It would be good to be able to do something like:: $ python -m doctest foo.py $ python -m doctest --text foo.txt bar.txt (or preferably some command line options design which could handle both .py and .txt files). -- components: Library (Lib) messages: 80323 nosy: tlynn severity: normal status: open title: doctest should allow running tests with "python -m doctest" type: feature request versions: Python 2.5 ___ Python tracker <http://bugs.python.org/issue5022> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5079] time.ctime docs refer to "time tuple" for default
New submission from Tom Lynn : The docs for time.ctime() (quoted below) seem to have been copied from time.asctime(). They refer to a time tuple and localtime(), where they should refer to seconds and time(). Current docs:: ctime(seconds) -> string Convert a time in seconds since the Epoch to a string in local time. This is equivalent to asctime(localtime(seconds)). When the time tuple is not present, current time as returned by localtime() is used. -- messages: 80644 nosy: tlynn severity: normal status: open title: time.ctime docs refer to "time tuple" for default ___ Python tracker <http://bugs.python.org/issue5079> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5079] time.ctime docs refer to "time tuple" for default
Changes by Tom Lynn : -- components: +Library (Lib) type: -> feature request versions: +Python 2.5, Python 3.0 ___ Python tracker <http://bugs.python.org/issue5079> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1079] decode_header does not follow RFC 2047
Tom Lynn added the comment: The only difference between the two regexps is that the email/header.py version looks for:: (?=[ \t]|$) # whitespace or the end of the string at the end (with re.MULTILINE, so $ also matches '\n'). To expand on "There is nothing about that thing in RFC 2047", it says:: IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's by an RFC 822 parser. RFC 822 says:: atom= 1* ... specials= "(" / ")" / "<" / ">" / "@" ; Must be in quoted- / "," / ";" / ":" / "\" / <"> ; string, to use / "." / "[" / "]" ; within a word. So an example of mis-parsing is:: >>> import email.header >>> h = '=?utf-8?q?=E2=98=BA?=(unicode white smiling face)' >>> email.header.decode_header(h) [('=?utf-8?q?=E2=98=BA?=(unicode white smiling face)', None)] The correct result would be:: >>> email.header.decode_header(h) [('\xe2\x98\xba', 'utf-8'), ('(unicode white smiling face)', None)] which is what you get if you insert a space before the '(' in h. -- nosy: +tlynn ___ Python tracker <http://bugs.python.org/issue1079> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4958] email/header.py ecre regular expression issue
Tom Lynn added the comment: Duplicates issue1047. -- nosy: +tlynn ___ Python tracker <http://bugs.python.org/issue4958> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4958] email/header.py ecre regular expression issue
Tom Lynn added the comment: Oops, duplicates issue 1079 even. ___ Python tracker <http://bugs.python.org/issue4958> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4491] email.Header.decode_header() doesn't work if encoded-word was separeted by CRLF
Tom Lynn added the comment: Duplicates issue1079. -- nosy: +tlynn ___ Python tracker <http://bugs.python.org/issue4491> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15858] tarfile missing entries due to omitted uid/gid fields
New submission from Tom Lynn: The tarfile module silently truncates the list of entries when reading a tar file if it sees an entry with a uid/gid field containing only spaces/NULs. I got such a tarball from Java Maven/plexus-archiver. I don't know whether they write such fields deliberately, but it seems reasonable to me, especially since they were providing the user/group names textually. I'd like to see two fixes - a None/-1/0 value for the uid/gid and not silently swallowing HeaderErrors in TarFile.next() (or at least documenting why it's being done). 0 would be consistent with the default value when writing, but None seems more honest. -1 seems hard to defend. Only tested on silly Python versions (2.6, PyPy-1.8), sorry. It's what I've got to hand, but I think this issue also applies to recent Python too going by looking at the hg trunk. -- components: Library (Lib) messages: 169799 nosy: tlynn priority: normal severity: normal status: open title: tarfile missing entries due to omitted uid/gid fields type: behavior versions: 3rd party, Python 2.6 ___ Python tracker <http://bugs.python.org/issue15858> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15858] tarfile missing entries due to omitted uid/gid fields
Tom Lynn added the comment: I think the default has to be 0 for consistency with how other empty numeric fields are handled. In theory spaces and NULs are supposed to be equivalent terminators in numeric fields, but I've just noticed that plexus-archiver is also using leading spaces rather than leading zeros (against the spec), e.g. ' 10422\x00 '. tarfile currently supports this, which I think is good, so I think the right approach is to lstrip(' ') fields and then treat space as a terminator. -- ___ Python tracker <http://bugs.python.org/issue15858> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15858] tarfile missing entries due to omitted uid/gid fields
Tom Lynn added the comment: See attached bad.tar. $ less bad.tar | cat drwxr-xr-x 0/0 0 2012-09-05 20:04 foo/ -rw-rw-r-- uname/gname 0 2012-09-05 20:04 foo/a $ python -c 'import tarfile; print(tarfile.open("bad.tar").getnames())' ['foo'] $ python -c 'import tarfile, patch; patch.patch_tarfile(); print (tarfile.open("bad.tar").getnames())' ['foo', 'foo/a'] I'm only allowed to attach one file via the tracker web UI, so patch.py will follow. Creation code for bad.tar, largely for my benefit: import java.io.FileOutputStream; import java.io.IOException; import org.codehaus.plexus.archiver.tar.TarOutputStream; import org.codehaus.plexus.archiver.tar.TarEntry; class TarTest { public static void main(String[] args) throws IOException { FileOutputStream fos = new FileOutputStream("bad.tar"); TarOutputStream tos = new TarOutputStream(fos); TarEntry entry = new TarEntry("foo/"); entry.setMode(16877); // 0o40755 entry.setUserId(0); entry.setGroupId(0); entry.setUserName(""); entry.setGroupName(""); tos.putNextEntry(entry); TarEntry entry2 = new TarEntry("foo/a"); entry2.setMode(33204); // 0o100664 entry2.setUserId(-1); // XXX: dodgy entry2.setGroupId(-1); // XXX: dodgy entry2.setUserName("uname"); entry2.setGroupName("gname"); tos.putNextEntry(entry2); tos.close(); fos.close(); } } -- Added file: http://bugs.python.org/file27129/bad.tar ___ Python tracker <http://bugs.python.org/issue15858> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15858] tarfile missing entries due to omitted uid/gid fields
Tom Lynn added the comment: patch.py attached - what I'm using as a workaround at the moment. -- Added file: http://bugs.python.org/file27130/patch.py ___ Python tracker <http://bugs.python.org/issue15858> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19560] PEP 8 operator precedence across parens
New submission from Tom Lynn: PEP 8 currently has:: Yes:: ... c = (a+b) * (a-b) No:: ... c = (a + b) * (a - b) That looks wrong to me -- surely the parens are a sufficient precedence hint, and don't need further squashing inside? This will be worse with any non-trivial example. I suspect it may also lead to silly complications in code formatting tools. This was changed by Guido as part of a reversion in issue 16239, but I wonder whether that example was intended to be included? -- assignee: docs@python components: Documentation messages: 202687 nosy: docs@python, tlynn priority: normal severity: normal status: open title: PEP 8 operator precedence across parens type: enhancement ___ Python tracker <http://bugs.python.org/issue19560> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19560] PEP 8 operator precedence across parens
Tom Lynn added the comment: FWIW, this pair of examples differs from the others in this section as they were both explicitly okayed in the first version of PEP 8 <http://hg.python.org/peps/rev/4c31c25bdc03?revcount=120>:: - Use your better judgment for the insertion of spaces around arithmetic operators. Always be consistent about whitespace on either side of a binary operator. Some examples: i = i+1 submitted = submitted + 1 x = x*2 - 1 hypot2 = x*x + y*y c = (a+b) * (a-b) c = (a + b) * (a - b) My guess is that this is still the intention? -- ___ Python tracker <http://bugs.python.org/issue19560> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15858] tarfile missing entries due to omitted uid/gid fields
Tom Lynn added the comment: The secondary issue, which the patch doesn't address, is that TarFile.next() seems unpythonic; it treats any {Invalid,Empty,Truncated}HeaderError after offset 0 as EOF rather than propagating the exception. It looks deliberate, but I'm not sure why it would be done like that or if it should be changed. -- ___ Python tracker <http://bugs.python.org/issue15858> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2746] ElementTree ProcessingInstruction uses character entities in content
Changes by Tom Lynn : -- nosy: +tlynn ___ Python tracker <http://bugs.python.org/issue2746> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1859] textwrap doesn't linebreak on "\n"
Tom Lynn added the comment: This bug should be re-opened, since there is definitely a bug here. I think the patch was incorrectly rejected. If I can expand palfrey's example: from textwrap import * T = TextWrapper(replace_whitespace=False, width=75) text = '''\ a a a a a b b b b b c c c c c d d d d d e e e e e''' for line in T.wrap(text): print line Python 2.5 textwrap turns it into: a a a a a b b b b b c c c c c d d d d d e e e e e That can't be right. palfrey's patch leaves the input unchanged, which seems correct to me. I think Guido guessed wrong here: the docs for replace_whitespace say: If true, each whitespace character (as defined by string.whitespace) remaining after tab expansion will be replaced by a single space The text should therefore not be reflowed in this case since replace_whitespace=False. palfrey's patch seems correct to me. It can be made to reflow to the full width by editing palfrey's patch, but that would disagree with the docs and break code. -- nosy: +tlynn ___ Python tracker <http://bugs.python.org/issue1859> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15858] tarfile missing entries due to omitted uid/gid fields
Tom Lynn added the comment: I think issue24514 (fixed in Py2.7.11) is a duplicate of this issue. -- ___ Python tracker <http://bugs.python.org/issue15858> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com