Re: [Python-Dev] Regexp 2.7
Would there be any interest in augmenting the test case library for the regex stuff? When I was working on PyPy, we were using a simplified regular expression matcher to implement the tokenizer for Python. I was able to take a lot of PCRE's regex tests and port them to test our regular expression implementation (to make sure the DFA's were being optimized properly, etc). I believe the PCRE test library was under a very liberal license, and so we may be able to do the same here. If there's interest in it, I can do the same for Python. Jared On 9 Mar 2009, at 16:07, Antoine Pitrou wrote: Facundo Batista gmail.com> writes: Matthew Barnett has been doing a lot of work on the regular expressions engine (it seems he hasn't finished yet) under http://bugs.python.org/issue2636 . However, the patches are really huge and touch all of the sre internals. I wonder what the review process can be for such patches? Is there someone knowledgeable enough to be able to review them? All test cases run ok? How well covered is that library? I don't know, I haven't even tried it. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Regexp 2.7
I'm not criticizing the current battery of tests, nor am I arguing that we replace them. There's a comment in the test_re.py that says that "these tests were carefully modeled to cover most of the code"... That is a very difficult statement to maintain and/or verify, especially if the library gets a major revision (which it appears the original post's patch is). PCRE has _thousands_ of detailed regular expression tests, testing everything from matching to parsing to extended regular expression syntax to encoding and locales. (It's been a while since I've looked at the details, but of course there are tests that dont apply to Python's implmentation.) So, if there's interest in investigating how much of the PCRE tests can augment the existing tests, I am offering to do so. (I already did a simple translation utility to parse the PCRE test format into something we could use in the PyPy test suite; I could try to do something similar for test_re, if there's interest). Jared On 10 Mar 2009, at 11:32, Guido van Rossum wrote: Hm, what's wrong with the existing set of regex test cases? This is one of the most complete set of test cases in our test suite. On Tue, Mar 10, 2009 at 11:08 AM, Jared Grubb wrote: Would there be any interest in augmenting the test case library for the regex stuff? On 9 Mar 2009, at 16:07, Antoine Pitrou wrote: Facundo Batista gmail.com> writes: Matthew Barnett has been doing a lot of work on the regular expressions engine (it seems he hasn't finished yet) under http://bugs.python.org/issue2636 . However, the patches are really huge and touch all of the sre internals. I wonder what the review process can be for such patches? Is there someone knowledgeable enough to be able to review them? All test cases run ok? How well covered is that library? I don't know, I haven't even tried it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Grammar for plus and minus unary ops
I was recently reviewing some Python code for a friend who is a C++ programmer, and he had code something like this: def foo(): try = 0 while tryI was a bit surprised that this was syntactically valid, and because the timeout condition only occurred in exceptional cases, the error has not yet caused any problems. It appears that the grammar treats the above example as the unary + op applied twice: u_expr ::= power | "-" u_expr | "+" u_expr | "\~" u_expr Playing in the interpreter, expressions like "1+5" and "1+-+- +-+-+-+-5" evaluate to 6. I'm not a EBNF expert, but it seems that we could modify the grammar to be more restrictive so the above code would not be silently valid. E.g., "++5" and "1+++5" and "1+-+5" are syntax errors, but still keep "1++5", "1+-5", "1-+5" as valid. (Although, '~' throws in a kink... should '~-5' be legal? Seems so...) Jared ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pyc files, constant folding and borderline portability issues
On 7 Apr 2009, at 11:59, Alexandru Moșoi wrote: Not necessarily. For example C/C++ doesn't define the order of the operations inside an expression (and AFAIK neither Python) and therefore folding 2 * 3 is OK whether b is an integer or an arbitrary object with mul operator overloaded. Moreover one would expect * to be associative and commutative (take a look at Python strings); if a * 2 * 3 returns a different result from a * 6 I will find it very surprising and probably reject such code. That's not true. All ops in C/C++ have associativity that is fixed and well-defined; the star op is left-associative: 2*3*x is (2*3)*x is 6*x x*2*3 is (x*2)*3, and this is NOT x*6 (You can show this in C++ by creating a class that has a side-effect in its * operator). The star operator is not commutative in Python or C/C++ (otherwise what would __rmul__ do?). It's easier to see that + is not commutative: "abc"+"def" and "def"+"abc" are definitely different! You may be confusing the "order is undefined" for the evaluation of parameter lists in C/C++. Example: foo(f(),g()) calls f and g in an undefined order. Jared ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Issue5434: datetime.monthdelta
On 16 Apr 2009, at 11:42, Paul Moore wrote: The key thing missing (I believe) from dateutil is any equivalent of monthmod. I agree with that. It's well-defined and it makes a lot of sense. +1 But, I dont think monthdelta can be made to work... what should the following be? print(date(2008,1,30) + monthdelta(1)) print(date(2008,1,30) + monthdelta(2)) print(date(2008,1,30) + monthdelta(1) + monthdelta(1)) Jared ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
On 19 Apr 2009, at 02:17, Stephen J. Turnbull wrote: Nick Coghlan writes: 3. Change the shebang lines in Python standard library scripts to be version specific and update release.py to fix them all when bumping the version number in the source tree. +1 I think that it's probably best to leave "python", "python2", and "python3" for the use of downstream distributors. ISTR that was what Guido concluded, in the discuss that led to Python 3 defaulting to altinstall---it wasn't just convenient because Python 3 is a major change, but that experience has shown that deciding which Python is going to be "The python" on somebody's system just isn't a decision that Python should make. Ok, so if I understand, the situation is: * python points to 2.x version * python3 points to 3.x version * need to be able to run certain 3k scripts from cmdline (since we're talking about shebangs) using Python3k even though "python" points to 2.x So, if I got the situation right, then do these same scripts understand that PYTHONPATH and PYTHONHOME and all the others are also probably pointing to 2.x code? Jared ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PYTHON3PATH
On 13 Jan 2010, at 13:43, Nick Coghlan wrote: > Guido van Rossum wrote: >> On Wed, Jan 13, 2010 at 9:57 AM, sstein...@gmail.com > Or, how about >> just removing the antiquated use of environment variables altogether >> from Python 3 and avoid the issue completely. >> >> -1. They have their use, but more in controlled situations. If you >> have "global" env vars that you only want to use with Python 2.x, >> write a wrapper for Python 3 that invokes it with -E. > > Perhaps a case can be made for Python 3 to assume -E by default, with a > -e option to enable reading of the environment variables? > > That way naive users could run Python3 without tripping over existing > Python2 environment variables, while other tools could readily set up a > different environment before launching Python3. I raised a question about these PYTHON3* variables once before in a discussion about shebang lines: http://www.mailinglistarchive.com/python-dev@python.org/msg29500.html I'm not advocating them, but just wanted to make sure to bring up the shebang use case. Jared ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] interesting article on regex performance
On 12 Mar 2010, at 15:22, s...@pobox.com wrote: > > Collin> re2 is not a full replacement for Python's current regex > Collin> semantics: it would only serve as an accelerator for a subset of > Collin> the current regex language. Given that, it makes perfect sense > Collin> that it would be optional on such minority platforms (much like > Collin> the incoming JIT). > > Sure, but over the years Python has supported at least four different > regular expression modules that I'm aware of (regex, regexp, and the current > re module with different extension modules underneath it, perhaps there were > others). During some of that time more than one module was distributed with > Python proper. I think the desire today would be that only one regular > expression module be distributed with Python (that would be my vote anyway). > Getting people to move off the older libraries was difficult. If re2 can't > replace sre under the covers than I think it belongs in PyPI, not the Python > distribution. That said, that suggests to me that a different NFA or DFA > implementation written in C would replace sre, one not written in C++. re2 would be a supplement to re -- it is not a replacement, and Python would run fine if it's not present on some platforms. It's like a floating-point processor: you can do all math you need with just an integer processor. But if you have an FPU present, then it makes sense to use it for the FP operations. Jared ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Documentation idea
This is a really interesting idea. If extra memory/lookup overhead is a concern, you could enable this new feature by default when the interactive interpreter is started (where it's more likely to be invoked), and turn it off by default when running scripts/modules. Jared On 9 Oct 2008, at 20:37, [EMAIL PROTECTED] wrote: On 9 Oct, 11:12 pm, [EMAIL PROTECTED] wrote: Background -- In the itertools module docs, I included pure python equivalents for each of the C functions. Necessarily, some of those equivalents are only approximate but they seem to have greatly enhanced the docs. Why not go the other direction? Ostensibly the reason for writing a module like 'itertools' in C is purely for performance. There's nothing that I'm aware of in that module which couldn't be in Python. Similarly, cStringIO, cPickle, etc. Everywhere these diverge, it is (if not a flat-out bug) not optimal. External projects are encouraged by a wealth of documentation to solve performance problems in a similar way: implement in Python, once you've got the interface right, optimize into C. So rather than have a C implementation, which points to Python, why not have a Python implementation that points at C? 'itertools' (and similar) can actually be Python modules, and use a decorator, let's call it "C", to do this: @C("_c_itertools.count") class count(object): """ This is the documentation for both the C version of itertools.count and the Python version - since they should be the same, right? """ In Python itself, the Python module would mostly be for documentation, and therefore solve the problem that Raymond is talking about, but it could also be a handy fallback for sanity checking, testing, and use in other Python runtimes (ironpython, jython, pypy). Many third-party projects already use ad-hoc mechanisms for doing this same thing, but an officially-supported way of saying "this works, but the optimized version is over here" would be a very useful convention. In those modules which absolutely require some C stuff to work, the python module could still serve as documentation. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/jared.grubb%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 374 (DVCS) now in reST
Regardless of the outcome, those that want to use SVN can use SVN, and those that want to use "chosen DVCS" can use that. In the end, which is the more "lossy" repository? It seems like if the change is transparent to everyone who is using it, then the only thing that we care about is that the chosen backend will preserve all the information to make it truly transparent to everyone involved. Jared On 25 Jan 2009, at 10:37, Martin v. Löwis wrote: There's a possible third way. I've heard (though haven't investigated) that some people are working on supporting the svn wire protocol in the bzr server. This would mean that anybody who's still comfortable with svn and feels no need to change their current habits can continue to work the way they always have. Those that want the extra benefits of a DVCS, or who do not have commit access to the code.python.org branches would have viable alternatives. Of course, those without commit access *already* have viable alternatives, IIUC, by means of the automatic ongoing conversion of the svn repository to bzr and hg (and, IIUC, git - or perhaps you can use git-svn without the need for server-side conversion). So a conversion to a DVCS would only benefit those committers who see a benefit in using a DVCS (*) (and would put a burden on those committers who see a DVCS as a burden). It would also put a burden on contributors who are uncomfortable with using a DVCS. Regards, Martin (*) I'm probably missing something, but ISTM that committers can already use the DVCS - they only need to create a patch just before committing. This, of course, is somewhat more complicated than directly pushing the changes to the server, but it still gives them most of what is often reported as the advantage of a DVCS (local commits, ability to have many branches simultaneously, ability to share work-in-progress, etc). In essence, committers wanting to use a DVCS can do so today, by acting as if they were non-committers, and only using svn for actual changes to the master repository. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/jared.grubb%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Counting collisions for the win
On 20 Jan 2012, at 10:49, Brett Cannon wrote: > Why can't we have our cake and eat it too? > > Can we do hash randomization in 3.3 and use the hash count solution for > bugfix releases? That way we get a basic fix into the bugfix releases that > won't break people's code (hopefully) but we go with a more thorough (and IMO > correct) solution of hash randomization starting with 3.3 and moving forward. > We aren't breaking compatibility in any way by doing this since it's a > feature release anyway where we change tactics. And it can't be that much > work since we seem to have patches for both solutions. At worst it will make > merging commits for those files affected by the patches, but that will most > likely be isolated and not a common collision (and less of any issue once 3.3 > is released later this year). > > I understand the desire to keep backwards-compatibility, but collision > counting could cause an error in some random input that someone didn't expect > to cause issues whether they were under a DoS attack or just had some > unfortunate input from private data. The hash randomization, though, is only > weak if someone is attacked, not if they are just using Python with their own > private data. I agree; it sounds really odd to throw an exception since nothing is actually wrong and there's nothing the caller would do about it to recover anyway. Rather than throwing an exception, maybe you just reseed the random value for the hash: * this would solve the security issue that someone mentioned about being able to deduce the hash because if they keep being mean it'll change anyway * for bugfix, start off without randomization (seed==0) and start to use it only when the collision count hits the threshold * for release, reseeding when you hit a certain threshold still seems like a good idea as it will make lookups/insertions better in the long-run AFAIUI, Python already doesnt guarantee order stability when you insert something into a dictionary, as in the worst case the dictionary has to resize its hash table, and then the order is freshly jumbled again. Just my two cents. Jared ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com