Re: [Python-Dev] Regexp 2.7

2009-03-10 Thread Jared Grubb
Would there be any interest in augmenting the test case library for  
the regex stuff?


When I was working on PyPy, we were using a simplified regular  
expression matcher to implement the tokenizer for Python. I was able  
to take a lot of PCRE's regex tests and port them to test our regular  
expression implementation (to make sure the DFA's were being optimized  
properly, etc).


I believe the PCRE test library was under a very liberal license, and  
so we may be able to do the same here. If there's interest in it, I  
can do the same for Python.


Jared

On 9 Mar 2009, at 16:07, Antoine Pitrou wrote:


Facundo Batista  gmail.com> writes:


Matthew Barnett has been doing a lot of work on the regular  
expressions

engine
(it seems he hasn't finished yet) under http://bugs.python.org/issue2636 
.
However, the patches are really huge and touch all of the sre  
internals. I
wonder what the review process can be for such patches? Is there  
someone

knowledgeable enough to be able to review them?


All test cases run ok? How well covered is that library?


I don't know, I haven't even tried it.

Regards
Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Regexp 2.7

2009-03-10 Thread Jared Grubb
I'm not criticizing the current battery of tests, nor am I arguing  
that we replace them.


There's a comment in the test_re.py that says that "these tests were  
carefully modeled to cover most of the code"... That is a very  
difficult statement to maintain and/or verify, especially if the  
library gets a major revision (which it appears the original post's  
patch is).


PCRE has _thousands_ of detailed regular expression tests, testing  
everything from matching to parsing to extended regular expression  
syntax to encoding and locales. (It's been a while since I've looked  
at the details, but of course there are tests that dont apply to  
Python's implmentation.)


So, if there's interest in investigating how much of the PCRE tests  
can augment the existing tests, I am offering to do so. (I already did  
a simple translation utility to parse the PCRE test format into  
something we could use in the PyPy test suite; I could try to do  
something similar for test_re, if there's interest).


Jared

On 10 Mar 2009, at 11:32, Guido van Rossum wrote:


Hm, what's wrong with the existing set of regex test cases? This is
one of the most complete set of test cases in our test suite.

On Tue, Mar 10, 2009 at 11:08 AM, Jared Grubb  
 wrote:
Would there be any interest in augmenting the test case library for  
the

regex stuff?

On 9 Mar 2009, at 16:07, Antoine Pitrou wrote:


Facundo Batista  gmail.com> writes:


Matthew Barnett has been doing a lot of work on the regular  
expressions engine


(it seems he hasn't finished yet) under http://bugs.python.org/issue2636 
.
However, the patches are really huge and touch all of the sre  
internals.
I wonder what the review process can be for such patches? Is  
there someone

knowledgeable enough to be able to review them?


All test cases run ok? How well covered is that library?


I don't know, I haven't even tried it.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Grammar for plus and minus unary ops

2009-03-27 Thread Jared Grubb
I was recently reviewing some Python code for a friend who is a C++  
programmer, and he had code something like this:


def foo():
  try = 0
  while tryI was a bit surprised that this was syntactically valid, and because  
the timeout condition only occurred in exceptional cases, the error  
has not yet caused any problems.


It appears that the grammar treats the above example as the unary + op  
applied twice:


u_expr ::=
 power | "-" u_expr
  | "+" u_expr | "\~" u_expr

Playing in the interpreter, expressions like "1+5" and  "1+-+- 
+-+-+-+-5" evaluate to 6.


I'm not a EBNF expert, but it seems that we could modify the grammar  
to be more restrictive so the above code would not be silently valid.  
E.g., "++5" and "1+++5" and "1+-+5" are syntax errors, but still keep  
"1++5", "1+-5", "1-+5" as valid. (Although, '~' throws in a kink...  
should '~-5' be legal? Seems so...)


Jared
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pyc files, constant folding and borderline portability issues

2009-04-07 Thread Jared Grubb

On 7 Apr 2009, at 11:59, Alexandru Moșoi wrote:

Not necessarily. For example C/C++ doesn't define the order of the
operations inside an expression (and AFAIK neither Python) and
therefore folding 2 * 3 is OK whether b is an integer or an arbitrary
object with mul operator overloaded. Moreover one would expect * to be
associative and commutative (take a look at Python strings); if a * 2
* 3 returns a different result from a * 6 I will find it very
surprising and probably reject such code.


That's not true. All ops in C/C++ have associativity that is fixed and  
well-defined; the star op is left-associative:

2*3*x is (2*3)*x is 6*x
x*2*3 is (x*2)*3, and this is NOT x*6 (You can show this in C++ by  
creating a class that has a side-effect in its * operator).


The star operator is not commutative in Python or C/C++ (otherwise  
what would __rmul__ do?). It's easier to see that + is not  
commutative: "abc"+"def" and "def"+"abc" are definitely different!


You may be confusing the "order is undefined" for the evaluation of  
parameter lists in C/C++. Example: foo(f(),g()) calls f and g in an  
undefined order.


Jared
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue5434: datetime.monthdelta

2009-04-16 Thread Jared Grubb

On 16 Apr 2009, at 11:42, Paul Moore wrote:
The key thing missing (I believe) from dateutil is any equivalent of  
monthmod.



I agree with that. It's well-defined and it makes a lot of sense. +1

But, I dont think monthdelta can be made to work... what should the  
following be?


print(date(2008,1,30) + monthdelta(1))
print(date(2008,1,30) + monthdelta(2))
print(date(2008,1,30) + monthdelta(1) + monthdelta(1))

Jared

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] #!/usr/bin/env python --> python3 where applicable

2009-04-20 Thread Jared Grubb


On 19 Apr 2009, at 02:17, Stephen J. Turnbull wrote:

Nick Coghlan writes:

3. Change the shebang lines in Python standard library scripts to be
version specific and update release.py to fix them all when bumping  
the

version number in the source tree.


+1

I think that it's probably best to leave "python", "python2", and
"python3" for the use of downstream distributors.  ISTR that was what
Guido concluded, in the discuss that led to Python 3 defaulting to
altinstall---it wasn't just convenient because Python 3 is a major
change, but that experience has shown that deciding which Python is
going to be "The python" on somebody's system just isn't a decision
that Python should make.


Ok, so if I understand, the situation is:
* python points to 2.x version
* python3 points to 3.x version
* need to be able to run certain 3k scripts from cmdline (since we're  
talking about shebangs) using Python3k even though "python" points to  
2.x


So, if I got the situation right, then do these same scripts  
understand that PYTHONPATH and PYTHONHOME and all the others are also  
probably pointing to 2.x code?


Jared
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PYTHON3PATH

2010-01-13 Thread Jared Grubb
On 13 Jan 2010, at 13:43, Nick Coghlan wrote:
> Guido van Rossum wrote:
>> On Wed, Jan 13, 2010 at 9:57 AM, sstein...@gmail.com > Or, how about
>> just removing the antiquated use of environment variables altogether
>> from Python 3 and avoid the issue completely.
>> 
>> -1. They have their use, but more in controlled situations. If you
>> have "global" env vars that you only want to use with Python 2.x,
>> write a wrapper for Python 3 that invokes it with -E.
> 
> Perhaps a case can be made for Python 3 to assume -E by default, with a
> -e option to enable reading of the environment variables?
> 
> That way naive users could run Python3 without tripping over existing
> Python2 environment variables, while other tools could readily set up a
> different environment before launching Python3.

I raised a question about these PYTHON3* variables once before in a discussion 
about shebang lines:
http://www.mailinglistarchive.com/python-dev@python.org/msg29500.html

I'm not advocating them, but just wanted to make sure to bring up the shebang 
use case.

Jared
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] interesting article on regex performance

2010-03-12 Thread Jared Grubb
On 12 Mar 2010, at 15:22, s...@pobox.com wrote:
> 
>  Collin> re2 is not a full replacement for Python's current regex
>  Collin> semantics: it would only serve as an accelerator for a subset of
>  Collin> the current regex language. Given that, it makes perfect sense
>  Collin> that it would be optional on such minority platforms (much like
>  Collin> the incoming JIT).
> 
> Sure, but over the years Python has supported at least four different
> regular expression modules that I'm aware of (regex, regexp, and the current
> re module with different extension modules underneath it, perhaps there were
> others).  During some of that time more than one module was distributed with
> Python proper.  I think the desire today would be that only one regular
> expression module be distributed with Python (that would be my vote anyway).
> Getting people to move off the older libraries was difficult.  If re2 can't
> replace sre under the covers than I think it belongs in PyPI, not the Python
> distribution.  That said, that suggests to me that a different NFA or DFA
> implementation written in C would replace sre, one not written in C++.

re2 would be a supplement to re -- it is not a replacement, and Python would 
run fine if it's not present on some platforms. 

It's like a floating-point processor: you can do all math you need with just an 
integer processor. But if you have an FPU present, then it makes sense to use 
it for the FP operations. 

Jared
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-09 Thread Jared Grubb
This is a really interesting idea. If extra memory/lookup overhead is  
a concern, you could enable this new feature by default when the  
interactive interpreter is started (where it's more likely to be  
invoked), and turn it off by default when running scripts/modules.


Jared

On 9 Oct 2008, at 20:37, [EMAIL PROTECTED] wrote:



On 9 Oct, 11:12 pm, [EMAIL PROTECTED] wrote:

Background
--
In the itertools module docs, I included pure python equivalents  
for each of the C functions.  Necessarily, some of those  
equivalents are only approximate but they seem to have greatly  
enhanced the docs.


Why not go the other direction?

Ostensibly the reason for writing a module like 'itertools' in C is  
purely for performance.  There's nothing that I'm aware of in that  
module which couldn't be in Python.


Similarly, cStringIO, cPickle, etc.  Everywhere these diverge, it is  
(if not a flat-out bug) not optimal.  External projects are  
encouraged by a wealth of documentation to solve performance  
problems in a similar way: implement in Python, once you've got the  
interface right, optimize into C.


So rather than have a C implementation, which points to Python, why  
not have a Python implementation that points at C?  'itertools' (and  
similar) can actually be Python modules, and use a decorator, let's  
call it "C", to do this:


  @C("_c_itertools.count")
  class count(object):
  """
  This is the documentation for both the C version of  
itertools.count

  and the Python version - since they should be the same, right?
  """

In Python itself, the Python module would mostly be for  
documentation, and therefore solve the problem that Raymond is  
talking about, but it could also be a handy fallback for sanity  
checking, testing, and use in other Python runtimes (ironpython,  
jython, pypy).  Many third-party projects already use ad-hoc  
mechanisms for doing this same thing, but an officially-supported  
way of saying "this works, but the optimized version is over here"  
would be a very useful convention.


In those modules which absolutely require some C stuff to work, the  
python module could still serve as documentation.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/jared.grubb%40gmail.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 374 (DVCS) now in reST

2009-01-25 Thread Jared Grubb
Regardless of the outcome, those that want to use SVN can use SVN, and  
those that want to use "chosen DVCS" can use that. In the end, which  
is the more "lossy" repository? It seems like if the change is  
transparent to everyone who is using it, then the only thing that we  
care about is that the chosen backend will preserve all the  
information to make it truly transparent to everyone involved.


Jared

On 25 Jan 2009, at 10:37, Martin v. Löwis wrote:

There's a possible third way.  I've heard (though haven't  
investigated)
that some people are working on supporting the svn wire protocol in  
the
bzr server.  This would mean that anybody who's still comfortable  
with

svn and feels no need to change their current habits can continue to
work the way they always have.  Those that want the extra benefits  
of a
DVCS, or who do not have commit access to the code.python.org  
branches

would have viable alternatives.


Of course, those without commit access *already* have viable
alternatives, IIUC, by means of the automatic ongoing conversion of
the svn repository to bzr and hg (and, IIUC, git - or perhaps you
can use git-svn without the need for server-side conversion).

So a conversion to a DVCS would only benefit those committers who
see a benefit in using a DVCS (*) (and would put a burden on those
committers who see a DVCS as a burden). It would also put a burden
on contributors who are uncomfortable with using a DVCS.

Regards,
Martin

(*) I'm probably missing something, but ISTM that committers can  
already
use the DVCS - they only need to create a patch just before  
committing.
This, of course, is somewhat more complicated than directly pushing  
the

changes to the server, but it still gives them most of what is often
reported as the advantage of a DVCS (local commits, ability to have  
many

branches simultaneously, ability to share work-in-progress, etc). In
essence, committers wanting to use a DVCS can do so today, by acting
as if they were non-committers, and only using svn for actual changes
to the master repository.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/jared.grubb%40gmail.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Counting collisions for the win

2012-01-21 Thread Jared Grubb
On 20 Jan 2012, at 10:49, Brett Cannon wrote:
> Why can't we have our cake and eat it too?
> 
> Can we do hash randomization in 3.3 and use the hash count solution for 
> bugfix releases? That way we get a basic fix into the bugfix releases that 
> won't break people's code (hopefully) but we go with a more thorough (and IMO 
> correct) solution of hash randomization starting with 3.3 and moving forward. 
> We aren't breaking compatibility in any way by doing this since it's a 
> feature release anyway where we change tactics. And it can't be that much 
> work since we seem to have patches for both solutions. At worst it will make 
> merging commits for those files affected by the patches, but that will most 
> likely be isolated and not a common collision (and less of any issue once 3.3 
> is released later this year).
> 
> I understand the desire to keep backwards-compatibility, but collision 
> counting could cause an error in some random input that someone didn't expect 
> to cause issues whether they were under a DoS attack or just had some 
> unfortunate input from private data. The hash randomization, though, is only 
> weak if someone is attacked, not if they are just using Python with their own 
> private data.

I agree; it sounds really odd to throw an exception since nothing is actually 
wrong and there's nothing the caller would do about it to recover anyway. 
Rather than throwing an exception, maybe you just reseed the random value for 
the hash:
 * this would solve the security issue that someone mentioned about being able 
to deduce the hash because if they keep being mean it'll change anyway
 * for bugfix, start off without randomization (seed==0) and start to use it 
only when the collision count hits the threshold
 * for release, reseeding when you hit a certain threshold still seems like a 
good idea as it will make lookups/insertions better in the long-run

AFAIUI, Python already doesnt guarantee order stability when you insert 
something into a dictionary, as in the worst case the dictionary has to resize 
its hash table, and then the order is freshly jumbled again.

Just my two cents.

Jared
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com