Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data

2018-03-29 Thread Chris Jerdonek
On Wed, Mar 28, 2018 at 6:15 PM, Nathaniel Smith  wrote:
> On Wed, Mar 28, 2018 at 1:03 PM, Serhiy Storchaka  wrote:
>> 28.03.18 21:39, Antoine Pitrou пише:
>>> I'd like to submit this PEP for discussion.  It is quite specialized
>>> and the main target audience of the proposed changes is
>>> users and authors of applications/libraries transferring large amounts
>>> of data (read: the scientific computing & data science ecosystems).
>>
>> Currently I'm working on porting some features from cloudpickle to the
>> stdlib. For these of them which can't or shouldn't be implemented in the
>> general purpose library (like serializing local functions by serializing
>> their code objects, because it is not portable) I want to add hooks that
>> would allow to implement them in cloudpickle using official API. This would
>> allow cloudpickle to utilize C implementation of the pickler and unpickler.
>
> There's obviously some tension here between pickle's use as a
> persistent storage format, and its use as a transient wire format. For
> the former, you definitely can't store code objects because there's no
> forwards- or backwards-compatibility guarantee for bytecode. But for
> the latter, transmitting bytecode is totally fine, because all you
> care about is whether it can be decoded once, right now, by some peer
> process whose python version you can control -- that's why cloudpickle
> exists.

Is it really true you'll always be able to control the Python version
on the other side? Even if they're internal services, it seems like
there could be times / reasons preventing you from upgrading the
environment of all of your services at the same rate. Or did you mean
to say "often" all you care about ...?

--Chris



>
> Would it make sense to have a special pickle version that the
> transient wire format users could opt into, that only promises
> compatibility within a given 3.X release cycle? Like version=-2 or
> version=pickle.NONPORTABLE or something?
>
> (This is orthogonal to Antoine's PEP.)
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/chris.jerdonek%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data

2018-03-29 Thread Nathaniel Smith
On Thu, Mar 29, 2018 at 12:56 AM, Chris Jerdonek
 wrote:
> On Wed, Mar 28, 2018 at 6:15 PM, Nathaniel Smith  wrote:
>> On Wed, Mar 28, 2018 at 1:03 PM, Serhiy Storchaka  
>> wrote:
>>> 28.03.18 21:39, Antoine Pitrou пише:
 I'd like to submit this PEP for discussion.  It is quite specialized
 and the main target audience of the proposed changes is
 users and authors of applications/libraries transferring large amounts
 of data (read: the scientific computing & data science ecosystems).
>>>
>>> Currently I'm working on porting some features from cloudpickle to the
>>> stdlib. For these of them which can't or shouldn't be implemented in the
>>> general purpose library (like serializing local functions by serializing
>>> their code objects, because it is not portable) I want to add hooks that
>>> would allow to implement them in cloudpickle using official API. This would
>>> allow cloudpickle to utilize C implementation of the pickler and unpickler.
>>
>> There's obviously some tension here between pickle's use as a
>> persistent storage format, and its use as a transient wire format. For
>> the former, you definitely can't store code objects because there's no
>> forwards- or backwards-compatibility guarantee for bytecode. But for
>> the latter, transmitting bytecode is totally fine, because all you
>> care about is whether it can be decoded once, right now, by some peer
>> process whose python version you can control -- that's why cloudpickle
>> exists.
>
> Is it really true you'll always be able to control the Python version
> on the other side? Even if they're internal services, it seems like
> there could be times / reasons preventing you from upgrading the
> environment of all of your services at the same rate. Or did you mean
> to say "often" all you care about ...?

Yeah, maybe I spoke a little sloppily -- I'm sure there are cases
where you're using pickle as a wire format between heterogenous
interpreters, in which case you wouldn't use version=NONPORTABLE. But
projects like dask, and everyone else who uses cloudpickle/dill, are
already assuming homogenous interpreters.

A typical way of using these kinds of systems is: you start your
script, it spins up some cloud VMs or local cluster nodes (maybe
sending them all a conda environment you made), they all chat for a
while doing your computation, and then they spin down again and your
script reports the results. So versioning and coordinated upgrades
really aren't a thing you need to worry about :-).

Another example is the multiprocessing module: it's very safe to
assume that the parent and the child are using the same interpreter
:-). There's no fundamental reason you shouldn't be able to send
bytecode between them.

Pickle's not really the ideal wire format for persistent services
anyway, given the arbitrary code execution and tricky versioning --
even if you aren't playing games with bytecode, pickle still assumes
that if two classes in two different interpreters have the same name,
then their internal implementation details are all the same. You can
make it work, but usually there are better options. It's perfect
though for multi-core and multi-machine parallelism.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data

2018-03-29 Thread Antoine Pitrou
On Thu, 29 Mar 2018 01:40:17 +
Robert Collins  wrote:
> >
> > Data sharing
> > 
> >
> > If you pickle and then unpickle an object in the same process, passing
> > out-of-band buffer views, then the unpickled object may be backed by the
> > same buffer as the original pickled object.
> >
> > For example, it might be reasonable to implement reduction of a Numpy array
> > as follows (crucial metadata such as shapes is omitted for simplicity)::
> >
> >class ndarray:
> >
> >   def __reduce_ex__(self, protocol):
> >  if protocol == 5:
> > return numpy.frombuffer, (PickleBuffer(self), self.dtype)
> >  # Legacy code for earlier protocols omitted
> >
> > Then simply passing the PickleBuffer around from ``dumps`` to ``loads``
> > will produce a new Numpy array sharing the same underlying memory as the
> > original Numpy object (and, incidentally, keeping it alive)::  
> 
> This seems incompatible with v4 semantics. There, a loads plus dumps
> combination is approximately a deep copy. This isn't. Sometimes. Sometimes
> it is.

True.  But it's only incompatible if you pass the new
``buffer_callback`` and ``buffers`` arguments.  If you don't, then you
always get a copy.  This is something that consumers should keep in
mind.

Note there's a movement towards immutable data. For example, Dask
arrays and Arrow arrays are designed as immutable.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data

2018-03-29 Thread Chris Angelico
On Thu, Mar 29, 2018 at 7:18 PM, Nathaniel Smith  wrote:
> Another example is the multiprocessing module: it's very safe to
> assume that the parent and the child are using the same interpreter
> :-). There's no fundamental reason you shouldn't be able to send
> bytecode between them.

You put a smiley on it, but is this actually guaranteed on all
platforms? On Unix-like systems, presumably it's using fork() and thus
will actually use the exact same binary, but what about on Windows,
where a new process has to be spawned? Can you say "spawn me another
of this exact binary blob", or do you have to identify it by a file
name?

It wouldn't be a problem for the nonportable mode to toss out an
exception in weird cases like this, but it _would_ be a problem if
that causes a segfault or something.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data

2018-03-29 Thread Paul Moore
On 29 March 2018 at 09:49, Chris Angelico  wrote:
> On Thu, Mar 29, 2018 at 7:18 PM, Nathaniel Smith  wrote:
>> Another example is the multiprocessing module: it's very safe to
>> assume that the parent and the child are using the same interpreter
>> :-). There's no fundamental reason you shouldn't be able to send
>> bytecode between them.
>
> You put a smiley on it, but is this actually guaranteed on all
> platforms? On Unix-like systems, presumably it's using fork() and thus
> will actually use the exact same binary, but what about on Windows,
> where a new process has to be spawned? Can you say "spawn me another
> of this exact binary blob", or do you have to identify it by a file
> name?
>
> It wouldn't be a problem for the nonportable mode to toss out an
> exception in weird cases like this, but it _would_ be a problem if
> that causes a segfault or something.

If you're embedding, you need multiprocessing.set_executable()
(https://docs.python.org/3.6/library/multiprocessing.html#multiprocessing.set_executable),
so in that case you definitely *won't* have the same binary...

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data

2018-03-29 Thread Chris Angelico
On Thu, Mar 29, 2018 at 7:56 PM, Paul Moore  wrote:
> On 29 March 2018 at 09:49, Chris Angelico  wrote:
>> On Thu, Mar 29, 2018 at 7:18 PM, Nathaniel Smith  wrote:
>>> Another example is the multiprocessing module: it's very safe to
>>> assume that the parent and the child are using the same interpreter
>>> :-). There's no fundamental reason you shouldn't be able to send
>>> bytecode between them.
>>
>> You put a smiley on it, but is this actually guaranteed on all
>> platforms? On Unix-like systems, presumably it's using fork() and thus
>> will actually use the exact same binary, but what about on Windows,
>> where a new process has to be spawned? Can you say "spawn me another
>> of this exact binary blob", or do you have to identify it by a file
>> name?
>>
>> It wouldn't be a problem for the nonportable mode to toss out an
>> exception in weird cases like this, but it _would_ be a problem if
>> that causes a segfault or something.
>
> If you're embedding, you need multiprocessing.set_executable()
> (https://docs.python.org/3.6/library/multiprocessing.html#multiprocessing.set_executable),
> so in that case you definitely *won't* have the same binary...

Ah, and that also showed me that forking isn't mandatory on Unix
either. So yeah, there's no assuming that they use the same binary.

I doubt it'll be a problem to pickle though as it'll use some form of
versioning even in NONPORTABLE mode right?

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Subtle difference between f-strings and str.format()

2018-03-29 Thread Jeff Allen
My credentials for this are that I re-worked str.format in Jython quite 
extensively, and I followed the design of f-strings a bit when they were 
introduced, but I haven't used them to write anything.


On 29/03/2018 00:48, Tim Peters wrote:

[Tim Delaney ]

...
I also assumed (not having actually used an f-string) that all its
formatting arguments were evaluated before formatting.

It's a string - it doesn't have "arguments" as such.  For example:
def f(a, b, n):
 return f"{a+b:0{n}b}"  # the leading "f" makes it an f-string

Agreed "argument" is the wrong word, but so is "string". It's an 
expression returning a string, in which a, b and n are free variables. I 
think we can understand it best as a string-display 
(https://docs.python.org/3/reference/expressions.html#list-displays), or 
a sort of eval() call.


The difference Serhiy identifies emerges (I think) because in the 
conventional interpretation of a format call, the arguments of format 
are evaluated left-to right (all of them) and then formatted in the 
order references are encountered to these values in a tuple or 
dictionary. In an f-string expressions are evaluated as they are 
encountered. A more testing example is therefore perhaps:


    '{1} {0}'.format(a(), b()) # E1

    f'{b()}{a()}'  # E2


I think I would be very surprised to find b called before a in E1 
because of the general contract on the meaning of method calls. I'm 
assuming that's what an AST-based optimisation would do? There's no 
reason in E2 to call them in any other order than b then a and the 
documentation tells me they are.


But do I expect a() to be called before the results of b() are 
formatted? In E1 I definitely expect that. In E2 I don't think I'd be 
surprised either way. Forced to guess, I would guess that b() would be 
formatted and in the output buffer before a() was called, since it gives 
the implementation fewer things to remember. Then I hope I would not 
depend on this guesswork. Strictly-speaking the documentation doesn't 
say when the result is formatted in relation to the evaluation of other 
expressions, so there is permission for Serhiy's idea #2.


I think the (internal) AST change implied in Serhiy's idea #1 is the 
price one has to pay *if* one insists on optimising str.format().


str.format just a method like any other. The reasons would have to be 
very strong to give it special-case semantics. I agree that the cases 
are rare in which one would notice a difference. (Mostly I think it 
would be a surprise during debugging.) But I think users should be able 
to rely on the semantics of call. Easier optimisation doesn't seem to me 
a strong enough argument.


This leaves me at:
1: +1
2a, 2b: +0
3: -1


Jeff Allen

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data

2018-03-29 Thread Nathaniel Smith
On Thu, Mar 29, 2018, 02:02 Chris Angelico  wrote:

> On Thu, Mar 29, 2018 at 7:56 PM, Paul Moore  wrote:
> > On 29 March 2018 at 09:49, Chris Angelico  wrote:
> >> On Thu, Mar 29, 2018 at 7:18 PM, Nathaniel Smith  wrote:
> >>> Another example is the multiprocessing module: it's very safe to
> >>> assume that the parent and the child are using the same interpreter
> >>> :-). There's no fundamental reason you shouldn't be able to send
> >>> bytecode between them.
> >>
> >> You put a smiley on it, but is this actually guaranteed on all
> >> platforms? On Unix-like systems, presumably it's using fork() and thus
> >> will actually use the exact same binary, but what about on Windows,
> >> where a new process has to be spawned? Can you say "spawn me another
> >> of this exact binary blob", or do you have to identify it by a file
> >> name?
> >>
> >> It wouldn't be a problem for the nonportable mode to toss out an
> >> exception in weird cases like this, but it _would_ be a problem if
> >> that causes a segfault or something.
> >
> > If you're embedding, you need multiprocessing.set_executable()
> > (
> https://docs.python.org/3.6/library/multiprocessing.html#multiprocessing.set_executable
> ),
> > so in that case you definitely *won't* have the same binary...
>
> Ah, and that also showed me that forking isn't mandatory on Unix
> either. So yeah, there's no assuming that they use the same binary.
>

Normally it spawns children using `sys.executable`, which I think on
Windows in particular is guaranteed to be the same binary that started the
main process, because the OS locks the file while it's executing. But yeah,
I didn't think about the embedding case, and apparently there's also a
little-known set of features for using multiprocessing between arbitrary
python processes:
https://docs.python.org/3/library/multiprocessing.html#multiprocessing-listeners-clients


> I doubt it'll be a problem to pickle though as it'll use some form of
> versioning even in NONPORTABLE mode right?
>

I guess the (merged, but undocumented?) changes in
https://bugs.python.org/issue28053 should make it possible to set the
pickle version, and yeah, if we did add a NONPORTABLE mode then presumably
it would have some kind of header saying which version of python it was
created with, so version mismatches could give a sensible error message.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Subtle difference between f-strings and str.format()

2018-03-29 Thread Eric V. Smith

On 3/29/2018 6:17 AM, Jeff Allen wrote:
My credentials for this are that I re-worked str.format in Jython quite 
extensively, and I followed the design of f-strings a bit when they were 
introduced, but I haven't used them to write anything.


Thanks for your work on Jython. And hop on the f-string bandwagon!

The difference Serhiy identifies emerges (I think) because in the 
conventional interpretation of a format call, the arguments of format 
are evaluated left-to right (all of them) and then formatted in the 
order references are encountered to these values in a tuple or 
dictionary. In an f-string expressions are evaluated as they are 
encountered. A more testing example is therefore perhaps:


     '{1} {0}'.format(a(), b()) # E1

     f'{b()}{a()}'  # E2


I think I would be very surprised to find b called before a in E1 
because of the general contract on the meaning of method calls. I'm 
assuming that's what an AST-based optimisation would do? There's no 
reason in E2 to call them in any other order than b then a and the 
documentation tells me they are.


But do I expect a() to be called before the results of b() are 
formatted? In E1 I definitely expect that. In E2 I don't think I'd be 
surprised either way. Forced to guess, I would guess that b() would be 
formatted and in the output buffer before a() was called, since it gives 
the implementation fewer things to remember. Then I hope I would not 
depend on this guesswork. Strictly-speaking the documentation doesn't 
say when the result is formatted in relation to the evaluation of other 
expressions, so there is permission for Serhiy's idea #2.


I don't think we should restrict f-strings to having to evaluate all of 
the expressions before formatting. But, if we do restrict it, we should 
document whatever the order is in 3.6 and add tests to ensure the 
behavior doesn't change.


I think the (internal) AST change implied in Serhiy's idea #1 is the 
price one has to pay *if* one insists on optimising str.format().


str.format just a method like any other. The reasons would have to be 
very strong to give it special-case semantics. I agree that the cases 
are rare in which one would notice a difference. (Mostly I think it 
would be a surprise during debugging.) But I think users should be able 
to rely on the semantics of call. Easier optimisation doesn't seem to me 
a strong enough argument.


This leaves me at:
1: +1
2a, 2b: +0
3: -1


#1 seems so complex as to not be worth it, given the likely small 
overall impact of the optimization to a large program. If the speedup 
really is sufficiently important for a particular piece of code, I'd 
suggest just rewriting the code to use f-strings, and the author could 
then determine if the transformation breaks anything. Maybe write a 2to3 
like tool that would identify places where str.format or %-formatting 
could be replaced by f-strings? I know I'd run it on my code, if it 
existed. Because the optimization can only work code with literals, I 
think manually modifying the source code is an acceptable solution if 
the possible change in semantics implied by #3 are unacceptable.


Eric.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Subtle difference between f-strings and str.format()

2018-03-29 Thread Steven D'Aprano
On Wed, Mar 28, 2018 at 06:27:19PM +0300, Serhiy Storchaka wrote:

> The optimizer already changes 
> semantic. Non-optimized "if a and True:" would call bool(a) twice, but 
> optimized code calls it only once.

I don't understand this. Why would bool(a) be called twice, and when did 
this change? Surely calling it twice would be a bug.

I just tried the oldest Python 3 I have on this computer, 3.2, and bool 
is only called once.



-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Subtle difference between f-strings and str.format()

2018-03-29 Thread Chris Angelico
On Thu, Mar 29, 2018 at 11:28 PM, Steven D'Aprano  wrote:
> On Wed, Mar 28, 2018 at 06:27:19PM +0300, Serhiy Storchaka wrote:
>
>> The optimizer already changes
>> semantic. Non-optimized "if a and True:" would call bool(a) twice, but
>> optimized code calls it only once.
>
> I don't understand this. Why would bool(a) be called twice, and when did
> this change? Surely calling it twice would be a bug.
>
> I just tried the oldest Python 3 I have on this computer, 3.2, and bool
> is only called once.

Technically not bool() itself, but the equivalent. Here's some similar code:
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Subtle difference between f-strings and str.format()

2018-03-29 Thread Chris Angelico
On Fri, Mar 30, 2018 at 1:08 AM, Chris Angelico  wrote:
> On Thu, Mar 29, 2018 at 11:28 PM, Steven D'Aprano  wrote:
>> On Wed, Mar 28, 2018 at 06:27:19PM +0300, Serhiy Storchaka wrote:
>>
>>> The optimizer already changes
>>> semantic. Non-optimized "if a and True:" would call bool(a) twice, but
>>> optimized code calls it only once.
>>
>> I don't understand this. Why would bool(a) be called twice, and when did
>> this change? Surely calling it twice would be a bug.
>>
>> I just tried the oldest Python 3 I have on this computer, 3.2, and bool
>> is only called once.
>
> Technically not bool() itself, but the equivalent. Here's some similar code:

Wow, I'm good. Premature send much? Nice going, Chris. Let's try that
again. Here's some similar code:

>>> def f(a):
... if a and x:
... print("Yep")
...
>>> class Bool:
... def __bool__(self):
... print("True?")
... return True
...
>>> x = 1
>>> f(Bool())
True?
Yep

This is, however, boolifying a, then boolifying x separately. To bool
a twice, you'd need to write this instead:

>>> def f(a):
... if a or False:
... print("Yep")
...

In its optimized form, this still only boolifies a once. But we can
defeat the optimization:

>>> def f(a):
... cond = a or False
... if cond:
... print("Yep")
...
>>> f(Bool())
True?
True?
Yep

The "or False" part implies a booleanness check on its left operand,
and the 'if' statement performs a boolean truthiness check on its
result. That means two calls to __bool__ in the unoptimized form. But
it gets optimized, on the assumption that __bool__ is a pure function.
The version assigning to a temporary variable does one check before
assigning, and then another check in the 'if'; the same thing without
the temporary skips the second check, and just goes ahead and enters
the body of the 'if'.

Technically that's a semantic change. But I doubt it'll hurt anyone.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data

2018-03-29 Thread Antoine Pitrou
On Thu, 29 Mar 2018 11:25:13 +
Nathaniel Smith  wrote:
> 
> > I doubt it'll be a problem to pickle though as it'll use some form of
> > versioning even in NONPORTABLE mode right?
> >  
> 
> I guess the (merged, but undocumented?) changes in
> https://bugs.python.org/issue28053 should make it possible to set the
> pickle version [...]

Not only undocumented, but untested and they are actually look plain
wrong when looking at that diff.  Notice how "reduction" is imported
using `from .context import reduction` and then changed inside the
"context" module using `globals()['reduction'] = reduction`.  That
seems unlikely to produce any effect.

(not to mention the abstract base class that doesn't seem to define any
abstract methods or properties)

To be frank such an unfinished patch should never have been committed.
I may consider undoing it if I find some spare cycles.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Subtle difference between f-strings and str.format()

2018-03-29 Thread Terry Reedy

On 3/28/2018 11:27 AM, Serhiy Storchaka wrote:
The optimizer already changes 
semantic. Non-optimized "if a and True:" would call bool(a) twice, but 
optimized code calls it only once.


Perhaps Ref 3.3.1 object.__bool__ entry, after " should return False or 
True.", should say something like "Should not have side-effects, as 
redundant bool calls may be optimized away (bool(bool(ob)) should have 
the same result as bool(ob))."



--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Sets, Dictionaries

2018-03-29 Thread Stephen Hansen
On Wed, Mar 28, 2018, at 9:14 PM, Julia Kim wrote:
> My suggestion is to change the syntax for creating an empty set and an 
> empty dictionary as following.
> 
> an_empty_set = {}
> an_empty_dictionary = {:}
> 
> It would seem to make more sense.

The amount of code this would break is astronomical. 

-- 
Stephen Hansen
  m e @ i x o k a i  . i o
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data

2018-03-29 Thread Nick Coghlan
On 29 March 2018 at 04:39, Antoine Pitrou  wrote:
>
> Hi,
>
> I'd like to submit this PEP for discussion.  It is quite specialized
> and the main target audience of the proposed changes is
> users and authors of applications/libraries transferring large amounts
> of data (read: the scientific computing & data science ecosystems).
>
> https://www.python.org/dev/peps/pep-0574/
>
> The PEP text is also inlined below.

+1 from me, which you already knew :)

For folks that haven't read Eric Snow's PEP 554 about exposing
multiple interpreter support as a Python level API, Antoine's proposed
zero-copy-data-management enhancements for pickle complement that
nicely, since they allow the three initial communication primitives in
PEP 554 (passing None, bytes, memory views) to be more efficiently
expanded to handling arbitrary objects by sending first the pickle
data, then the out-of-band memory views, and finally None as an
end-of-message marker.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Subtle difference between f-strings and str.format()

2018-03-29 Thread Nick Coghlan
On 29 March 2018 at 21:50, Eric V. Smith  wrote:
> #1 seems so complex as to not be worth it, given the likely small overall
> impact of the optimization to a large program. If the speedup really is
> sufficiently important for a particular piece of code, I'd suggest just
> rewriting the code to use f-strings, and the author could then determine if
> the transformation breaks anything. Maybe write a 2to3 like tool that would
> identify places where str.format or %-formatting could be replaced by
> f-strings? I know I'd run it on my code, if it existed. Because the
> optimization can only work code with literals, I think manually modifying
> the source code is an acceptable solution if the possible change in
> semantics implied by #3 are unacceptable.

While more projects are starting to actively drop Python 2.x support,
there are also quite a few still straddling the two different
versions. The "rewrite to f-strings" approach requires explicitly
dropping support for everything below 3.6, whereas implicit
optimization of literal based formatting will work even for folks
preserving backwards compatibility with older versions.

As far as the semantics go, perhaps it would be possible to explicitly
create a tuple as part of the implementation to ensure that the
arguments are still evaluated in order, and everything gets calculated
exactly once? This would have the benefit that even format strings
that used numbered references could be optimised in a fairly
straightforward way.

'{}{}'.format(a, b)

would become:

_hidden_ref = (a, b)
f'{_hidden_ref[0]}{_hidden_ref[1]}'

while:

'{1}{0}'.format(a, b)

would become:

_hidden_ref = (a, b)
f'{_hidden_ref[1]}{_hidden_ref[0]}'

This would probably need to be implemented as Serhiy's option 1
(generating a distinct AST node), which in turn leads to 2a: adding
extra stack manipulation opcodes in order to more closely replicate
str.format semantics.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Subtle difference between f-strings and str.format()

2018-03-29 Thread Eric V. Smith

On 3/29/2018 12:13 PM, Nick Coghlan wrote:

On 29 March 2018 at 21:50, Eric V. Smith  wrote:

#1 seems so complex as to not be worth it, given the likely small overall
impact of the optimization to a large program. If the speedup really is
sufficiently important for a particular piece of code, I'd suggest just
rewriting the code to use f-strings, and the author could then determine if
the transformation breaks anything. Maybe write a 2to3 like tool that would
identify places where str.format or %-formatting could be replaced by
f-strings? I know I'd run it on my code, if it existed. Because the
optimization can only work code with literals, I think manually modifying
the source code is an acceptable solution if the possible change in
semantics implied by #3 are unacceptable.


While more projects are starting to actively drop Python 2.x support,
there are also quite a few still straddling the two different
versions. The "rewrite to f-strings" approach requires explicitly
dropping support for everything below 3.6, whereas implicit
optimization of literal based formatting will work even for folks
preserving backwards compatibility with older versions.


Sure. But 3.6 will be 3 years old before this optimization is released. 
I've been seeing 3.4 support dropping off, and expect to see 3.5 follow 
suit by the time 3.8 is released. Although maybe the thought is to do 
this in a bug-fix release? If we're changing semantics at all, that 
seems like a non-starter.



As far as the semantics go, perhaps it would be possible to explicitly
create a tuple as part of the implementation to ensure that the
arguments are still evaluated in order, and everything gets calculated
exactly once? This would have the benefit that even format strings
that used numbered references could be optimised in a fairly
straightforward way.

 '{}{}'.format(a, b)

would become:

 _hidden_ref = (a, b)
 f'{_hidden_ref[0]}{_hidden_ref[1]}'

while:

 '{1}{0}'.format(a, b)

would become:

 _hidden_ref = (a, b)
 f'{_hidden_ref[1]}{_hidden_ref[0]}'

This would probably need to be implemented as Serhiy's option 1
(generating a distinct AST node), which in turn leads to 2a: adding
extra stack manipulation opcodes in order to more closely replicate
str.format semantics.


I still think the complexity isn't worth it, but maybe I'm a lone voice 
on this.


Eric.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Sets, Dictionaries

2018-03-29 Thread David Mertz
I agree with everything Steven says. But it's true that even as a 20-year
Python user, this is an error I make moderately often when I want an empty
set... Notwithstanding that I typed it thousands of times before sets even
existed (and still type it when I want an empty dictionary).

That said, I've sort of got in the habit of using the type initializers:

x = set()
y = dict()
z = list()

I feel like those jump out a little better visually. But I'm inconsistent
in my code.

On Thu, Mar 29, 2018, 2:03 AM Steven D'Aprano  wrote:

> Hi Julia, and welcome!
>
> On Wed, Mar 28, 2018 at 09:14:53PM -0700, Julia Kim wrote:
>
> > My suggestion is to change the syntax for creating an empty set and an
> > empty dictionary as following.
> >
> > an_empty_set = {}
> > an_empty_dictionary = {:}
> >
> > It would seem to make more sense.
>
> Indeed it would, and if sets had existed in Python since the beginning,
> that's probably exactly what we would have done. But unfortunately they
> didn't, and {} has meant an empty dict forever.
>
> The requirement to keep backwards-compatibility is a very, very hard
> barrier to cross. I think we all acknowledge that it is sad and a little
> bit confusing that {} means a dict not a set, but it isn't sad or
> confusing enough to justify breaking millions of existing scripts and
> applications.
>
> Not to mention the confusing transition period when the community would
> be using *both* standards at the same time, which could easily last ten
> years.
>
> Given that, I think we just have to accept that having to use set() for
> the empty set instead of {} is a minor wart on the language that we're
> stuck with.
>
> If you disagree, and think that you have a concrete plan that can make
> this transition work, we'll be happy to hear it, but you'll almost
> certainly need to write a PEP before it could be accepted.
>
> https://www.python.org/dev/peps/
>
>
> Thanks,
>
> --
> Steve
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/mertz%40gnosis.cx
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data

2018-03-29 Thread Serhiy Storchaka

28.03.18 23:19, Antoine Pitrou пише:

Agreed.  Do you know by which timeframe you'll know which opcodes you
want to add?


I'm currently in the middle of the first part, trying to implement 
pickling local classes with static and class methods without creating 
loops. Other parts exist just like general ideas, I didn't rite code for 
them still. I try to do this with existing protocols, but maybe some new 
opcodes will be needed for efficiency. We are now at the early stage of 
3.8 developing, and I think we have a lot of time.


It wouldn't deserve bumping pickle version, but if we do this already, 
it would be worth to add shorter versions for FRAME. Currently it uses 
64-bit size, and 9 bytes is a large overhead for short pickles. 8-bit 
size would reduce overhead for short pickles, and 32-bit size would be 
enough for any practical use (larger data is not wrapped in a frame).


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Sets, Dictionaries

2018-03-29 Thread Eric Fahlgren
On Thu, Mar 29, 2018 at 10:11 AM, David Mertz  wrote:

> I agree with everything Steven says. But it's true that even as a 20-year
> Python user, this is an error I make moderately often when I want an empty
> set... Notwithstanding that I typed it thousands of times before sets even
> existed (and still type it when I want an empty dictionary).
>
> That said, I've sort of got in the habit of using the type initializers:
>
> x = set()
> y = dict()
> z = list()
>
> I feel like those jump out a little better visually. But I'm inconsistent
> in my code.
>

​
Yeah, we've been doing that for several years, too.  A hair slower in some
cases, but much more greppable...

​
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Subtle difference between f-strings and str.format()

2018-03-29 Thread Serhiy Storchaka

29.03.18 13:17, Jeff Allen пише:

     '{1} {0}'.format(a(), b()) # E1

     f'{b()}{a()}'  # E2


I think I would be very surprised to find b called before a in E1 
because of the general contract on the meaning of method calls. I'm 
assuming that's what an AST-based optimisation would do? There's no 
reason in E2 to call them in any other order than b then a and the 
documentation tells me they are.


I was going to optimize only formatting with implicit references. '{} 
{}' but not '{1} {0}' and either not '{0} {1}'. This guaranties in-order 
computation and referencing every subexpression only once. I don't have 
a goal of converting every string formatting, but only the most common 
and the most simple ones.


If go further, we will need to add several new AST nodes (like for 
comprehensions).


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Move ensurepip blobs to external place

2018-03-29 Thread Donald Stufft
From my POV I don’t care where they live, just document how to update them 
going forward. 

Sent from my iPhone

> On Mar 24, 2018, at 4:50 AM, Serhiy Storchaka  wrote:
> 
> Wouldn't be better to put them into a separate repository like Tcl/Tk and 
> other external binaries for Windows, and download only the recent version?

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Move ensurepip blobs to external place

2018-03-29 Thread Wes Turner
AFAIU, the objectives (with no particular ranking) are:

- minimize git clone time and bandwidth
- include latest pip with every python install
- run the full test suite with CI for every PR (buildbot)
  - the full test suite requires pip
- run the test suite locally when developing a PR
- minimize PyPI bandwidth

What are the proposed solutions?

...

https://help.github.com/articles/about-storage-and-bandwidth-usage/

> All personal and organization accounts using Git LFS receive 1 GB of free
storage and 1 GB a month of free bandwidth. If the bandwidth and storage
quotas are not enough, you can choose to purchase an additional quota for
Git LFS.
>

> Git LFS is available for every repository on GitHub, whether or not your
account or organization has a paid plan.

On Thursday, March 29, 2018, Donald Stufft  wrote:

> From my POV I don’t care where they live, just document how to update them
> going forward.
>
> Sent from my iPhone
>
> > On Mar 24, 2018, at 4:50 AM, Serhiy Storchaka 
> wrote:
> >
> > Wouldn't be better to put them into a separate repository like Tcl/Tk
> and other external binaries for Windows, and download only the recent
> version?
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> wes.turner%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Subtle difference between f-strings and str.format()

2018-03-29 Thread Steven D'Aprano
On Wed, Mar 28, 2018 at 06:27:19PM +0300, Serhiy Storchaka wrote:

> 2. Change the semantic of f-strings. Make it closer to the semantic of 
> str.format(): evaluate all subexpressions first than format them. This 
> can be implemented in two ways:
> 
> 2a) Add additional instructions for stack manipulations. This will slow 
> down f-strings.
> 
> 2b) Introduce a new complex opcode that will replace FORMAT_VALUE and 
> BUILD_STRING. This will speed up f-strings.

If the aim here is to be an optimization, then I vote strongly for 2b.

That gives you *faster f-strings* that have the same order-of-evaluation 
of normal method calls, so that when you optimize str.format into an 
f-string, not only is the behaviour identical, but they will be even 
faster than with option 3.

Python's execution model implies that 

obj.method(expression_a, expression_b)

should fully evaluate both expressions before they are passed to the 
method. Making str.format a magical special case that violates that rule 
should be a last resort.

In this case, we can have our cake and eat it too: both the str.format 
to f-string optimization and keeping the normal evaluation rules. And as 
a bonus, we make f-strings even faster.

I say "we", but of course it is Serhiy doing the work, thank you.

Is there a down-side to 2b? It sounds like something you might end up 
doing at a later date regardless of what you do now.


-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] [RELEASE] Python 3.7.0b3 is now available for testing

2018-03-29 Thread Ned Deily
On behalf of the Python development community and the Python 3.7 release
team, I'm happy to announce the availability of Python 3.7.0b3.  b3 is
the third of four planned beta releases of Python 3.7, the next major
release of Python, and marks the end of the feature development phase
for 3.7.  You can find Python 3.7.0b3 here:

https://www.python.org/downloads/release/python-370b3/

Among the new major new features in Python 3.7 are:

* PEP 538, Coercing the legacy C locale to a UTF-8 based locale
* PEP 539, A New C-API for Thread-Local Storage in CPython
* PEP 540, UTF-8 mode
* PEP 552, Deterministic pyc
* PEP 553, Built-in breakpoint()
* PEP 557, Data Classes
* PEP 560, Core support for typing module and generic types
* PEP 562, Module __getattr__ and __dir__
* PEP 563, Postponed Evaluation of Annotations
* PEP 564, Time functions with nanosecond resolution
* PEP 565, Show DeprecationWarning in __main__
* PEP 567, Context Variables

Please see "What’s New In Python 3.7" for more information.
Additional documentation for these features and for other changes
will be provided during the beta phase.

https://docs.python.org/3.7/whatsnew/3.7.html

Beta releases are intended to give you the opportunity to test new
features and bug fixes and to prepare their projects to support the
new feature release. We strongly encourage you to test your projects
with 3.7 during the beta phase and report issues found to
https://bugs.python.org as soon as possible.

While the release is feature complete entering the beta phase, it is
possible that features may be modified or, in rare cases, deleted up
until the start of the release candidate phase (2018-05-21). Our goal
is have no ABI changes after beta 3 and no code changes after rc1.
To achieve that, it will be extremely important to get as much exposure
for 3.7 as possible during the beta phase.

Attention macOS users: there is a new installer variant for
macOS 10.9+ that includes a built-in version of Tcl/Tk 8.6. This
variant is expected to become the default version when 3.7.0 releases.
Check it out! We welcome your feedback.  As of 3.7.0b3, the legacy
10.6+ installer also includes a built-in Tcl/Tk 8.6.

Please keep in mind that this is a preview release and its use is
not recommended for production environments.

The next planned release of Python 3.7 will be 3.7.0b4, currently
scheduled for 2018-04-30. More information about the release schedule
can be found here:

https://www.python.org/dev/peps/pep-0537/

--
  Ned Deily
  n...@python.org -- []

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Subtle difference between f-strings and str.format()

2018-03-29 Thread Nick Coghlan
On 30 March 2018 at 03:33, Eric V. Smith  wrote:
> On 3/29/2018 12:13 PM, Nick Coghlan wrote:
>> While more projects are starting to actively drop Python 2.x support,
>> there are also quite a few still straddling the two different
>> versions. The "rewrite to f-strings" approach requires explicitly
>> dropping support for everything below 3.6, whereas implicit
>> optimization of literal based formatting will work even for folks
>> preserving backwards compatibility with older versions.
>
>
> Sure. But 3.6 will be 3 years old before this optimization is released. I've
> been seeing 3.4 support dropping off, and expect to see 3.5 follow suit by
> the time 3.8 is released. Although maybe the thought is to do this in a
> bug-fix release? If we're changing semantics at all, that seems like a
> non-starter.

Definitely 3.8+ only. The nice thing about doing this implicitly at
the compiler level is that it potentially provides an automatic
performance improvement for existing libraries and applications. The
justification for the extra complexity would then come from whether or
not it actually measurably improve things, either for the benchmark
suite, or for folks' real-world applications.

Steven D'Aprano also raises a good point on that front: a
FORMAT_STRING super-opcode that sped up f-strings *and* allowed
semantics preserving constant-folding of str.format calls on string
literals could make more sense than a change that focused solely on
the implicit optimisation case.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] How can we use 48bit pointer safely?

2018-03-29 Thread INADA Naoki
Hi,

As far as I know, most amd64 and arm64 systems use only 48bit address spaces.
(except [1])

[1] 
https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf

It means there are some chance to compact some data structures.
I point two examples below.

My question is; can we use 48bit pointer safely?
It depends on CPU architecture & OS memory map.
Maybe, configure option which is available on only (amd64, amd64) *
(Linux, Windows, macOS)?


# Possible optimizations by 48bit pointer

## PyASCIIObject

[snip]
unsigned int ready:1;
/* Padding to ensure that PyUnicode_DATA() is always aligned to
   4 bytes (see issue #19537 on m68k). */
unsigned int :24;
} state;
wchar_t *wstr;  /* wchar_t representation (null-terminated) */
} PyASCIIObject;

Currently, state is 8bit + 24bit padding.  I think we can pack state and wstr
in 64bit.

## PyDictKeyEntry

typedef struct {
/* Cached hash code of me_key. */
Py_hash_t me_hash;
PyObject *me_key;
PyObject *me_value; /* This field is only meaningful for combined tables */
} PyDictKeyEntry;

There are chance to compact it: Use only 32bit for hash and 48bit*2
for key and value.  CompactEntry may be 16byte instead of 24byte.


Regards,
-- 
INADA Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com