Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data
On Wed, Mar 28, 2018 at 6:15 PM, Nathaniel Smith wrote: > On Wed, Mar 28, 2018 at 1:03 PM, Serhiy Storchaka wrote: >> 28.03.18 21:39, Antoine Pitrou пише: >>> I'd like to submit this PEP for discussion. It is quite specialized >>> and the main target audience of the proposed changes is >>> users and authors of applications/libraries transferring large amounts >>> of data (read: the scientific computing & data science ecosystems). >> >> Currently I'm working on porting some features from cloudpickle to the >> stdlib. For these of them which can't or shouldn't be implemented in the >> general purpose library (like serializing local functions by serializing >> their code objects, because it is not portable) I want to add hooks that >> would allow to implement them in cloudpickle using official API. This would >> allow cloudpickle to utilize C implementation of the pickler and unpickler. > > There's obviously some tension here between pickle's use as a > persistent storage format, and its use as a transient wire format. For > the former, you definitely can't store code objects because there's no > forwards- or backwards-compatibility guarantee for bytecode. But for > the latter, transmitting bytecode is totally fine, because all you > care about is whether it can be decoded once, right now, by some peer > process whose python version you can control -- that's why cloudpickle > exists. Is it really true you'll always be able to control the Python version on the other side? Even if they're internal services, it seems like there could be times / reasons preventing you from upgrading the environment of all of your services at the same rate. Or did you mean to say "often" all you care about ...? --Chris > > Would it make sense to have a special pickle version that the > transient wire format users could opt into, that only promises > compatibility within a given 3.X release cycle? Like version=-2 or > version=pickle.NONPORTABLE or something? > > (This is orthogonal to Antoine's PEP.) > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/chris.jerdonek%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data
On Thu, Mar 29, 2018 at 12:56 AM, Chris Jerdonek wrote: > On Wed, Mar 28, 2018 at 6:15 PM, Nathaniel Smith wrote: >> On Wed, Mar 28, 2018 at 1:03 PM, Serhiy Storchaka >> wrote: >>> 28.03.18 21:39, Antoine Pitrou пише: I'd like to submit this PEP for discussion. It is quite specialized and the main target audience of the proposed changes is users and authors of applications/libraries transferring large amounts of data (read: the scientific computing & data science ecosystems). >>> >>> Currently I'm working on porting some features from cloudpickle to the >>> stdlib. For these of them which can't or shouldn't be implemented in the >>> general purpose library (like serializing local functions by serializing >>> their code objects, because it is not portable) I want to add hooks that >>> would allow to implement them in cloudpickle using official API. This would >>> allow cloudpickle to utilize C implementation of the pickler and unpickler. >> >> There's obviously some tension here between pickle's use as a >> persistent storage format, and its use as a transient wire format. For >> the former, you definitely can't store code objects because there's no >> forwards- or backwards-compatibility guarantee for bytecode. But for >> the latter, transmitting bytecode is totally fine, because all you >> care about is whether it can be decoded once, right now, by some peer >> process whose python version you can control -- that's why cloudpickle >> exists. > > Is it really true you'll always be able to control the Python version > on the other side? Even if they're internal services, it seems like > there could be times / reasons preventing you from upgrading the > environment of all of your services at the same rate. Or did you mean > to say "often" all you care about ...? Yeah, maybe I spoke a little sloppily -- I'm sure there are cases where you're using pickle as a wire format between heterogenous interpreters, in which case you wouldn't use version=NONPORTABLE. But projects like dask, and everyone else who uses cloudpickle/dill, are already assuming homogenous interpreters. A typical way of using these kinds of systems is: you start your script, it spins up some cloud VMs or local cluster nodes (maybe sending them all a conda environment you made), they all chat for a while doing your computation, and then they spin down again and your script reports the results. So versioning and coordinated upgrades really aren't a thing you need to worry about :-). Another example is the multiprocessing module: it's very safe to assume that the parent and the child are using the same interpreter :-). There's no fundamental reason you shouldn't be able to send bytecode between them. Pickle's not really the ideal wire format for persistent services anyway, given the arbitrary code execution and tricky versioning -- even if you aren't playing games with bytecode, pickle still assumes that if two classes in two different interpreters have the same name, then their internal implementation details are all the same. You can make it work, but usually there are better options. It's perfect though for multi-core and multi-machine parallelism. -n -- Nathaniel J. Smith -- https://vorpus.org ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data
On Thu, 29 Mar 2018 01:40:17 + Robert Collins wrote: > > > > Data sharing > > > > > > If you pickle and then unpickle an object in the same process, passing > > out-of-band buffer views, then the unpickled object may be backed by the > > same buffer as the original pickled object. > > > > For example, it might be reasonable to implement reduction of a Numpy array > > as follows (crucial metadata such as shapes is omitted for simplicity):: > > > >class ndarray: > > > > def __reduce_ex__(self, protocol): > > if protocol == 5: > > return numpy.frombuffer, (PickleBuffer(self), self.dtype) > > # Legacy code for earlier protocols omitted > > > > Then simply passing the PickleBuffer around from ``dumps`` to ``loads`` > > will produce a new Numpy array sharing the same underlying memory as the > > original Numpy object (and, incidentally, keeping it alive):: > > This seems incompatible with v4 semantics. There, a loads plus dumps > combination is approximately a deep copy. This isn't. Sometimes. Sometimes > it is. True. But it's only incompatible if you pass the new ``buffer_callback`` and ``buffers`` arguments. If you don't, then you always get a copy. This is something that consumers should keep in mind. Note there's a movement towards immutable data. For example, Dask arrays and Arrow arrays are designed as immutable. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data
On Thu, Mar 29, 2018 at 7:18 PM, Nathaniel Smith wrote: > Another example is the multiprocessing module: it's very safe to > assume that the parent and the child are using the same interpreter > :-). There's no fundamental reason you shouldn't be able to send > bytecode between them. You put a smiley on it, but is this actually guaranteed on all platforms? On Unix-like systems, presumably it's using fork() and thus will actually use the exact same binary, but what about on Windows, where a new process has to be spawned? Can you say "spawn me another of this exact binary blob", or do you have to identify it by a file name? It wouldn't be a problem for the nonportable mode to toss out an exception in weird cases like this, but it _would_ be a problem if that causes a segfault or something. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data
On 29 March 2018 at 09:49, Chris Angelico wrote: > On Thu, Mar 29, 2018 at 7:18 PM, Nathaniel Smith wrote: >> Another example is the multiprocessing module: it's very safe to >> assume that the parent and the child are using the same interpreter >> :-). There's no fundamental reason you shouldn't be able to send >> bytecode between them. > > You put a smiley on it, but is this actually guaranteed on all > platforms? On Unix-like systems, presumably it's using fork() and thus > will actually use the exact same binary, but what about on Windows, > where a new process has to be spawned? Can you say "spawn me another > of this exact binary blob", or do you have to identify it by a file > name? > > It wouldn't be a problem for the nonportable mode to toss out an > exception in weird cases like this, but it _would_ be a problem if > that causes a segfault or something. If you're embedding, you need multiprocessing.set_executable() (https://docs.python.org/3.6/library/multiprocessing.html#multiprocessing.set_executable), so in that case you definitely *won't* have the same binary... Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data
On Thu, Mar 29, 2018 at 7:56 PM, Paul Moore wrote: > On 29 March 2018 at 09:49, Chris Angelico wrote: >> On Thu, Mar 29, 2018 at 7:18 PM, Nathaniel Smith wrote: >>> Another example is the multiprocessing module: it's very safe to >>> assume that the parent and the child are using the same interpreter >>> :-). There's no fundamental reason you shouldn't be able to send >>> bytecode between them. >> >> You put a smiley on it, but is this actually guaranteed on all >> platforms? On Unix-like systems, presumably it's using fork() and thus >> will actually use the exact same binary, but what about on Windows, >> where a new process has to be spawned? Can you say "spawn me another >> of this exact binary blob", or do you have to identify it by a file >> name? >> >> It wouldn't be a problem for the nonportable mode to toss out an >> exception in weird cases like this, but it _would_ be a problem if >> that causes a segfault or something. > > If you're embedding, you need multiprocessing.set_executable() > (https://docs.python.org/3.6/library/multiprocessing.html#multiprocessing.set_executable), > so in that case you definitely *won't* have the same binary... Ah, and that also showed me that forking isn't mandatory on Unix either. So yeah, there's no assuming that they use the same binary. I doubt it'll be a problem to pickle though as it'll use some form of versioning even in NONPORTABLE mode right? ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Subtle difference between f-strings and str.format()
My credentials for this are that I re-worked str.format in Jython quite extensively, and I followed the design of f-strings a bit when they were introduced, but I haven't used them to write anything. On 29/03/2018 00:48, Tim Peters wrote: [Tim Delaney ] ... I also assumed (not having actually used an f-string) that all its formatting arguments were evaluated before formatting. It's a string - it doesn't have "arguments" as such. For example: def f(a, b, n): return f"{a+b:0{n}b}" # the leading "f" makes it an f-string Agreed "argument" is the wrong word, but so is "string". It's an expression returning a string, in which a, b and n are free variables. I think we can understand it best as a string-display (https://docs.python.org/3/reference/expressions.html#list-displays), or a sort of eval() call. The difference Serhiy identifies emerges (I think) because in the conventional interpretation of a format call, the arguments of format are evaluated left-to right (all of them) and then formatted in the order references are encountered to these values in a tuple or dictionary. In an f-string expressions are evaluated as they are encountered. A more testing example is therefore perhaps: '{1} {0}'.format(a(), b()) # E1 f'{b()}{a()}' # E2 I think I would be very surprised to find b called before a in E1 because of the general contract on the meaning of method calls. I'm assuming that's what an AST-based optimisation would do? There's no reason in E2 to call them in any other order than b then a and the documentation tells me they are. But do I expect a() to be called before the results of b() are formatted? In E1 I definitely expect that. In E2 I don't think I'd be surprised either way. Forced to guess, I would guess that b() would be formatted and in the output buffer before a() was called, since it gives the implementation fewer things to remember. Then I hope I would not depend on this guesswork. Strictly-speaking the documentation doesn't say when the result is formatted in relation to the evaluation of other expressions, so there is permission for Serhiy's idea #2. I think the (internal) AST change implied in Serhiy's idea #1 is the price one has to pay *if* one insists on optimising str.format(). str.format just a method like any other. The reasons would have to be very strong to give it special-case semantics. I agree that the cases are rare in which one would notice a difference. (Mostly I think it would be a surprise during debugging.) But I think users should be able to rely on the semantics of call. Easier optimisation doesn't seem to me a strong enough argument. This leaves me at: 1: +1 2a, 2b: +0 3: -1 Jeff Allen ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data
On Thu, Mar 29, 2018, 02:02 Chris Angelico wrote: > On Thu, Mar 29, 2018 at 7:56 PM, Paul Moore wrote: > > On 29 March 2018 at 09:49, Chris Angelico wrote: > >> On Thu, Mar 29, 2018 at 7:18 PM, Nathaniel Smith wrote: > >>> Another example is the multiprocessing module: it's very safe to > >>> assume that the parent and the child are using the same interpreter > >>> :-). There's no fundamental reason you shouldn't be able to send > >>> bytecode between them. > >> > >> You put a smiley on it, but is this actually guaranteed on all > >> platforms? On Unix-like systems, presumably it's using fork() and thus > >> will actually use the exact same binary, but what about on Windows, > >> where a new process has to be spawned? Can you say "spawn me another > >> of this exact binary blob", or do you have to identify it by a file > >> name? > >> > >> It wouldn't be a problem for the nonportable mode to toss out an > >> exception in weird cases like this, but it _would_ be a problem if > >> that causes a segfault or something. > > > > If you're embedding, you need multiprocessing.set_executable() > > ( > https://docs.python.org/3.6/library/multiprocessing.html#multiprocessing.set_executable > ), > > so in that case you definitely *won't* have the same binary... > > Ah, and that also showed me that forking isn't mandatory on Unix > either. So yeah, there's no assuming that they use the same binary. > Normally it spawns children using `sys.executable`, which I think on Windows in particular is guaranteed to be the same binary that started the main process, because the OS locks the file while it's executing. But yeah, I didn't think about the embedding case, and apparently there's also a little-known set of features for using multiprocessing between arbitrary python processes: https://docs.python.org/3/library/multiprocessing.html#multiprocessing-listeners-clients > I doubt it'll be a problem to pickle though as it'll use some form of > versioning even in NONPORTABLE mode right? > I guess the (merged, but undocumented?) changes in https://bugs.python.org/issue28053 should make it possible to set the pickle version, and yeah, if we did add a NONPORTABLE mode then presumably it would have some kind of header saying which version of python it was created with, so version mismatches could give a sensible error message. -n ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Subtle difference between f-strings and str.format()
On 3/29/2018 6:17 AM, Jeff Allen wrote: My credentials for this are that I re-worked str.format in Jython quite extensively, and I followed the design of f-strings a bit when they were introduced, but I haven't used them to write anything. Thanks for your work on Jython. And hop on the f-string bandwagon! The difference Serhiy identifies emerges (I think) because in the conventional interpretation of a format call, the arguments of format are evaluated left-to right (all of them) and then formatted in the order references are encountered to these values in a tuple or dictionary. In an f-string expressions are evaluated as they are encountered. A more testing example is therefore perhaps: '{1} {0}'.format(a(), b()) # E1 f'{b()}{a()}' # E2 I think I would be very surprised to find b called before a in E1 because of the general contract on the meaning of method calls. I'm assuming that's what an AST-based optimisation would do? There's no reason in E2 to call them in any other order than b then a and the documentation tells me they are. But do I expect a() to be called before the results of b() are formatted? In E1 I definitely expect that. In E2 I don't think I'd be surprised either way. Forced to guess, I would guess that b() would be formatted and in the output buffer before a() was called, since it gives the implementation fewer things to remember. Then I hope I would not depend on this guesswork. Strictly-speaking the documentation doesn't say when the result is formatted in relation to the evaluation of other expressions, so there is permission for Serhiy's idea #2. I don't think we should restrict f-strings to having to evaluate all of the expressions before formatting. But, if we do restrict it, we should document whatever the order is in 3.6 and add tests to ensure the behavior doesn't change. I think the (internal) AST change implied in Serhiy's idea #1 is the price one has to pay *if* one insists on optimising str.format(). str.format just a method like any other. The reasons would have to be very strong to give it special-case semantics. I agree that the cases are rare in which one would notice a difference. (Mostly I think it would be a surprise during debugging.) But I think users should be able to rely on the semantics of call. Easier optimisation doesn't seem to me a strong enough argument. This leaves me at: 1: +1 2a, 2b: +0 3: -1 #1 seems so complex as to not be worth it, given the likely small overall impact of the optimization to a large program. If the speedup really is sufficiently important for a particular piece of code, I'd suggest just rewriting the code to use f-strings, and the author could then determine if the transformation breaks anything. Maybe write a 2to3 like tool that would identify places where str.format or %-formatting could be replaced by f-strings? I know I'd run it on my code, if it existed. Because the optimization can only work code with literals, I think manually modifying the source code is an acceptable solution if the possible change in semantics implied by #3 are unacceptable. Eric. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Subtle difference between f-strings and str.format()
On Wed, Mar 28, 2018 at 06:27:19PM +0300, Serhiy Storchaka wrote: > The optimizer already changes > semantic. Non-optimized "if a and True:" would call bool(a) twice, but > optimized code calls it only once. I don't understand this. Why would bool(a) be called twice, and when did this change? Surely calling it twice would be a bug. I just tried the oldest Python 3 I have on this computer, 3.2, and bool is only called once. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Subtle difference between f-strings and str.format()
On Thu, Mar 29, 2018 at 11:28 PM, Steven D'Aprano wrote: > On Wed, Mar 28, 2018 at 06:27:19PM +0300, Serhiy Storchaka wrote: > >> The optimizer already changes >> semantic. Non-optimized "if a and True:" would call bool(a) twice, but >> optimized code calls it only once. > > I don't understand this. Why would bool(a) be called twice, and when did > this change? Surely calling it twice would be a bug. > > I just tried the oldest Python 3 I have on this computer, 3.2, and bool > is only called once. Technically not bool() itself, but the equivalent. Here's some similar code: ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Subtle difference between f-strings and str.format()
On Fri, Mar 30, 2018 at 1:08 AM, Chris Angelico wrote: > On Thu, Mar 29, 2018 at 11:28 PM, Steven D'Aprano wrote: >> On Wed, Mar 28, 2018 at 06:27:19PM +0300, Serhiy Storchaka wrote: >> >>> The optimizer already changes >>> semantic. Non-optimized "if a and True:" would call bool(a) twice, but >>> optimized code calls it only once. >> >> I don't understand this. Why would bool(a) be called twice, and when did >> this change? Surely calling it twice would be a bug. >> >> I just tried the oldest Python 3 I have on this computer, 3.2, and bool >> is only called once. > > Technically not bool() itself, but the equivalent. Here's some similar code: Wow, I'm good. Premature send much? Nice going, Chris. Let's try that again. Here's some similar code: >>> def f(a): ... if a and x: ... print("Yep") ... >>> class Bool: ... def __bool__(self): ... print("True?") ... return True ... >>> x = 1 >>> f(Bool()) True? Yep This is, however, boolifying a, then boolifying x separately. To bool a twice, you'd need to write this instead: >>> def f(a): ... if a or False: ... print("Yep") ... In its optimized form, this still only boolifies a once. But we can defeat the optimization: >>> def f(a): ... cond = a or False ... if cond: ... print("Yep") ... >>> f(Bool()) True? True? Yep The "or False" part implies a booleanness check on its left operand, and the 'if' statement performs a boolean truthiness check on its result. That means two calls to __bool__ in the unoptimized form. But it gets optimized, on the assumption that __bool__ is a pure function. The version assigning to a temporary variable does one check before assigning, and then another check in the 'if'; the same thing without the temporary skips the second check, and just goes ahead and enters the body of the 'if'. Technically that's a semantic change. But I doubt it'll hurt anyone. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data
On Thu, 29 Mar 2018 11:25:13 + Nathaniel Smith wrote: > > > I doubt it'll be a problem to pickle though as it'll use some form of > > versioning even in NONPORTABLE mode right? > > > > I guess the (merged, but undocumented?) changes in > https://bugs.python.org/issue28053 should make it possible to set the > pickle version [...] Not only undocumented, but untested and they are actually look plain wrong when looking at that diff. Notice how "reduction" is imported using `from .context import reduction` and then changed inside the "context" module using `globals()['reduction'] = reduction`. That seems unlikely to produce any effect. (not to mention the abstract base class that doesn't seem to define any abstract methods or properties) To be frank such an unfinished patch should never have been committed. I may consider undoing it if I find some spare cycles. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Subtle difference between f-strings and str.format()
On 3/28/2018 11:27 AM, Serhiy Storchaka wrote: The optimizer already changes semantic. Non-optimized "if a and True:" would call bool(a) twice, but optimized code calls it only once. Perhaps Ref 3.3.1 object.__bool__ entry, after " should return False or True.", should say something like "Should not have side-effects, as redundant bool calls may be optimized away (bool(bool(ob)) should have the same result as bool(ob))." -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Sets, Dictionaries
On Wed, Mar 28, 2018, at 9:14 PM, Julia Kim wrote: > My suggestion is to change the syntax for creating an empty set and an > empty dictionary as following. > > an_empty_set = {} > an_empty_dictionary = {:} > > It would seem to make more sense. The amount of code this would break is astronomical. -- Stephen Hansen m e @ i x o k a i . i o ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data
On 29 March 2018 at 04:39, Antoine Pitrou wrote: > > Hi, > > I'd like to submit this PEP for discussion. It is quite specialized > and the main target audience of the proposed changes is > users and authors of applications/libraries transferring large amounts > of data (read: the scientific computing & data science ecosystems). > > https://www.python.org/dev/peps/pep-0574/ > > The PEP text is also inlined below. +1 from me, which you already knew :) For folks that haven't read Eric Snow's PEP 554 about exposing multiple interpreter support as a Python level API, Antoine's proposed zero-copy-data-management enhancements for pickle complement that nicely, since they allow the three initial communication primitives in PEP 554 (passing None, bytes, memory views) to be more efficiently expanded to handling arbitrary objects by sending first the pickle data, then the out-of-band memory views, and finally None as an end-of-message marker. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Subtle difference between f-strings and str.format()
On 29 March 2018 at 21:50, Eric V. Smith wrote: > #1 seems so complex as to not be worth it, given the likely small overall > impact of the optimization to a large program. If the speedup really is > sufficiently important for a particular piece of code, I'd suggest just > rewriting the code to use f-strings, and the author could then determine if > the transformation breaks anything. Maybe write a 2to3 like tool that would > identify places where str.format or %-formatting could be replaced by > f-strings? I know I'd run it on my code, if it existed. Because the > optimization can only work code with literals, I think manually modifying > the source code is an acceptable solution if the possible change in > semantics implied by #3 are unacceptable. While more projects are starting to actively drop Python 2.x support, there are also quite a few still straddling the two different versions. The "rewrite to f-strings" approach requires explicitly dropping support for everything below 3.6, whereas implicit optimization of literal based formatting will work even for folks preserving backwards compatibility with older versions. As far as the semantics go, perhaps it would be possible to explicitly create a tuple as part of the implementation to ensure that the arguments are still evaluated in order, and everything gets calculated exactly once? This would have the benefit that even format strings that used numbered references could be optimised in a fairly straightforward way. '{}{}'.format(a, b) would become: _hidden_ref = (a, b) f'{_hidden_ref[0]}{_hidden_ref[1]}' while: '{1}{0}'.format(a, b) would become: _hidden_ref = (a, b) f'{_hidden_ref[1]}{_hidden_ref[0]}' This would probably need to be implemented as Serhiy's option 1 (generating a distinct AST node), which in turn leads to 2a: adding extra stack manipulation opcodes in order to more closely replicate str.format semantics. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Subtle difference between f-strings and str.format()
On 3/29/2018 12:13 PM, Nick Coghlan wrote: On 29 March 2018 at 21:50, Eric V. Smith wrote: #1 seems so complex as to not be worth it, given the likely small overall impact of the optimization to a large program. If the speedup really is sufficiently important for a particular piece of code, I'd suggest just rewriting the code to use f-strings, and the author could then determine if the transformation breaks anything. Maybe write a 2to3 like tool that would identify places where str.format or %-formatting could be replaced by f-strings? I know I'd run it on my code, if it existed. Because the optimization can only work code with literals, I think manually modifying the source code is an acceptable solution if the possible change in semantics implied by #3 are unacceptable. While more projects are starting to actively drop Python 2.x support, there are also quite a few still straddling the two different versions. The "rewrite to f-strings" approach requires explicitly dropping support for everything below 3.6, whereas implicit optimization of literal based formatting will work even for folks preserving backwards compatibility with older versions. Sure. But 3.6 will be 3 years old before this optimization is released. I've been seeing 3.4 support dropping off, and expect to see 3.5 follow suit by the time 3.8 is released. Although maybe the thought is to do this in a bug-fix release? If we're changing semantics at all, that seems like a non-starter. As far as the semantics go, perhaps it would be possible to explicitly create a tuple as part of the implementation to ensure that the arguments are still evaluated in order, and everything gets calculated exactly once? This would have the benefit that even format strings that used numbered references could be optimised in a fairly straightforward way. '{}{}'.format(a, b) would become: _hidden_ref = (a, b) f'{_hidden_ref[0]}{_hidden_ref[1]}' while: '{1}{0}'.format(a, b) would become: _hidden_ref = (a, b) f'{_hidden_ref[1]}{_hidden_ref[0]}' This would probably need to be implemented as Serhiy's option 1 (generating a distinct AST node), which in turn leads to 2a: adding extra stack manipulation opcodes in order to more closely replicate str.format semantics. I still think the complexity isn't worth it, but maybe I'm a lone voice on this. Eric. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Sets, Dictionaries
I agree with everything Steven says. But it's true that even as a 20-year Python user, this is an error I make moderately often when I want an empty set... Notwithstanding that I typed it thousands of times before sets even existed (and still type it when I want an empty dictionary). That said, I've sort of got in the habit of using the type initializers: x = set() y = dict() z = list() I feel like those jump out a little better visually. But I'm inconsistent in my code. On Thu, Mar 29, 2018, 2:03 AM Steven D'Aprano wrote: > Hi Julia, and welcome! > > On Wed, Mar 28, 2018 at 09:14:53PM -0700, Julia Kim wrote: > > > My suggestion is to change the syntax for creating an empty set and an > > empty dictionary as following. > > > > an_empty_set = {} > > an_empty_dictionary = {:} > > > > It would seem to make more sense. > > Indeed it would, and if sets had existed in Python since the beginning, > that's probably exactly what we would have done. But unfortunately they > didn't, and {} has meant an empty dict forever. > > The requirement to keep backwards-compatibility is a very, very hard > barrier to cross. I think we all acknowledge that it is sad and a little > bit confusing that {} means a dict not a set, but it isn't sad or > confusing enough to justify breaking millions of existing scripts and > applications. > > Not to mention the confusing transition period when the community would > be using *both* standards at the same time, which could easily last ten > years. > > Given that, I think we just have to accept that having to use set() for > the empty set instead of {} is a minor wart on the language that we're > stuck with. > > If you disagree, and think that you have a concrete plan that can make > this transition work, we'll be happy to hear it, but you'll almost > certainly need to write a PEP before it could be accepted. > > https://www.python.org/dev/peps/ > > > Thanks, > > -- > Steve > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/mertz%40gnosis.cx > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data
28.03.18 23:19, Antoine Pitrou пише: Agreed. Do you know by which timeframe you'll know which opcodes you want to add? I'm currently in the middle of the first part, trying to implement pickling local classes with static and class methods without creating loops. Other parts exist just like general ideas, I didn't rite code for them still. I try to do this with existing protocols, but maybe some new opcodes will be needed for efficiency. We are now at the early stage of 3.8 developing, and I think we have a lot of time. It wouldn't deserve bumping pickle version, but if we do this already, it would be worth to add shorter versions for FRAME. Currently it uses 64-bit size, and 9 bytes is a large overhead for short pickles. 8-bit size would reduce overhead for short pickles, and 32-bit size would be enough for any practical use (larger data is not wrapped in a frame). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Sets, Dictionaries
On Thu, Mar 29, 2018 at 10:11 AM, David Mertz wrote: > I agree with everything Steven says. But it's true that even as a 20-year > Python user, this is an error I make moderately often when I want an empty > set... Notwithstanding that I typed it thousands of times before sets even > existed (and still type it when I want an empty dictionary). > > That said, I've sort of got in the habit of using the type initializers: > > x = set() > y = dict() > z = list() > > I feel like those jump out a little better visually. But I'm inconsistent > in my code. > Yeah, we've been doing that for several years, too. A hair slower in some cases, but much more greppable... ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Subtle difference between f-strings and str.format()
29.03.18 13:17, Jeff Allen пише: '{1} {0}'.format(a(), b()) # E1 f'{b()}{a()}' # E2 I think I would be very surprised to find b called before a in E1 because of the general contract on the meaning of method calls. I'm assuming that's what an AST-based optimisation would do? There's no reason in E2 to call them in any other order than b then a and the documentation tells me they are. I was going to optimize only formatting with implicit references. '{} {}' but not '{1} {0}' and either not '{0} {1}'. This guaranties in-order computation and referencing every subexpression only once. I don't have a goal of converting every string formatting, but only the most common and the most simple ones. If go further, we will need to add several new AST nodes (like for comprehensions). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Move ensurepip blobs to external place
From my POV I don’t care where they live, just document how to update them going forward. Sent from my iPhone > On Mar 24, 2018, at 4:50 AM, Serhiy Storchaka wrote: > > Wouldn't be better to put them into a separate repository like Tcl/Tk and > other external binaries for Windows, and download only the recent version? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Move ensurepip blobs to external place
AFAIU, the objectives (with no particular ranking) are: - minimize git clone time and bandwidth - include latest pip with every python install - run the full test suite with CI for every PR (buildbot) - the full test suite requires pip - run the test suite locally when developing a PR - minimize PyPI bandwidth What are the proposed solutions? ... https://help.github.com/articles/about-storage-and-bandwidth-usage/ > All personal and organization accounts using Git LFS receive 1 GB of free storage and 1 GB a month of free bandwidth. If the bandwidth and storage quotas are not enough, you can choose to purchase an additional quota for Git LFS. > > Git LFS is available for every repository on GitHub, whether or not your account or organization has a paid plan. On Thursday, March 29, 2018, Donald Stufft wrote: > From my POV I don’t care where they live, just document how to update them > going forward. > > Sent from my iPhone > > > On Mar 24, 2018, at 4:50 AM, Serhiy Storchaka > wrote: > > > > Wouldn't be better to put them into a separate repository like Tcl/Tk > and other external binaries for Windows, and download only the recent > version? > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > wes.turner%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Subtle difference between f-strings and str.format()
On Wed, Mar 28, 2018 at 06:27:19PM +0300, Serhiy Storchaka wrote: > 2. Change the semantic of f-strings. Make it closer to the semantic of > str.format(): evaluate all subexpressions first than format them. This > can be implemented in two ways: > > 2a) Add additional instructions for stack manipulations. This will slow > down f-strings. > > 2b) Introduce a new complex opcode that will replace FORMAT_VALUE and > BUILD_STRING. This will speed up f-strings. If the aim here is to be an optimization, then I vote strongly for 2b. That gives you *faster f-strings* that have the same order-of-evaluation of normal method calls, so that when you optimize str.format into an f-string, not only is the behaviour identical, but they will be even faster than with option 3. Python's execution model implies that obj.method(expression_a, expression_b) should fully evaluate both expressions before they are passed to the method. Making str.format a magical special case that violates that rule should be a last resort. In this case, we can have our cake and eat it too: both the str.format to f-string optimization and keeping the normal evaluation rules. And as a bonus, we make f-strings even faster. I say "we", but of course it is Serhiy doing the work, thank you. Is there a down-side to 2b? It sounds like something you might end up doing at a later date regardless of what you do now. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] [RELEASE] Python 3.7.0b3 is now available for testing
On behalf of the Python development community and the Python 3.7 release team, I'm happy to announce the availability of Python 3.7.0b3. b3 is the third of four planned beta releases of Python 3.7, the next major release of Python, and marks the end of the feature development phase for 3.7. You can find Python 3.7.0b3 here: https://www.python.org/downloads/release/python-370b3/ Among the new major new features in Python 3.7 are: * PEP 538, Coercing the legacy C locale to a UTF-8 based locale * PEP 539, A New C-API for Thread-Local Storage in CPython * PEP 540, UTF-8 mode * PEP 552, Deterministic pyc * PEP 553, Built-in breakpoint() * PEP 557, Data Classes * PEP 560, Core support for typing module and generic types * PEP 562, Module __getattr__ and __dir__ * PEP 563, Postponed Evaluation of Annotations * PEP 564, Time functions with nanosecond resolution * PEP 565, Show DeprecationWarning in __main__ * PEP 567, Context Variables Please see "What’s New In Python 3.7" for more information. Additional documentation for these features and for other changes will be provided during the beta phase. https://docs.python.org/3.7/whatsnew/3.7.html Beta releases are intended to give you the opportunity to test new features and bug fixes and to prepare their projects to support the new feature release. We strongly encourage you to test your projects with 3.7 during the beta phase and report issues found to https://bugs.python.org as soon as possible. While the release is feature complete entering the beta phase, it is possible that features may be modified or, in rare cases, deleted up until the start of the release candidate phase (2018-05-21). Our goal is have no ABI changes after beta 3 and no code changes after rc1. To achieve that, it will be extremely important to get as much exposure for 3.7 as possible during the beta phase. Attention macOS users: there is a new installer variant for macOS 10.9+ that includes a built-in version of Tcl/Tk 8.6. This variant is expected to become the default version when 3.7.0 releases. Check it out! We welcome your feedback. As of 3.7.0b3, the legacy 10.6+ installer also includes a built-in Tcl/Tk 8.6. Please keep in mind that this is a preview release and its use is not recommended for production environments. The next planned release of Python 3.7 will be 3.7.0b4, currently scheduled for 2018-04-30. More information about the release schedule can be found here: https://www.python.org/dev/peps/pep-0537/ -- Ned Deily n...@python.org -- [] ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Subtle difference between f-strings and str.format()
On 30 March 2018 at 03:33, Eric V. Smith wrote: > On 3/29/2018 12:13 PM, Nick Coghlan wrote: >> While more projects are starting to actively drop Python 2.x support, >> there are also quite a few still straddling the two different >> versions. The "rewrite to f-strings" approach requires explicitly >> dropping support for everything below 3.6, whereas implicit >> optimization of literal based formatting will work even for folks >> preserving backwards compatibility with older versions. > > > Sure. But 3.6 will be 3 years old before this optimization is released. I've > been seeing 3.4 support dropping off, and expect to see 3.5 follow suit by > the time 3.8 is released. Although maybe the thought is to do this in a > bug-fix release? If we're changing semantics at all, that seems like a > non-starter. Definitely 3.8+ only. The nice thing about doing this implicitly at the compiler level is that it potentially provides an automatic performance improvement for existing libraries and applications. The justification for the extra complexity would then come from whether or not it actually measurably improve things, either for the benchmark suite, or for folks' real-world applications. Steven D'Aprano also raises a good point on that front: a FORMAT_STRING super-opcode that sped up f-strings *and* allowed semantics preserving constant-folding of str.format calls on string literals could make more sense than a change that focused solely on the implicit optimisation case. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] How can we use 48bit pointer safely?
Hi, As far as I know, most amd64 and arm64 systems use only 48bit address spaces. (except [1]) [1] https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf It means there are some chance to compact some data structures. I point two examples below. My question is; can we use 48bit pointer safely? It depends on CPU architecture & OS memory map. Maybe, configure option which is available on only (amd64, amd64) * (Linux, Windows, macOS)? # Possible optimizations by 48bit pointer ## PyASCIIObject [snip] unsigned int ready:1; /* Padding to ensure that PyUnicode_DATA() is always aligned to 4 bytes (see issue #19537 on m68k). */ unsigned int :24; } state; wchar_t *wstr; /* wchar_t representation (null-terminated) */ } PyASCIIObject; Currently, state is 8bit + 24bit padding. I think we can pack state and wstr in 64bit. ## PyDictKeyEntry typedef struct { /* Cached hash code of me_key. */ Py_hash_t me_hash; PyObject *me_key; PyObject *me_value; /* This field is only meaningful for combined tables */ } PyDictKeyEntry; There are chance to compact it: Use only 32bit for hash and 48bit*2 for key and value. CompactEntry may be 16byte instead of 24byte. Regards, -- INADA Naoki ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com