date:20191216

Chris Angelico wrote:
> ANY object can be passed to str() in order to get some sort of valid
> printable form. The awkwardness comes from the fact that str()
> performs double duty - it's both "give me a printable form of this
> object" and "decode these bytes into text".

While it does make sense for str() to be able to give some form of
printable form for any object, I suppose that I just don't consider something
like this:  "b'\\xc3\\xa1'" to be overly useful, at least for any practical
purposes. Can anyone think of a situation where you would want a string
representation of a bytes object instead of decoding it?

If not, I think it would be more useful for it to either:

1) Raise a TypeError, assume that the user wanted to decode the string but
forgot to specify an encoding
2) Implicitly decode the bytes object as UTF-8, assume the user meant
str(bytes_obj, encoding='utf-8')

Personally, I'm more in favor of (1) since it's much more explicit and
obvious, but I think (2) would at least be more useful than the current
behavior.

On Sun, Dec 15, 2019 at 8:13 PM Chris Angelico  wrote:

> On Mon, Dec 16, 2019 at 12:00 PM Kyle Stanley  wrote:
> > On a related note though, I'm not a fan of this behavior:
> > >>> str(b'\xc3\xa1')
> > "b'\\xc3\\xa1'"
> >
> > Passing a bytes object to str() without specifying an encoding seems
> like a mistake, I honestly don't see how this ("b'\\xc3\\xa1'") would even
> be useful in any capacity. I would expect this to instead raise a
> TypeError, similar to passing a string to bytes() without specifying an
> encoding:
> > >>> bytes('á')
> > ...
> > TypeError: string argument without an encoding
> >
> > I'd much prefer to see something like this:
> > >>> str(b'\xc3\xa1')
> > ...
> > TypeError: bytes argument without an encoding
> >
> > Is there some use case for returning "b'\\xc3\\xa1'" from this operation
> that I'm not seeing? To me, it seems equally, if not more confusing and
> pointless than returning an empty string from str(errors='strict') or some
> other combination of *errors* and *encoding* kwargs without passing an
> object.
> >
>
> ANY object can be passed to str() in order to get some sort of valid
> printable form. The awkwardness comes from the fact that str()
> performs double duty - it's both "give me a printable form of this
> object" and "decode these bytes into text". With an actual bytes
> object, I always prefer b.decode(...) to str(b, encoding=...). But the
> one-arg form of str() needs to be able to represent a bytes object in
> some way, just as it can represent an int, a Fraction, or a list.
>
> ChrisA
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/ZP7SXIDQOQVKUF66NVZPS3O4FN3A6DWA/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MBP2ZHCGV44IUP4LQEXO2UFEVJX6QNGO/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

2019-12-16 Thread Glenn Linderman

On 12/16/2019 12:05 AM, Kyle Stanley wrote:

Chris Angelico wrote:
> ANY object can be passed to str() in order to get some sort of valid
> printable form. The awkwardness comes from the fact that str()
> performs double duty - it's both "give me a printable form of this
> object" and "decode these bytes into text".

While it does make sense for str() to be able to give some form of 
printable form for any object, I suppose that I just don't consider 
something like this: "b'\\xc3\\xa1'" to be overly useful, at least for 
any practical purposes. Can anyone think of a situation where you 
would want a string representation of a bytes object instead of 
decoding it?

Binary data
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BLYF3OAG6LLVBRJ4J6A6QSGC7KFKBX7I/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

2019-12-16 Thread Eric V. Smith

On 12/16/2019 3:05 AM, Kyle Stanley wrote:

Chris Angelico wrote:
> ANY object can be passed to str() in order to get some sort of valid
> printable form. The awkwardness comes from the fact that str()
> performs double duty - it's both "give me a printable form of this
> object" and "decode these bytes into text".

While it does make sense for str() to be able to give some form of 
printable form for any object, I suppose that I just don't consider 
something like this: "b'\\xc3\\xa1'" to be overly useful, at least for 
any practical purposes. Can anyone think of a situation where you 
would want a string representation of a bytes object instead of 
decoding it?

Debugging. I sometimes do things like: print('\n'.join(str(thing) for 
thing in lst)), or various variations on this. This is especially useful 
when maybe something in the list is a bytes object where I was expecting 
a string.

I'm not saying it's the best practice, but calling str() on an object is 
a currently a guaranteed way of making a string out of it, and I don't 
think we can change it.

Eric

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5B46FTPOWDWNWDD7UL2HCDGSVPCSUUR3/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

16.12.19 04:48, Larry Hastings пише:
As of 3.7, dict objects are guaranteed to maintain insertion order. But
set objects make no such guarantee, and AFAIK in practice they don't
maintain insertion order either. Should they?

I do have a use case for this. In one project I maintain a "ready" list
of jobs; I need to iterate over it, but I also want fast lookup because
I soemtimes remove jobs when they're subsequently marked "not ready".
The existing set object would work fine here, except that it doesn't
maintain insertion order. That means multiple runs of the program with
the same inputs may result in running jobs in different orders, and this
instability makes debugging more difficult. I've therefore switched
from a set to a dict with all keys mapped to None, which provides all
the set-like functionality I need.

ISTM that all the reasons why dicts should maintain insertion order also
apply to sets, and so I think we should probably do this. Your thoughts?

The current dict implementation is called a "compact dict
implementation", because it saves memory in common cases. It was the
initial reason of writing it. At the same time there was a need in
ordered dicts for kwarg and class namespace. We discovered that slightly
modified compact dict implementation preserves order, and that its
possible drawback (performance penalty) is too small if exists.

But ordered set implementation is not more compact that the current set
implementation (because for dicts the item is 3-word, but for sets it is
2-word). Also the current set implementation has some optimizations that
the dict implementation never had, which will be lost in the ordered set
implementation.

Take to account that sets are way less used than dicts, and that you can
use a dict if you need an ordered set-like object.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/4BDCSPE4FKPNU6SSMH6A7PX5CGO7EF4I/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()


16.12.19 02:55, Kyle Stanley пише:

I'd much prefer to see something like this:
 >>> str(b'\xc3\xa1')
...
TypeError: bytes argument without an encoding

Is there some use case for returning "b'\\xc3\\xa1'" from this operation 
that I'm not seeing? To me, it seems equally, if not more confusing and 
pointless than returning an empty string from str(errors='strict') or 
some other combination of *errors* and *encoding* kwargs without passing 
an object.


It is not more confusing that returning "". By 
default str() returns the same as repr(), unless we made the object 
having other string representation.


You can get an error here if you run Python with -bb. This is a 
temporary option to catch common errors of porting from Python 2.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RRG4Q7BQWLIYNYNKJGE4BASFWTQ3P7PK/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()


15.12.19 16:30, David Mertz пише:

I bet someone in the world has written code like:

foo = str(**dynamic-args())

And therefore, disabling "silly" combinations of arguments will break 
their code occasionally.


Do you have real world examples?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4FD2MKAKGSI74EZGFSRCGGOROOQHZVFZ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

Eric V. Smith wrote:
> Debugging. I sometimes do things like: print('\n'.join(str(thing) for
thing in lst)), or various variations on this. This is especially useful >
when maybe something in the list is a bytes object where I was expecting a
string.
>
> I'm not saying it's the best practice, but calling str() on an object is
a currently a guaranteed way of making a string out of it, and I
> don't think we can change it.

I could see that being useful actually. Regardless of "best practices",
it's reasonably common to indiscriminately convert a large sequence of
objects into strings for basic inspection purposes. There may be better
means of debugging, but I wouldn't want to prevent that option entirely for
bytes objects.

But, I suspect that backwards compatibility might be too much of a concern
here for the change to be worthwhile either way. Adding the TypeError or
even gradual deprecation would more than likely lead to a decent amount of
code breakage and maintenance; and changing it to implicitly perform a
UTF-8 encoding would very likely cause some confusion and debugging
difficulties for those who frequently inspect via string conversion.

Thanks for the insight.

On Mon, Dec 16, 2019 at 3:43 AM Eric V. Smith  wrote:

> On 12/16/2019 3:05 AM, Kyle Stanley wrote:
>
> Chris Angelico wrote:
> > ANY object can be passed to str() in order to get some sort of valid
> > printable form. The awkwardness comes from the fact that str()
> > performs double duty - it's both "give me a printable form of this
> > object" and "decode these bytes into text".
>
> While it does make sense for str() to be able to give some form of
> printable form for any object, I suppose that I just don't consider something
> like this:  "b'\\xc3\\xa1'" to be overly useful, at least for any practical
> purposes. Can anyone think of a situation where you would want a string
> representation of a bytes object instead of decoding it?
>
> Debugging. I sometimes do things like: print('\n'.join(str(thing) for
> thing in lst)), or various variations on this. This is especially useful
> when maybe something in the list is a bytes object where I was expecting a
> string.
>
> I'm not saying it's the best practice, but calling str() on an object is a
> currently a guaranteed way of making a string out of it, and I don't think
> we can change it.
>
> Eric
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/5B46FTPOWDWNWDD7UL2HCDGSVPCSUUR3/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IRF6PY33LQAQXGBGBEHVWZFOAUQV7J6D/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

Serhiy Storchaka wrote:
> It is not more confusing that returning "". By
> default str() returns the same as repr(), unless we made the object
> having other string representation.

Yeah, I suppose not. But that does raise of question of why bytes objects
were made to have a specific form of string representation in the first
place, instead of the generic object address repr. I suspect that it might
be for historical or arbitrary reasons.

But, that's likely an entirely different topic. I'll leave it at that so I
don't derail the main topic.

Serhiy Storchaka wrote:
> You can get an error here if you run Python with -bb. This is a
> temporary option to catch common errors of porting from Python 2.

Huh, interesting.



On Mon, Dec 16, 2019 at 3:59 AM Serhiy Storchaka 
wrote:

> 16.12.19 02:55, Kyle Stanley пише:
> > I'd much prefer to see something like this:
> >  >>> str(b'\xc3\xa1')
> > ...
> > TypeError: bytes argument without an encoding
> >
> > Is there some use case for returning "b'\\xc3\\xa1'" from this operation
> > that I'm not seeing? To me, it seems equally, if not more confusing and
> > pointless than returning an empty string from str(errors='strict') or
> > some other combination of *errors* and *encoding* kwargs without passing
> > an object.
>
> It is not more confusing that returning "". By
> default str() returns the same as repr(), unless we made the object
> having other string representation.
>
> You can get an error here if you run Python with -bb. This is a
> temporary option to catch common errors of porting from Python 2.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/RRG4Q7BQWLIYNYNKJGE4BASFWTQ3P7PK/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Y4EPTAME4QRLLBGQBSH6YYADBYGPMLMV/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

16.12.19 10:34, Eric V. Smith пише:

On 12/16/2019 3:05 AM, Kyle Stanley wrote:

Chris Angelico wrote:
> ANY object can be passed to str() in order to get some sort of valid
> printable form. The awkwardness comes from the fact that str()
> performs double duty - it's both "give me a printable form of this
> object" and "decode these bytes into text".

While it does make sense for str() to be able to give some form of 
printable form for any object, I suppose that I just don't consider 
something like this: "b'\\xc3\\xa1'" to be overly useful, at least for 
any practical purposes. Can anyone think of a situation where you 
would want a string representation of a bytes object instead of 
decoding it?

Debugging. I sometimes do things like: print('\n'.join(str(thing) for 
thing in lst)), or various variations on this. This is especially useful 
when maybe something in the list is a bytes object where I was expecting 
a string.

I usually create a list:

print([a, b, c])

It guarantees that repr() be used instead of str(). It also makes the 
debug output more distinguishable from normal output.

I use %r or !r when include an arbitrary object in logging or error 
messages. It is safer for several reasons.

But I agree that making str() failing for bytes can break a lot of 
existing code.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TNIMOXKSV47XWXXDRBHU2NCKNPPXIZYI/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

2019-12-16 Thread Inada Naoki

On Sun, Dec 15, 2019 at 11:07 PM Serhiy Storchaka  wrote:
>
> I propose several changes:
>
> 1. Forbids calling str() without object if encoding or errors are
> specified. It is very unlikely that this can break a real code, so I
> propose to make it an error without a deprecation period.
>
> 2. Make the first parameter of str(), bytes() and bytearray()
> positional-only. Originally this feature was an implementation artifact:
> before 3.6 parameters of a C implemented function should be either all
> positional-only (if used PyArg_ParseTuple), or all keyword (if used
> PyArg_ParseTupleAndKeywords). So str(), bytes() and bytearray() accepted
> the first parameter by keyword. We already made similar changes for
> int(), float(), etc: int(x=42) no longer works.
>
> Unlikely str(object=object) is used in a real code, so we can skip a
> deprecation period for this change too.
>

+1 for 1 and 2.

> 3. Make encoding required if errors is specified in str(). This will
> reduce the number of possible combinations, makes str() more similar to
> bytes() and bytearray() and simplify the mental model: if encoding is
> specified, then we decode, and the first argument must be a bytes-like
> object, otherwise we convert an object to a string using __str__.

-0.

We can omit `encoding="utf-8"` in bytes.decode() because the default
encoding is always UTF-8.

>>> x = "おはよう".encode()
>>> x.decode(errors="strict")
'おはよう'

So allowing `bytes(o, errors="replace")` instead of making encoding
mandatory also makes sense to me.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MDOU2IZ5YTCRS7VMR6DPHSQGSKGKDBFZ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

2019-12-16 Thread Inada Naoki

On Mon, Dec 16, 2019 at 6:25 PM Inada Naoki  wrote:
>
> +1 for 1 and 2.
>

If we find it broke some software, we can step back to regular
deprecation workflow.
Python 3.9 is still far from beta yet.  That's why I'm +1 on these proposals.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HWNLBBHSVB5NRQC6ESQQNCQQ2EYUMW27/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

Inada Naoki wrote:
> If we find it broke some software, we can step back to regular
> deprecation workflow.
> Python 3.9 is still far from beta yet.  That's why I'm +1 on these
proposals.

IMO, since this would be changing a builtin function, we should at least
use a version+2 deprecation cycle (in this case, removal in 3.11)
regardless of reported breakages.

Especially if there's no _substantial_ security, efficiency, or performance
reason for immediate prevention of str() without passing an object (while
specifying *encoding* and/or *error)  or making *object* a positional only
argument.

On Mon, Dec 16, 2019 at 4:31 AM Inada Naoki  wrote:

> On Mon, Dec 16, 2019 at 6:25 PM Inada Naoki 
> wrote:
> >
> > +1 for 1 and 2.
> >
>
> If we find it broke some software, we can step back to regular
> deprecation workflow.
> Python 3.9 is still far from beta yet.  That's why I'm +1 on these
> proposals.
>
> --
> Inada Naoki  
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/HWNLBBHSVB5NRQC6ESQQNCQQ2EYUMW27/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YOOL6KM6JQWPSJ5O65IXWERIIDVPD3RU/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

Kyle Stanley wrote:
> or making *object* a positional only argument.

Typo: I meant "positional only parameter", not "argument".

On Mon, Dec 16, 2019 at 4:39 AM Kyle Stanley  wrote:

>
> Inada Naoki wrote:
> > If we find it broke some software, we can step back to regular
> > deprecation workflow.
> > Python 3.9 is still far from beta yet.  That's why I'm +1 on these
> proposals.
>
> IMO, since this would be changing a builtin function, we should at least
> use a version+2 deprecation cycle (in this case, removal in 3.11)
> regardless of reported breakages.
>
> Especially if there's no _substantial_ security, efficiency, or
> performance reason for immediate prevention of str() without passing an
> object (while specifying *encoding* and/or *error)  or making *object* a
> positional only argument.
>
> On Mon, Dec 16, 2019 at 4:31 AM Inada Naoki 
> wrote:
>
>> On Mon, Dec 16, 2019 at 6:25 PM Inada Naoki 
>> wrote:
>> >
>> > +1 for 1 and 2.
>> >
>>
>> If we find it broke some software, we can step back to regular
>> deprecation workflow.
>> Python 3.9 is still far from beta yet.  That's why I'm +1 on these
>> proposals.
>>
>> --
>> Inada Naoki  
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/HWNLBBHSVB5NRQC6ESQQNCQQ2EYUMW27/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KAQZGR37QU6BS6UVSW4H7F4MDOYFY5ZG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-16 Thread Petr Viktorin

On 2019-12-16 06:00, Larry Hastings wrote:
[...]

These are all performance concerns. As I mentioned previously in this
thread, in my opinion we should figure out what semantics we want for
the object, then implement those, and only after that should we worry
about performance. I think we should decide the question "should set
objects maintain insertion order?" literally without any consideration
about performance implications.

Then this thread is missing arguments saying *why* ordered dicts are
actually better, semantics-wise.

Originally, making dicts ordered was all about performance (or rather
memory efficiency, which falls in the same bucket.) It wasn't added
because it's better semantics-wise.

Here's one (very simplified and maybe biased) view of the history of dicts:

* 2.x: Dicts are unordered, please don't rely on the order.
* 3.3: Dict iteration order is now randomized. We told you not to rely
on it!
* 3.6: We now use an optimized implementation of dict that saves memory!
As a side effect it makes dicts ordered, but that's an implementation
detail, please don't rely on it.
* 3.7: Oops, people are now relying on dicts being ordered. Insisting on
people not relying on it is battling windmills. Also, it's actually
useful sometimes, and alternate implementations can support it pretty
easily. Let's make it a language feature! (Later it turns out
MicroPython can't support it easily. Tough luck.)

By itself, "we already made dicts do it" is not a great argument in the
set *semantics* debate.
Of course, it may turn out ordering was a great idea semantics-wise as
well, but if I'm reading correctly, so far this thread has one piece of
anectodal evidence for that.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/RFGF4IJ3RHW5QCQGY5P6IUWE336D4OU5/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

2019-12-16 Thread Guido van Rossum

On Sun, Dec 15, 2019 at 6:09 AM Serhiy Storchaka 
wrote:

> Currently str() takes up to 3 arguments. All are optional and
> positional-or-keyword. All combinations are valid:
>
> str()
> str(object=object)
> str(object=buffer, encoding=encoding)
> str(object=buffer, errors=errors)
> str(object=buffer, encoding=encoding, errors=errors)
> str(encoding=encoding)
> str(errors=errors)
> str(encoding=encoding, errors=errors)
>
> The last three are especially surprising. If you do not specify an
> object, str() ignores values of encoding and errors and returns an empty
> string.
>
> bytes() and bytearray() are more limited. Valid combinations are:
>
> bytes()
> bytes(source=object)
> bytes(source=string, encoding=encoding)
> bytes(source=string, encoding=encoding, errors=errors)
>
> I propose several changes:
>
> 1. Forbids calling str() without object if encoding or errors are
> specified. It is very unlikely that this can break a real code, so I
> propose to make it an error without a deprecation period.
>

What problem are you trying to solve with this proposal? I am only -0 on
this, but I am wondering why bother with the churn.


> 2. Make the first parameter of str(), bytes() and bytearray()
> positional-only. Originally this feature was an implementation artifact:
> before 3.6 parameters of a C implemented function should be either all
> positional-only (if used PyArg_ParseTuple), or all keyword (if used
> PyArg_ParseTupleAndKeywords). So str(), bytes() and bytearray() accepted
> the first parameter by keyword. We already made similar changes for
> int(), float(), etc: int(x=42) no longer works.
>

I am +1 on this. Your reasoning is spot on. (Note that str() must work --
all builtin types can be called without arguments and will return a "zero"
element of the right type.)


> Unlikely str(object=object) is used in a real code, so we can skip a
> deprecation period for this change too.
>

Likely.


> 3. Make encoding required if errors is specified in str(). This will
> reduce the number of possible combinations, makes str() more similar to
> bytes() and bytearray() and simplify the mental model: if encoding is
> specified, then we decode, and the first argument must be a bytes-like
> object, otherwise we convert an object to a string using __str__.
>

 I'm -0 on this. It seems that the presence of either errors= or encoding=
causes str() to switch to "decode bytes" semantics, and a default decoding
of UTF-8. That default makes sense: UTF-8 is our default source encoding,
and we are trending to use it as the default in other places. I doubt that
such calls would confuse anyone.

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RL3NJFWO6VBJYVQHUZJMIWQU5JATP725/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-16 Thread Guido van Rossum

I don't know if this works the same for sets, but for dicts, this is the
semantics that everyone has wanted for a long time. It makes doctests (and
similar things) easier to write, reduces a source of nondeterministic
failure, and removes a wart that everyone had to learn:

>>> {"foo": 1, "bar": 2, "baz": 3}
{'baz': 3, 'foo': 1, 'bar': 2}
>>>

It's essentially the same reason as why Python specifies that expressions
are (in most cases) evaluated from left to right. User don't realize that
f()+g() might call g() before f(), and write code that assumes f() is
called first -- the language should not disappoint them, optimization
opportunities be damned.

On Mon, Dec 16, 2019 at 4:23 AM Petr Viktorin  wrote:

> On 2019-12-16 06:00, Larry Hastings wrote:
> [...]
> >
> > These are all performance concerns.  As I mentioned previously in this
> > thread, in my opinion we should figure out what semantics we want for
> > the object, then implement those, and only after that should we worry
> > about performance.  I think we should decide the question "should set
> > objects maintain insertion order?" literally without any consideration
> > about performance implications.
>
> Then this thread is missing arguments saying *why* ordered dicts are
> actually better, semantics-wise.
>
> Originally, making dicts ordered was all about performance (or rather
> memory efficiency, which falls in the same bucket.) It wasn't added
> because it's better semantics-wise.
> Here's one (very simplified and maybe biased) view of the history of dicts:
>
> * 2.x: Dicts are unordered, please don't rely on the order.
> * 3.3: Dict iteration order is now randomized. We told you not to rely
> on it!
> * 3.6: We now use an optimized implementation of dict that saves memory!
> As a side effect it makes dicts ordered, but that's an implementation
> detail, please don't rely on it.
> * 3.7: Oops, people are now relying on dicts being ordered. Insisting on
> people not relying on it is battling windmills. Also, it's actually
> useful sometimes, and alternate implementations can support it pretty
> easily. Let's make it a language feature! (Later it turns out
> MicroPython can't support it easily. Tough luck.)
>
>
> By itself, "we already made dicts do it" is not a great argument in the
> set *semantics* debate.
> Of course, it may turn out ordering was a great idea semantics-wise as
> well, but if I'm reading correctly, so far this thread has one piece of
> anectodal evidence for that.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/RFGF4IJ3RHW5QCQGY5P6IUWE336D4OU5/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/N7ECHMFKSQ7E56CQYGCYMVAJWEZOMVB4/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-16 Thread Steve Dower

On 16Dec2019 0417, Petr Viktorin wrote:
Originally, making dicts ordered was all about performance (or rather
memory efficiency, which falls in the same bucket.) It wasn't added
because it's better semantics-wise.

Here's one (very simplified and maybe biased) view of the history of dicts:

For the record, we missed out on a very memory efficient "frozendict"
implementation because it can't maintain insertion order - Yury is
currently proposing it as FrozenMap in PEP 603.
https://discuss.python.org/t/pep-603-adding-a-frozenmap-type-to-collections/2318

Codifying semantics isn't always the kind of future-proof we necessarily
want to have :)

Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/Y7YP7SKSQOGOCXNXV27ZGSQDUVZRPSPH/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-16 Thread David Mertz

On Sun, Dec 15, 2019 at 11:28 PM Raymond Hettinger <
raymond.hettin...@gmail.com> wrote:

> * The corresponding mathematical concept is unordered and it would be
> weird to impose such as order.
>

I'm with Raymond in not wanting sets to maintain insertion (or any) order.
Even though I don't doubt that Larry--and no doubt other folks, from time
to time--have a use for an "ordered set," I feel like it is bad practice to
encourage that way of thinking about sets and using them.

Admittedly, I was only lukewarm about making an insertion-order guarantee
for dictionaries too. But for sets I feel more strongly opposed. Although
it seems unlikely now, if some improved implementation of sets had the
accidental side effects of making them ordered, I would still not want that
to become a semantic guarantee.

That said, having OrderedSet in collections module would be fine by me. It
might have different performance characteristics, but so what? It would be
a different class that folks could use or not, depending on how they felt
about its behavior and performance profile.

--
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons. Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/G5VFFODDT5N2HNWCTAKUEDDXJJVX7VDJ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

2019-12-16 Thread David Mertz

On Mon, Dec 16, 2019 at 4:06 AM Serhiy Storchaka 
wrote:

> 15.12.19 16:30, David Mertz пише:
> > I bet someone in the world has written code like:
> >
> > foo = str(**dynamic_args())
> >
> > And therefore, disabling "silly" combinations of arguments will break
> > their code occasionally.
>
> Do you have real world examples?
>

I do not! It wasn't me who wrote it :-).

I was really replying to the claim that there was definitely no code in the
world the proposed change would break.  I think that claim is almost surely
false.  But maybe it's little enough code that it's worth it (but I think
deprecation period is needed still).


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IN35EIUMMRGYUQGSNFO43GBPEBVP6I2V/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-16 Thread Barry Warsaw

I can’t think of a time when I’ve really needed sets to maintain insertion 
order, but thinking about it from a user’s perspective, it’s a natural leap 
from ordered dicts to ordered sets.  At least I don’t immediately think about 
sets as their mathematical equivalent, but as sets were originally proposed: 
efficient collections of value-less keys.  Before we had sets, it was common to 
use dicts with values of None to represent the same concept.

I’m fine with a decision not to change the ordering semantics of sets, but we 
should update the Language Reference to describe the language guarantees for 
both sets and dicts.  The Library Reference does document the ordering 
semantics for both dicts and sets, but I wouldn’t say that information is easy 
to find.  Maybe we can make the latter more clear too.

Cheers,
-Barry

> On Dec 16, 2019, at 09:57, David Mertz  wrote:
> 
> On Sun, Dec 15, 2019 at 11:28 PM Raymond Hettinger 
>  wrote:
> * The corresponding mathematical concept is unordered and it would be weird 
> to impose such as order.
> 
> I'm with Raymond in not wanting sets to maintain insertion (or any) order.  
> Even though I don't doubt that Larry--and no doubt other folks, from time to 
> time--have a use for an "ordered set," I feel like it is bad practice to 
> encourage that way of thinking about sets and using them.
> 
> Admittedly, I was only lukewarm about making an insertion-order guarantee for 
> dictionaries too.  But for sets I feel more strongly opposed.  Although it 
> seems unlikely now, if some improved implementation of sets had the 
> accidental side effects of making them ordered, I would still not want that 
> to become a semantic guarantee.
> 
> That said, having OrderedSet in collections module would be fine by me.  It 
> might have different performance characteristics, but so what? It would be a 
> different class that folks could use or not, depending on how they felt about 
> its behavior and performance profile.
> 
> 
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/G5VFFODDT5N2HNWCTAKUEDDXJJVX7VDJ/
> Code of Conduct: http://python.org/psf/codeofconduct/



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4SEYLWDBWC3Z2FCZNFE5PW5XTVSU52OV/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Adding a toml module to the standard lib?

2019-12-16 Thread Brett Cannon

Specifically the work to reach 1.0 is tracked in 
https://github.com/toml-lang/toml/projects/1 and I too think we should wait 
until 1.0 comes out for the exact reasons Victor laid out.

I will also mention that https://github.com/pradyunsg who is a core dev of pip 
and very active PyPA member is one of the maintainers of TOML. So I'm sure he 
will let us know when things reach 1.0 and also wouldn't mind some help 
reaching 1.0 as well. :)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OZSHTQY5BRX6PVX4DSK2N4EVRDGX5HBR/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

[Guido]
> ...
> the language should not disappoint them, optimization opportunities be damned.

I would like to distinguish between two kinds of "optimization
opportunities": theoretical ones that may or may not be exploited
some day, and those that CPython has _already_ exploited.

That is, we don't have a blank slate here. As Raymond said, the set
implementation already diverged from the dict implementation in
fundamental ways for "go fast" reasons. How much are people willing
to see set operations slow down to get this new feature?

For me, "none" ;-) Really. I have no particular use for "ordered"
sets, but have set-slinging code that benefits _today_ from the "go
fast" work Raymond did for them.

Analogy: it was always obvious that list.sort() is "better" stable
than not, but I ended up crafting a non-stable samplesort variant that
ran faster than any stable sort anyone tried to write. For years. So
we stuck with that, to avoid slowing down sorting across releases.

The stable "timsort" finally managed to be _as_ fast as the older
samplesort in almost all cases, and was very much faster in many
real-world cases. "Goes faster" was the thing that really sold it,
and so much so that its increased memory use didn't count for much
against it in comparison.

Kinda similarly, "ordered dicts" were (as has already been pointed
out) originally introduced as a memory optimization ("compact" dicts),
where the "ordered" part fell out more-or-less for free. The same
approach wouldn't save memory for sets (or so I'm told), so the
original motivation for compact dicts doesn't carry over.

So I'm +1 on ordered sets if and only if there's an implementation
that's at least as fast as what we already have. If not now, then,
like ordered dicts evolved, offer a slower OrderedSet type in the
`collections` module for those who really want it, and wait for magic
;-)

BTW, what should

{1, 2} | {3, 4, 5, 6, 7}

return as ordered sets? Beats me.; The imbalance between the
operands' cardinalities can be arbitrarily large, and "first make a
copy of the larger one, then loop over the smaller one" is the obvious
way to implement union to minimize the number of lookups needed. The
speed penalty for, e.g., considering "the left" operand's elements'
insertion orders to precede all "the right" operand's elements'
insertion orders can be arbitrarily large.

The concept of "insertion order" just doesn't make well-defined sense
to me for any operation the produces a set from multiple input sets,
unless it means no more than "whatever order happens to be used
internally to add elements to the result set". Dicts don't have
anything like that, although dict.update comes close, but in that case
the result is defined by mutating the dict via a one-at-a-time loop
over the argument dict.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/YXSGUPZYTO7TKOVVU32276M54TMITVVQ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] GitHub Actions enabled (was: Travis CI for backports not working)

2019-12-16 Thread Steve Dower


On 13Dec2019 0959, Brett Cannon wrote:

Steve Dower wrote:

If people are generally happy to move PR builds/checks to GitHub
Actions, I'm happy to merge https://github.com/zooba/cpython/pull/7
into
our active branches (with probably Brett's help) and disable Azure
Pipelines?


I'm personally up for trying this out on master, making sure everything runs 
fine, and then push down into the other active branches.


This is now running on master (and likely 3.8 and 3.7, at a guess) - you 
can see it on my PR at https://github.com/python/cpython/pull/17628 
(adding badges and making a tweak to when the builds run)


The checks are not required yet - that requires admin powers.

Please just shout out, either here on at 
https://bugs.python.org/issue39041 if you see anything not working.


(Apart from post-merge coverage checks. We know they're not working - 
see https://github.com/python/cpython/runs/351136928 - and if you'd like 
to help fix it you're more than welcome to jump in!)


Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/R6VOXOL2HARK6ZHD7OWE4UP7PWTT5A4N/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()


16.12.19 18:35, Guido van Rossum пише:
On Sun, Dec 15, 2019 at 6:09 AM Serhiy Storchaka > wrote:


1. Forbids calling str() without object if encoding or errors are
specified. It is very unlikely that this can break a real code, so I
propose to make it an error without a deprecation period.


What problem are you trying to solve with this proposal? I am only -0 on 
this, but I am wondering why bother with the churn.


Initially I wanted to check the documentation and the docstrings of 
str() and fix it if needed. It was inspired by the Discourse topic [1]. 
I have found that in contrary to the OP's claim the documentation is 
correct, but the docstring is not.


The documentation is correct (because Chris Jerdonek accurately 
documented the actual behavior in 2012 [2]), but ambiguous.


str(object='')
str(object=b'', encoding='utf-8', errors='strict')

0- and 1-argument calls match both signatures. Also it implies that 
str(encoding='ascii') and str(errors='ignore') are valid, and this is 
true! And more, str(encoding='spam') and str(errors='ham') are valid 
too, because the values of encoding and errors are ignored. I cannot 
imagine a use case for this. It looks like an implementation artifact.


The docstring is left not fixed.

str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

It uses different names for the first parameter (it would not matter if 
it would be positional-only), it requires bytes_or_buffer for decoding, 
it requires encoding if errors is passed.


So my goal is to remove glitches which are not used in a real code in 
any case, and make the behavior closer to the initial intention.  If 
apply all three my proposition, signatures would look like:


str(object='', /) -> str
str(bytes_or_buffer, /, encoding, errors='strict') -> str

Almost the same as for bytes:

bytes(object=b'', /) -> bytes
bytes(string, /, encoding, errors='strict') -> bytes

[1] https://discuss.python.org/t/str-mybytes-wrong-docs/2866
[2] https://bugs.python.org/issue13538



3. Make encoding required if errors is specified in str(). This will
reduce the number of possible combinations, makes str() more similar to
bytes() and bytearray() and simplify the mental model: if encoding is
specified, then we decode, and the first argument must be a bytes-like
object, otherwise we convert an object to a string using __str__.


  I'm -0 on this. It seems that the presence of either errors= or 
encoding= causes str() to switch to "decode bytes" semantics, and a 
default decoding of UTF-8. That default makes sense: UTF-8 is our 
default source encoding, and we are trending to use it as the default in 
other places. I doubt that such calls would confuse anyone.


This proposition is the one about which I am not sure. On one side, the 
bytes() constructor requires encoding for decoding. On other side, it is 
optional in str.encode() and bytes.decode(). But str.encode() and 
bytes.decode() have only one function, so you can omit both encoding and 
errors without ambiguity.


If we allow str(bytes_or_buffer, errors=errors), should not we allow 
also bytes(string, errors=errors)?

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OXPP7HTFU32VXE3LMSICPB57V5KHM4PW/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: GitHub Actions enabled (was: Travis CI for backports not working)

2019-12-16 Thread Gregory P. Smith

On Mon, Dec 16, 2019 at 11:11 AM Steve Dower  wrote:

> On 13Dec2019 0959, Brett Cannon wrote:
> > Steve Dower wrote:
> >> If people are generally happy to move PR builds/checks to GitHub
> >> Actions, I'm happy to merge https://github.com/zooba/cpython/pull/7
> >> into
> >> our active branches (with probably Brett's help) and disable Azure
> >> Pipelines?
> >
> > I'm personally up for trying this out on master, making sure everything
> runs fine, and then push down into the other active branches.
>
> This is now running on master (and likely 3.8 and 3.7, at a guess) - you
> can see it on my PR at https://github.com/python/cpython/pull/17628
> (adding badges and making a tweak to when the builds run)
>
> The checks are not required yet - that requires admin powers.
>
> Please just shout out, either here on at
> https://bugs.python.org/issue39041 if you see anything not working.
>

neat, thanks Steve!


>
> (Apart from post-merge coverage checks. We know they're not working -
> see https://github.com/python/cpython/runs/351136928 - and if you'd like
> to help fix it you're more than welcome to jump in!)
>
> Cheers,
> Steve
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/R6VOXOL2HARK6ZHD7OWE4UP7PWTT5A4N/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QH5EA45GGDJVVZA44TZDP3OPJECP6AKX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-16 Thread Victor Stinner

That looks quite interesting. It looks like compact dict optimization
applied to set. I had the same idea :-)

If it reduces the memory footprint, keep insertion order and has low
performance overhead, I would be an interesting idea!

Victor

Le lun. 16 déc. 2019 à 07:56, Inada Naoki  a écrit :
>
> On Mon, Dec 16, 2019 at 1:33 PM Guido van Rossum  wrote:
> >
> > Actually, for dicts the implementation came first.
> >
>
> I had tried to implement the Ordered Set.  Here is the implementation.
> https://github.com/methane/cpython/pull/23
>
> Regards,
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/SGDD47GTMS7OGIEZTLLXEYHABL5OS4EN/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/C4IQW5OHLTGWJ7I6EAZ6S6XYQPONGVAV/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-16 Thread David Cuthbert via Python-Dev

On Mon 12/16/19, 9:59 AM, "David Mertz"
mailto:me...@gnosis.cx>> wrote:
Admittedly, I was only lukewarm about making an insertion-order guarantee for
dictionaries too. But for sets I feel more strongly opposed. Although it
seems unlikely now, if some improved implementation of sets had the accidental
side effects of making them ordered, I would still not want that to become a
semantic guarantee.

Eek… No accidental side effects whenever possible, please. People come to rely
upon them (like that chemistry paper example[*]), and changing the assumptions
results in a lot of breakage down the line. Changing the length of AWS
identifiers (e.g. instances from i-1234abcd to i-0123456789abcdef) was a huge
deal; even though the identifier length was never guaranteed, plenty of folks
had created database schemata with VARCHAR(10) for instance ids, for example.

Break assumptions from day 1. If some platforms happen to return sorted
results, sort the other platforms or toss in a sorted(key=lambda el:
random.randomint()) call on the sorting platform. If you’re creating custom
identifiers, allocate twice as many bits as you think you’ll need to store it.

Yes, this is bad user code, but I’m all for breaking bad user code in obvious
ways sooner rather than subtle ways later, especially in a language like Python.

[*]
https://arstechnica.com/information-technology/2019/10/chemists-discover-cross-platform-python-scripts-not-so-cross-platform/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/TKVGVH5GHOL3YH7W55MEU2PHASSPY74M/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-16 Thread David Mertz

On Mon, Dec 16, 2019, 7:35 PM David Cuthbert  wrote:

> On Mon 12/16/19, 9:59 AM, "David Mertz"  wrote:
>
If some improved implementation of sets had the accidental side effects of
> making them ordered, I would still not want that to become a semantic
> guarantee.
>
> Eek… No accidental side effects whenever possible, please. People come to
> rely upon them (like that chemistry paper example[*]), and changing the
> assumptions results in a lot of breakage down the line.
>
I'm not sure what point you are making really. Any particular
implementation will have some behaviors that are not guaranteed by the
language spec. I wouldn't want to PROHIBIT a set implementation that
preserved insertion order. Nor, for example, would I want to prohibit one
that stored string elements in alphabetical order, if that somehow had an
independent performance advantage. But that's very different from wanting
to REQUIRE sets to iterate in alphabetical order. If they did that, it
wouldn't be *wrong*, but it also shouldn't be something we rely on.

> [*]
> https://arstechnica.com/information-technology/2019/10/chemists-discover-cross-platform-python-scripts-not-so-cross-platform/
>
> This is interesting. But it's a programming error that Python, or any
programming language, can not perfect against. glob.glib() does not promise
to list matching files in any specific order. If the authors wrote code
whose results vary based on the particular order files are processed in,
that's a bug. It's their responsibility to order the files appropriately.

Obviously, glob COULD alphabetize it's results. But that order is generally
different from ordering by creation time. Which is, in turn, different from
ordering by modification time or file size. I don't want to prohibit glob
from doing any of these if the filesystem happens to make such easier (this
is really a filesystem question more than an OS question). But I also don't
want to make the order part of the semantics of the function... Nor do
extra work to "randomize" the order to avoid some pattern that may happen
to exist on some platform.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PHXPPDOMX4UHSGCAEO5APGC5KCUPEANM/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?



On 12/16/19 10:59 AM, Tim Peters wrote:

BTW, what should
 {1, 2} | {3, 4, 5, 6, 7}

return as ordered sets?  Beats me.;


The obvious answer is {1, 2, 3, 4, 5, 6, 7}.  Which is the result I got 
in Python 3.8 just now ;-)  But that's just a side-effect of how hashing 
numbers works, the implementation, etc.  It's rarely stable like this, 
and nearly any other input would have resulted in the scrambling we all 
(currently) expect to see.


>>> {"apples", "peaches", "pumpkin pie"} | {"who's", "not",
   "ready", "holler", "I" }
   {'pumpkin pie', 'peaches', 'I', "who's", 'holler', 'ready',
   'apples', 'not'}


//arry/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6ZNPKM5HFK76DZC2QF3SHS6RQWYZKZ6X/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

[Tim]
> BTW, what should
>
> {1, 2} | {3, 4, 5, 6, 7}
>
> return as ordered sets?  Beats me.;

[Larry]
> The obvious answer is {1, 2, 3, 4, 5, 6, 7}.

Why?  An obvious implementation that doesn't ignore performance entirely is:

def union(smaller, larger):
if len(larger) < len(smaller):
smaller, larger = larger, smaller
result = larger.copy()
for x in smaller:
   result.add(x)

In the example, that would first copy {3, 4, 5, 6, 7}, and then add 1
and 2 (in that order) to it, giving {3, 4, 5, 6, 7, 1, 2} as the
result.

If it's desired that "insertion order" be consistent across runs,
platforms, and releases, then what "insertion order" _means_ needs to
be rigorously defined & specified for all set operations.  This was
comparatively trivial for dicts, because there are, e.g., no
commutative binary operators defined on dicts.

If you want to insist that `a | b` first list all the elements of a,
and then all the elements of b that weren't already in a, regardless
of cost, then you create another kind of unintuitive surprise:  in
general the result of "a | b" will display differently than the result
of "b | a" (although the results will compare equal), and despite that
the _user_ didn't "insert" anything.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OU37ZU46BCHI6HLA7E3NEWCDOLQOHRNF/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

[Raymond]
> ...
> * The ordering we have for dicts uses a hash table that indexes into a 
> sequence.
> That works reasonably well for typical dict operations but is unsuitable for 
> set
> operations where some common use cases make interspersed additions
> and deletions (that is why the LRU cache still uses a cheaply updated doubly-l
> linked list rather that deleting and reinserting dict entries).

I'm going to take a stab at fleshing that out a bit:  ideally, an
ordered dict stores hash+key+value records in a contiguous array with
no "holes".  That's ("no holes") where the memory savings comes from.
The "holes" are in the hash table, which only hold indices _into_ the
contiguous array (and the smallest width of C integer indices
sufficient to get to every slot in the array).

"The problem" is that deletions leave _two_ kinds of holes:  one in
the hash table, and another in the contiguous vector.  The latter
holes cannot be filled with subsequent new hash+key+value records
because that would break insertion order.

So in an app that mixes additions and deletions, the ordered dict
needs to be massively rearranged at times to squash out the holes left
behind by deletions, effectively moving all the holes to "the end",
where they can again be used to reflect insertion order.

Unordered dicts never had to be rearranged unless the total size
changed "a lot", and that remains true of the set implementation.  But
in apps mixing adds and deletes, ordered dicts can need massive
internal rearrangement at times even if the total size never changes
by more than 1.

Algorithms doing a lot of mixing of adds and deletes seem a lot more
common for sets than for dicts, and so the ordered dict's basic
implementation _approach_ is a lot less suitable for sets.  Or, at
least, that's my best attempt to flesh out Raymond's telegraphic
thinking there.

Note:  the holes left by deletions _wouldn't_ be "a problem" _except_
for maintaining insertion order.  If we were only after the memory
savings, then on deletion "the last" record in the contiguous array
could be moved into the hole at once, leaving the array hole-free
again.  But that also changes the order.  IIRC, that's what the
original "compact dict" _did_ do.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QQXSSJWHKTUNHSMSHVM7XLMDBMUV7BDX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

[Petr Viktorin ]
> ...
> Originally, making dicts ordered was all about performance (or rather
> memory efficiency, which falls in the same bucket.) It wasn't added
> because it's better semantics-wise.

As I tried to flesh out a bit in a recent message, the original
"compact dict" idea got all the memory savings, but did _not_ maintain
insertion order.  Maintaining insertion order too complicated
deletions (see recent message), but was deliberately done because
people really did want "ordered dicts".

> Here's one (very simplified and maybe biased) view of the history of dicts:
>
> * 2.x: Dicts are unordered, please don't rely on the order.
> * 3.3: Dict iteration order is now randomized. We told you not to rely
> on it!
> * 3.6: We now use an optimized implementation of dict that saves memory!
> As a side effect it makes dicts ordered, but that's an implementation
> detail, please don't rely on it.
> * 3.7: Oops, people are now relying on dicts being ordered. Insisting on
> people not relying on it is battling windmills. Also, it's actually
> useful sometimes, and alternate implementations can support it pretty
> easily. Let's make it a language feature! (Later it turns out
> MicroPython can't support it easily. Tough luck.)

A very nice summary!  My only quibble is as above:  the "compact dict"
implementation doesn't maintain insertion order naturally, _unless_
there are no deletions (which is, e.g., true of dicts constructed to
pass keyword arguments).  The code got hairier to maintain insertion
order in the presence of mixing insertions and deletions.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TRVOCOLQGSOM2OLLAI3UPRCTFIKIWWH6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

2019-12-16 Thread Guido van Rossum

On Mon, Dec 16, 2019 at 12:04 PM Serhiy Storchaka 
wrote:

> 16.12.19 18:35, Guido van Rossum пише:
> > On Sun, Dec 15, 2019 at 6:09 AM Serhiy Storchaka  > > wrote:
> >
> > 1. Forbids calling str() without object if encoding or errors are
> > specified. It is very unlikely that this can break a real code, so I
> > propose to make it an error without a deprecation period.
> >
> >
> > What problem are you trying to solve with this proposal? I am only -0 on
> > this, but I am wondering why bother with the churn.
>
> Initially I wanted to check the documentation and the docstrings of
> str() and fix it if needed. It was inspired by the Discourse topic [1].
> I have found that in contrary to the OP's claim the documentation is
> correct, but the docstring is not.
>

So let's fix the docstring.

The documentation is correct (because Chris Jerdonek accurately
> documented the actual behavior in 2012 [2]), but ambiguous.
>
>  str(object='')
>  str(object=b'', encoding='utf-8', errors='strict')
>

Honestly this notation leaves a lot unsaid. Apparently the first form
allows `object` to have any type, while the second only allows it to be
bytes (or bytearray, or memoryview, or presumably anything that supports
the buffer protocol?). And it appears unnecessary to specify a default in
the first case -- then the 0-args form would only match the second pattern.


> 0- and 1-argument calls match both signatures. Also it implies that
> str(encoding='ascii') and str(errors='ignore') are valid, and this is
> true!


And the docs spell this out clearly enough that I don't see any reason to
change it. This is a function that is *so* common that *any* tweak we make
to it will break someone's code.


> And more, str(encoding='spam') and str(errors='ham') are valid
> too, because the values of encoding and errors are ignored. I cannot
> imagine a use case for this. It looks like an implementation artifact.
>

But again one that we can't change.

At least for errors='ham', this seems to be the case for all
encoding/decoding functions -- the error handler is looked up lazily, and
an empty input string doesn't need it. b''.decode(errors="ham") acts the
same way.

In fact, it's the same for b.decode(encoding='spam'). So str() is not
special here, and I recommend keeping it that way.


> The docstring is left not fixed.
>
>  str(object='') -> str
>  str(bytes_or_buffer[, encoding[, errors]]) -> str
>
> It uses different names for the first parameter (it would not matter if
> it would be positional-only), it requires bytes_or_buffer for decoding,
> it requires encoding if errors is passed.
>
> So my goal is to remove glitches which are not used in a real code in
> any case, and make the behavior closer to the initial intention.  If
> apply all three my proposition, signatures would look like:
>
>  str(object='', /) -> str
>  str(bytes_or_buffer, /, encoding, errors='strict') -> str
>
> Almost the same as for bytes:
>
>  bytes(object=b'', /) -> bytes
>  bytes(string, /, encoding, errors='strict') -> bytes
>

bytes() and str() just aren't each other's opposite -- bytes() really only
takes str input, but str() takes any input. So there's always going to be a
discrepancy. I now think the current behavior should not change.


> [1] https://discuss.python.org/t/str-mybytes-wrong-docs/2866
> [2] https://bugs.python.org/issue13538
>
>
> > 3. Make encoding required if errors is specified in str(). This will
> > reduce the number of possible combinations, makes str() more similar
> to
> > bytes() and bytearray() and simplify the mental model: if encoding is
> > specified, then we decode, and the first argument must be a
> bytes-like
> > object, otherwise we convert an object to a string using __str__.
> >
> >
> >   I'm -0 on this. It seems that the presence of either errors= or
> > encoding= causes str() to switch to "decode bytes" semantics, and a
> > default decoding of UTF-8. That default makes sense: UTF-8 is our
> > default source encoding, and we are trending to use it as the default in
> > other places. I doubt that such calls would confuse anyone.
>
> This proposition is the one about which I am not sure. On one side, the
> bytes() constructor requires encoding for decoding. On other side, it is
> optional in str.encode() and bytes.decode(). But str.encode() and
> bytes.decode() have only one function, so you can omit both encoding and
> errors without ambiguity.
>
> If we allow str(bytes_or_buffer, errors=errors), should not we allow
> also bytes(string, errors=errors)?
>

Not necessarily. There's an old saying in PEP 8 about foolish consistency...

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send

[Python-Dev] Re: Should set objects maintain insertion order too?

On 12/16/19 6:30 PM, Tim Peters wrote:

If it's desired that "insertion order" be consistent across runs,
platforms, and releases, then what "insertion order" _means_ needs to
be rigorously defined & specified for all set operations.  This was
comparatively trivial for dicts, because there are, e.g., no
commutative binary operators defined on dicts.

My intuition is that figuring out sensible semantics here is maybe not 
trivial, but hardly impossible.  If this proposal goes anywhere I'd be 
willing to contribute to figuring it out.

If you want to insist that `a | b` first list all the elements of a,
and then all the elements of b that weren't already in a, regardless
of cost, then you create another kind of unintuitive surprise:  in
general the result of "a | b" will display differently than the result
of "b | a" (although the results will compare equal), and despite that
the _user_ didn't "insert" anything.

Call me weird--and I won't disagree--but I find nothing unintuitive 
about that.  After all, that's already the world we live in: there are 
any number of sets that compare as equal but display differently.  In 
current Python:

>>> a = {'a', 'b', 'c'}
>>> d = {'d', 'e', 'f'}
>>> a | d
   {'f', 'e', 'd', 'a', 'b', 'c'}
>>> d | a
   {'f', 'b', 'd', 'a', 'e', 'c'}
>>> a | d == d | a
   True

This is also true for dicts, in current Python, which of course do 
maintain insertion order.  Dicts don't have the | operator, so I 
approximate the operation by duplicating the dict (which AFAIK preserves 
insertion order) and using update.

>>> aa = {'a': 1, 'b': 1, 'c': 1}
>>> dd = {'d': 1, 'e': 1, 'f': 1}
>>> x = dict(aa)
>>> x.update(dd)
>>> y = dict(dd)
>>> y.update(aa)
>>> x
   {'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1, 'f': 1}
>>> y
   {'d': 1, 'e': 1, 'f': 1, 'a': 1, 'b': 1, 'c': 1}
>>> x == y
   True

Since dicts already behave in exactly that way, I don't think it would 
be too surprising if sets behaved that way too.  In fact, I think it's a 
little surprising that they behave differently, which I suppose was my 
whole thesis from the beginning.

Cheers,

//arry/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/76LFOPFMT6EIQXAHBIQMI2EZRQAONTZ7/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?



On 12/16/19 7:43 PM, Tim Peters wrote:

[Petr Viktorin ]

Here's one (very simplified and maybe biased) view of the history of dicts:

* 2.x: Dicts are unordered, please don't rely on the order.
* 3.3: Dict iteration order is now randomized. We told you not to rely
on it!
* 3.6: We now use an optimized implementation of dict that saves memory!
As a side effect it makes dicts ordered, but that's an implementation
detail, please don't rely on it.
* 3.7: Oops, people are now relying on dicts being ordered. Insisting on
people not relying on it is battling windmills. Also, it's actually
useful sometimes, and alternate implementations can support it pretty
easily. Let's make it a language feature! (Later it turns out
MicroPython can't support it easily. Tough luck.)

A very nice summary!  My only quibble is as above:  the "compact dict"
implementation doesn't maintain insertion order naturally, _unless_
there are no deletions (which is, e.g., true of dicts constructed to
pass keyword arguments).  The code got hairier to maintain insertion
order in the presence of mixing insertions and deletions.



Didn't some paths also get slightly slower as a result of 
maintaining insertion order when mixing insertions and deletions? My 
recollection is that that was part of the debate--not only "are we going 
to regret inflicting these semantics on posterity, and on other 
implementations?", but also "are these semantics worth the 
admittedly-small performance hit, in Python's most important and 
most-used data structure?".


Also, I don't recall anything about us resigning ourselves to explicitly 
maintain ordering on dicts because people were relying on it, "battling 
windmills", etc.  Dict ordering had never been guaranteed, a lack of 
guarantee Python had always taken particularly seriously.  Rather, we 
decided it was a desirable feature, and one worth pursuing even at the 
cost of a small loss of performance.  One prominent Python core 
developer** wanted this feature for years, and I recall them saying 
something like:


   Guido says, "When a programmer iterates over a dictionary and they
   see the keys shift around when the dictionary changes, they learn
   something!"  To that I say--"Yes!  They learn that Python is
   unreliable and random!"


//arry/

** I won't give their name here because I fear I'm misquoting everybody 
involved.  Apologies in advance if that's the case!


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WRL7DAFCXPMZNU5J5GRG6HOPTCQJYKDV/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

[Tim]
>> If it's desired that "insertion order" be consistent across runs,
>> platforms, and releases, then what "insertion order" _means_ needs to
>> be rigorously defined & specified for all set operations.  This was
>> comparatively trivial for dicts, because there are, e.g., no
>> commutative binary operators defined on dicts.

[Larry]
> My intuition is that figuring out sensible semantics here is maybe not
> trivial, but hardly impossible.  If this proposal goes anywhere I'd be
> willing to contribute to figuring it out.

No, it _is_ easy.  It's just tedious, adding piles of words to the
docs, and every constraint also constrains possible implementations.
You snipped my example of implementing union, which you should really
think about instead ;-)


>> If you want to insist that `a | b` first list all the elements of a,
>> and then all the elements of b that weren't already in a, regardless
>> of cost, then you create another kind of unintuitive surprise:  in
>> general the result of "a | b" will display differently than the result
>> of "b | a" (although the results will compare equal), and despite that
>> the _user_ didn't "insert" anything.

> Call me weird--and I won't disagree--but I find nothing unintuitive about 
> that.
>  After all, that's already the world we live in: there are any number of sets
> that compare as equal but display differently.  In current Python:
>
> >>> a = {'a', 'b', 'c'}
> >>> d = {'d', 'e', 'f'}
> >>> a | d
> {'f', 'e', 'd', 'a', 'b', 'c'}
> >>> d | a
> {'f', 'b', 'd', 'a', 'e', 'c'}

Yup, it happens   But under the sample union implementation I gave, it
would never happen when the sets had different cardinalities (the
different sizes are used to force a "standard" order then).  For
mathematical sets, | is commutative (it makes no difference to the
result if the arguments are swapped - but can make a _big_ difference
to implementation performance unless the implementation is free to
pick the better order).

> ...
> This is also true for dicts, in current Python, which of course do maintain
> insertion order.  Dicts don't have the | operator, so I approximate the
> operation by duplicating the dict (which AFAIK preserves insertion order)

Ya, it does, but I don't believe that's documented (it should be).

> and using update.

Too different to be interesting here - update() isn't commutative.
For sets, union, intersection, and symmetric difference are
commutative.

> ...
> Since dicts already behave in exactly that way, I don't think it would be too
> surprising if sets behaved that way too.  In fact, I think it's a little 
> surprising
> that they behave differently, which I suppose was my whole thesis from
> the beginning.

I appreciate that dicts and sets behave differently in visible ways
now.  It just doesn't bother me enough that I'm cool with slowing set
operations to "fix that".
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/L2VGSHROPMW4BV6ORYMFKQ4ZJ5AXTZLE/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

2019-12-16 Thread Terry Reedy




The docstring is left not fixed.

     str(object='') -> str
     str(bytes_or_buffer[, encoding[, errors]]) -> str


I noticed this too; the doc and docstring should be made to agree with 
each other and the code.


While exploring the actual behavior, I discovered that while the 
presence of encoding triggers decoding of bytes, it is not needed and 
hence not checked for null bytes.  Hence an invalid encoding is OK in 
this edge case.


>>> b''.decode('0')
''
>>> str(b'','0')
''
>>> str(b'')
"b''"

Should this be at least tested if not documented?  (So that other 
implementations know to check the bytes value before the encoding value?)



--
Terry Jan Reedy

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SGM6JGK3DEAEV2QMU6T6BLB656YRTEAQ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

[Larry]
> Didn't some paths also get slightly slower as a result of maintaining
> insertion order when mixing insertions and deletions?

I paid no attention at the time.  But in going from "compact dict" to
"ordered dict", deletion all by itself got marginally cheaper.  The
downside was the need to rearrange the whole dict when too many
"holes" built up.  "Compact (but unordered) dict" doesn't need that.

> My recollection is that that was part of the debate--not only "are we going to
> regret inflicting these semantics on posterity, and on other 
> implementations?",
> but also "are these semantics worth the admittedly-small performance hit, in
> Python's most important and most-used data structure?".

There's a layer of indirection in compact dicts - lookups are
distributed across two arrays.  In non-compact unordered dicts,
everything is in a single array.  Cache effects may or may not make a
measurable difference then, depending on all sorts of things.

> Also, I don't recall anything about us resigning ourselves to explicitly
> maintain ordering on dicts because people were relying on it, "battling
> windmills", etc.  Dict ordering had never been guaranteed, a lack
> of guarantee Python had always taken particularly seriously.  Rather, we
> decided it was a desirable feature, and one worth pursuing even at the
> cost of a small loss of performance.

I'm not convinced there was a loss of performance.  The delay between
the implementation supplying ordered dicts, and the language
guaranteeing it, was, I suspect, more a matter of getting extensive
real-world experience about whether the possible new need to massively
rearrange dict internals to remove "holes" would bite someone too
savagely to live with.  But, again, I wasn't paying attention at the
time.

> One prominent Python core developer** wanted this feature for years, and I 
> recall
> them saying something like:
>
> Guido says, "When a programmer iterates over a dictionary and they see the 
> keys
> shift around when the dictionary changes, they learn something!"  To that I 
> say--"Yes!
> They learn that Python is unreliable and random!"

I never wanted ordered dicts, but never objected to them either.  All
in all, given that _I_ haven't seen a performance degradation, I like
that they're more predictable, and love the memory savings.

But as Raymond said (& I elaborated on), there's reason to believe
that the implementation of ordered dicts is less suited to sets, where
high rates of mixing adds and deletes is more common (thus triggering
high rates of massive internal dict rearranging).  Which is why I said
much earlier that I'd be +1 on ordered sets only when there's an
implementation that's as fast as what we have now.

Backing up:

> Python is the language where speed, correctness, and readability trump
> performance every time.

Speed trumping performance didn't make any sense ;-)

So assuming you didn't intend to type "speed", I think you must have,
say, Scheme or Haskell in mind there.  "Practicality beats purity" is
never seen on forums for those languages.  Implementation speed & pain
have played huge rules in many Python decisions.  As someone who has
wrestled with removing the GIL, you should know that better than
anyone ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QK3KERIPYD3Q3XNKZJBQBQ6NUUKT63WN/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?