[Python-Dev] Re: PEP 467 feedback from the Steering Council

Gregory P. Smith Sun, 22 Aug 2021 14:47:44 -0700

On Tue, Aug 10, 2021 at 3:48 PM Christopher Barker <python...@gmail.com>
wrote:

> On Tue, Aug 10, 2021 at 3:00 PM <raymond.hettin...@gmail.com> wrote:
>
>> The history of bytes/bytearray is a dual-purpose view.  It can be used in
>> a string-like way to emulate Python 2 string handling (hence all the usual
>> string methods and a repr that displays in a string-like fashion).  It can
>> also be used as an array of numbers, 0 to 255 (hence the list methods and
>> having an iterator of ints).  ISTM that the authors of this PEP reject or
>> want to discourage the latter use cases.
>>
>
> I didn't read it that way, but if so, please no, I"d rather see the former
> use cases discouraged. ISTM that the Py2 string handling is still needed
> for working with mixed binary / text data -- but that should be a pretty
> specialized use case. spelling the way to create a byte, byte() sure makes
> more sense in any other context.
>
>
>> ... anything where a C programmer would an array of unsigned chars).
>>
>
> or any programmer would use an array of unsigned 8bit integers :-) numpy
> spells it: `np.uint8`, and the the type in the C99 stdint.h is `uint8_t`.
> My point is that for anyone not an "old time" C programmer, or even a
> Python2 programmer, the  "character is an unsigned 8 bit int" concept is
> alien and confusing, not a helpful mnemonic.
>
>
>> For example, creating a single byte with bytes([0x1f]) isn't pleasant,
>> obvious, or fast.
>>
>
> no, though bytes([31]) isn't horrible ;-)   (despite coding for over four
> decades, I'm still not comfortable with hex notation)
>
> I say it's not horrible, because bytes is a Sequence of bytes (or integer
> values between 0 and 255), initializing it with an iterable seems pretty
> reasonable, that's how we initialize most (all?) other sequences after all.
> And compatible with array.array and numpy arrays.
>

I consider bytes([31]) notation to be horrible API design because a simple
easy to make typo of omitting the [] or using () and forgetting the
tupleizing comma turns it into a different valid call with an entirely
different meaning.  bytes([31]) vs bytes((31)) vs bytes(31).

It's also ugly to anyone who thinks about what bytecode is generated and
executed in order to do it.  an entire new list object with a single
element referring to a tiny int is created and destroyed just to create a
b'\037' object?  An optimizer pass to fix that up at the bytecode level
isn't easy as it can only be done when it can prove that `bytes` has not
been reassigned to something other than the builtin.  Near impossible in a
lot of code.  bytes.fromint(31) isn't much better in the bytecode regard,
but at least a temporary list is not being created.

As much as I think that bytes(size: int) was a bad idea to have as an API -
bytearray(size: int) is fine and useful as it is mutable - that ship sailed
and getting rid of it would break some odd code.  It doesn't have much use,
so adding fromsize(size: int) methods don't sound very compelling as it
just adds yet another way to do the same thing.  we should just live with
that specific wart.

`bchr` as a builtin... I'm with the others on saying no to any new builtin
that isn't expected to see frequent use.  bchr won't see frequent use.

`bytes.fromint` seems fine.  others are proposing `bytes.byte` for that.  I
don't *like* to argue over names (the last stage of anything) but I do need
to point out how that sounds to read.  It falls victim to API stuttering.
"bytes dot byte" or "bytes byte" doesn't convey much to a reader in English
as the difference is a subtle "s".  "bytes dot from int" or "bytes from
int" is quite clear.  (avoiding stuttering in API design was popularized by
golang - it's a good thing to strive for in any language)  It's times like
this that i wish Python had chosen consistent camelCase, CapWords, or
snake_case in all API names as conjoinedwords aren't great. But they are
sadly consistent with our past sins.

One thing never mentioned in the PEP.  If you expect a primary use of the
fromint (aka bchr builtin that isn't going to happen) to be called on
constant values often.  Why are we adding name lookups and function calls
to this?  Why not address the elephant in the room and allow for decimal
values to be written as an escape sequence within bytes literals?

b'\d31' for example to say "decimal byte 31".  Proposal: Only values 0-255
with no leading zero should be accepted when parsing such an escape.  (Do
not bother adding the same feature for codepoints in unicode strs; leave
that to later if someone shows actual demand).  This can't address the
bytearray need, but that's been true of bytearray for ages, a common way to
create them is via a copy from transient bytes objects.  bytearray(b'\d31')
isn't much different than bytearray.fromint(31).  one less name lookup.

Why not add a \d escape? Introducing a new escape is fraught with peril as
existing \d's within b'' literals in code could change meaning.  backwards
compatibility fail.  But one that is easy to check for with a
DeprecationWarning for a few releases...  The new literal parsing could be
enabled per-file with a __future__ import.

-gps

> -CHB
>
>
> --
> Christopher Barker, PhD (Chris)
>
> Python Language Consulting
>   - Teaching
>   - Scientific Software Development
>   - Desktop GUI and Web Development
>   - wxPython, numpy, scipy, Cython
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/RM4JHK4GIKYYWV7J5F6IQJ66KUIXWMMF/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DGJWM3VMNMDBUTGYG72H5WLKDWBYFSUV/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PEP 467 feedback from the Steering Council

Reply via email to