Re: [Python-Dev] PEP 460 reboot

Terry Reedy Mon, 13 Jan 2014 18:27:41 -0800

On 1/13/2014 4:32 PM, Guido van Rossum wrote:

> I will doggedly keep posting to this thread rather than creating morethreads.

Please permit to to doggedly keep pointing you toward the possiblesolution I posted on the tracker last October.

But formatb() feels absurd to me. PEP 460 has neither a precise
specification or any actual examples, so I can't tell whether the

Two days ago, I reposted byteformat() here on pydev with a precise textspecification added to the code, and with an expanded test example. Ihave just added another example based on your question below.

intention is that the format string can *only* contain {...} sequences
or whether it can also contain "regular" characters. Translating to
formatb(), my question comes down to the legality of the following
example:

   b'Hello, {}'.formatb(name)  # Where name is some bytes object

If this is allowed, it reintroduces the ASCII bias (since the
substring 'Hello' is clearly ASCII).

Since byteformat() uses re to find {<format-spec>} replacement fields,it only has such ascii bias as re has, which I believe is not much, ifany. As far as re and byteformat are concerned, everything outside ofthe {...} fields is uninterpreted bytes. As far as bytes.join isconcerned, both joiner and joined are uninterpreted bytes.


>>> byteformat(b'\x00{}\x02{}def', (b'\x01', b'abc',))
b'\x00\x01\x02abcdef'

re.split produces [b'\x00', b'', b'\x02', b'', b'def']. The only asciibias is the one already present is the representation of bytes, and thefact that Python code must have an ascii-compatible encoding.


The advantage of
byteformat(b'\x00{}\x02{}def', (b'\x01', b'abc',))
over directly writing
b''.join([b'\x00', b'\x01', b'\x02', b'abc', b'def']

is that one does not have to manually split the presumably constanttemplate into chunks and interleave them with the presumable variablechunks.

Here is the example that I used for testing, including non-blank formatspecs.

bformat = b"bytes: {}; bytearray: {:}; unicode: {:s}; int: {:5d}; float:{:7.2f}; end"

objects = (b'abc', bytearray(b'def'), u'ghi', 123, 12.3)
result = byteformat(bformat, objects)
>>>
b'bytes: abc; bytearray: def; unicode: ghi; int:   123; float:   12.30; end'

The additional advantage here is the automatic encoding of formattedstrings to bytes. As posted, byteformat() uses the str.encode defaults(encoding='utf-8', errors='strict'). But as I said in the post, thesecould become parameters to the function that are passed on to str.encode.

The design reuses re.split, bytes.join, format, and the formatspecification. By re-using the format-spec as is, the only new thing tolearn is that blank specs correspond to bytes instead of strings. Thisis easier to design, implement, and learn than if the format-spec islimited to disallow some things (after much bike-shedding over what toeliminate ;-).


I would appreciate your comment on this proposal.

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 460 reboot

Reply via email to