Re: [Python-Dev] PEP 460 reboot

Mark Shannon Mon, 13 Jan 2014 01:51:09 -0800


On 13/01/14 09:19, Glenn Linderman wrote:

On 1/13/2014 12:46 AM, Mark Shannon wrote:

On 13/01/14 03:47, Guido van Rossum wrote:

On Sun, Jan 12, 2014 at 6:24 PM, Ethan Furman <et...@stoneleaf.us> wrote:

On 01/12/2014 06:16 PM, Ethan Furman wrote:



If you do :

--> b'%s' % 'some text'



Ignore what I previously said.  With no encoding the result would be:

b"'some text'"

So an encoding should definitely be specified.


Yes, but the encoding is no business of %s or %. As far as the
formatting operation cares, if the argument is bytes they will be
copied literally, and if the argument is a str (or anything else) it
will call ascii() on it.


It seems to me that what people want from '%s' is:
Convert to a str then encode as ascii for non-bytes
or copy directly for bytes.


Maybe. But it only takes a small tweak to the parameter to get what they 
want... a tweak that works in both Python 2.7 and Python 
3.whatever-version-gets-this.

Instead of

b"%s" % foo

they must use

b"%s"  % foo.encode( explicitEncoding )

which is what they should have been doing in Python 2.7 all along, and if they 
were, they need make no change.

Oh, foo was a Python 2.7 str? Converted to Python 3.x str, by default 
conversion rules? Already in ASCII? No harm.
Oh, foo was a literal? Add b prefix, instead of the .encode("ASCII"), if you 
prefer.

So why not replace '%s' with '%a' for the ascii case and
with '%b' for directly inserting bytes.


Because %a and %b don't exist in Python 2.7?


I thought this was about 3.5, not 2.7 ;)
'%s' can't work in 3.5, as we must differentiate between
strings which meed to be encoded and bytes which don't.

That way, the encoding is explicit.


The encoding is already explicit.  If it is bytes encoded from str, that transformation 
had an explicit encoding.  If it is "%s" % str(...), then there is no encoding, 
but rather a transformation into
an ASCII representation of the Unicode code points, using escape sequences. 
Which isn't likely to be what they want, but see the parameter tweak above.

I think it is vital that the encoding is explicit in all cases where
bytes <-> str conversion occurs.


Since it is explicit, you have no concerns in this area.


Regarding the concern about implicit use of ASCII by certain bytes methods and 
proposed interpolations, I'm curious how many standard encodings exist that do 
not have an ASCII subset. I can enumerate
a starting list, but if there are others in actual use, I'm unaware of them.

EBCDIC
UTF-16 BE & LE
UTF-32 BE & LE

Wikipedia: The vast majority of code pages in current use are supersets of ASCII 
<http://en.wikipedia.org/wiki/ASCII>, a 7-bit code representing 128 control 
codes and printable characters.


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/mark%40hotpy.org

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 460 reboot

Reply via email to