> -----Original Message-----
> From: Victor Stinner [mailto:victor.stin...@gmail.com]
> Sent: 9. janúar 2014 13:51
> To: Kristján Valur Jónsson
> Cc: Antoine Pitrou; python-dev@python.org
> Subject: Re: [Python-Dev] Python3 "complexity"
> 
> 2014/1/9 Kristján Valur Jónsson <krist...@ccpgames.com>:
> > This definition is funny, because according to Wikipedia, it is a
> > "superset" of 8869-1 ( latin1)
> 
> Bytes 0x80..0x9f are unassigned in ISO/CEI 8859-1... but are assigned in
> (IANA's) ISO-8859-1.
> 
> Python implements the latter, ISO-8859-1.
> 
> Wikipedia says "This encoding is a superset of ISO 8859-1, but differs from
> the IANA's ISO-8859-1".
> 

Thanks.  That's entirely non-confusing :)
" ISO-8859-1 is the IANA preferred name for this standard when supplemented 
with the C0 and C1 control codes from ISO/IEC 6429."

So anyway, yes, Python's "latin1" encoding does cover the entire 256 range.  
But on windows we use cp1252 instead which does not,
but instead defines useful and common windows characters in many of the control 
caracters slots.
Hence the need for "surrogateescape" to be able to roundtrip characters.

Again, this is non-obvious, and knowing from my experience with cp1252, I had 
no way of guessing that the "subset", i.e. latin1, would indeed cover all the 
range.  Two things then I have learned since my initial foray into parsing 
ascii files with python3:  Surrogateescapes and "latin1 in python == IANA's 
ISO-8859-1 which does indeed define the whole 8 bit range".

K
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to