[issue9133] Invalid UTF8 Byte sequence not raising exception/being substituted

2014-03-12 Thread Julian Mehnle
Changes by Julian Mehnle : -- nosy: +jmehnle ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python

[issue9133] Invalid UTF8 Byte sequence not raising exception/being substituted

2010-06-30 Thread Ezio Melotti
Changes by Ezio Melotti : -- stage: -> committed/rejected ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http:

[issue9133] Invalid UTF8 Byte sequence not raising exception/being substituted

2010-06-30 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Ezio Melotti wrote: > > I think this can be closed as wontfix. Agreed. I've already closed the ticket. -- ___ Python tracker ___ _

[issue9133] Invalid UTF8 Byte sequence not raising exception/being substituted

2010-06-30 Thread Marc-Andre Lemburg
Changes by Marc-Andre Lemburg : -- resolution: -> wont fix status: pending -> closed ___ Python tracker ___ ___ Python-bugs-list maili

[issue9133] Invalid UTF8 Byte sequence not raising exception/being substituted

2010-06-30 Thread Ezio Melotti
Ezio Melotti added the comment: This is already fixed in Python 3. However I think that for backward compatibility reasons it can't be fixed in Python 2, where it is possible to encode and decode every codepoint to/from UTF-8. See also http://bugs.python.org/issue8271#msg102209 I think this

[issue9133] Invalid UTF8 Byte sequence not raising exception/being substituted

2010-06-30 Thread Mike Lewis
Mike Lewis added the comment: Sorry, meant to add this part to the quote from the rfc: This leads to different results for character numbers above 0x; the CESU-8 encoding of those characters is NOT valid UTF-8 -- ___ Python tracker

[issue9133] Invalid UTF8 Byte sequence not raising exception/being substituted

2010-06-30 Thread Mike Lewis
New submission from Mike Lewis : When I do codecs.encode(codecs.decode('\xed\xbc\xad', 'utf8'), 'utf8') its not throwing an exception. '\xed\xbc\xad' is an invalid UTF8 byte sequence. It maps to the value U+DF2D which is a "surrogate pair" it seems. http://tools.ietf.org/html/rfc3629#section-