[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes

2019-05-17 Thread Cheryl Sabella
Change by Cheryl Sabella : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___ __

[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes

2019-05-17 Thread miss-islington
Change by miss-islington : -- pull_requests: +13294 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://m

[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes

2019-05-08 Thread mbiggs
mbiggs added the comment: Ah sent a pull request but didn't realize that redshiftzero already had. Their PR looks good to me. Thanks for fixing this! -- ___ Python tracker

[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes

2019-05-08 Thread mbiggs
Change by mbiggs : -- pull_requests: +13102 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes

2019-05-06 Thread redshiftzero
Change by redshiftzero : -- keywords: +patch pull_requests: +13026 stage: needs patch -> patch review ___ Python tracker ___ ___ Pyt

[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes

2019-05-05 Thread Ezio Melotti
Change by Ezio Melotti : -- nosy: +ezio.melotti type: -> enhancement ___ Python tracker ___ ___ Python-bugs-list mailing list Unsub

[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes

2019-05-05 Thread Josh Rosenberg
Josh Rosenberg added the comment: Minor bikeshed: If updating the documentation, refer to U+ as "the null character" or "NUL", not "NULL". Using "NULL" allows for confusion with NULL pointers; "the null character" (the name used in the Unicode standard) or "NUL" (the official three lette

[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes

2019-05-04 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I agree that the documentation should be updated. Do you mind to create a pull request mbiggs? There are UTF-8 variants which guarantee that the encoded text has no zero bytes (see Modified UTF-8), but Python only provides the standard UTF-8 and UTF-8 wit

[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes

2019-05-04 Thread mbiggs
mbiggs added the comment: So a correct statement would be "A UTF-8 string is turned into a sequence of bytes that contains embedded zero bytes only where they represent the NULL character (U+)." I think it's important to correct this because the part about processing UTF-8 with C functi

[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes

2019-05-03 Thread Andrew Svetlov
Andrew Svetlov added the comment: This is right for 99.99% cases: utf8 doesn't encode any character except explicit zero with zero bytes. UTF-16 for example encodes 'a' as b'\xff\xfea\x00' -- nosy: +asvetlov ___ Python tracker

[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes

2019-05-03 Thread mbiggs
New submission from mbiggs : In the Unicode HOWTO: http://docs.python.org/3.3/howto/unicode.html It says the following: "UTF-8 has several convenient properties: (...) 2. A Unicode string is turned into a sequence of bytes containing no embedded zero bytes. This avoids byte-ordering issues, a