[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-03 Thread Serhiy Storchaka
03.11.21 15:14, Stephen J. Turnbull пише: > So the only > time that wouldn't be true is if escape sequences are allowed to > represent characters. I believe unicode_escape is the only codec > that does. Also raw_unicode_escape and utf_7. And maybe punycode or idna, I am not sure.

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-03 Thread Stephen J. Turnbull
Chris Angelico writes: > Ah, okay, so much for that, then. What about the weaker sense: > Characters below 128 are always and only represented by those byte > values? So if you find byte value 39, it might not actually be an > apostrophe, but if you're looking for an apostrophe, you know for s

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-03 Thread Petr Viktorin
On 03. 11. 21 12:33, Serhiy Storchaka wrote: 03.11.21 12:36, Petr Viktorin пише: On 03. 11. 21 2:58, Kyle Stanley wrote: I'd suggest both: briefer, easier to read write up for average user in docs, more details/semantics in informational PEP. Thanks for working on this, Petr! Well, this is

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-03 Thread Petr Viktorin
On 03. 11. 21 12:37, Chris Angelico wrote: On Wed, Nov 3, 2021 at 10:22 PM Steven D'Aprano wrote: On Wed, Nov 03, 2021 at 11:21:53AM +1100, Chris Angelico wrote: TBH, I'm not entirely sure how valid it is to talk about *security* considerations when we're dealing with Python source code and

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-03 Thread Chris Angelico
On Wed, Nov 3, 2021 at 10:22 PM Steven D'Aprano wrote: > > On Wed, Nov 03, 2021 at 11:21:53AM +1100, Chris Angelico wrote: > > > TBH, I'm not entirely sure how valid it is to talk about *security* > > considerations when we're dealing with Python source code and variable > > confusions, but that's

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-03 Thread Serhiy Storchaka
03.11.21 12:36, Petr Viktorin пише: > On 03. 11. 21 2:58, Kyle Stanley wrote: >> I'd suggest both: briefer, easier to read write up for average user in >> docs, more details/semantics in informational PEP. Thanks for working >> on this, Petr! > > Well, this is the brief write-up :) > Maybe it woul

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-03 Thread Steven D'Aprano
On Wed, Nov 03, 2021 at 11:11:00AM +0100, Marc-Andre Lemburg wrote: > Coming back to the thread topic, many of the Unicode security > considerations don't apply to non-Unicode encodings, since those > usually don't support e.g. changing the bidi direction within a > stream of text or other interes

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-03 Thread Steven D'Aprano
On Wed, Nov 03, 2021 at 11:21:53AM +1100, Chris Angelico wrote: > TBH, I'm not entirely sure how valid it is to talk about *security* > considerations when we're dealing with Python source code and variable > confusions, but that's a term that is well understood. It's not like Unicode is the only

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-03 Thread Petr Viktorin
On 03. 11. 21 2:58, Kyle Stanley wrote: I'd suggest both: briefer, easier to read write up for average user in docs, more details/semantics in informational PEP. Thanks for working on this, Petr! Well, this is the brief write-up :) Maybe it would work better if the info was integrated into th

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-03 Thread Paul Moore
On Wed, 3 Nov 2021 at 10:11, Marc-Andre Lemburg wrote: > I don't think limiting the source code encoding is the right approach > to making code more secure. Instead, tooling has to be used to detect > potentially malicious code points in code. +1 Discussing "making code more secure" without bein

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-03 Thread Marc-Andre Lemburg
On 03.11.2021 01:21, Chris Angelico wrote: > On Wed, Nov 3, 2021 at 11:09 AM Steven D'Aprano wrote: >> >> On Wed, Nov 03, 2021 at 03:03:54AM +1100, Chris Angelico wrote: >>> On Wed, Nov 3, 2021 at 1:06 AM Petr Viktorin wrote: Let me know if it's clear in the newest version, with this note: >

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-03 Thread Chris Angelico
On Wed, Nov 3, 2021 at 8:01 PM Stephen J. Turnbull wrote: > > Chris Angelico writes: > > > But I was surprised to find that Python would let you use > > unicode_escape for source code. > > I'm not surprised. Today it's probably not necessary, but I've > exchanged a lot of code (not Python, thou

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-03 Thread Serhiy Storchaka
03.11.21 11:01, Stephen J. Turnbull пише: > And of > course UTF-16 is incompatible in that sense, although I don't know if > anybody actually saves Python code in UTF-16. CPython does not currently support UTF-16 for source files. ___ Python-Dev mailin

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-03 Thread Stephen J. Turnbull
Chris Angelico writes: > But I was surprised to find that Python would let you use > unicode_escape for source code. I'm not surprised. Today it's probably not necessary, but I've exchanged a lot of code (not Python, though) with folks whose editors were limited to 8 bit codes or even just ASC

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Chris Angelico
On Wed, Nov 3, 2021 at 5:12 PM Stephen J. Turnbull wrote: > > Chris Angelico writes: > > > Huh. Is that level of generality actually still needed? Can Python > > deprecate all but a small handful of encodings? > > I think that's pointless. With few exceptions (GB18030, Big5 has a > couple of co

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Stephen J. Turnbull
Chris Angelico writes: > Huh. Is that level of generality actually still needed? Can Python > deprecate all but a small handful of encodings? I think that's pointless. With few exceptions (GB18030, Big5 has a couple of code point pairs that encode the same very rare characters, ISO 2022 extens

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Stephen J. Turnbull
Serhiy Storchaka writes: > This is excellent! > > 01.11.21 14:17, Petr Viktorin пише: > >> CPython treats the control character NUL (``\0``) as end of input, > >> but many editors simply skip it, possibly showing code that Python > >> will not > >> run as a regular part of a file. > > It

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Jim J. Jewett
Chris Angelico wrote: > I'm not sure how a linter would stop > someone from publishing code on PyPI that causes confusion by its > character encoding, for instance. If it becomes important, the cheeseshop backend can run various validations (including a linter) on submissions, and include those r

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Kyle Stanley
I'd suggest both: briefer, easier to read write up for average user in docs, more details/semantics in informational PEP. Thanks for working on this, Petr! On Tue, Nov 2, 2021 at 2:07 PM David Mertz, Ph.D. wrote: > This is an amazing document, Petr. Really great work! > > I think I agree with Ma

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Chris Angelico
On Wed, Nov 3, 2021 at 11:09 AM Steven D'Aprano wrote: > > On Wed, Nov 03, 2021 at 03:03:54AM +1100, Chris Angelico wrote: > > On Wed, Nov 3, 2021 at 1:06 AM Petr Viktorin wrote: > > > Let me know if it's clear in the newest version, with this note: > > > > > > > Here, ``encoding: unicode_escape`

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Steven D'Aprano
On Wed, Nov 03, 2021 at 03:03:54AM +1100, Chris Angelico wrote: > On Wed, Nov 3, 2021 at 1:06 AM Petr Viktorin wrote: > > Let me know if it's clear in the newest version, with this note: > > > > > Here, ``encoding: unicode_escape`` in the initial comment is an encoding > > > declaration. The ``uni

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Terry Reedy
On 11/2/2021 1:02 PM, Marc-Andre Lemburg wrote: On 01.11.2021 13:17, Petr Viktorin wrote: PEP: Title: Unicode Security Considerations for Python Author: Petr Viktorin Status: Active Type: Informational Content-Type: text/x-rst Created: 01-Nov-2021 Post-History: Thanks for writing this up

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Chris Angelico
On Wed, Nov 3, 2021 at 5:07 AM David Mertz, Ph.D. wrote: > > This is an amazing document, Petr. Really great work! > > I think I agree with Marc-André that putting it in the actual Python > documentation would give it more visibility than in a PEP. > There are quite a few other PEPs that have si

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread David Mertz, Ph.D.
This is an amazing document, Petr. Really great work! I think I agree with Marc-André that putting it in the actual Python documentation would give it more visibility than in a PEP. On Tue, Nov 2, 2021, 1:06 PM Marc-Andre Lemburg wrote: > On 01.11.2021 13:17, Petr Viktorin wrote: > >> PEP:

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Marc-Andre Lemburg
On 01.11.2021 13:17, Petr Viktorin wrote: >> PEP: >> Title: Unicode Security Considerations for Python >> Author: Petr Viktorin >> Status: Active >> Type: Informational >> Content-Type: text/x-rst >> Created: 01-Nov-2021 >> Post-History: Thanks for writing this up. I'm not sure whether a PEP

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Chris Angelico
On Wed, Nov 3, 2021 at 1:06 AM Petr Viktorin wrote: > Let me know if it's clear in the newest version, with this note: > > > Here, ``encoding: unicode_escape`` in the initial comment is an encoding > > declaration. The ``unicode_escape`` encoding instructs Python to treat > > ``\u0027`` as a singl

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-02 Thread Petr Viktorin
On 01. 11. 21 13:17, Petr Viktorin wrote: Hello, Today, an attack called "Trojan source" was revealed, where a malicious contributor can use Unicode features (left-to-right text and homoglyphs) to code that, when shown in an editor, will look different from how a computer language parser will

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-01 Thread Steven D'Aprano
On Mon, Nov 01, 2021 at 11:41:06AM -0700, Toshio Kuratomi wrote: > Unicode specifies the mapping of glyphs to code points. Then a second > mapping from code points to sequences of bytes is what is actually > recorded by the computer. The second mapping is what programmers > using Python will com

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-01 Thread Terry Reedy
On 11/1/2021 8:17 AM, Petr Viktorin wrote: Nevertheless, I did do a bit of research about similar gotchas in Python, and I'd like to publish a summary as an informational PEP, pasted below. Very helpful. Bidirectional Text -- Some scripts, such as Hebrew or Arabic, are writ

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-01 Thread Jim J. Jewett
"The East Asian symbol for *ten* looks like a plus sign, so ``十= 10`` is a complete Python statement." Normally, an identifier must begin with a letter, and numbers can only be used in the second and subsequent positions. (XID_CONTINUE instead of XID_START) The fact that some characters with

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-01 Thread Toshio Kuratomi
This is an excellent enumeration of some of the concerns! One minor comment about the introductory material: On Mon, Nov 1, 2021 at 5:21 AM Petr Viktorin wrote: > > > > Introduction > > > > > > Python code is written in `Unicode`_ – a system for encoding and > > handling all kinds

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-01 Thread Serhiy Storchaka
This is excellent! 01.11.21 14:17, Petr Viktorin пише: >> CPython treats the control character NUL (``\0``) as end of input, >> but many editors simply skip it, possibly showing code that Python >> will not >> run as a regular part of a file. It is an implementation detail and we will get rid of

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-01 Thread Steven D'Aprano
Thanks for writing this Petr! A few comments below. On Mon, Nov 01, 2021 at 01:17:02PM +0100, Petr Viktorin wrote: > >ASCII-only Considerations > >- > > > >ASCII is a subset of Unicode > > > >While issues with the ASCII character set are generally well understood, > >the'