03.11.21 15:14, Stephen J. Turnbull пише:
> So the only
> time that wouldn't be true is if escape sequences are allowed to
> represent characters. I believe unicode_escape is the only codec
> that does.
Also raw_unicode_escape and utf_7. And maybe punycode or idna, I am not
sure.
Chris Angelico writes:
> Ah, okay, so much for that, then. What about the weaker sense:
> Characters below 128 are always and only represented by those byte
> values? So if you find byte value 39, it might not actually be an
> apostrophe, but if you're looking for an apostrophe, you know for s
On 03. 11. 21 12:33, Serhiy Storchaka wrote:
03.11.21 12:36, Petr Viktorin пише:
On 03. 11. 21 2:58, Kyle Stanley wrote:
I'd suggest both: briefer, easier to read write up for average user in
docs, more details/semantics in informational PEP. Thanks for working
on this, Petr!
Well, this is
On 03. 11. 21 12:37, Chris Angelico wrote:
On Wed, Nov 3, 2021 at 10:22 PM Steven D'Aprano wrote:
On Wed, Nov 03, 2021 at 11:21:53AM +1100, Chris Angelico wrote:
TBH, I'm not entirely sure how valid it is to talk about *security*
considerations when we're dealing with Python source code and
On Wed, Nov 3, 2021 at 10:22 PM Steven D'Aprano wrote:
>
> On Wed, Nov 03, 2021 at 11:21:53AM +1100, Chris Angelico wrote:
>
> > TBH, I'm not entirely sure how valid it is to talk about *security*
> > considerations when we're dealing with Python source code and variable
> > confusions, but that's
03.11.21 12:36, Petr Viktorin пише:
> On 03. 11. 21 2:58, Kyle Stanley wrote:
>> I'd suggest both: briefer, easier to read write up for average user in
>> docs, more details/semantics in informational PEP. Thanks for working
>> on this, Petr!
>
> Well, this is the brief write-up :)
> Maybe it woul
On Wed, Nov 03, 2021 at 11:11:00AM +0100, Marc-Andre Lemburg wrote:
> Coming back to the thread topic, many of the Unicode security
> considerations don't apply to non-Unicode encodings, since those
> usually don't support e.g. changing the bidi direction within a
> stream of text or other interes
On Wed, Nov 03, 2021 at 11:21:53AM +1100, Chris Angelico wrote:
> TBH, I'm not entirely sure how valid it is to talk about *security*
> considerations when we're dealing with Python source code and variable
> confusions, but that's a term that is well understood.
It's not like Unicode is the only
On 03. 11. 21 2:58, Kyle Stanley wrote:
I'd suggest both: briefer, easier to read write up for average user in
docs, more details/semantics in informational PEP. Thanks for working on
this, Petr!
Well, this is the brief write-up :)
Maybe it would work better if the info was integrated into th
On Wed, 3 Nov 2021 at 10:11, Marc-Andre Lemburg wrote:
> I don't think limiting the source code encoding is the right approach
> to making code more secure. Instead, tooling has to be used to detect
> potentially malicious code points in code.
+1
Discussing "making code more secure" without bein
On 03.11.2021 01:21, Chris Angelico wrote:
> On Wed, Nov 3, 2021 at 11:09 AM Steven D'Aprano wrote:
>>
>> On Wed, Nov 03, 2021 at 03:03:54AM +1100, Chris Angelico wrote:
>>> On Wed, Nov 3, 2021 at 1:06 AM Petr Viktorin wrote:
Let me know if it's clear in the newest version, with this note:
>
On Wed, Nov 3, 2021 at 8:01 PM Stephen J. Turnbull
wrote:
>
> Chris Angelico writes:
>
> > But I was surprised to find that Python would let you use
> > unicode_escape for source code.
>
> I'm not surprised. Today it's probably not necessary, but I've
> exchanged a lot of code (not Python, thou
03.11.21 11:01, Stephen J. Turnbull пише:
> And of
> course UTF-16 is incompatible in that sense, although I don't know if
> anybody actually saves Python code in UTF-16.
CPython does not currently support UTF-16 for source files.
___
Python-Dev mailin
Chris Angelico writes:
> But I was surprised to find that Python would let you use
> unicode_escape for source code.
I'm not surprised. Today it's probably not necessary, but I've
exchanged a lot of code (not Python, though) with folks whose editors
were limited to 8 bit codes or even just ASC
On Wed, Nov 3, 2021 at 5:12 PM Stephen J. Turnbull
wrote:
>
> Chris Angelico writes:
>
> > Huh. Is that level of generality actually still needed? Can Python
> > deprecate all but a small handful of encodings?
>
> I think that's pointless. With few exceptions (GB18030, Big5 has a
> couple of co
Chris Angelico writes:
> Huh. Is that level of generality actually still needed? Can Python
> deprecate all but a small handful of encodings?
I think that's pointless. With few exceptions (GB18030, Big5 has a
couple of code point pairs that encode the same very rare characters,
ISO 2022 extens
Serhiy Storchaka writes:
> This is excellent!
>
> 01.11.21 14:17, Petr Viktorin пише:
> >> CPython treats the control character NUL (``\0``) as end of input,
> >> but many editors simply skip it, possibly showing code that Python
> >> will not
> >> run as a regular part of a file.
>
> It
Chris Angelico wrote:
> I'm not sure how a linter would stop
> someone from publishing code on PyPI that causes confusion by its
> character encoding, for instance.
If it becomes important, the cheeseshop backend can run various validations
(including a linter) on submissions, and include those r
I'd suggest both: briefer, easier to read write up for average user in
docs, more details/semantics in informational PEP. Thanks for working on
this, Petr!
On Tue, Nov 2, 2021 at 2:07 PM David Mertz, Ph.D.
wrote:
> This is an amazing document, Petr. Really great work!
>
> I think I agree with Ma
On Wed, Nov 3, 2021 at 11:09 AM Steven D'Aprano wrote:
>
> On Wed, Nov 03, 2021 at 03:03:54AM +1100, Chris Angelico wrote:
> > On Wed, Nov 3, 2021 at 1:06 AM Petr Viktorin wrote:
> > > Let me know if it's clear in the newest version, with this note:
> > >
> > > > Here, ``encoding: unicode_escape`
On Wed, Nov 03, 2021 at 03:03:54AM +1100, Chris Angelico wrote:
> On Wed, Nov 3, 2021 at 1:06 AM Petr Viktorin wrote:
> > Let me know if it's clear in the newest version, with this note:
> >
> > > Here, ``encoding: unicode_escape`` in the initial comment is an encoding
> > > declaration. The ``uni
On 11/2/2021 1:02 PM, Marc-Andre Lemburg wrote:
On 01.11.2021 13:17, Petr Viktorin wrote:
PEP:
Title: Unicode Security Considerations for Python
Author: Petr Viktorin
Status: Active
Type: Informational
Content-Type: text/x-rst
Created: 01-Nov-2021
Post-History:
Thanks for writing this up
On Wed, Nov 3, 2021 at 5:07 AM David Mertz, Ph.D. wrote:
>
> This is an amazing document, Petr. Really great work!
>
> I think I agree with Marc-André that putting it in the actual Python
> documentation would give it more visibility than in a PEP.
>
There are quite a few other PEPs that have si
This is an amazing document, Petr. Really great work!
I think I agree with Marc-André that putting it in the actual Python
documentation would give it more visibility than in a PEP.
On Tue, Nov 2, 2021, 1:06 PM Marc-Andre Lemburg wrote:
> On 01.11.2021 13:17, Petr Viktorin wrote:
> >> PEP:
On 01.11.2021 13:17, Petr Viktorin wrote:
>> PEP:
>> Title: Unicode Security Considerations for Python
>> Author: Petr Viktorin
>> Status: Active
>> Type: Informational
>> Content-Type: text/x-rst
>> Created: 01-Nov-2021
>> Post-History:
Thanks for writing this up. I'm not sure whether a PEP
On Wed, Nov 3, 2021 at 1:06 AM Petr Viktorin wrote:
> Let me know if it's clear in the newest version, with this note:
>
> > Here, ``encoding: unicode_escape`` in the initial comment is an encoding
> > declaration. The ``unicode_escape`` encoding instructs Python to treat
> > ``\u0027`` as a singl
On 01. 11. 21 13:17, Petr Viktorin wrote:
Hello,
Today, an attack called "Trojan source" was revealed, where a malicious
contributor can use Unicode features (left-to-right text and homoglyphs)
to code that, when shown in an editor, will look different from how a
computer language parser will
On Mon, Nov 01, 2021 at 11:41:06AM -0700, Toshio Kuratomi wrote:
> Unicode specifies the mapping of glyphs to code points. Then a second
> mapping from code points to sequences of bytes is what is actually
> recorded by the computer. The second mapping is what programmers
> using Python will com
On 11/1/2021 8:17 AM, Petr Viktorin wrote:
Nevertheless, I did do a bit of research about similar gotchas in
Python, and I'd like to publish a summary as an informational PEP,
pasted below.
Very helpful.
Bidirectional Text
--
Some scripts, such as Hebrew or Arabic, are writ
"The East Asian symbol for *ten* looks like a plus sign, so ``十= 10`` is a
complete Python statement."
Normally, an identifier must begin with a letter, and numbers can only be used
in the second and subsequent positions. (XID_CONTINUE instead of XID_START)
The fact that some characters with
This is an excellent enumeration of some of the concerns!
One minor comment about the introductory material:
On Mon, Nov 1, 2021 at 5:21 AM Petr Viktorin wrote:
> >
> > Introduction
> >
> >
> > Python code is written in `Unicode`_ – a system for encoding and
> > handling all kinds
This is excellent!
01.11.21 14:17, Petr Viktorin пише:
>> CPython treats the control character NUL (``\0``) as end of input,
>> but many editors simply skip it, possibly showing code that Python
>> will not
>> run as a regular part of a file.
It is an implementation detail and we will get rid of
Thanks for writing this Petr!
A few comments below.
On Mon, Nov 01, 2021 at 01:17:02PM +0100, Petr Viktorin wrote:
> >ASCII-only Considerations
> >-
> >
> >ASCII is a subset of Unicode
> >
> >While issues with the ASCII character set are generally well understood,
> >the'
33 matches
Mail list logo