date:20211115

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Stephen J. Turnbull

Christopher Barker writes:

> Would a proposal to switch the normalization to NFC only have any hope of
> being accepted?

Hope, yes. Counting you, it's been proposed twice. :-) I don't know
whether it would get through. We know this won't affect the stdlib,
since that's restricted to ASCII. I suppose we could trawl PyPI and
GitHub for "compatibles" (the Unicode term for "K" normalizations).

> For example, in writing math we often use different scripts to mean
> different things (e.g. TeX's Blackboard Bold). So if I were to use
> some of the Unicode Mathematical Alphanumeric Symbols, I wouldn't
> want them to get normalized.

Independent of the question of the normalization of Python
identifiers, I think using those characters this way is a bad idea.
In fact, I think adding these symbols to Unicode was a bad idea; they
should be handled at a higher level in the linguistic stack (by
semantic markup).

You're confusing two things here. In Unicode, a script is a
collection of characters used for a specific language, typically a set
of Unicode blocks of characters (more or less; there are a lot of Han
ideographs that are recognizable as such to Japanese but are not part
of the repertoire of the Japanese script). That is, these characters
are *different* from others that look like them.

Blackboard Bold is more what we would usually call a "font": the
(math) italic "x" and the (math) bold italic "x" are the same "x", but
one denotes a scalar and the other a vector in many math books. A
roman "R" probably denotes the statistical application, an italic "R"
the reaction function in game theory model, and a Blackboard Bold "R"
the set of real numbers. But these are all the same character.

It's a bad idea to rely on different (Unicode) scripts that use the
same glyphs for different characters to look different from each
other, unless you "own" the fonts to be used. As far as I know
there's no way for a Python program to specify the font to be used to
display itself though. :-)

It's also a UX problem. At slightly higher layer in the stack, I'm
used to using Japanese input methods to input sigma and pi which
produce characters in the Greek block, and at least the upper case
forms that denote sum and product have separate characters in the math
operators block. I understand why people who literally write
mathematics in Greek might want those not normalized, but I sure am
going to keep using "Greek sigma", not "math sigma"! The probability
that I'm going to have a Greek uppercase sigma in my papers is nil,
the probability of a summation symbol near unity. But the summation
symbol is not easily available, I have to scroll through all the
preceding Unicode blocks to find Mathematical Operators. So I am
perfectly happy with uppercase Greek sigma for that role (as is
XeTeX!!)

And the thing is, of course those Greek letters really are Greek
letters: they were chosen because pi is the homophone of p which is
the first letter of "product", and sigma is the homophone of s which
is the first letter of "sum". Å for Ångström is similar, it's the
initial letter of a Swedish name.

Sure, we could fix the input methods (and search methods!! -- people
are going to input the character they know that corresponds to the
glyph *they* see, not the bit pattern the *CPU* sees). But that's as
bad as trying to fix mail clients. Not worth the effort because I'm
pretty sure you're gonna fail -- it's one of those "you'll have to pry
this crappy software that annoys admins around the world from my cold
dead fingers" issues, which is why their devs refuse to fix them.

Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/5GHPVNJLLOKBYPE7FSU5766XYP6IJPEK/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Abdur-Rahmaan Janhangeer

Well,

Yet another issue is adding vulnerabilities in plain sight.

Human code reviewers will see this:

if user.admin == "something":

Static analysers will see

if user.admin == "something":

but will not flag it as it's up to the user to verify the logic of  things

and as such soft authors can plant backdoors in plain sight

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IS2AWOSUNMHUXN6M4WPWT5QUTQFNNBZI/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Kyle Stanley

On Sat, Nov 13, 2021 at 5:04 PM  wrote:

>
>
> def 𝚑𝓮𝖑𝒍𝑜():
>
> try:
>
> 𝔥e𝗅𝕝𝚘︴ = "Hello"
>
> 𝕨𝔬r𝓵ᵈ﹎ = "World"
>
> ᵖ𝖗𝐢𝘯𝓽(f"{𝗵ｅ𝓵𝔩º_}, {𝖜ₒ𝒓lⅆ︴}!")
>
> except 𝓣𝕪ᵖｅ𝖤𝗿ᵣ𝖔𝚛 as ⅇ𝗑c:
>
> 𝒑rℹₙₜ("failed: {}".𝕗𝗼ʳᵐªｔ(ᵉ𝐱𝓬))
>
>
>
> if _︴ⁿ𝓪𝑚𝕖__ == "__main__":
>
> 𝒉eℓˡ𝗈()
>
>
>
>
>
> # snippet from unittest/util.py
>
> _𝓟Ⅼ𝖠𝙲𝗘ℋ𝒪Lᴰ𝑬𝕽﹏𝕷𝔼𝗡 = 12
>
> def _𝔰ʰ𝓸ʳ𝕥𝙚𝑛(𝔰, p𝑟𝔢ﬁ𝖝𝕝𝚎𝑛, ｓᵤ𝑓𝗳𝗂𝑥𝗹ₑ𝚗):
>
> ˢ𝗸ｉ𝗽 = 𝐥ｅ𝘯(𝖘) - ｐr𝚎𝖋𝐢x𝗅ᵉ𝓷 - 𝒔𝙪ﬀｉ𝘅𝗹𝙚ₙ
>
> if sｋi𝘱 > _𝐏𝗟𝖠𝘊𝙴H𝕺Ｌ𝕯𝙀𝘙﹏L𝔈𝒩:
>
> 𝘴 = '%s[%d chars]%s' % (𝙨[:𝘱𝐫𝕖𝑓𝕚ｘℓ𝒆𝕟], ₛ𝚔𝒊p, 𝓼[𝓁𝒆𝖓(
> 𝚜) - 𝙨𝚞𝒇ﬁx𝙡ᵉ𝘯:])
>
> return ₛ
>

0_o color me impressed, I did not think that would be legal syntax. Would
be interesting to include in a textbook, if for nothing else other than to
academically demonstrate that it is possible, as I suspect many are not
aware.

-- 
--Kyle R. Stanley, Python Core Developer (what is a core dev?
)
*Pronouns: they/them **(why is my pronoun here?*

)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4ZA2KK46JHENKSPB52RMXBAQT7CP3Q6A/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Petr Viktorin

On 15. 11. 21 9:25, Stephen J. Turnbull wrote:

Christopher Barker writes:

> Would a proposal to switch the normalization to NFC only have any hope of
> being accepted?

I don't think PyPI/GitHub are good resources to trawl.

Non-ASCII identifiers were added for the benefit of people who use
non-English languages. But both on PyPI and GitHub are overwhelmingly
projects written in English -- especially if you look at the more
popular projects.
It would be interesting to reach out to the target audience here... but
they're not on this list, either. Do we actually know anyone using this?

I do teach beginners in a non-English language, but tell them that they
need to learn English if they want to do any serious programming. Any
code that's to be shared more widely than a country effectively has to
be in English. It seems to me that at the level where you worry about
supply chain attacks and you're doing code audits, something like
CPython's policy (ASCII only except proper names and Unicode-related
tests) is a good idea.
Or not? I don't know anyone who actually uses non-ASCII identifiers for
a serious project.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/AVCLMBIXWPNIIKRFMGTS5SETUCGAONLK/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Steven D'Aprano

On Mon, Nov 15, 2021 at 12:33:54PM +0400, Abdur-Rahmaan Janhangeer wrote:

> Yet another issue is adding vulnerabilities in plain sight.
> Human code reviewers will see this:
> 
> if user.admin == "something":
> 
> Static analysers will see
> 
> if user.admin == "something":

Okay, you have a string literal with hidden characters. Assuming that 
your editor actually renders them as invisible characters, rather than 
"something???" or "something□□□" or "something���" or equivalent.

Now what happens? where do you go from there to a vunerability or 
backdoor? I think it might be a bit obvious that there is something 
funny going on if I see:

if (user.admin == "root" and check_password_securely() 
or user.admin == "root"
# Second string has hidden characters, do not remove it.
):
elevate_privileges()

even without the comment :-)

In another thread, Serhiy already suggested we ban invisible control 
characters (other than whitespace) in comments and strings.

https://mail.python.org/archives/list/python-dev@python.org/message/DN24FK3A2DSO4HBGEDGJXERSAUYK6VK6/

I think that is a good idea.

But beyond the C0 and C1 control characters, we should be conservative 
about banning "hidden characters" without a *concrete* threat. For 
example, variation selectors are "hidden", but they change the visual 
look of emoji and other characters. Even if you think that being able to 
set the skin tone of your emoji or choose different national flags using 
variation selectors is pure frippery, they are also necessary for 
Mongolian and some CJK ideographs.

http://unicode.org/reports/tr28/tr28-3.html#13_7_variation_selectors

I'm not sure about bidirectional controls; I have to leave that to 
people with more experience in bidirectional text than I do. I think 
that many editors in common use don't support bidirectional text, or at 
least the ones I use don't seem to support it fully or correctly. But 
for what little it is worth, my feeling is that people who use RTL or 
bidirectional strings and have editors that support them will be annoyed 
if we ban them from strings for the comfort of people who may never in 
their life come across a string containing such bidirectional text.

But, if there is a concrete threat beyond "it looks weird", that it 
another issue.

> but will not flag it as it's up to the user to verify the logic of 
> things

There is no reason why linters and code checkers shouldn't check for 
invisible characters, Unicode confusables or mixed script identifiers 
and flag them. The interpreter shouldn't concern itself with such purely 
stylistic issues unless there is a concrete threat that can only be 
handled by the interpreter itself.

-- 
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KSIBL3KMONIETBKXSBPPMA27MACWIH33/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Abdur-Rahmaan Janhangeer

Greetings,

> Now what happens? where do you go from there to a vunerability or
backdoor? I think it might be a bit obvious that there is something
funny going on if I see:

if (user.admin == "root" and check_password_securely()
or user.admin == "root"
# Second string has hidden characters, do not remove it.
):
elevate_privileges()

Well, it's not so obvious. From Ross Anderson and Nicholas Boucher
src: https://trojansource.codes/trojan-source.pdf

See appendix H. for Python.

with implementations:

https://github.com/nickboucher/trojan-source/tree/main/Python

Rely precisely on bidirectional control chars and/or replacing look alikes

> There is no reason why linters and code checkers shouldn't check for
invisible characters, Unicode confusables or mixed script identifiers
and flag them. The interpreter shouldn't concern itself with such purely
stylistic issues unless there is a concrete threat that can only be
handled by the interpreter itself.

I mean current linters. But it will be good to check those for sure.
As a programmer, i don't want a language which bans unicode stuffs.
If there's something that should be fixed, it's the unicode standard, maybe
defining a sane mode where weird unicode stuffs are not allowed. Can also
be from language side in the event where it's not being considered in the
standard
itself.

I don't see it as a language fault nor as a client fault as they are
considering
the unicode docs but the response was mixed with some languages decided to
patch it
from their side, some linters implementing detection for it as well as some
editors flagging
it and rendering it as the exploit intended.

Kind Regards,

Abdur-Rahmaan Janhangeer
about | blog

github
Mauritius
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/I43MI4QYEERGEKX6YX6NCHCZTUAFWY4X/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Steven D'Aprano

On Sun, Nov 14, 2021 at 10:12:39PM -0800, Christopher Barker wrote:

> I am, however, surprised and disappointed by the NKFC normalization.
> 
> For example, in writing math we often use different scripts to mean 
> different things (e.g. TeX's Blackboard Bold). So if I were to use 
> some of the Unicode Mathematical Alphanumeric Symbols, I wouldn't want 
> them to get normalized.

Hmmm... would you really want these to all be different identifiers?

𝕭 𝓑 𝑩 𝐁 B

You're assuming the reader of the code has the right typeface to view 
them (rather than as mere boxes), and that their eyesight is good enough 
to distinguish the variations even if their editor applies bold or 
italic as part of syntax highlighting. That's very bold of you :-)

In any case, the question of NFKC versus NFC was certainly considered, 
but unfortunately PEP 3131 doesn't document why NFKC was chosen.

https://www.python.org/dev/peps/pep-3131/

Before we change the normalisation rules, it would probably be a good 
idea to trawl through the archives of the mailing list and work out why 
NFKC was chosen in the first place, or contact Martin von Löwis and see 
if he remembers.

> Then there's the question of when this normalization happens (and when it
> doesn't). If one is doing any kind of metaprogramming, even just using
> getattr() and setattr(), things could get very confusing:

For ordinary identifiers, they are normalised at some point during 
compilation or interpretation. It probably doesn't matter exactly when.

Strings should *not* be normalised when using subscripting on a dict, 
not even on globals():

https://bugs.python.org/issue42680

I'm not sure about setattr and getattr. I think that they should be 
normalised. But apparently they aren't:

>>> from types import SimpleNamespace
>>> obj = SimpleNamespace(B=1)
>>> setattr(obj, '𝕭', 2)
>>> obj
namespace(B=1, 𝕭=2)
>>> obj.B
1
>>> obj.𝕭
1

See also here:

https://bugs.python.org/issue35105

-- 
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7XZJPFED3YJSJ73YSPWCQPN6NLTNEMBI/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Chris Angelico

On Mon, Nov 15, 2021 at 10:22 PM Abdur-Rahmaan Janhangeer
 wrote:
>
> Greetings,
>
>
> > Now what happens? where do you go from there to a vunerability or
> backdoor? I think it might be a bit obvious that there is something
> funny going on if I see:
>
> if (user.admin == "root" and check_password_securely()
> or user.admin == "root"
> # Second string has hidden characters, do not remove it.
> ):
> elevate_privileges()
>
>
> Well, it's not so obvious. From Ross Anderson and Nicholas Boucher
> src: https://trojansource.codes/trojan-source.pdf
>
> See appendix H. for Python.
>
> with implementations:
>
> https://github.com/nickboucher/trojan-source/tree/main/Python
>
> Rely precisely on bidirectional control chars and/or replacing look alikes

The point of those kinds of attacks is that syntax highlighters and
related code review tools would misinterpret them. So I pulled them
all up in both GitHub's view and the editor I personally use (SciTE,
albeit a fairly old version now). GitHub specifically flags it as a
possible exploit in a couple of cases, but also syntax highlights the
return keyword appropriately. SciTE doesn't give any sort of warnings,
but again, correctly highlights the code - early-return shows "return"
as a keyword, invisible-function shows the name "is_" as the function
name and the rest not, homoglyph-function shows a quite
distinct-looking letter that definitely isn't an H.

The problems here are not Python's, they are code reviewers', and that
means they're really attacks against the code review tools. It's no
different from using the variable m in one place and rn in another,
and hoping that code review uses a proportionally-spaced font that
makes those look similar. So to count as a viable attack, there needs
to be at least one tool that misparses these; so far, I haven't found
one, but if I do, wouldn't it be more appropriate to raise the bug
report against the tool?

> > There is no reason why linters and code checkers shouldn't check for
> invisible characters, Unicode confusables or mixed script identifiers
> and flag them. The interpreter shouldn't concern itself with such purely
> stylistic issues unless there is a concrete threat that can only be
> handled by the interpreter itself.
>
>
> I mean current linters. But it will be good to check those for sure.
> As a programmer, i don't want a language which bans unicode stuffs.
> If there's something that should be fixed, it's the unicode standard, maybe
> defining a sane mode where weird unicode stuffs are not allowed. Can also
> be from language side in the event where it's not being considered in the 
> standard
> itself.

Uhhm. "weird unicode stuffs"? Please clarify.

> I don't see it as a language fault nor as a client fault as they are 
> considering
> the unicode docs but the response was mixed with some languages decided to 
> patch it
> from their side, some linters implementing detection for it as well as some 
> editors flagging
> it and rendering it as the exploit intended.

I see it as an editor issue (or code review tool, as the case may be).
You'd be hard-pressed to get something past code review if it looks to
everyone else like you slipped a "return" statement at the end of a
docstring.

So far, I've seen fewer problems from "weird unicode stuffs" than from
the quoted-printable encoding, and that's an attack that involves
nothing but ASCII text. It's also an attack that far more code review
tools seem to be vulnerable to.

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OUPC6LGFXIILBTNEC4FYTERBX7VKQHDX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Marc-Andre Lemburg

On 15.11.2021 12:36, Steven D'Aprano wrote:
> On Sun, Nov 14, 2021 at 10:12:39PM -0800, Christopher Barker wrote:
> 
>> I am, however, surprised and disappointed by the NKFC normalization.
>>
>> For example, in writing math we often use different scripts to mean 
>> different things (e.g. TeX's Blackboard Bold). So if I were to use 
>> some of the Unicode Mathematical Alphanumeric Symbols, I wouldn't want 
>> them to get normalized.
> 
> Hmmm... would you really want these to all be different identifiers?
> 
> 𝕭 𝓑 𝑩 𝐁 B
> 
> You're assuming the reader of the code has the right typeface to view 
> them (rather than as mere boxes), and that their eyesight is good enough 
> to distinguish the variations even if their editor applies bold or 
> italic as part of syntax highlighting. That's very bold of you :-)
> 
> In any case, the question of NFKC versus NFC was certainly considered, 
> but unfortunately PEP 3131 doesn't document why NFKC was chosen.
> 
> https://www.python.org/dev/peps/pep-3131/
> 
> Before we change the normalisation rules, it would probably be a good 
> idea to trawl through the archives of the mailing list and work out why 
> NFKC was chosen in the first place, or contact Martin von Löwis and see 
> if he remembers.

This was raised in the discussion, but never conclusively answered:

https://mail.python.org/pipermail/python-3000/2007-May/007995.html

NFKC is the standard normalization form when you want remove any
typography related variants/hints from the text before comparing
strings. See http://www.unicode.org/reports/tr15/

I guess that's why Martin chose this form, since the point
was to maintain readability, even if different variants of a
character are used in the source code. A "B" in the source code
should be interpreted as an ASCII B, even when written
as 𝕭 𝓑 𝑩 or 𝐁.

This simplifies writing code and does away with many of the
security issues you could otherwise run into (where e.g. the
absence of an identifier causes the application flow to
be different).

>> Then there's the question of when this normalization happens (and when it
>> doesn't).

It happens in the parser when reading a non-ASCII identifier
(see Parser/pegen.c), so only applies to source code, not attributes
you dynamically add to e.g. class or module namespaces.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Nov 15 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SNN2WZ3MOH5IACSZVHGS6DKTNMKO5JBV/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: The Steering Council elections.

2021-11-15 Thread Thomas Wouters

Just a reminder that the nomination period for the next SC ends *today*
(AoE), so if you're intending to nominate (yourself or someone else),
please get those posts in. (No need for a long post before the deadline, it
can be expanded later.) We currently have the 4 incumbents, and nobody
else, so please consider who you would like to see on the SC.

On Wed, Nov 10, 2021 at 1:40 PM Thomas Wouters  wrote:

> (Not sending this out as a SC member, just as myself.)
>
> Because we're half-way through the nomination period, I want to remind
> people about the Steering Council elections, and the fact that *you do
> not have to be a Core Developer to be nominated*, just to nominate
> someone (including yourself). If you know someone who you think would be a
> good person to have on the Steering Council, Core Developer or not, talk to
> them. Offer to nominate them, if you're a Core Developer, or talk to a Core
> Developer to do it.
>
> For more information, see PEP 13 (Python Language Governance)
> , PEP 8103 (2022 Term steering
> council election) , and Ee's
> announcement
> 
>  on
> how to nominate.
>
> We still have about a week left, and I know there's several people who are
> planning to nominate, so I'm not worried about the number of nominations...
> But I would personally like a good, sizable pool of candidates, because
> it's much better for the health and longevity of the SC model. (I will also
> be nominating myself, I just haven't gotten around to making the actual
> post yet.)
>
> Electably-y'rs,
> --
> Thomas Wouters 
>
> Hi! I'm an email virus! Think twice before sending your email to help me
> spread!
>

-- 
Thomas Wouters 

Hi! I'm an email virus! Think twice before sending your email to help me
spread!
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5YFKDO7X7FEECB65NOYRKFLBE4QADW3F/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Stephen J. Turnbull

Abdur-Rahmaan Janhangeer writes:

 > As a programmer, i don't want a language which bans unicode stuffs.

But that's what Unicode says should be done (see below).

 > If there's something that should be fixed, it's the unicode standard,

Unicode is not going to get "fixed".  Most features are important for
some natural language or other.  One could argue that (for example)
math symbols that are adopted directly from some character repertoire
should not have been -- I did so elsewhere, although not terribly
seriously.

 > maybe defining a sane mode where weird unicode stuffs are not
 > allowed.

Unicode denies responsibility for that by permitting arbitrary
subsetting.  It does have a couple of (very broad) subsets predefined,
ie, the normalization formats.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/63FDIQQNJKCH7C3NMEN3ECRHTA7JHJ2W/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Terry Reedy

On 11/15/2021 5:45 AM, Steven D'Aprano wrote:

In another thread, Serhiy already suggested we ban invisible control
characters (other than whitespace) in comments and strings.

He said in string *literals*. One would put them in stromgs by using
visible escape sequences.

>>> '\033' is '\x1b' is '\u001b'
True

https://mail.python.org/archives/list/python-dev@python.org/message/DN24FK3A2DSO4HBGEDGJXERSAUYK6VK6/

I think that is a good idea.

If one is outputting terminal control sequences, making the escape char
visible is a good idea anyway. It would be easier if '\e' worked. (But
see below.)

But beyond the C0 and C1 control characters, we should be conservative
about banning "hidden characters" without a *concrete* threat. For
example, variation selectors are "hidden", but they change the visual
look of emoji and other characters.
I can imagine that a complete emoji point and click input method might
have one select the emoji and the variation and output the pair
together. An option to output the selection character as the
appropriate python-specific '\u' is unlikely, and even if there
were, who would know what it meant? Users would want the selected
variation visible if the editor supported such.

If terminal escape sequences were also selected by point and click, my
comment above would change.

--
Terry Jan Reedy

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/4IMXVQFZI3VDHA4D2YZD4KTBU7GSEFPW/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Abdur-Rahmaan Janhangeer

> GitHub specifically flags it as a
possible exploit in a couple of cases, but also syntax highlights the
return keyword appropriately.

My guess is that Github did patch it afterwards as the paper does list
Github
as vulnerable

> Uhhm. "weird unicode stuffs"? Please clarify.

Wriggly texts just because they appear different

Well, it's tool based but maybe compiler checks aka checks from
the language side is something that should be insisted upon too to
patch inconsistent checks across editors.

The reason i was saying it's related to encodings is that when languages
are impacted en masse, maybe it hints to a revision in the unicode standards
at the very least warnings. As Steven above even before i posted the paper
was hinting towards the vulnerability so maybe those in charge of the
unicode
standards should study and predict angles of attacks.

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UEF6RYKZLVOG2PPSGAMOLDEP6LPEG6UZ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: The Steering Council elections.

2021-11-15 Thread Kyle Stanley

On Mon, Nov 15, 2021 at 10:49 AM Thomas Wouters  wrote:

>
> Just a reminder that the nomination period for the next SC ends *today*
> (AoE), so if you're intending to nominate (yourself or someone else),
> please get those posts in. (No need for a long post before the deadline, it
> can be expanded later.) We currently have the 4 incumbents, and nobody
> else, so please consider who you would like to see on the SC.
>

Thanks for the reminder, Thomas. Although I have nobody to nominate or
consider myself to have the capacity and experience to self-nominate for
SC, it is great to see that the incumbents desire to have a wide pool of
applicants for the longevity of Python; rather than being on the SC
indefinitely.

Best Regards,
-- 
--Kyle R. Stanley, Python Core Developer (what is a core dev?
)
*Pronouns: they/them **(why is my pronoun here?*

)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PHBQHDHQJMJPT26EPJMLANIPWQ6W2GFM/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] [RELEASE] Python 3.9.9 hotfix release is now available

2021-11-15 Thread Łukasz Langa

Get it here: https://www.python.org/downloads/release/python-399/ 

Python 3.9.9 is the eighth maintenance release of the legacy 3.9 series. Python 
3.10 is now the latest feature release series of Python 3. Get the latest 
release of 3.10.x here .

3.9.9 was released out of schedule as a hotfix for an argparse regression in 
Python 3.9.8 which caused complex command-line tools to fail recognizing 
sub-commands properly. Details in BPO-45235 
. There are only three other bugfixes in 
this release compared to 3.9.8. See the changelog 
 for details on 
what changed.

Upgrading to 3.9.9 is highly recommended if you’re running Python 3.9.8.

The next Python 3.9 maintenance release will be 3.9.10, currently scheduled for 
2022-01-03.

 
We
 apologize for the inconvenience

…and still hope you’ll enjoy the new release!

Your friendly release team,
Ned Deily @nad 
Steve Dower @steve.dower 
Łukasz Langa @ambv 


signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DCLXETTTA3XHT22D5E2XL324A2LS2XGY/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Steven D'Aprano

On Mon, Nov 15, 2021 at 12:28:01PM -0500, Terry Reedy wrote:
> On 11/15/2021 5:45 AM, Steven D'Aprano wrote:
> 
> >In another thread, Serhiy already suggested we ban invisible control
> >characters (other than whitespace) in comments and strings.
> 
> He said in string *literals*.  One would put them in stromgs by using 
> visible escape sequences.

Thanks Terry for the clarification, of course I didn't mean to imply 
that we should ban control characters in strings completely. Only actual 
control characters embedded in string literals in the source, just as we 
already currently ban them outside of comments and strings.


-- 
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XCPSQYKOX4YXDIAACDLL3I5OYWFGFLD7/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Do we need to remove everything that's deprecated?

2021-11-15 Thread Brett Cannon

On Sun, Nov 14, 2021 at 3:01 PM Victor Stinner  wrote:

> On Sun, Nov 14, 2021 at 6:34 PM Eric V. Smith  wrote:
> > On second thought, I guess the existing policy already does this. Maybe
> > we should make it more than 2 versions for deprecations? I've written
> > libraries where I support 4 or 5 released versions. Although maybe I
> > should just trim that back.
>
> If I understood correctly, the problem is more for how long is the new
> way available?
>

I think Eric was suggesting more along the lines of PEP 387 saying that
deprecations should last as long as there is a supported version of Python
that *lacks* the deprecation. So for something that's deprecated in 3.10,
we wouldn't remove it until 3.10 is the oldest Python version we support.
That would be October 2025 when Python 3.9 reaches EOL and Python 3.13
comes out as at that point you could safely rely on the non-deprecated
solution across all supported Python versions (or if you want a full year
of overlap, October 2026 and Python 3.14).

I think the key point with that approach is if you wanted to maximize your
support across supported versions, this would mean there wouldn't be
transition code except when the SC approves of a shorter deprecation. So a
project would simply rely on the deprecated approach until they started
work towards Python 3.13, at which point they drop support for the
deprecated approach and cleanly switch over to the new approach as all
versions of Python at that point will support the new approach as well.

-Brett


>
> For example, if the new way is introduced in Python 3.6, the old way
> is deprecated is Python 3.8, can we remove the old way in Python 3.10?
> It means that the new way is available in 4 versions (3.6, 3.7, 3.8,
> 3.9), before the old way is removed. It means that it's possible to
> have a single code base (no test on the Python version and no feature
> test) for Python 3.6 and newer.
>
> More concrete examples:
>
> * the "U" open() flag was deprecated since Python 3.0, removed in
> Python 3.11: the flag was ignored since Python 3.0, code without "U"
> works on Python 3.0 and newer
>
> * collections.abc.MutableMapping exists since Python 3.3:
> collections.MutableMapping was deprecated in Python 3.3, removed in
> Python 3.10. Using collections.abc.MutableMapping works on Python 3.3
> and newer.
>
> * unittest: failIf() alias, deprecated since Python 2.7, was removed
> in Python 3.11: assertFalse() always worked.
>
> For these 3 changes, it's possible to keep support up to Python 3.3.
> Up to Python 3.0 if you add "try/except ImportError" for
> collections.abc.
>
> IMO it would help to have a six-like module to write code for the
> latest Python version, and keep support for old Python versions. For
> example, have hacks to be able to use collections.abc.MutableMapping
> on Python 3.2 and older (extreme example, who still care about Python
> older than 3.5 in 2021?).
>
> I wrote something like that for the C API, provide *new* C API
> functions to *old* Python versions:
> https://github.com/pythoncapi/pythoncapi_compat
>
> Victor
> --
> Night gathers, and now my watch begins. It shall not end until my death.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/WD6NLGVI5AXB3POKQHOUKZ5WUR2HBLV2/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QPWE4RNBYHN4XCDYNJ2APEQQWQJPEQNO/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Remove asyncore, asynchat and smtpd modules

2021-11-15 Thread Brett Cannon

On Fri, Nov 12, 2021 at 4:16 AM Victor Stinner  wrote:

> > > It was decided to start deprecating the asyncore, asynchat and smtpd
> > > modules in Python 3.6 released in 2016, 5 years ago. Python 3.10 emits
> > > DeprecationWarning.
> >
> > Wait, only Python 3.10?
> > According to the policy, the warning should be there for *at least* two
> > releases. (That's a minimum, for removing entire modules it might make
> > sense to give people even more time.)
>
> The PEP 387 says "Similarly a feature cannot be removed without notice
> between any two consecutive releases."
>
> It is the case here. The 3 modules are marked as deprecated for 4
> releases in the documentation: Python 3.6, 3.7, 3.9 and 3.10. Example:
> https://docs.python.org/3.6/library/asyncore.html
>

But have they been raising exceptions for two releases?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6VQPRSUR3XAOKQBGEIT6XA2XST6JDDFT/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Remove asyncore, asynchat and smtpd modules

2021-11-15 Thread Victor Stinner

On Tue, Nov 16, 2021 at 1:15 AM Brett Cannon  wrote:
> But have they been raising exceptions for two releases?

As I wrote previously, the DeprecationWarning warning is only emitted
at runtime since Python 3.10.

Since my PR got 5 approvals, I just merged it:
https://github.com/python/cpython/pull/29521

The asyncore, asynchat and smtpd modules are now removed in Python
3.11. You should now use asyncio and aiosmtpd instead.

Note: the binhex module has also been removed in Python 3.11. It
emitted a DeprecationWarning in Python 3.9 and 3.10.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LVFBHVNV3INGKVVRONVRIA3Q6JIYXMZM/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Steven D'Aprano

On Mon, Nov 15, 2021 at 03:20:26PM +0400, Abdur-Rahmaan Janhangeer wrote:

> Well, it's not so obvious. From Ross Anderson and Nicholas Boucher
> src: https://trojansource.codes/trojan-source.pdf

Thanks for the link. But it discusses a whole range of Unicode attacks,
and the specific attack you mentioned (Invisible Character Attacks) is
described in section D page 7 as "unlikely to work in practice".

As they say, compilers and interpreters in general already display
errors, or at least a warning, for invisible characters in code.

In addition, there is the difficulty that its not just enough to use
invisible characters to call a different function, you have to smuggle
in the hostile function that you actually want to call.

It does seem that the Trojan-Source attack listed in the paper is new,
but others (such as the homoglyph attacks that get most people's
attention) are neither new nor especially easy to actually exploit.
Unicode has been warning about it for many years. We discussed it in PEP
3131. This is not new, and not easy to exploit.

Perhaps that's why there are no, or very few, actual exploits of this in
the wild. Homoglyph attacks against user-names and URLs, absolutely, but
homoglyph attacks against source code are a different story.

Yes, you can cunningly have two classes like Α and A and the Python
interpreter will treat them as distinct, but you still have to smuggle
in your hostile code in Α (greek Alpha) without anyone noticing, and you
have to avoid anyone asking why you have two classes with the same name.
And that's the hard part.

We don't need Unicode for homoglyph attacks. func0 and funcO may look
identical, or nearly identical, but you still have to smuggle in your
hostile code into funcO without anyone noticing, and that's why there
are so few real-world homoglyph attacks.

Whereas the Trojan Source attacks using BIDI controls does seem to be
genuinely exploitable.

--
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/FSHGS4AOAGTWKSWAADZWH5L2GGBWHHXE/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Steven D'Aprano

On Mon, Nov 15, 2021 at 10:43:12PM +1100, Chris Angelico wrote:

> The problems here are not Python's, they are code reviewers', and that
> means they're really attacks against the code review tools.

I think that's a bit strong. Boucher and Anderson's paper describes
multiple kinds of vulnerabilities. At a fairly quick glance, the BIDI
attacks does seem to be a novel attack, and probably exploitable.

But unfortunately it seems to be the Unicode confusables or homoglyph
attack that seems to be getting most of the attention, and that's not
new, it is as old as ASCII, and not so easily exploitable. Being able to
have А (Cyrillic) Α (Greek alpha) and A (Latin) in the same code base
makes for a nice way to write obfuscated code, but it's *obviously*
obfuscated and not so easy to smuggle in hostile code.

Whereas the BIDI attacks do (apparently) make it easy to smuggle in
code: using invisible BIDI control codes, you can introduce source code
where the way the editor renders the code, and the way the coder reads
it, is different from the way the interpreter or compiler runs it.

That is, I think, new and exploitable: something that looks like a
comment is actually code that the interpreter runs, and something that
looks like code is actually a string or comment which is not executed,
but editors may syntax-colour it as if it were code.

Obviously we can mitigate against this by improving the editors (at the
very least, all editors should have a Show Invisible Characters option).
Linters and code checks should also flag problematic code containing
BIDI codes, or attacks against docstrings.

Beyond that, it is not clear to me what, if anything, we should do in
response to this new class of Trojan Source attacks, beyond documenting
it.

--
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/SXF2BG47UZTI7QM7GB3XCTGEV576UZOE/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-15 Thread Chris Angelico

On Tue, Nov 16, 2021 at 12:13 PM Steven D'Aprano  wrote:
>
> On Mon, Nov 15, 2021 at 10:43:12PM +1100, Chris Angelico wrote:
>
> > The problems here are not Python's, they are code reviewers', and that
> > means they're really attacks against the code review tools.
>
> I think that's a bit strong. Boucher and Anderson's paper describes
> multiple kinds of vulnerabilities. At a fairly quick glance, the BIDI
> attacks does seem to be a novel attack, and probably exploitable.

The BIDI attacks basically amount to making this:

def func():
"""This is a docstring"""; return

look like this:

def func():
"""This is a docstring; return"""

If you see something that looks like the second, but the word "return"
is syntax-highlighted as a keyword instead of part of the string, the
attack has failed. (Or if you ignore that, then your code review is
flawed, and you're letting malicious code in.) The attack depends for
its success on some human approving some piece of code that doesn't do
what they think it does, and that means it has to look like what it
doesn't do - which is an attack against what the code looks like,
since what it does is very well defined.

> Whereas the BIDI attacks do (apparently) make it easy to smuggle in
> code: using invisible BIDI control codes, you can introduce source code
> where the way the editor renders the code, and the way the coder reads
> it, is different from the way the interpreter or compiler runs it.

Right: the way the editor renders the code, that's the essential part.
That's why I consider this an attack against some editor (or set of
editors). When you find an editor that is vulnerable to this, file a
bug report against that editor.

The way the coder reads it will be heavily based upon the way the
editor colours it.

> That is, I think, new and exploitable: something that looks like a
> comment is actually code that the interpreter runs, and something that
> looks like code is actually a string or comment which is not executed,
> but editors may syntax-colour it as if it were code.

Right. Exactly my point: editors may syntax-colour it incorrectly.

That's why I consider this not an attack on the language, but on the
editor. As long as the editor parses it the exact same way that the
interpreter does, there isn't a problem.

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3X6K5YYBRATECDRTN57XNT3QNP2J6ZBG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: The Steering Council elections.

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: The Steering Council elections.

[Python-Dev] [RELEASE] Python 3.9.9 hotfix release is now available

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: Do we need to remove everything that's deprecated?

[Python-Dev] Re: Remove asyncore, asynchat and smtpd modules

[Python-Dev] Re: Remove asyncore, asynchat and smtpd modules

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

22 matches

Site Navigation

Mail list logo

Footer information