[Python-Dev] Re: Do we need to remove everything that's deprecated?

2021-11-13 Thread Stephen J. Turnbull
Victor Stinner writes:

 > In Python, usually, there is a better alternative.

As in life.

 > Do you have to repeat "You should check for DeprecationWarning in
 > your code" in every "What's New in Python X.Y?" document?

That probably doesn't hurt, but I doubt it does much good for anybody
except the bewildered first-day college intern who wonders why nobody
in the devops team follows best practices.  They'll learn to ignore
those lines pretty quick, too, I'm sure. ;-)

What I think would make a difference is a six-like tool for making
"easy changes" like substituting aliases and maybe marking other stuff
that requires human brains to make the right changes.

I'm not volunteering to do this, I don't even know that it's actually
feasible.  But I think that unless we're willing to bite that bullet,
it's going to be difficult to make much progress over the current
situation.  Deprecated code does normally more or less work, and often
it never gets close to dangerous behavior.  On the flip side, it often
can cause dangerous behavior, and you won't know if it does until you
do a thorough audit of your use case, which isn't going to happen
because it would take as much effort as replacing the deprecated
code.

I think we all see both sides, even if our own individual experience
leads us to want to change the current balance.  Unfortunately, as we
see here, some folks want more removals, some peole want less (or
none).


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JVR33XD7EIUIIMKOHIGFZQIXSWBZKS6D/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Do we need to remove everything that's deprecated?

2021-11-13 Thread Christopher Barker
On Sat, Nov 13, 2021 at 12:01 AM Stephen J. Turnbull

> What I think would make a difference is a six-like tool for making
> "easy changes" like substituting aliases and maybe marking other stuff
> that requires human brains to make the right changes.


I think a “2to3” like or “futurize” like tool is a better idea, but yes.

The real challenge with the 2-3 transition was that many of us needed to
keep the same code base running on both 2 and 3. But do we need to support
running the same code on 3.5 to 3.10? I don’t think so. If you can’t
upgrade Python to a supported version, you probably shouldn’t upgrade your
code or libraries.

Which is a thought — maybe the policy should be that we remove things when
the new way is usable in all supported versions of Python. So as of today (
if I’m correct) anything needed in 3.5 can be dropped.

I'm not volunteering to do this, I don't even know that it's actually
> feasible.


It’s clearly feasible— if the transition from 2 to 3 could be done, this is
easy :-)

Not that I’m volunteering either.

But maybe the folks that find updating deprecated features onerous might
want to do it (or already have — I haven’t looked)

 Deprecated code does normally more or less work, and often
> it never gets close to dangerous behavior.  On the flip side, it often
> can cause dangerous behavior,


I’m confused — did you mean “sometimes cause dangerous behavior”? That’s
pretty rare isn’t it?

-CHB


-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UPOHK3SKHZZHNBRCCHNZQ24AT5L252NW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Having Sorted Containers in stdlib?

2021-11-13 Thread Tim Peters
I started writing up a SortedDict use case I have, but it's very
elaborate and I expect it would just end with endless pointless
argument about other approaches I _could_ take. But I already know all
those ;-)

So let's look at something conceptually "dead easy" instead: priority
queues. They're a basic building block in algorithms from many fields.

In the distributed Python, `heapq` is the only thing that comes close
to being reasonably efficient and scalable for that.

However:

- It only supports min-heaps. It's not that max-heaps aren't also
useful (indeed, _under the covers_ CPython also implements max-heaps
for its own internal use). It's more that the original API simply
doesn't generalize cleanly.

- So there's a body of word-of-mouth "tricks" for making a min-heap
"act like" a max-heap. For integer or float priorities, it suffices to
use the negation of "the real" priority. For other kinds of values, it
gets trickier. In the end, you can wrap each item in a class that
implements `x.__lt__(y)` by doing `y.value < x.value`. But few people
appear to be aware of that, while nobody wants to endure the bother
;-)

- It's often the case that "the priority" needs to be computed from an
item's value. `heapq` has no direct support for that. Instead there's
another body of word-of-mouth tricks akin to the old
decorate-sort-undecorate approach sorting used. Rather than store the
values in the heap, you store `(priority, value)` tuples. But then
various unpleasant things can happen if two priorities are tied: in
that case tuple comparison falls back to comparing the values. But,
e.g., value comparison may be very expensive, or the values may not
support __lt__ at all. So instead you store `(priority,
unique_integer, value)` triples. And hope that you don't really need a
max-heap, lest it get even more obscure.

Using SortedContainers, none of those are issues. It's all
straightforward and obvious. If you have a list L always sorted by
priority, extracting from a "min queue" just requires L.pop(0), or
from a "max queue" L.pop(-1) (or L.pop() - -1 is the default, same as
for the Python list.pop()). No tricks are needed. Fine too if
sometimes you want to extract the min, while at other times the max.
Or, e.g., peek at the 5 highest-priority values and pick the one that
best fits resources available at the time.

In cases of computed priorities, the SortedKeyList constructor allows
specifying a function to be used to compute the keys used to determine
the list's ordering. Again straightforward and obvious.

from sortedcontainers import SortedKeyList
L = SortedKeyList([(i, j) for i in range(1, 5)
  for j in range(1, 5)],
  key=lambda t: t[0]/t[1])

>>> [t for t in L]
[(1, 4), (1, 3), (1, 2), (2, 4), (2, 3), (3, 4), (1, 1),
 (2, 2), (3, 3), (4, 4), (4, 3), (3, 2), (2, 1), (4, 2),
  (3, 1), (4, 1)]

That said, in most of my code I end up _removing_ uses of
SortedContainers, in favor of faster ways of getting the job done. The
package isn't the last resort for me, it's the first. I can write code
in straightforward ways that scale well from the start. If I need more
speed later, fine, I can puzzle out faster ways to get the task done.
If you're proud of that you _start_ by plotting ways to minimize the
number of sorts you do, you're coding in C++, not Python ;-)

So I suggest people are staring at the wrong end when asking for use
cases that can't be done without the package. Those are necessarily
non-trivial. Having a sorted collection is "more than good enough" for
many tasks. As above, e.g., there's no reason to imagine that Grant
Jenks had priority queues in mind at all, yet the flavors of sorted
lists he implemented are very easy to use for that task.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XXAXVV3KJ4ORQ7K6ELSS4R7S5725ASRE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-13 Thread ptmcg
I’ve not been following the thread, but Steve Holden forwarded me the email 
from Petr Viktorin, that I might share some of the info I found while recently 
diving into this topic.

 

As part of working on the next edition of “Python in a Nutshell” with Steve, 
Alex Martelli, and Anna Ravencroft, Alex suggested that I add a cautionary 
section on homoglyphs, specifically citing “A” (LATIN CAPITAL LETTER A) and “Α” 
(GREEK CAPITAL LETTER ALPHA) as an example problem pair. I wanted to look a 
little further at the use of characters in identifiers beyond the standard 
7-bit ASCII, and so I found some of these same issues dealing with Unicode NFKC 
normalization. The first discovery was the overlapping normalization of “ªº” 
with “ao”. This was quite a shock to me, since I assumed that the inclusion of 
Unicode for identifier characters would preserve the uniqueness of the 
different code points. Even ligatures can be used, and will overlap with their 
multi-character ASCII forms. So we have added a second note in the upcoming 
edition on the risks of using these “homonorms” (which is a word I just made up 
for the occasion).

 

To explore the extreme case, I wrote a pyparsing transformer to convert 
identifiers in a body of Python source to mixed font, equivalent to the 
original source after NFKC normalization. Here are hello.py, and a snippet from 
unittest/utils.py:

 

def 𝚑𝓮𝖑𝒍𝑜():

try:

𝔥e𝗅𝕝𝚘︴ = "Hello"

𝕨𝔬r𝓵ᵈ﹎ = "World"

ᵖ𝖗𝐢𝘯𝓽(f"{𝗵e𝓵𝔩º_}, {𝖜ₒ𝒓lⅆ︴}!")

except 𝓣𝕪ᵖe𝖤𝗿ᵣ𝖔𝚛 as ⅇ𝗑c:

𝒑rℹₙₜ("failed: {}".𝕗𝗼ʳᵐªt(ᵉ𝐱𝓬))

 

if _︴ⁿ𝓪𝑚𝕖__ == "__main__":

𝒉eℓˡ𝗈()

 

 

# snippet from unittest/util.py

_𝓟Ⅼ𝖠𝙲𝗘ℋ𝒪Lᴰ𝑬𝕽﹏𝕷𝔼𝗡 = 12

def _𝔰ʰ𝓸ʳ𝕥𝙚𝑛(𝔰, p𝑟𝔢fi𝖝𝕝𝚎𝑛, sᵤ𝑓𝗳𝗂𝑥𝗹ₑ𝚗):

ˢ𝗸i𝗽 = 𝐥e𝘯(𝖘) - pr𝚎𝖋𝐢x𝗅ᵉ𝓷 - 𝒔𝙪ffi𝘅𝗹𝙚ₙ

if ski𝘱 > _𝐏𝗟𝖠𝘊𝙴H𝕺L𝕯𝙀𝘙﹏L𝔈𝒩:

𝘴 = '%s[%d chars]%s' % (𝙨[:𝘱𝐫𝕖𝑓𝕚xℓ𝒆𝕟], ₛ𝚔𝒊p, 𝓼[𝓁𝒆𝖓(𝚜) - 𝙨𝚞𝒇fix𝙡ᵉ𝘯:])

return ₛ

 

 

You should able to paste these into your local UTF-8-aware editor or IDE and 
execute them as-is.

 

(If this doesn’t come through, you can also see this as a GitHub gist at Hello, 
World rendered in a variety of Unicode characters (github.com) 
 . I have a 
second gist containing the transformer, but it is still a private gist atm.)

 

 

Some other discoveries:

“·” (ASCII 183) is a valid identifier body character, making “_···” a valid 
Python identifier. This could actually be another security attack point, in 
which “s·join(‘x’)” could be easily misread as “s.join(‘x’)”, but would 
actually be a call to potentially malicious method “s·join”.

“_” seems to be a special case for normalization. Only the ASCII “_” character 
is valid as a leading identifier character; the Unicode characters that 
normalize to “_” (any of the characters in “︳︴﹍﹎﹏_”) can only be used as 
identifier body characters. “︳” especially could be misread as “|” followed by 
a space, when it actually normalizes to “_”.

 

 

Potential beneficial uses:

I am considering taking my transformer code and experimenting with an 
orthogonal approach to syntax highlighting, using Unicode groups instead of 
colors. Module names using characters from one group, builtins from another, 
program variables from another, maybe distinguish local from global variables. 
Colorizing has always been an obvious syntax highlight feature, but is an 
accessibility issue for those with difficulty distinguishing colors. Unlike the 
“ransom note” code above, code highlighted in this way might even be quite 
pleasing to the eye.

 

 

-- Paul McGuire

 

 

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GBLXJ2ZTIMLBD2MJQ4VDNUKFFTPPIIMO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-13 Thread Stestagg
This is my favourite version of the issue:

е = lambda е, e: е if е > e else e
print(е(2, 1), е(1, 2)) # python 3 outputs: 2 2

https://twitter.com/stestagg/status/685239650064162820?s=21

Steve

On Sat, 13 Nov 2021 at 22:05,  wrote:

> I’ve not been following the thread, but Steve Holden forwarded me the
> email from Petr Viktorin, that I might share some of the info I found while
> recently diving into this topic.
>
>
>
> As part of working on the next edition of “Python in a Nutshell” with
> Steve, Alex Martelli, and Anna Ravencroft, Alex suggested that I add a
> cautionary section on homoglyphs, specifically citing “A” (LATIN CAPITAL
> LETTER A) and “Α” (GREEK CAPITAL LETTER ALPHA) as an example problem pair.
> I wanted to look a little further at the use of characters in identifiers
> beyond the standard 7-bit ASCII, and so I found some of these same issues
> dealing with Unicode NFKC normalization. The first discovery was the
> overlapping normalization of “ªº” with “ao”. This was quite a shock to me,
> since I assumed that the inclusion of Unicode for identifier characters
> would preserve the uniqueness of the different code points. Even ligatures
> can be used, and will overlap with their multi-character ASCII forms. So we
> have added a second note in the upcoming edition on the risks of using
> these “homonorms” (which is a word I just made up for the occasion).
>
>
>
> To explore the extreme case, I wrote a pyparsing transformer to convert
> identifiers in a body of Python source to mixed font, equivalent to the
> original source after NFKC normalization. Here are hello.py, and a snippet
> from unittest/utils.py:
>
>
>
> def 𝚑𝓮𝖑𝒍𝑜():
>
> try:
>
> 𝔥e𝗅𝕝𝚘︴ = "Hello"
>
> 𝕨𝔬r𝓵ᵈ﹎ = "World"
>
> ᵖ𝖗𝐢𝘯𝓽(f"{𝗵e𝓵𝔩º_}, {𝖜ₒ𝒓lⅆ︴}!")
>
> except 𝓣𝕪ᵖe𝖤𝗿ᵣ𝖔𝚛 as ⅇ𝗑c:
>
> 𝒑rℹₙₜ("failed: {}".𝕗𝗼ʳᵐªt(ᵉ𝐱𝓬))
>
>
>
> if _︴ⁿ𝓪𝑚𝕖__ == "__main__":
>
> 𝒉eℓˡ𝗈()
>
>
>
>
>
> # snippet from unittest/util.py
>
> _𝓟Ⅼ𝖠𝙲𝗘ℋ𝒪Lᴰ𝑬𝕽﹏𝕷𝔼𝗡 = 12
>
> def _𝔰ʰ𝓸ʳ𝕥𝙚𝑛(𝔰, p𝑟𝔢fi𝖝𝕝𝚎𝑛, sᵤ𝑓𝗳𝗂𝑥𝗹ₑ𝚗):
>
> ˢ𝗸i𝗽 = 𝐥e𝘯(𝖘) - pr𝚎𝖋𝐢x𝗅ᵉ𝓷 - 𝒔𝙪ffi𝘅𝗹𝙚ₙ
>
> if ski𝘱 > _𝐏𝗟𝖠𝘊𝙴H𝕺L𝕯𝙀𝘙﹏L𝔈𝒩:
>
> 𝘴 = '%s[%d chars]%s' % (𝙨[:𝘱𝐫𝕖𝑓𝕚xℓ𝒆𝕟], ₛ𝚔𝒊p, 𝓼[𝓁𝒆𝖓(
> 𝚜) - 𝙨𝚞𝒇fix𝙡ᵉ𝘯:])
>
> return ₛ
>
>
>
>
>
> You should able to paste these into your local UTF-8-aware editor or IDE
> and execute them as-is.
>
>
>
> (If this doesn’t come through, you can also see this as a GitHub gist at 
> Hello,
> World rendered in a variety of Unicode characters (github.com)
> . I have
> a second gist containing the transformer, but it is still a private gist
> atm.)
>
>
>
>
>
> Some other discoveries:
>
> “·” (ASCII 183) is a valid identifier body character, making “_···” a
> valid Python identifier. This could actually be another security attack
> point, in which “s·join(‘x’)” could be easily misread as “s.join(‘x’)”, but
> would actually be a call to potentially malicious method “s·join”.
>
> “_” seems to be a special case for normalization. Only the ASCII “_”
> character is valid as a leading identifier character; the Unicode
> characters that normalize to “_” (any of the characters in “︳︴﹍﹎﹏_”) can
> only be used as identifier body characters. “︳” especially could be
> misread as “|” followed by a space, when it actually normalizes to “_”.
>
>
>
>
>
> Potential beneficial uses:
>
> I am considering taking my transformer code and experimenting with an
> orthogonal approach to syntax highlighting, using Unicode groups instead of
> colors. Module names using characters from one group, builtins from
> another, program variables from another, maybe distinguish local from
> global variables. Colorizing has always been an obvious syntax highlight
> feature, but is an accessibility issue for those with difficulty
> distinguishing colors. Unlike the “ransom note” code above, code
> highlighted in this way might even be quite pleasing to the eye.
>
>
>
>
>
> -- Paul McGuire
>
>
>
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/GBLXJ2ZTIMLBD2MJQ4VDNUKFFTPPIIMO/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/664TQW7KCLKQIMELI4VSP6LRDUWBOVRJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-13 Thread Terry Reedy

On 11/13/2021 4:35 PM, pt...@austin.rr.com wrote:
I’ve not been following the thread, but Steve Holden forwarded me the 


To explore the extreme case, I wrote a pyparsing transformer to convert 
identifiers in a body of Python source to mixed font, equivalent to the 
original source after NFKC normalization. Here are hello.py, and a 
snippet from unittest/utils.py:


def 𝚑𝓮𝖑𝒍𝑜():

     try:

𝔥e𝗅𝕝𝚘︴ = "Hello"

𝕨𝔬r𝓵ᵈ﹎ = "World"

     ᵖ𝖗𝐢𝘯𝓽(f"{𝗵e𝓵𝔩º_}, {𝖜ₒ𝒓lⅆ︴}!")

     except 𝓣𝕪ᵖe𝖤𝗿ᵣ𝖔𝚛 as ⅇ𝗑c:

𝒑rℹₙₜ("failed: {}".𝕗𝗼ʳᵐªt(ᵉ𝐱𝓬))

if _︴ⁿ𝓪𝑚𝕖__ == "__main__":

𝒉eℓˡ𝗈()

# snippet from unittest/util.py

_𝓟Ⅼ𝖠𝙲𝗘ℋ𝒪Lᴰ𝑬𝕽﹏𝕷𝔼𝗡 = 12

def _𝔰ʰ𝓸ʳ𝕥𝙚𝑛(𝔰, p𝑟𝔢fi𝖝𝕝𝚎𝑛, sᵤ𝑓𝗳𝗂𝑥𝗹ₑ𝚗):

     ˢ𝗸i𝗽 = 𝐥e𝘯(𝖘) - pr𝚎𝖋𝐢x𝗅ᵉ𝓷 - 𝒔𝙪ffi𝘅𝗹𝙚ₙ

     if ski𝘱 > _𝐏𝗟𝖠𝘊𝙴H𝕺L𝕯𝙀𝘙﹏L𝔈𝒩:

𝘴 = '%s[%d chars]%s' % (𝙨[:𝘱𝐫𝕖𝑓𝕚xℓ𝒆𝕟], ₛ𝚔𝒊p, 𝓼[𝓁𝒆𝖓(𝚜) - 
𝙨𝚞𝒇fix𝙡ᵉ𝘯:])


     return ₛ

You should able to paste these into your local UTF-8-aware editor or IDE 
and execute them as-is.


Wow.  After pasting the util.py snippet into current IDLE, which on my 
Windows machine* displays the complete text:


>>> dir()
['_PLACEHOLDER_LEN', '__annotations__', '__builtins__', '__doc__', 
'__loader__', '__name__', '__package__', '__spec__', '_shorten']

>>> _shorten('abc', 1, 1)
'abc'
>>> _shorten('abcdefghijklmnopqrw', 2, 2)
'ab[15 chars]rw'

* Does not at all work in CommandPrompt, even after supposedly changing 
to a utf-8 codepage with 'chcp 65000'.


--
Terry Jan Reedy
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NSGBCZQ2R6G2HGPAID4ZI35YCRMF7ERC/
Code of Conduct: http://python.org/psf/codeofconduct/