Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

2016-02-10 Thread Andrew Barnert via Python-Dev
On Feb 9, 2016, at 20:17, Stephen J. Turnbull  wrote:

>> It really requires going through all the OS calls and either (a) making 
>> them consistently decode bytes to str using the declared FS encoding 
>> (currently 'mbcs', but I see no reason we can't make it 'utf_8'),
> 
> If it were that easy, it would have been done two decades ago.  I'm no
> fan of Windows[1], but it's obvious that Microsoft has devoted
> enormous amounts of brainpower to the problem of encoding
> rationalization since the early 90s.  I don't think they would have
> missed this idea.

Microsoft spent a lot of time and effort on the idea that UTF-16 (or, 
originally, UCS-2) everywhere was the answer. Never call the A functions (or 
the msvcrt functions that emulate the C and POSIX stdlib), and there's never a 
problem. What if you read filenames out of a text file? No problem; text files 
are UTF-16-BOM. Over a socket? All network protocols are also UTF-16. What if 
you have to read a file written in Unix? Come on, nobody's ever created a 
useful file without Windows. What about Windows 3.1? Uh... that's a problem. 
Also, what happens when Unicode goes over 64k characters? And so on. So their 
grand project failed.

That doesn't mean the problem can't be solved. Apple solved their equivalent 
problem, albeit by sacrificing backward compatibility in a way Microsoft can't 
get away with. I haven't seen a MacRoman or Shift-JIS filename since they broke 
the last holdout (the low-level AppleEvent interface) in 10.7--and most of the 
apps I was using back then don't run on 10.10 without an update. So Python 2 
works great on Macs, whether you use bytes or unicode. But that doesn't help us 
on Windows, where you can't use bytes, or Linux, where you can't use Unicode 
(without surrogate escape or some other mechanism that Python 2 doesn't have).
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

2016-02-10 Thread Stephen J. Turnbull
Executive summary:

Code pages and POSIX locales aren't solutions, they're the Original Sin.

Steve Dower writes:
 > On 09Feb2016 2017, Stephen J. Turnbull wrote:

 > >   > The problem here is the protocol that Python uses to return
 > >   > bytes paths, and that protocol is inconsistent between APIs
 > >   > and information is lost.
 > >
 > > No, the problem is that the necessary information simply isn't always
 > > available.
 > 
 > But if we return bytes paths and the user passes them back in unchanged, 
 > that should be irrelevant.

Yes.  That's pretty much exactly the semantics of using the latin-1
codec.  UTF-8 can't do that without surrogateescape, which Python 2 lacks.

 > The earlier issue was that that doesn't work (e.g. a bytes path
 > from os.scandir couldn't be passed back into open()).

My purely-from-the-user-side take is that that's just a bug in
os.scandir that should be fixed, and that even though the complexity
that occasions such bugs is an undesirable aspect of Python (v2)
programming, it's not a bug because it *can't* be fixed -- you have to
fix the world, not Python.  Or switch to Python 3.

I don't know enough to have an opinion on whether "fixing" os.scandir
could cause other problems.

 > I meant with Python's calls into the API. Anywhere Python does the 
 > conversion from bytes to LPCWSTR (the UTF-16 type) there's a chance 
 > it'll be wrong.

Indeed.  That's why converting the bytes is often the wrong thing to
do *period*.  The reasons that Python might be wrong apply to every
agent that might decide the conversion -- except the user; the user is
never wrong about these things.

 > Microsoft's solution here is the user's active code page, much like 
 > *nix's solution as I understand it, except that where *nix will convert 
 > _to_ the encoding as a normalized form, Windows will convert _from_ the 
 > encoding to its UTF-16 "normalized" form.

Not quite accurate.  Unix by original design doesn't *have* a
normalized form.[1] Bytez-iz-bytez-R-Us, that's Unix.  Recently
everybody (except for a few nationalist lunatics and the unteachables
in some legislatures) has learned that some form of Unicode is the way
to go internally.  But that's "best practice", not POSIX requirement,
and tons of software continues to operate[2] based on the assumption
that users are monolingual with a canonical one-byte encoding, so it
doesn't matter as long as *no conversion is ever done*, and the input
methods and fonts are consistent with each other.  Code pages just try
to *enforce* that constraint (and as I already mentioned, that pissed
me off so much in 1990 that I'm still a Windows refusenik today).

 > Back-compat concerns have prevented any significant changes being
 > made here, otherwise there wouldn't be a 'bytes' interface at
 > all.

It's not just back-compat, it's absolutely necessary in a code-page-
based world because you just can't be sure what encoding your content
is in until the user tells you the crap you've spewed on her screen
might be Klingon, but it's not any of the 7 human languages she knows.
"Toto!  I don't think we're in Kansas any more"  The fact is that
code-page-based content continues to be produced in significant
quantities, despite the universal availability and absolute
superiority (except for workstation reconfiguration costs) of Unicode.


Footnotes: 
[1]  The POSIX locale selects encodings for console input and output.
File I/O is just bytes, both the content and the file name.  The code
page also defines the file name encoding as I understand it.

[2]  I would hope that nobody is *writing* software like that any
more, but I live in Japan.  That hope is years in the future for me.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

2016-02-10 Thread Paul Moore
On 10 February 2016 at 08:00, Stephen J. Turnbull  wrote:
>> The earlier issue was that that doesn't work (e.g. a bytes path
>  > from os.scandir couldn't be passed back into open()).
>
> My purely-from-the-user-side take is that that's just a bug in
> os.scandir that should be fixed, and that even though the complexity
> that occasions such bugs is an undesirable aspect of Python (v2)
> programming, it's not a bug because it *can't* be fixed -- you have to
> fix the world, not Python.  Or switch to Python 3.
>
> I don't know enough to have an opinion on whether "fixing" os.scandir
> could cause other problems.

The original os.scandir issue was encountered on Python 3. And I do
agree with Victor that the correct answer was to point out to the user
that they should be using unicode/surrogateescape. What I disagree
with is mandating that (by removing the bytes interface) on anything
other than all platforms at once, because that doesn't remove the
problem (of coders using the wrong approach on Python 3) it just makes
the code such users write non-portable.

Whether removing the bytes interface is feasible, given that there's
then no way that works across Python 2 and 3 of writing code that
manipulates the sort of bytes-that-use-multiple-encodings data that
you mention, is a separate issue.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

2016-02-10 Thread Victor Stinner
2016-02-10 9:30 GMT+01:00 Paul Moore :
> Whether removing the bytes interface is feasible, given that there's
> then no way that works across Python 2 and 3 of writing code that
> manipulates the sort of bytes-that-use-multiple-encodings data that
> you mention, is a separate issue.

It's annoying that 8 years after the release of Python 3.0, Python 3
is still stuck by Python 2 :-(

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Experiences with Creating PEP 484 Stub Files

2016-02-10 Thread Phil Thompson

> On 9 Feb 2016, at 11:48 pm, Guido van Rossum  wrote:
> 
> [Phil]
 I found the documentation confusing regarding Optional. Intuitively it 
 seems to be the way to specify arguments with default values. However it 
 is explained in terms of (for example) Union[str, None] and I (intuitively 
 but incorrectly) read that as meaning "a str or None" as opposed to "a str 
 or nothing".
> [me]
>>> But it *does* mean 'str or None'. The *type* of an argument doesn't
>>> have any bearing on whether it may be omitted from the argument list
>>> by the caller -- these are orthogonal concepts (though sadly the word
>>> optional might apply to both). It's possible (though unusual) to have
>>> an optional argument that must be a str when given; it's also possible
>>> to have a mandatory argument that may be a str or None.
> [Phil]
>> In the case of Python wrappers around a C++ library then *every* optional 
>> argument will have to have a specific type when given.
> 
> IIUC you're saying that every argument that may be omitted must still
> have a definite type other than None. Right? In that case just don't
> use Optional[]. If a signature has the form
> 
> def foo(a: str = 'xyz') -> str: ...
> 
> then this means that str may be omitted or it may be a str -- you
> cannot call foo(a=None).
> 
> You can even (in a stub file) write this as:
> 
> def foo(a: str = ...) -> str: ...
> 
> (literal '...' i.e. ellipsis) if you don't want to commit to a
> specific default value (it makes no difference to mypy).
> 
>> So you are saying that a mandatory argument that may be a str or None would 
>> be specified as Union[str, None]?
> 
> Or as Optional[str], which means the same.
> 
>> But the docs say that that is the underlying implementation of Option[str] - 
>> which (to me) means an optional argument that should be a string when given.
> 
> (Assuming you meant Option*al*.) There seems to be an utter confusion
> of the two uses of the term "optional" here. An "optional argument"
> (outside PEP 484) is one that has a default value. The "Optional[T]"
> notation in PEP 484 means "Union[T, None]". They mean different
> things.
> 
>>> Can you help improve the wording in the docs (preferably by filing an 
>>> issue)?
>> 
>> When I eventually understand what it means...

I understand now. The documentation, as it stands, is correct and consistent 
but (to me) the meaning of Optional is completely counter-intuitive. What you 
suggest with str = ... is exactly what I need. Adding a section to the docs 
describing that should clear up the confusion.

Thanks,
Phil
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

2016-02-10 Thread Paul Moore
On 10 February 2016 at 08:45, Victor Stinner  wrote:
> 2016-02-10 9:30 GMT+01:00 Paul Moore :
>> Whether removing the bytes interface is feasible, given that there's
>> then no way that works across Python 2 and 3 of writing code that
>> manipulates the sort of bytes-that-use-multiple-encodings data that
>> you mention, is a separate issue.
>
> It's annoying that 8 years after the release of Python 3.0, Python 3
> is still stuck by Python 2 :-(

Agreed. Of course personally, I'm in favour of going Python 3/Unicode
everywhere, it's the Unix guys with their legacy distros and Python
installations and bytes-based filesystems that get in the way of that
:-) And I don't think we're brave enough to force *Unix* users to use
the right type for filenames :-)

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

2016-02-10 Thread Andrew Barnert via Python-Dev
On Wednesday, February 10, 2016 12:47 AM, Victor Stinner 
 wrote:

> > 2016-02-10 9:30 GMT+01:00 Paul Moore :
>>  Whether removing the bytes interface is feasible, given that there's
>>  then no way that works across Python 2 and 3 of writing code that
>>  manipulates the sort of bytes-that-use-multiple-encodings data that
>>  you mention, is a separate issue.

Well, there's a surrogate-escape backport on PyPI (I think there's a standalone 
one, and one in python-future), so you _could_ do everything the same as in 3.x.

Depending on what you're doing, you may also need to use the io module instead 
of file (which may just mean "from io import open", but could mean more work), 
wrap the stdio streams explicitly, manually decode argv, etc. But someone could 
write a six-like module (or add it to six) that does all of that. It may be a 
little slower and more memory-intensive in 2.7 than in 3.x, but for most apps, 
that doesn't matter. The big problem would be third-party libraries (and stdlib 
modules like csv) that want to use bytes in 2.x; convincing them all to support 
full-on-unicode in 2.x might be more trouble than it's worth. Still, if I were 
feeling the pain of maintaining lots of linux-bytes-Windows-unicode-2.7 code, 
I'd try it and see how far I get.

> It's annoying that 8 years after the release of Python 3.0, Python 3
> is still stuck by Python 2 :-(

I understand the frustration, but... time already goes too fast at my age; 
don't skip me ahead almost a whole year to December 2016. :)

Also, unless you're the one guy who actually abandoned 2.6 for 3.0, it's 
probably more useful to count from 2.7, 3.2, or the no-2.8 declaration, which 
are all about 5 years ago.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

2016-02-10 Thread Steven D'Aprano
On Wed, Feb 10, 2016 at 12:41:08PM +1100, Chris Angelico wrote:
> On Wed, Feb 10, 2016 at 12:37 PM, Steve Dower  wrote:
> > I really don't like the idea of not being able to use bytes in cross
> > platform code. Unless it's become feasible to use Unicode for lossless
> > filenames on Linux - last I heard it wasn't.
> 
> It has, but only in Python 3 - anyone who needs to support 2.7 and
> arbitrary bytes in filenames can't use Unicode strings.

Are you sure? Unless I'm confused, which I may be, I don't think you 
can specify file names with arbitrary bytes in Python 3.


Writing, and reading, filenames including odd bytes works in Python 2.7:

[steve@ando ~]$ python -c 'open("/tmp/abc\xD8\x01", "w").write("Hello World\n")'
[steve@ando ~]$ ls /tmp/abc*
/tmp/abc??
[steve@ando ~]$ python -c 'print open("/tmp/abc\xD8\x01", "r").read()'
Hello World

[steve@ando ~]$


And I can read the file using bytes in Python 3:

[steve@ando ~]$ python3.3 -c 'print(open(b"/tmp/abc\xD8\x01", "r").read())'
Hello World

[steve@ando ~]$


But Unicode fails:

[steve@ando ~]$ python3.3 -c 'print(open("/tmp/abc\xD8\x01", "r").read())'
Traceback (most recent call last):
  File "", line 1, in 
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/abcØ\x01'



What Unicode string does one need to give in order to open file 
b"/tmp/abc\xD8\x01"? I think one would need to find a valid unicode 
string which, when encoded to UTF-8, gives the byte sequence \xD8\x01, 
but since that's half of a surrogate pair it is an illegal UTF-8 byte 
sequence. So I don't think it can be done.

Am I mistaken?




-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

2016-02-10 Thread Victor Stinner
2016-02-10 11:18 GMT+01:00 Steven D'Aprano :
> [steve@ando ~]$ python3.3 -c 'print(open(b"/tmp/abc\xD8\x01", "r").read())'
> Hello World
>
> [steve@ando ~]$ python3.3 -c 'print(open("/tmp/abc\xD8\x01", "r").read())'
> Traceback (most recent call last):
>   File "", line 1, in 
> FileNotFoundError: [Errno 2] No such file or directory: '/tmp/abcØ\x01'
>
> What Unicode string does one need to give in order to open file
> b"/tmp/abc\xD8\x01"?

Use os.fsdecode(b"/tmp/abc\xD8\x01") to get the filename as an Unicode
string, it will work.

Removing 'b' in front of byte strings is not enough to convert an
arbitrary byte strings to Unicode :-D Encodings are more complex than
that... See http://unicodebook.readthedocs.org/

The problem on Python 2 is that the UTF-8 encoders encode surrogate
characters, which is wrong. You cannot use an error handler to choose
how to handle these surrogate characters.

On Python 3, you have a wide choice of builtin error handlers, and you
can even write your own error handlers. Example with Python 3.6 and
its new "namereplace" error handler.

>>> def format_filename(filename, encoding='ascii', errors='backslashreplace'):
... return filename.encode(encoding, errors).decode(encoding)
...

>>> print(format_filename(os.fsdecode(b'abc\xff')))
abc\udcff

>>> print(format_filename(os.fsdecode(b'abc\xff'), errors='replace'))
abc?

>>> print(format_filename(os.fsdecode(b'abc\xff'), errors='ignore'))
abc

>>> print(format_filename(os.fsdecode(b'abc\xff') + "é", errors='namereplace'))
abc\udcff\N{LATIN SMALL LETTER E WITH ACUTE}

My locale encoding is UTF-8.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-10 Thread Serhiy Storchaka

On 08.02.16 16:32, Victor Stinner wrote:

On Python 2, it wasn't possible to use Unicode for filenames, many
functions fail badly with Unicode, especially when you mix bytes and
Unicode.


Even not all os functions support Unicode.
See http://bugs.python.org/issue18695.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Experiences with Creating PEP 484 Stub Files

2016-02-10 Thread Nick Coghlan
On 10 February 2016 at 06:54, Guido van Rossum  wrote:
> [Just adding to Andrew's response]
>
> On Tue, Feb 9, 2016 at 9:58 AM, Andrew Barnert via Python-Dev
>  wrote:
>> On Feb 9, 2016, at 03:44, Phil Thompson  wrote:
>>>
>>> There are a number of things I'd like to express but cannot find a way to 
>>> do so...
>>>
>>> - objects that implement the buffer protocol
>>
>> That seems like it should be filed as a bug with the typing repo. Presumably 
>> this is just an empty type that registers bytes, bytearray, and memoryview, 
>> and third-party classes have to register with it manually?
>
> Hm, there's no way to talk about these in regular Python code either,
> is there? I think that issue should be resolved first. Probably by
> adding something to collections.abc. And then we can add the
> corresponding name to typing.py. This will take time though (have to
> wait for 3.6) so I'd recommend 'Any' for now (and filing those bugs).

Somewhat related, there's actually no way to export PEP 3118 buffers
directly from a type implemented in Python:
http://bugs.python.org/issue13797

Cython and PyPy each have their own approach to handling that, but
there's no language level cross-interpreter convention

A type (e.g. BytesLike, given the change we made to relevant error
messages) could still be added to collections.abc without addressing
that problem, it would just need to be empty and used only for
explicit registration without any structural typing support.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

2016-02-10 Thread Stephen J. Turnbull
Victor Stinner writes:

 > It's annoying that 8 years after the release of Python 3.0, Python 3
 > is still stuck by Python 2 :-(

I prefer to think of it as the irritant that reminds me that I am very
much alive, and so is Python, vibrantly so.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

2016-02-10 Thread Stephen J. Turnbull
Andrew Barnert via Python-Dev writes:

 > That doesn't mean the problem can't be solved. Apple solved their
 > equivalent problem, albeit by sacrificing backward compatibility in
 > a way Microsoft can't get away with. I haven't seen a MacRoman or
 > Shift-JIS filename since they broke the last holdout

If you lived where I do, you'd still be seeing both, because you
wouldn't be able to escape archival files on CD and removable media
(typically written on Windows boxen).  They still work, sort of ==
same as always, and as far as I know, that's because Apple has *not*
sacrificed backward compatibility: under the hood, Darwin is still a
POSIX kernel which thinks of file names and everything else outside of
memory as bytestreams.

One place they *fail very badly* is Shift JIS filenames in zipfiles,
which nothing provided by Apple can deal with safely, and InfoZip
breaks too (at least in MacPorts).  Yes, I know that is specifically
disallowed.  Feel free to tell 1__ Japanese Windows users.
Thank heaven for Python there!  A three-line hack and I'm free!

 > So Python 2 works great on Macs, whether you use bytes or
 > unicode. But that doesn't help us on Windows, where you can't use
 > bytes, or Linux, where you can't use Unicode (without surrogate
 > escape or some other mechanism that Python 2 doesn't have).

You contradict yourself! ;-)

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Experiences with Creating PEP 484 Stub Files

2016-02-10 Thread Guido van Rossum
On Wed, Feb 10, 2016 at 1:11 AM, Phil Thompson
 wrote:
> I understand now. The documentation, as it stands, is correct and consistent 
> but (to me) the meaning of Optional is completely counter-intuitive. What you 
> suggest with str = ... is exactly what I need. Adding a section to the docs 
> describing that should clear up the confusion.

I tried to add some clarity to the docs with this paragraph:

   Note that this is not the same concept as an optional argument,
   which is one that has a default.  An optional argument with a
   default needn't use the ``Optional`` qualifier on its type
   annotation (although it is inferred if the default is ``None``).
   A mandatory argument may still have an ``Optional`` type if an
   explicit value of ``None`` is allowed.

Should be live on docs.python.org with the next push (I don't recall
the delay, at most a day IIRC).

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Experiences with Creating PEP 484 Stub Files

2016-02-10 Thread Phil Thompson
On 10 Feb 2016, at 5:52 pm, Guido van Rossum  wrote:
> 
> On Wed, Feb 10, 2016 at 1:11 AM, Phil Thompson
>  wrote:
>> I understand now. The documentation, as it stands, is correct and consistent 
>> but (to me) the meaning of Optional is completely counter-intuitive. What 
>> you suggest with str = ... is exactly what I need. Adding a section to the 
>> docs describing that should clear up the confusion.
> 
> I tried to add some clarity to the docs with this paragraph:
> 
>   Note that this is not the same concept as an optional argument,
>   which is one that has a default.  An optional argument with a
>   default needn't use the ``Optional`` qualifier on its type
>   annotation (although it is inferred if the default is ``None``).
>   A mandatory argument may still have an ``Optional`` type if an
>   explicit value of ``None`` is allowed.
> 
> Should be live on docs.python.org with the next push (I don't recall
> the delay, at most a day IIRC).

That should do it, thanks. A followup question...

Is...

def foo(bar: str = Optional[str])

...valid? In other words, bar can be omitted, but if specified must be a str or 
None?

Thanks,
Phil
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Experiences with Creating PEP 484 Stub Files

2016-02-10 Thread Guido van Rossum
On Wed, Feb 10, 2016 at 10:01 AM, Phil Thompson
 wrote:
> On 10 Feb 2016, at 5:52 pm, Guido van Rossum  wrote:
[...]
> That should do it, thanks. A followup question...
>
> Is...
>
> def foo(bar: str = Optional[str])
>
> ...valid? In other words, bar can be omitted, but if specified must be a str 
> or None?

The syntax you gave makes no sense (the default value shouldn't be a
type) but to do what your words describe you can do

def foo(bar: Optional[str] = ...): ...

That's literally what you would put in the stub file (the ... are
literal ellipses).

In a .py file you'd have to specify a concrete default value. If your
concrete default is neither str nor None you'd have to use cast(str,
default_value), e.g.

_NO_VALUE = object()  # marker

def foo(bar: Optional[str] = cast(str, _NO_VALUE)):
...implementation...

Now the implementation can distinguish between foo(), foo(None) and foo('').

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

2016-02-10 Thread Andrew Barnert via Python-Dev
On Wednesday, February 10, 2016 6:50 AM, Stephen J. Turnbull 
 wrote:
> Andrew Barnert via Python-Dev writes:
> 
>>  That doesn't mean the problem can't be solved. Apple solved their
>>  equivalent problem, albeit by sacrificing backward compatibility in
>>  a way Microsoft can't get away with. I haven't seen a MacRoman or
>>  Shift-JIS filename since they broke the last holdout
> 
> If you lived where I do, you'd still be seeing both, because you
> wouldn't be able to escape archival files on CD and removable media
> (typically written on Windows boxen). They still work, sort of ==
> same as always, and as far as I know, that's because Apple has *not*
> sacrificed backward compatibility: under the hood, Darwin is still a
> POSIX kernel which thinks of file names and everything else outside of
> memory as bytestreams.


Sure, but the Darwin kernel can't read CDs; that's up to the CD filesystem 
driver.


Anyway, Windows CDs can't cause this problem. Windows CDs use the Joliet 
filesystem,[^1] which stores everything in UCS2.[^2] When you call CreateFileA 
or fopen or _open with bytes, Windows decodes those bytes and stores them as 
UCS2. The filesystem drivers on POSIX platforms have to encode that UCS2 to 
_something_ (POSIX APIs make it very hard for you to deal with filename strings 
like 
"A\0B\0C\0.\0T\0X\0T\0\0\0"...). The linux driver uses a mount option to decide 
how to encode; the OS X driver always uses UTF-8. And every valid UCS2 string 
can be encoded as UTF-8, so you can use unicode everywhere, even in Python 2.

Of course you can have mojibake problems, but that's a different issue,[^3] and 
no worse with unicode than with bytes.[^4]

The same thing is true with NTFS external drives, VFAT USB drives, etc. 
Generally, it's usually not Windows media on *nix systems that break Python 2 
unicode; it's native *nix filesystems where users mix locales.

> One place they *fail very badly* is Shift JIS filenames in zipfiles,
> which nothing provided by Apple can deal with safely, and InfoZip
> breaks too (at least in MacPorts). Yes, I know that is specifically
> disallowed. Feel free to tell 1__ Japanese Windows users.

The good news is, as far as I can tell, it's not disallowed anymore.[^5] So we 
just have to tell them that they shouldn't have been doing it in the past. :)

Anyway, zipfiles are data files as far as the OS is concerned; the fact that 
they contain filenames is no more relevant to the kernel (or filesystem driver 
or userland) than the fact that "List of PDFs to Read This Weekend.txt" 
contains filenames.

PS, everything Apple provides is already using Info-ZIP.


>>  So Python 2 works great on Macs, whether you use bytes or
>>  unicode. But that doesn't help us on Windows, where you can't use
>>  bytes, or Linux, where you can't use Unicode (without surrogate
>>  escape or some other mechanism that Python 2 doesn't have).
> 
> You contradict yourself! ;-)

Yes, as I later realized, sometimes, you _can_ (or at least ought to be able 
to--I haven't actually tried) use Python 2 with unicode everywhere to write 
cross-platform software that actually works on linux, by using backports of 
surrogate-escape and pathlib, and the io module instead of the file type, as 
long as you only need stdlib and third-party modules that support unicode 
filenames. If that does work for at least some apps, then I'm perfectly happen 
to have been wrong earlier. And if catching myself before someone else did 
makes me a flip-flopper, well, I'm not running for president. :P


  [^1]: Except when Vista and 7 mistakenly think your CD is a DVD and use UDF 
instead of ISO9660--but in that case, the encoding is stored in the filesystem 
header, so it's also not a problem.

  [^2]: Actually, despite Microsoft's spec, later versions of Windows store 
UTF-16, even if there are surrogate pairs, or BMP-but-post-UCS2 code points. 
But that doesn't matter here; the linux, Mac, etc. drivers all assume UTF-16, 
which works either way.

  [^3]: Say you write a program that assumes it will only be run on Shift-JIS 
systems, and you use CreateFileA to create a file named "ハローワールド". The actual 
bytes you're sending are cp436 for "ânâìü[âÅü[âïâh", so the file on the CD is 
named, in Unicode, "ânâìü[âÅü[âïâh". So of course the Mac driver encodes that 
to UTF-8 b"ânâìü[âÅü[âïâh". You won't have any problems opening what you 
readdir, or what you copy from a UTF-8 terminal or a UTF-16 Cocoa app like 
Finder, etc. But of course you will have trouble getting your user to recognize 
that name as meaningful, unless you can figure out or guess or prompt the user 
to guess that it needs to be passed through 
s.encode('cp436').decode('shift-jis'). 

  [^4]: Your locale is always UTF-8 on Mac. So the only significant difference 
is that if you're using bytes, you need 
b.decode('utf-8').encode('cp436').decode('shift-jis') to fix the problem.

  [^5]: Zipfiles using the Unicode extension can store a UTF-8 transcoding 
along with

[Python-Dev] why we have both re.match and re.string?

2016-02-10 Thread Luca Sangiacomo

Hi,
I hope the question is not too silly, but why I would like to understand 
the advantages of having both re.match() and re.search(). Wouldn't be 
more clear to have just one function with one additional parameters like 
this:


re.search(regexp, text, from_beginning=True|False) ?

In this way we prevent, as written in the documentation, people writing 
".*" in front of the regexp used with re.match()


Thanks.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Georg Brandl
This came up in python-ideas, and has met mostly positive comments,
although the exact syntax rules are up for discussion.

cheers,
Georg



PEP: 515
Title: Underscores in Numeric Literals
Version: $Revision$
Last-Modified: $Date$
Author: Georg Brandl
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 10-Feb-2016
Python-Version: 3.6

Abstract and Rationale
==

This PEP proposes to extend Python's syntax so that underscores can be used in
integral and floating-point number literals.

This is a common feature of other modern languages, and can aid readability of
long literals, or literals whose value should clearly separate into parts, such
as bytes or words in hexadecimal notation.

Examples::

# grouping decimal numbers by thousands
amount = 10_000_000.0

# grouping hexadecimal addresses by words
addr = 0xDEAD_BEEF

# grouping bits into bytes in a binary literal
flags = 0b_0011__0100_1110


Specification
=

The current proposal is to allow underscores anywhere in numeric literals, with
these exceptions:

* Leading underscores cannot be allowed, since they already introduce
  identifiers.
* Trailing underscores are not allowed, because they look confusing and don't
  contribute much to readability.
* The number base prefixes ``0x``, ``0o``, and ``0b`` cannot be split up,
  because they are fixed strings and not logically part of the number.
* No underscore allowed after a sign in an exponent (``1e-_5``), because
  underscores can also not be used after the signs in front of the number
  (``-1e5``).
* No underscore allowed after a decimal point, because this leads to ambiguity
  with attribute access (the lexer cannot know that there is no number literal
  in ``foo._5``).

There appears to be no reason to restrict the use of underscores otherwise.

The production list for integer literals would therefore look like this::

   integer: decimalinteger | octinteger | hexinteger | bininteger
   decimalinteger: nonzerodigit [decimalrest] | "0" [("0" | "_")* "0"]
   nonzerodigit: "1"..."9"
   decimalrest: (digit | "_")* digit
   digit: "0"..."9"
   octinteger: "0" ("o" | "O") (octdigit | "_")* octdigit
   hexinteger: "0" ("x" | "X") (hexdigit | "_")* hexdigit
   bininteger: "0" ("b" | "B") (bindigit | "_")* bindigit
   octdigit: "0"..."7"
   hexdigit: digit | "a"..."f" | "A"..."F"
   bindigit: "0" | "1"

For floating-point literals::

   floatnumber: pointfloat | exponentfloat
   pointfloat: [intpart] fraction | intpart "."
   exponentfloat: (intpart | pointfloat) exponent
   intpart: digit (digit | "_")*
   fraction: "." intpart
   exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest]


Alternative Syntax
==

Underscore Placement Rules
--

Instead of the liberal rule specified above, the use of underscores could be
limited.  Common rules are (see the "other languages" section):

* Only one consecutive underscore allowed, and only between digits.
* Multiple consecutive underscore allowed, but only between digits.

Different Separators


A proposed alternate syntax was to use whitespace for grouping.  Although
strings are a precedent for combining adjoining literals, the behavior can lead
to unexpected effects which are not possible with underscores.  Also, no other
language is known to use this rule, except for languages that generally
disregard any whitespace.

C++14 introduces apostrophes for grouping, which is not considered due to the
conflict with Python's string literals. [1]_


Behavior in Other Languages
===

Those languages that do allow underscore grouping implement a large variety of
rules for allowed placement of underscores.  This is a listing placing the known
rules into three major groups.  In cases where the language spec contradicts the
actual behavior, the actual behavior is listed.

**Group 1: liberal (like this PEP)**

* D [2]_
* Perl 5 (although docs say it's more restricted) [3]_
* Rust [4]_
* Swift (although textual description says "between digits") [5]_

**Group 2: only between digits, multiple consecutive underscores**

* C# (open proposal for 7.0) [6]_
* Java [7]_

**Group 3: only between digits, only one underscore**

* Ada [8]_
* Julia (but not in the exponent part of floats) [9]_
* Ruby (docs say "anywhere", in reality only between digits) [10]_


Implementation
==

A preliminary patch that implements the specification given above has been
posted to the issue tracker. [11]_


References
==

.. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html

.. [2] http://dlang.org/spec/lex.html#integerliteral

.. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors

.. [4] http://doc.rust-lang.org/reference.html#number-literals

.. [5]
https://developer.apple.com/library/ios/documentation/Swift/Concep

Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Brett Cannon
On Wed, 10 Feb 2016 at 14:21 Georg Brandl  wrote:

> This came up in python-ideas, and has met mostly positive comments,
> although the exact syntax rules are up for discussion.
>
> cheers,
> Georg
>
>
> 
>
> PEP: 515
> Title: Underscores in Numeric Literals
> Version: $Revision$
> Last-Modified: $Date$
> Author: Georg Brandl
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 10-Feb-2016
> Python-Version: 3.6
>
> Abstract and Rationale
> ==
>
> This PEP proposes to extend Python's syntax so that underscores can be
> used in
> integral and floating-point number literals.
>
> This is a common feature of other modern languages, and can aid
> readability of
> long literals, or literals whose value should clearly separate into parts,
> such
> as bytes or words in hexadecimal notation.
>
> Examples::
>
> # grouping decimal numbers by thousands
> amount = 10_000_000.0
>
> # grouping hexadecimal addresses by words
> addr = 0xDEAD_BEEF
>
> # grouping bits into bytes in a binary literal
> flags = 0b_0011__0100_1110
>

I assume all of these examples are possible in either the liberal or
restrictive approaches?


>
>
> Specification
> =
>
> The current proposal is to allow underscores anywhere in numeric literals,
> with
> these exceptions:
>
> * Leading underscores cannot be allowed, since they already introduce
>   identifiers.
> * Trailing underscores are not allowed, because they look confusing and
> don't
>   contribute much to readability.
> * The number base prefixes ``0x``, ``0o``, and ``0b`` cannot be split up,
>   because they are fixed strings and not logically part of the number.
> * No underscore allowed after a sign in an exponent (``1e-_5``), because
>   underscores can also not be used after the signs in front of the number
>   (``-1e5``).
> * No underscore allowed after a decimal point, because this leads to
> ambiguity
>   with attribute access (the lexer cannot know that there is no number
> literal
>   in ``foo._5``).
>
> There appears to be no reason to restrict the use of underscores otherwise.
>
> The production list for integer literals would therefore look like this::
>
>integer: decimalinteger | octinteger | hexinteger | bininteger
>decimalinteger: nonzerodigit [decimalrest] | "0" [("0" | "_")* "0"]
>nonzerodigit: "1"..."9"
>decimalrest: (digit | "_")* digit
>digit: "0"..."9"
>octinteger: "0" ("o" | "O") (octdigit | "_")* octdigit
>hexinteger: "0" ("x" | "X") (hexdigit | "_")* hexdigit
>bininteger: "0" ("b" | "B") (bindigit | "_")* bindigit
>octdigit: "0"..."7"
>hexdigit: digit | "a"..."f" | "A"..."F"
>bindigit: "0" | "1"
>
> For floating-point literals::
>
>floatnumber: pointfloat | exponentfloat
>pointfloat: [intpart] fraction | intpart "."
>exponentfloat: (intpart | pointfloat) exponent
>intpart: digit (digit | "_")*
>fraction: "." intpart
>exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest]
>
>
> Alternative Syntax
> ==
>
> Underscore Placement Rules
> --
>
> Instead of the liberal rule specified above, the use of underscores could
> be
> limited.  Common rules are (see the "other languages" section):
>
> * Only one consecutive underscore allowed, and only between digits.
> * Multiple consecutive underscore allowed, but only between digits.
>
> Different Separators
> 
>
> A proposed alternate syntax was to use whitespace for grouping.  Although
> strings are a precedent for combining adjoining literals, the behavior can
> lead
> to unexpected effects which are not possible with underscores.  Also, no
> other
> language is known to use this rule, except for languages that generally
> disregard any whitespace.
>
> C++14 introduces apostrophes for grouping, which is not considered due to
> the
> conflict with Python's string literals. [1]_
>
>
> Behavior in Other Languages
> ===
>
> Those languages that do allow underscore grouping implement a large
> variety of
> rules for allowed placement of underscores.  This is a listing placing the
> known
> rules into three major groups.  In cases where the language spec
> contradicts the
> actual behavior, the actual behavior is listed.
>
> **Group 1: liberal (like this PEP)**
>
> * D [2]_
> * Perl 5 (although docs say it's more restricted) [3]_
> * Rust [4]_
> * Swift (although textual description says "between digits") [5]_
>
> **Group 2: only between digits, multiple consecutive underscores**
>
> * C# (open proposal for 7.0) [6]_
> * Java [7]_
>
> **Group 3: only between digits, only one underscore**
>
> * Ada [8]_
> * Julia (but not in the exponent part of floats) [9]_
> * Ruby (docs say "anywhere", in reality only between digits) [10]_
>
>
> Implementation
> ==
>
> A preliminary patch that implements the specifica

Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Glenn Linderman

On 2/10/2016 2:20 PM, Georg Brandl wrote:

This came up in python-ideas, and has met mostly positive comments,
although the exact syntax rules are up for discussion.

cheers,
Georg



PEP: 515
Title: Underscores in Numeric Literals
Version: $Revision$
Last-Modified: $Date$
Author: Georg Brandl
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 10-Feb-2016
Python-Version: 3.6

Abstract and Rationale
==

This PEP proposes to extend Python's syntax so that underscores can be used in
integral and floating-point number literals.

This is a common feature of other modern languages, and can aid readability of
long literals, or literals whose value should clearly separate into parts, such
as bytes or words in hexadecimal notation.

Examples::

 # grouping decimal numbers by thousands
 amount = 10_000_000.0

 # grouping hexadecimal addresses by words
 addr = 0xDEAD_BEEF

 # grouping bits into bytes in a binary literal
 flags = 0b_0011__0100_1110


+1

You don't mention potential restrictions that decimal numbers should 
permit them only every three places, or hex ones only every 2 or 4, and 
your binary example mentions grouping into bytes, but actually groups 
into nybbles.


But such restrictions would be annoying: if it is useful to the coder to 
use them, that is fine. But different situation may find other 
placements more useful... particularly in binary, as it might want to 
match widths of various bitfields.


Adding that as a rejected consideration, with justifications, would be 
helpful.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Paul Moore
On 10 February 2016 at 22:20, Georg Brandl  wrote:
> This came up in python-ideas, and has met mostly positive comments,
> although the exact syntax rules are up for discussion.

+1 on the PEP. Is there any value in allowing underscores in strings
passed to the Decimal constructor as well? The same sorts of
justifications would seem to apply. It's perfectly arguable that the
change for Decimal would be so rarely used as to not be worth it,
though, so I don't mind either way in practice.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] why we have both re.match and re.string?

2016-02-10 Thread Michel Desmoulin

Hi,

Le 10/02/2016 22:59, Luca Sangiacomo a écrit :

Hi,
I hope the question is not too silly, but why I would like to 
understand the advantages of having both re.match() and re.search(). 
Wouldn't be more clear to have just one function with one additional 
parameters like this:


re.search(regexp, text, from_beginning=True|False) ?


Actually you can just do

re.search(^regexp, text)

But with match you express the intent to match the text with something, 
while with search, you express that you look for something in the text. 
Maybe that was the idea?




In this way we prevent, as written in the documentation, people 
writing ".*" in front of the regexp used with re.match()


Thanks.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/desmoulin.michel%40gmail.com


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Victor Stinner
It looks like the implementation https://bugs.python.org/issue26331
only changes the Python parser.

What about other functions converting strings to numbers at runtime
like int(str) and float(str)? Paul also asked for Decimal(str).

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread MRAB

On 2016-02-10 22:35, Brett Cannon wrote:

[snip]


Examples::

 # grouping decimal numbers by thousands
 amount = 10_000_000.0

 # grouping hexadecimal addresses by words
 addr = 0xDEAD_BEEF

 # grouping bits into bytes in a binary literal
 flags = 0b_0011__0100_1110


I assume all of these examples are possible in either the liberal or
restrictive approaches?


[snip]
Strictly speaking, "0b_0011__0100_1110" wouldn't be valid if an 
underscore was allowed only between digits because the "b" isn't a digit.


Similarly, "0x_FF_FF" wouldn't be valid, but "0xFF_FF" would.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

2016-02-10 Thread eryk sun
On Wed, Feb 10, 2016 at 2:30 PM, Andrew Barnert via Python-Dev
 wrote:
>   [^3]: Say you write a program that assumes it will only be run on Shift-JIS 
> systems, and you use
> CreateFileA to create a file named "ハローワールド". The actual bytes you're sending 
> are cp436
> for "ânâìü[âÅü[âïâh", so the file on the CD is named, in Unicode, 
> "ânâìü[âÅü[âïâh".

Unless the system default was changed or the program called
SetFileApisToOEM, CreateFileA would decode using the ANSI codepage
1252, not the OEM codepage 437 (not 436), i.e.
"ƒnƒ\x8d\x81[ƒ\x8f\x81[ƒ‹ƒh". Otherwise the example is right. But the
transcoding strategy won't work in general. For example, if the tables
are turned such that the ANSI codepage is 932 and the program passes a
bytes name from codepage 1252, the user on the other end won't be able
to transcode without error if the original bytes contained invalid
DBCS sequences that were mapped to the default character, U+30FB. This
transcodes as the meaningless string "\x81E". The user can replace
that string with "--" and enjoy a nice game of hang man.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] why we have both re.match and re.string?

2016-02-10 Thread Steven D'Aprano
On Wed, Feb 10, 2016 at 10:59:18PM +0100, Luca Sangiacomo wrote:
> Hi,
> I hope the question is not too silly, but why I would like to understand 
> the advantages of having both re.match() and re.search(). Wouldn't be 
> more clear to have just one function with one additional parameters like 
> this:
> 
> re.search(regexp, text, from_beginning=True|False) ?

I guess the most important reason now is backwards compatibility. The 
oldest Python I have installed here is version 1.5, and it has the brand 
new "re" module (intended as a replacement for the old "regex" module). 
Both have search() and match() top-level functions. So my guess is that 
you would have to track down the author of the original "regex" module.

But a more general answer is the principle, "Functions shouldn't take 
constant bool arguments". It is an API design principle which (if I 
remember correctly) Guido has stated a number of times. Functions should 
not take a boolean argument which (1) exists only to select between two 
different modes and (2) are nearly always given as a constant.

Do you ever find yourself writing code like this?

if some_calculation():
result = re.match(regex, string)
else:
result = re.search(regex, string)


If you do, that would be a hint that perhaps match() and search() should 
be combined so you can write:

result = re.search(regex, string, some_calculation())


But I expect that you almost never do. I would expect that if we 
combined the two functions into one, we would nearly always call them 
with a constant bool:

# I always forget whether True means match from the start or not, 
# and which is the default...
result = re.search(regex, string, False)

which suggests that search() is actually two different functions, and 
should be split into two, just as we have now.

It's a general principle, not a law of nature, so you may find 
exceptions in the standard library. But if I were designing the re 
module from scratch, I would either keep the two distinct functions, or 
just provide search() and let users use ^ to anchor the search to the 
beginning.


> In this way we prevent, as written in the documentation, people writing 
> ".*" in front of the regexp used with re.match()

I only see one example that does that:

https://docs.python.org/3/library/re.html#checking-for-a-pair

Perhaps it should be changed.


-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Steven D'Aprano
On Wed, Feb 10, 2016 at 10:53:09PM +, Paul Moore wrote:
> On 10 February 2016 at 22:20, Georg Brandl  wrote:
> > This came up in python-ideas, and has met mostly positive comments,
> > although the exact syntax rules are up for discussion.
> 
> +1 on the PEP. Is there any value in allowing underscores in strings
> passed to the Decimal constructor as well? The same sorts of
> justifications would seem to apply. It's perfectly arguable that the
> change for Decimal would be so rarely used as to not be worth it,
> though, so I don't mind either way in practice.

Let's delay making any change to string conversions for now, and that 
includes Decimal. We can also do this:

Decimal("123_456_789.0_12345_67890".replace("_", ""))


for those who absolutely must include underscores in their numeric 
strings. The big win is for numeric literals, not numeric string 
conversions.



-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Andrew Barnert via Python-Dev
On Feb 10, 2016, at 14:20, Georg Brandl  wrote:

First, general questions: should the PEP mention the Decimal constructor? What 
about int and float (I'd assume int(s) continues to work as always, while 
int(s, 0) gets the new behavior, but if that isn't obviously true, it may be 
worth saying explicitly).

> * Trailing underscores are not allowed, because they look confusing and don't
>  contribute much to readability.

Why is "123_456_" so ugly that we have to catch it, when "1___2_345__6" is 
just fine, or "123e__+456"? More to the point, if we really need an extra rule, 
and more complicated BNF, to outlaw this case, I don't think we want a liberal 
design at all.

Also, notice that Swift, Rust, and D all show examples with trailing 
underscores in their references, and they don't look particularly out of place 
with the other examples.

> There appears to be no reason to restrict the use of underscores otherwise.

What other restrictions are there? I think the only place you've left that's 
not between digits is between the e and the sign. A dead-simple rule like 
Swift's seems better than five separate rules that I have to learn and remember 
that make lexing more complicated and that ultimately amount to the 
conservative rule plus one other place I can put underscores where I'd never 
want to.

> **Group 1: liberal (like this PEP)**
> 
> * D [2]_
> * Perl 5 (although docs say it's more restricted) [3]_
> * Rust [4]_
> * Swift (although textual description says "between digits") [5]_

I don't think any of these are liberal like this PEP.

For example, Swift's actual grammar rule allows underscores anywhere but 
leading in the "digits" part of int literals and all three potential digit 
parts of float literals. That's the whole rule. It's more conservative than 
this PEP in not allowing them outside of digit parts (like between E and +), 
more liberal in allowing them to be trailing, but I'm pretty sure the reason 
behind the design wasn't specifically about how liberal or conservative they 
wanted to be, but about being as simple as possible. Rust's rule seems to be 
equivalent to Swift's, except that they forgot to define exponents anywhere. I 
don't think either of them was trying to be more liberal or more conservative; 
rather, they were both trying to be as simple as possible.

D does go out of its way to be as liberal as possible, e.g., allowing things 
like "0x_1_" that the others wouldn't (they'd treat the "_1_" as a digit part, 
which can't have leading underscores), but it's also more conservative than 
this spec in not allowing underscores between e and the sign.

I think Perl is the only language that allows them anywhere but in the digits 
part.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

2016-02-10 Thread Andrew Barnert via Python-Dev
On Feb 10, 2016, at 15:11, eryk sun  wrote:
> 
> On Wed, Feb 10, 2016 at 2:30 PM, Andrew Barnert via Python-Dev
>  wrote:
>>  [^3]: Say you write a program that assumes it will only be run on Shift-JIS 
>> systems, and you use
>> CreateFileA to create a file named "ハローワールド". The actual bytes you're 
>> sending are cp436
>> for "ânâìü[âÅü[âïâh", so the file on the CD is named, in Unicode, 
>> "ânâìü[âÅü[âïâh".
> 
> Unless the system default was changed or the program called
> SetFileApisToOEM, CreateFileA would decode using the ANSI codepage
> 1252, not the OEM codepage 437 (not 436), i.e.
> "ƒnƒ\x8d\x81[ƒ\x8f\x81[ƒ‹ƒh". Otherwise the example is right. But the
> transcoding strategy won't work in general. For example, if the tables
> are turned such that the ANSI codepage is 932 and the program passes a
> bytes name from codepage 1252, the user on the other end won't be able
> to transcode without error if the original bytes contained invalid
> DBCS sequences that were mapped to the default character, U+30FB.
> This
> transcodes as the meaningless string "\x81E". The user can replace
> that string with "--" and enjoy a nice game of hang man.

Of course there's no way to recover the actual intended filenames if that 
information was thrown out instead of being stored, but that's no different 
from the situation where the user mashed the keyboard instead of typing what 
they intended.

The point remains: the Mac strategy (which is also the linux strategy for 
filesystems that are inherently UTF-16) always generates valid UTF-8, and 
doesn't try to magically cure mojibake but doesn't get in the way of the user 
manually curing it. When the Unicode encoding is lossy, of course the user 
can't cure that, but UTF-8 isn't making it any harder.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Steven D'Aprano
On Wed, Feb 10, 2016 at 11:20:38PM +0100, Georg Brandl wrote:
> This came up in python-ideas, and has met mostly positive comments,
> although the exact syntax rules are up for discussion.

Nicely done. But I would change the restrictions to a simpler version. 
Instead of five rules to learn:


> The current proposal is to allow underscores anywhere in numeric literals, 
> with
> these exceptions:
> 
> * Leading underscores cannot be allowed, since they already introduce
>   identifiers.
> * Trailing underscores are not allowed, because they look confusing and don't
>   contribute much to readability.
> * The number base prefixes ``0x``, ``0o``, and ``0b`` cannot be split up,
>   because they are fixed strings and not logically part of the number.
> * No underscore allowed after a sign in an exponent (``1e-_5``), because
>   underscores can also not be used after the signs in front of the number
>   (``-1e5``).
> * No underscore allowed after a decimal point, because this leads to ambiguity
>   with attribute access (the lexer cannot know that there is no number literal
>   in ``foo._5``).


change to a single rule "one or more underscores may appear between two 
(hex)digits, but otherwise nowhere else". That's much simpler to 
understand than a series of restrictions as given above.

That would be your second restrictive rule:

"Multiple consecutive underscore allowed, but only between digits."

That forbids leading and trailing underscores, underscores inside or 
immediately after the leading number base (since x, o and b aren't 
digits), and immediately before or after the sign, decimal point or e|E 
exponent symbol.


> There appears to be no reason to restrict the use of underscores otherwise.

I don't like underscores immediately before the . or e|E in floats 
either: 123_.000_456

The dot is already visually distinctive enough, as is the e|E, and 
placing an underscore immediately before them doesn't aid in grouping 
the digits.



> Instead of the liberal rule specified above, the use of underscores could be
> limited.  Common rules are (see the "other languages" section):
> 
> * Only one consecutive underscore allowed, and only between digits.
> * Multiple consecutive underscore allowed, but only between digits.

I don't think there is any need to restrict it to only a single 
underscore. There are uses for more than one:

Fraction(3__141_592_654, 1_000_000_000)

hints that the 3 is somewhat special (for obvious reasons).



-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Time for a change of random number generator?

2016-02-10 Thread Greg Ewing

The Mersenne Twister is no longer regarded as quite state-of-the art
because it can get into states that produce long sequences that are
not very random.

There is a variation on MT called WELL that has better properties
in this regard. Does anyone think it would be a good idea to replace
MT with WELL as Python's default rng?

https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear

--
Greg

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Ethan Furman

On 02/10/2016 04:04 PM, Steven D'Aprano wrote:


> change to a single rule "one or more underscores may appear between
> two (hex)digits, but otherwise nowhere else". That's much simpler to
> understand than a series of restrictions as given above.

I like the simpler rule, but I would also allow for an underscore 
between the base and the first digit:


0x_1ef9_ab22

is easier (at least, for me ;)
to parse than

0x1ef9_ab22

However, since Georg is doing the work, I'm not going to argue too hard.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Steven D'Aprano
On Wed, Feb 10, 2016 at 03:45:48PM -0800, Andrew Barnert via Python-Dev wrote:
> On Feb 10, 2016, at 14:20, Georg Brandl  wrote:
> 
> First, general questions: should the PEP mention the Decimal constructor? 
> What about int and float (I'd assume int(s) continues to work as always, 
> while int(s, 0) gets the new behavior, but if that isn't obviously true, it 
> may be worth saying explicitly).
> 
> > * Trailing underscores are not allowed, because they look confusing and 
> > don't
> >  contribute much to readability.
> 
> Why is "123_456_" so ugly that we have to catch it, when 
> "1___2_345__6" is just fine, 

It's not just fine, it's ugly as sin, but it shouldn't be a matter for 
the parser to decide a style-issue.

Just as we allow people to write ugly tuples:

t = ( 1,  2,3  ,4,   5,  )

so we should allow people to write ugly ints rather than try to enforce 
good taste in the parser. There are uses for allowing multiple 
underscores, and odd groupings, so rather than a blanket ban, we trust 
that people won't do stupid things.


> or "123e__+456"? 

That I would prohibit. I think that the decimal point and exponent sign 
provide sufficient visual distinctiveness that putting underscores 
around them doesn't gain you anything. In some cases it looks like 
you might have missed a group of digits:

1.234_e-89

hints that perhaps there ought to be more digits after the 4.

I'd be okay with a rule "no underscores in the exponent at all", but I 
don't particularly see the need for it since that's pretty much covered 
by the style guide saying "don't use underscores unnecessarily". For 
floats, exponents have a practical limitation of three digits, so 
there's not much need for grouping them.

+1 on allowing underscores between digits
+0 on prohibiting underscores in the exponent



> More to the point, 
> if we really need an extra rule, and more complicated BNF, to outlaw 
> this case, I don't think we want a liberal design at all.

I think "underscores can occur between any two digits" is pretty 
liberal, since it allows multiple underscores, and allows grouping in 
any size group (including mixed sizes, and stupid sizes like 1).

To me, the opposite of a liberal rule is something like "underscores may 
only occur between groups of three digits".


> Also, notice that Swift, Rust, and D all show examples with trailing 
> underscores in their references, and they don't look particularly out 
> of place with the other examples.

That's a matter of opinion.



-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Martin Panter
I have occasionally wondered about this missing feature.

On 10 February 2016 at 22:20, Georg Brandl  wrote:
> Abstract and Rationale
> ==
>
> This PEP proposes to extend Python's syntax so that underscores can be used in
> integral and floating-point number literals.

This should extend complex or imaginary literals like 10_000j for consistency.

> Specification
> =
>
> * Trailing underscores are not allowed, because they look confusing and don't
>   contribute much to readability.
> * No underscore allowed after a sign in an exponent (``1e-_5``), because
>   underscores can also not be used after the signs in front of the number
>   (``-1e5``).
> [. . .]
>
> The production list for integer literals would therefore look like this::
>
>integer: decimalinteger | octinteger | hexinteger | bininteger
>decimalinteger: nonzerodigit [decimalrest] | "0" [("0" | "_")* "0"]
>nonzerodigit: "1"..."9"
>decimalrest: (digit | "_")* digit
>digit: "0"..."9"
>octinteger: "0" ("o" | "O") (octdigit | "_")* octdigit
>hexinteger: "0" ("x" | "X") (hexdigit | "_")* hexdigit
>bininteger: "0" ("b" | "B") (bindigit | "_")* bindigit
>octdigit: "0"..."7"
>hexdigit: digit | "a"..."f" | "A"..."F"
>bindigit: "0" | "1"
>
> For floating-point literals::
>
>floatnumber: pointfloat | exponentfloat
>pointfloat: [intpart] fraction | intpart "."
>exponentfloat: (intpart | pointfloat) exponent
>intpart: digit (digit | "_")*

This allows trailing underscores such as 1_.2, 1.2_, 1.2_e-5. Your
bullet point above suggests at least some of these are not desired.

>fraction: "." intpart
>exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest]

This allows underscores in the exponent (1e-5_0), contradicting the
other bullet point.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Andrew Barnert via Python-Dev
On Feb 10, 2016, at 16:21, Steven D'Aprano  wrote:
> 
>> On Wed, Feb 10, 2016 at 03:45:48PM -0800, Andrew Barnert via Python-Dev 
>> wrote:
>> On Feb 10, 2016, at 14:20, Georg Brandl  wrote:
>> 
>> First, general questions: should the PEP mention the Decimal constructor? 
>> What about int and float (I'd assume int(s) continues to work as always, 
>> while int(s, 0) gets the new behavior, but if that isn't obviously true, it 
>> may be worth saying explicitly).
>> 
>>> * Trailing underscores are not allowed, because they look confusing and 
>>> don't
>>> contribute much to readability.
>> 
>> Why is "123_456_" so ugly that we have to catch it, when 
>> "1___2_345__6" is just fine,
> 
> It's not just fine, it's ugly as sin, but it shouldn't be a matter for 
> the parser to decide a style-issue.

Exactly. So why should it be any more of a matter for the parser to decide that 
"123_456_" is illegal? Leave that in the style guide, and keep the parser, and 
the reference documentation, as simple as possible.

>> or "123e__+456"?
> 
> That I would prohibit.

The PEP allows that. The simpler rule used by Swift and Rust prohibits it.

>> More to the point, 
>> if we really need an extra rule, and more complicated BNF, to outlaw 
>> this case, I don't think we want a liberal design at all.
> 
> I think "underscores can occur between any two digits" is pretty 
> liberal, since it allows multiple underscores, and allows grouping in 
> any size group (including mixed sizes, and stupid sizes like 1).

The PEP calls that a type-2 conservative proposal, and uses "liberal" to mean 
that underscores can appear in places that aren't between digits. I don't think 
we want that liberalism, especially if it requires 5 rules instead of 1 to get 
it right.

Again, Swift and Rust only allow underscores in the digit part of integers, and 
the up to three digit parts of floats, and the only rule they impose is no 
leading underscore. (In some caass they lead to ambiguity, in others they 
don't, but it's easier to just always ban them.) I don't see anything wrong 
with that rule. The fact that it doesn't allow "1.2e_+3" seems fine. The fact 
that it doesn't prevent "123_" seems fine also. It's not about being as liberal 
as possible, or as restrictive as possible, because those edge cases just don't 
matter, so being as simple as possible seems like an obvious win.

>> Also, notice that Swift, Rust, and D all show examples with trailing 
>> underscores in their references, and they don't look particularly out 
>> of place with the other examples.
> 
> That's a matter of opinion.

Sure, but it's apparently the opinion of the people who designed and/or 
documented this feature in three out of the four languages I looked at (aka 
every language but Perl), not mine.

And honestly, are you really claiming that in your opinion, "123_456_" is worse 
than all of their other examples, like "1_23__4"?

They're both presented as something the syntax allows, and neither one looks 
like something I'd ever want to write, much less promote in a style guide or 
something, but neither one screams out as something that's so heinous we need 
to complicate the language to ensure it raises a SyntaxError. Yes, that's my 
opinion, but do.you really have a different opinion about any part of that?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Georg Brandl
On 02/11/2016 02:16 AM, Martin Panter wrote:
> I have occasionally wondered about this missing feature.
> 
> On 10 February 2016 at 22:20, Georg Brandl  wrote:
>> Abstract and Rationale
>> ==
>>
>> This PEP proposes to extend Python's syntax so that underscores can be used 
>> in
>> integral and floating-point number literals.
> 
> This should extend complex or imaginary literals like 10_000j for consistency.

Yes, that was always the case, but I guess it should be explicit.

>> Specification
>> =
>>
>> * Trailing underscores are not allowed, because they look confusing and don't
>>   contribute much to readability.
>> * No underscore allowed after a sign in an exponent (``1e-_5``), because
>>   underscores can also not be used after the signs in front of the number
>>   (``-1e5``).
>> [. . .]
>>
>> The production list for integer literals would therefore look like this::
>>
>>integer: decimalinteger | octinteger | hexinteger | bininteger
>>decimalinteger: nonzerodigit [decimalrest] | "0" [("0" | "_")* "0"]
>>nonzerodigit: "1"..."9"
>>decimalrest: (digit | "_")* digit
>>digit: "0"..."9"
>>octinteger: "0" ("o" | "O") (octdigit | "_")* octdigit
>>hexinteger: "0" ("x" | "X") (hexdigit | "_")* hexdigit
>>bininteger: "0" ("b" | "B") (bindigit | "_")* bindigit
>>octdigit: "0"..."7"
>>hexdigit: digit | "a"..."f" | "A"..."F"
>>bindigit: "0" | "1"
>>
>> For floating-point literals::
>>
>>floatnumber: pointfloat | exponentfloat
>>pointfloat: [intpart] fraction | intpart "."
>>exponentfloat: (intpart | pointfloat) exponent
>>intpart: digit (digit | "_")*
> 
> This allows trailing underscores such as 1_.2, 1.2_, 1.2_e-5. Your
> bullet point above suggests at least some of these are not desired.

The middle one isn't, indeed.  I updated the grammar accordingly.

>>fraction: "." intpart
>>exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest]
> 
> This allows underscores in the exponent (1e-5_0), contradicting the
> other bullet point.

I clarified the bullet points.  An "immediately" was missing.

Thanks for the feedback!
Georg

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Georg Brandl
On 02/11/2016 12:45 AM, Andrew Barnert via Python-Dev wrote:
> On Feb 10, 2016, at 14:20, Georg Brandl  wrote:
> 
> First, general questions: should the PEP mention the Decimal constructor?
> What about int and float (I'd assume int(s) continues to work as always,
> while int(s, 0) gets the new behavior, but if that isn't obviously true, it
> may be worth saying explicitly).
> 
>> * Trailing underscores are not allowed, because they look confusing and
>> don't contribute much to readability.
> 
> Why is "123_456_" so ugly that we have to catch it, when "1___2_345__6"
> is just fine, or "123e__+456"? More to the point, if we really need an extra
> rule, and more complicated BNF, to outlaw this case, I don't think we want a
> liberal design at all.
> 
> Also, notice that Swift, Rust, and D all show examples with trailing
> underscores in their references, and they don't look particularly out of
> place with the other examples.

That's a point.  I'll look into the implementation.

>> There appears to be no reason to restrict the use of underscores
>> otherwise.
> 
> What other restrictions are there? I think the only place you've left that's
> not between digits is between the e and the sign.

There are other places left:

* between 0x and the digits
* between the digits and "j"
* before and after the decimal point

> A dead-simple rule like
> Swift's seems better than five separate rules that I have to learn and
> remember that make lexing more complicated and that ultimately amount to the
> conservative rule plus one other place I can put underscores where I'd never
> want to.

Not quite, see above.

>> **Group 1: liberal (like this PEP)**
>> 
>> * D [2]_ * Perl 5 (although docs say it's more restricted) [3]_ * Rust
>> [4]_ * Swift (although textual description says "between digits") [5]_
> 
> I don't think any of these are liberal like this PEP.
>
> For example, Swift's actual grammar rule allows underscores anywhere but
> leading in the "digits" part of int literals and all three potential digit
> parts of float literals. That's the whole rule. It's more conservative than
> this PEP in not allowing them outside of digit parts (like between E and +),
> more liberal in allowing them to be trailing, but I'm pretty sure the reason
> behind the design wasn't specifically about how liberal or conservative they
> wanted to be, but about being as simple as possible. Rust's rule seems to be
> equivalent to Swift's, except that they forgot to define exponents anywhere.
> I don't think either of them was trying to be more liberal or more
> conservative; rather, they were both trying to be as simple as possible.

I actually modelled this PEP closely on Rust.  It has restrictions as in this
PEP, except that trailing underscores are allowed, and that "1.0e_+5" is not
allowed (allowed by the PEP), and "1.0e+_5" is (not allowed by the PEP).

I don't think you can argue that it's simpler.

(If the PEP and our lexical reference were as loosely worded as Rust's, one
could probably say it's "simple", too.)

Also, both Swift and Rust don't have the baggage of allowing ".5" style
literals, which makes the grammar simpler in Swift's case.

> D does go out of its way to be as liberal as possible, e.g., allowing things
> like "0x_1_" that the others wouldn't (they'd treat the "_1_" as a digit
> part, which can't have leading underscores), but it's also more conservative
> than this spec in not allowing underscores between e and the sign.
> 
> I think Perl is the only language that allows them anywhere but in the digits
> part.

Thanks for the feedback!
Georg

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-10 Thread Georg Brandl
On 02/10/2016 11:35 PM, Brett Cannon wrote:

>> Examples::
>> 
>> # grouping decimal numbers by thousands
>> amount = 10_000_000.0
>> 
>> # grouping hexadecimal addresses by words
>> addr = 0xDEAD_BEEF
>> 
>> # grouping bits into bytes in a binary literal
>> flags = 0b_0011__0100_1110
>> 
> 
> I assume all of these examples are possible in either the liberal or 
> restrictive
> approaches?

The last one isn't for restrictive -- its first underscore isn't between digits.

>> 
>> Implementation
>> ==
>> 
>> A preliminary patch that implements the specification given above has 
>> been
>> posted to the issue tracker. [11]_
>> 
> 
> Is the implementation made easier or harder if we went with the Group 2 or 3
> approaches? Are there any reasonable examples that the Group 1 approach allows
> that Group 3 doesn't that people have used in other languages?

Group 3 is probably a little more work than group 2, since you have to make sure
only one consecutive underscore is present.  I don't see a point to that.

> I'm +1 on the idea, but which approach I prefer is going to be partially
> dependent on the difficulty of implementing (else I say Group 3 to make it
> easier to explain the rules).

Based on the feedback so far, I have an easier rule in mind that I will base
the next PEP revision on.  It's basically

"One ore more underscores allowed anywhere after a digit or a base specifier."

This preserves my preferred non-restrictive cases (0b__, 1.5_j) and
disallows more controversial versions like "1.5e_+_2".

cheers,
Georg





___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com