[Python-Dev] Fwd: Accepting PEP 440: Version Identification and Dependency Specification

2014-08-26 Thread Nick Coghlan
Antoine pointed out that it would still be a good idea to forward
packaging PEP acceptance announcements to python-dev, even when the
actual acceptance happens on distutils-sig.

That makes sense to me, so here's last week's notice of the acceptance
of PEP 440, the implementation independent versioning standard derived
from pkg_resources, PEP 386, and ideas from both Linux distributions
and other open source language communities.

Regards,
Nick.

-- Forwarded message --
From: Nick Coghlan 
Date: 22 August 2014 22:34
Subject: Accepting PEP 440: Version Identification and Dependency Specification
To: DistUtils mailing list 


I just pushed Donald's final round of edits in response to the
feedback on the last PEP 440 thread, and as such I'm happy to announce
that I am accepting PEP 440 as the recommended approach to identifying
versions and specifying dependencies when distributing Python
software.

The PEP is available in the usual place at
http://www.python.org/dev/peps/pep-0440/

It's been a long road to get to an implementation independent
versioning standard that has a feasible migration path from the
current pkg_resources defined de facto standard, and I'd like to thank
a few folks:

* Donald Stufft for his extensive work on PEP 440 itself, especially
the proof of concept integration into pip
* Vinay Sajip for his efforts in validating earlier versions of the PEP
* Tarek Ziadé for starting us down the road to an implementation
independent versioning standard with the initial creation of PEP 386
back in June 2009, more than five years ago!

Regards,
Nick.

--
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia


-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-26 Thread Martin v. Löwis
Am 24.08.14 03:11, schrieb Greg Ewing:
> Isaac Morland wrote:
>> In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF
>> (byte order mark) is used:
>>
>> http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration
>>
>> Not sure about XML.
> 
> According to Appendix F here:
> 
> http://www.w3.org/TR/xml/#sec-guessing
> 
> an XML parser needs to be prepared to try all the encodings it
> supports until it finds one that works well enough to decode
> the XML declaration, then it can find out the exact encoding
> used.

That's not what this section says. Instead, it says that
you need to auto-detect UCS-4, UTF-16, UTF-8 from the BOM,
or guess them or EBCDIC from the encoding of 'https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path related questions for Guido

2014-08-26 Thread MRAB

On 2014-08-26 03:11, Stephen J. Turnbull wrote:

Nick Coghlan writes:

  > "purge_surrogate_escapes" was the other term that occurred to me.

"purge" suggests removal, not replacement.  That may be useful too.

neutralize_surrogate_escapes(s, remove=False, replacement='\uFFFD')


How about:

replace_surrogate_escapes(s, replacement='\uFFFD')

If you want them removed, just pass an empty string as the replacement.


maybe?  (Of course the remove argument is feature creep, so I'm only
about +0.5 myself.  And the name is long, but I can't think of any
better synonyms for "make safe" in English right now).

  > Either way, my use case is to filter them out when I *don't* want to
  > pass them along to other software, but would prefer the Unicode
  > replacement character to the ASCII question mark created by using the
  > "replace" filter when encoding.

I think it would be preferable to be unicodely correct here by
default, since this is a str -> str function.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-26 Thread R. David Murray
On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan  wrote:
> As some examples of where bilingual computing breaks down:
> 
> * My NFS client and server may have different locale settings
> * My FTP client and server may have different locale settings
> * My SSH client and server may have different locale settings
> * I save a file locally and send it to someone with a different locale setting
> * I attempt to access a Windows share from a Linux client (or vice-versa)
> * I clone my POSIX hosted git or Mercurial repository on a Windows client
> * I have to connect my Linux client to a Windows Active Directory
> domain (or vice-versa)
> * I have to interoperate between native code and JVM code
> 
> The entire computing industry is currently struggling with this
> monolingual (ASCII/Extended ASCII/EBCDIC/etc) -> bilingual (locale
> encoding/code pages) -> multilingual (Unicode) transition. It's been
> going on for decades, and it's still going to be quite some time
> before we're done.
> 
> The POSIX world is slowly clawing its way towards a multilingual model
> that actually works: UTF-8
> Windows (including the CLR) and the JVM adopted a different
> multilingual model, but still one that actually works: UTF-16-LE

This kind of puts the "length" of the python2->python3 transition
period in perspective, doesn't it?

--David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Windows Unicode console support [Was: Bytes path support]

2014-08-26 Thread Paul Moore
On 24 August 2014 04:27, Nick Coghlan  wrote:
> One of those areas is the fact that we still use the old 8-bit APIs to
> interact with the Windows console. Those are just as broken in a
> multilingual world as the other Windows 8-bit APIs, so Drekin came up
> with a project to expose the Windows console as a UTF-16-LE stream
> that uses the 16-bit APIs instead:
> https://pypi.python.org/pypi/win_unicode_console
>
> I personally hope we'll be able to get the issues Drekin references
> there resolved for Python 3.5 - if other folks hope for the same
> thing, then one of the best ways to help that happen is to try out the
> win_unicode_console module and provide feedback on what does and
> doesn't work.

This looks very cool, and I plan on giving it a try. But I don't see
any issues mentioned there (unless you mean the fact that it's not
possible to hook into Python's interactive interpreter directly, but I
don't see how that could be fixed in an external module). There's no
open issues on the project's github tracker.

I'd love to see this go into 3.5, so any more specific suggestions as
to what would be needed to move it forwards would be great.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-26 Thread Terry Reedy

On 8/26/2014 9:11 AM, R. David Murray wrote:

On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan  wrote:

As some examples of where bilingual computing breaks down:

* My NFS client and server may have different locale settings
* My FTP client and server may have different locale settings
* My SSH client and server may have different locale settings
* I save a file locally and send it to someone with a different locale setting
* I attempt to access a Windows share from a Linux client (or vice-versa)
* I clone my POSIX hosted git or Mercurial repository on a Windows client
* I have to connect my Linux client to a Windows Active Directory
domain (or vice-versa)
* I have to interoperate between native code and JVM code

The entire computing industry is currently struggling with this
monolingual (ASCII/Extended ASCII/EBCDIC/etc) -> bilingual (locale
encoding/code pages) -> multilingual (Unicode) transition. It's been
going on for decades, and it's still going to be quite some time
before we're done.

The POSIX world is slowly clawing its way towards a multilingual model
that actually works: UTF-8
Windows (including the CLR) and the JVM adopted a different
multilingual model, but still one that actually works: UTF-16-LE


Nick, I think the first half of your post is one of the clearest 
expositions yet of 'why Python 3' (in particular, the str to unicode 
change).  It is worthy of wider distribution and without much change, it 
would be a great blog post.



This kind of puts the "length" of the python2->python3 transition
period in perspective, doesn't it?


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-26 Thread Nick Coghlan
On 27 Aug 2014 02:52, "Terry Reedy"  wrote:
>
> On 8/26/2014 9:11 AM, R. David Murray wrote:
>>
>> On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan 
wrote:
>>>
>>> As some examples of where bilingual computing breaks down:
>>>
>>> * My NFS client and server may have different locale settings
>>> * My FTP client and server may have different locale settings
>>> * My SSH client and server may have different locale settings
>>> * I save a file locally and send it to someone with a different locale
setting
>>> * I attempt to access a Windows share from a Linux client (or
vice-versa)
>>> * I clone my POSIX hosted git or Mercurial repository on a Windows
client
>>> * I have to connect my Linux client to a Windows Active Directory
>>> domain (or vice-versa)
>>> * I have to interoperate between native code and JVM code
>>>
>>> The entire computing industry is currently struggling with this
>>> monolingual (ASCII/Extended ASCII/EBCDIC/etc) -> bilingual (locale
>>> encoding/code pages) -> multilingual (Unicode) transition. It's been
>>> going on for decades, and it's still going to be quite some time
>>> before we're done.
>>>
>>> The POSIX world is slowly clawing its way towards a multilingual model
>>> that actually works: UTF-8
>>> Windows (including the CLR) and the JVM adopted a different
>>> multilingual model, but still one that actually works: UTF-16-LE
>
>
> Nick, I think the first half of your post is one of the clearest
expositions yet of 'why Python 3' (in particular, the str to unicode
change).  It is worthy of wider distribution and without much change, it
would be a great blog post.

Indeed, I had the same idea - I had been assuming users already understood
this context, which is almost certainly an invalid assumption.

The blog post version is already mostly written, but I ran out of weekend.
Will hopefully finish it up and post it some time in the next few days :)

>> This kind of puts the "length" of the python2->python3 transition
>> period in perspective, doesn't it?

I realised in writing the post that ASCII is over 50 years old at this
point, while Unicode as an official standard is more than 20. By the time
this is done, we'll likely be talking 30+ years for Unicode to displace the
confusing mess that is code pages and locale encodings :)

Cheers,
Nick.

>
>
> --
> Terry Jan Reedy
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-26 Thread Nikolaus Rath
Nick Coghlan  writes:
 As some examples of where bilingual computing breaks down:

 * My NFS client and server may have different locale settings
 * My FTP client and server may have different locale settings
 * My SSH client and server may have different locale settings
 * I save a file locally and send it to someone with a different locale
> setting
 * I attempt to access a Windows share from a Linux client (or
> vice-versa)
 * I clone my POSIX hosted git or Mercurial repository on a Windows
> client
 * I have to connect my Linux client to a Windows Active Directory
 domain (or vice-versa)
 * I have to interoperate between native code and JVM code

 The entire computing industry is currently struggling with this
 monolingual (ASCII/Extended ASCII/EBCDIC/etc) -> bilingual (locale
 encoding/code pages) -> multilingual (Unicode) transition. It's been
 going on for decades, and it's still going to be quite some time
 before we're done.

 The POSIX world is slowly clawing its way towards a multilingual model
 that actually works: UTF-8
 Windows (including the CLR) and the JVM adopted a different
 multilingual model, but still one that actually works: UTF-16-LE
>>
>>
>> Nick, I think the first half of your post is one of the clearest
> expositions yet of 'why Python 3' (in particular, the str to unicode
> change).  It is worthy of wider distribution and without much change, it
> would be a great blog post.
>
> Indeed, I had the same idea - I had been assuming users already understood
> this context, which is almost certainly an invalid assumption.
>
> The blog post version is already mostly written, but I ran out of weekend.
> Will hopefully finish it up and post it some time in the next few days
> :)

In that case, maybe it'd be nice to also explain why you use the term
"bilingual" for codepage based encoding. At least to me, a
codepage/locale is pretty monolingual, or alternatively covering a whole
region (e.g. western europe). I figure with bilingual you mean ascii +
something, but that's mostly a guess from my side.


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-26 Thread Stephen J. Turnbull
Nikolaus Rath writes:

 > In that case, maybe it'd be nice to also explain why you use the
 > term "bilingual" for codepage based encoding.

Modern computing systems are written in languages which are invariably
based on syntax expressed using ASCII, and provide by default
functionality for expressing dates etc suitable for rendering American
English.  Thus ASCII (ie, American English) is always an available
language.  Code pages provide facilities for rendering one or more
languages languages sharing a common coded character set, but are
unsuitable for rendering most of the rest of the world's dozens of
language groups (grouping languages by common character set).

Multilingual has come to mean "able to express (almost) any set of
languages in a single text" (see, for example, Emacs's "HELLO" file),
not just "more than two".  So code pages are closer in spirit to
"bilingual" (two of many) than to "multilingual" (all of many).

It's messy, analogical terminology.  But then, natural language is
messy and analogical.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com