[Python-Dev] Unicode locale values in 2.7

2009-12-03 Thread Eric Smith

While researching http://bugs.python.org/issue7327, I've come to the
conclusion that trunk handles locales incorrectly in regards to Unicode.
Fixing this would be the first step toward resolving this issue with 
float and Decimal locale-aware formatting.


The issue concerns the locale "cs_CZ.UTF-8", and the "thousands_sep"
value (among others). The C struct lconv (in Linux) contains '\xc2\xa0'
for thousands_sep. In py3k this is handled by calling mbstowcs (which is
locale-aware) and then PyUnicode_FromWideChar, so the value is converted
to u"\xa0" (non-breaking space).

But in trunk, the value is just used as-is. So when formating a decimal,
for example, '\xc2\xa0' is just inserted into the result, such as:

format(Decimal('1000'), 'n')

'1\xc2\xa'
This doesn't make much sense, and causes an error when converting it to
unicode:

format(Decimal('1000'), u'n')

Traceback (most recent call last):
  File "", line 1, in 
  File "/root/python/trunk/Lib/decimal.py", line 3609, in __format__
return _format_number(self._sign, intpart, fracpart, exp, spec)
  File "/root/python/trunk/Lib/decimal.py", line 5704, in _format_number
return _format_align(sign, intpart+fracpart, spec)
  File "/root/python/trunk/Lib/decimal.py", line 5595, in _format_align
result = unicode(result)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1:
ordinal not in range(128)

I believe that the correct solution is to do what py3k does in locale,
which is to convert the struct lconv values to unicode. But since this
would be a disruptive change if universally applied, I'd like to propose
that we only convert to unicode if the values won't fit into a str.

So the algorithm would be something like:
1. call mbstowcs
2. if every value in the result is in the range [32, 126], return a str
3. otherwise, return a unicode

This would mean that for most locales, the current behavior in trunk
wouldn't change: the locale.localeconv() values would continue to be
str. Only for those locales where the values wouldn't fit into a str
would unicode be returned.

Does this seem like an acceptable change?

Eric.

PS: Thanks to Mark Dickinson and others on irc and on the issue for
helping in formulating this.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode locale values in 2.7

2009-12-03 Thread Antoine Pitrou
Eric Smith  trueblade.com> writes:
> 
> But in trunk, the value is just used as-is. So when formating a decimal,
> for example, '\xc2\xa0' is just inserted into the result, such as:
> >>> format(Decimal('1000'), 'n')
> '1\xc2\xa'
> This doesn't make much sense,

Why doesn't it make sense? It's normal UTF-8.
The same thing happens when the monetary sign is non-ASCII, see
Lib/test/test_locale.py for an example.

> I believe that the correct solution is to do what py3k does in locale,
> which is to convert the struct lconv values to unicode. But since this
> would be a disruptive change if universally applied, I'd like to propose
> that we only convert to unicode if the values won't fit into a str.

This would still be disruptive, because some programs may rely on these values
being bytestrings in the current locale encoding.

I'd say don't try to fix this, and encourage people to use py3k if they really
want safe unicode+locale. Proper unicode behaviour is one of py3k's main
features after all.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode locale values in 2.7

2009-12-03 Thread Mark Dickinson
On Thu, Dec 3, 2009 at 11:33 AM, Antoine Pitrou  wrote:
> Eric Smith  trueblade.com> writes:
>>
>> But in trunk, the value is just used as-is. So when formating a decimal,
>> for example, '\xc2\xa0' is just inserted into the result, such as:
>> >>> format(Decimal('1000'), 'n')
>> '1\xc2\xa'
>> This doesn't make much sense,
>
> Why doesn't it make sense? It's normal UTF-8.
> The same thing happens when the monetary sign is non-ASCII, see
> Lib/test/test_locale.py for an example.

Well, one problem is that it messes up character counts.  Suppose
you're aware that the thousands separator might be a single multibyte
character, and you want to produce a unicode result that's zero-padded
to a width of 6.  There's currently no sensible way of doing this that
I can see:

format(Decimal('1000'), '06n').decode('utf-8') gives a string of length 5

format(Decimal('1000'), u'06n') fails with a UnicodeDecodeError.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode locale values in 2.7

2009-12-03 Thread Antoine Pitrou

> Well, one problem is that it messes up character counts.

Well, I know it does. That's why py3k is inherently better than 2.x's
bytestrings-by-default behaviour. There's a reason we don't try to
backport py3k's unicode goodness to 2.x, and that's it would be terribly
messy to do so while retaining some measure of backwards compatibility.

(By the way, I would mention that relying on locale to get number
formatting right regardless of the actual user is optimistic, borderline
foolish ;-))

cheers

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode locale values in 2.7

2009-12-03 Thread Martin v. Löwis
> But in trunk, the value is just used as-is. So when formating a decimal,
> for example, '\xc2\xa0' is just inserted into the result, such as:
 format(Decimal('1000'), 'n')
> '1\xc2\xa'
> This doesn't make much sense

I agree with Antoine: it makes sense, and is the correct answer, given
the locale definition.

Now, I think that the locale definition is flawed - it's *not* a
property of the Czech language or culture that the "no-break space"
character is the thousands-separator. If anything other than the regular
space should be the thousands separator, it should be "thin space", and
it should be used in all locales on a system that currently use space.
Having it just in the Czech locale is a misconfiguration, IMO.

But if we accept the system's locale definition, then the above is
certainly the right answer.

> and causes an error when converting it to
> unicode:
 format(Decimal('1000'), u'n')

You'll need to decode in the locale's encoding, then it would
work. Unfortunately, that is difficult to achieve.

> I believe that the correct solution is to do what py3k does in locale,
> which is to convert the struct lconv values to unicode. But since this
> would be a disruptive change if universally applied, I'd like to propose
> that we only convert to unicode if the values won't fit into a str.

I think Guido is on record for objecting to that kind of API strongly.

> So the algorithm would be something like:
> 1. call mbstowcs
> 2. if every value in the result is in the range [32, 126], return a str
> 3. otherwise, return a unicode

Not sure what API you are describing here - the algorithm for doing
what?

> This would mean that for most locales, the current behavior in trunk
> wouldn't change: the locale.localeconv() values would continue to be
> str. Only for those locales where the values wouldn't fit into a str
> would unicode be returned.
> 
> Does this seem like an acceptable change?

Definitely not. This will be just for 2.7, and I see no point in
producing such an incompatibility. Applications may already perform
the conversion themselves, and that would break under such a change.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wpython is back

2009-12-03 Thread Cesare Di Mauro
2009/11/27 Christian Heimes 

> Cesare Di Mauro wrote:
> >
> > You'll find some at page 28
> > here<
> http://wpython.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf
> >
> > ..
> >
> > Mart made more interesting
> > ones >with
> > Unladen benchmarks.
>
> The PDF document sounded interesting and I was tempted to test WPython.
> Unfortunately it doesn't compile on my box:
>
> $ make
> gcc -pthread -c -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall
> -Wstrict-prototypes  -I. -IInclude -I./Include   -DPy_BUILD_CORE -o
> Python/ast.o Python/ast.c
>
>
> Python/ast.c:30: warning: ‘enum _expr_const’ declared inside parameter
> list
> Python/ast.c:30: warning: its scope is only this definition or
> declaration, which is probably not what you want
>
> Python/ast.c:335: warning: ‘enum _expr_const’ declared inside parameter
> list
> Python/ast.c:335: error: parameter 2 (‘constant’) has incomplete type
>
> Python/ast.c: In function ‘Const’:
>
> Python/ast.c:341: error: ‘Const_kind’ undeclared (first use in this
> function)
>
> Python/ast.c:341: error: (Each undeclared identifier is reported only
> once
> Python/ast.c:341: error: for each function it appears in.)
>
> Python/ast.c:342: error: ‘union ’ has no member named ‘Const’
>
> Python/ast.c:343: error: ‘union ’ has no member named ‘Const’
>
> Python/ast.c: In function ‘set_context’:
>
> Python/ast.c:457: error: ‘Const_kind’ undeclared (first use in this
> function)
>
> Python/ast.c: At top level:
>
> Python/ast.c:591: warning: ‘enum _expr_const’ declared inside parameter
> list
> Python/ast.c:590: error: conflicting types for ‘seq_for_testlist’
>
> Python/ast.c:29: note: previous declaration of ‘seq_for_testlist’ was here
> [...]
>
> $ gcc --version
> gcc (Ubuntu 4.4.1-4ubuntu8) 4.4.1
> $ uname -a
> Linux hamiller 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:05:01 UTC
> 2009 x86_64 GNU/Linux
>
>
I have created a new project at Google Code:
http://code.google.com/p/wpython2/ using Mercurial for the repository.

The master (Python 2.6.4) code is located into the default repository:
https://wpython2.googlecode.com/hg/

The wpython (version 1.0) clone is in:
https://wpython10.wpython2.googlecode.com/hg/

Sources are available in:
http://code.google.com/p/wpython2/downloads/list

wpython 1.0 is an almost complete replacement for Python 2.6.4 (except for
Doc/library.dis.rst, which I'll update later, when I stop adding or changing
opcodes).

I have changed the ASDL grammar (in Parser/Python.asdl) so that there's no
need to overwrite Include/Python-ast.h, and I've added full support for
constants to the AST code (I left Num_kind and Str_kind untouched right now,
but I plan to remove them in the next release, since Const_kind is able to
hold any kind of constant object).

Now you shouldn't have problems compiling it.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Troubled by changes to PyPI usage agreement

2009-12-03 Thread Ben Finney
Howdy all,

In the Subversion repository for PyPI, this revision appeared:

=
$ bzr info .
Repository checkout (format: 2a)
Location:
  repository checkout root: .
checkout of branch: https://svn.python.org/packages/trunk/pypi
 shared repository: /home/bignose/Projects/python/pypi

$ bzr log --revision 655

revno: 655
svn revno: 690 (on /trunk/pypi)
committer: martin.von.loewis
timestamp: Sun 2009-11-29 17:15:12 +
message:
  Update usage agreement according to Van Lindberg's instructions.
=

The revision modifies various files for the web interface, changing the
wording of an agreement and requiring an “I agree” checkbox to be
checked.

=
$ bzr diff --change 655 | diffstat
 templates/openid_return.pt |   28 +++-
 templates/register.pt  |   32 +++-
 webui.py   |6 ++
 3 files changed, 48 insertions(+), 18 deletions(-)
=

The new wording is one that I can't agree to:

=
[…]
+ Content is restricted to Python packages and related 
information only.
+ Any content uploaded to PyPI is provided on a 
non-confidential basis.
+ The PSF is free to use or disseminate any content that I 
upload on an 
+   unrestricted basis for any purpose. In particular, the PSF and 
all other 
+   users of the web site are granted an irrevocable, worldwide, 
royalty-free, 
+   nonexclusive license to reproduce, distribute, transmit, 
display, perform, 
+   and publish the content, including in digital form.
+ I represent and warrant that I have complied with all 
government 
+   regulations […]
=

The content that I submit to PyPI is licensed under specific license
terms. That certainly does *not* allow the PSF to “use or disseminate
any content that I upload on an unrestricted basis for any purpose”,
etc.; it allows only those acts permitted by the license terms granted
in the work.

I have already registered an account at PyPI, and never agreed to this
wording. (The previous wording was much less broad and unobjectionable.)
I would not have noticed it changing if I had not been investigating the
PyPI website source code.

Will the PSF claim I am bound by it anyway? What about future changes?

Why was this wording chosen? How does the PSF propose to reconcile this
with copyright holders's chosen license terms for their work?

-- 
 \   “Timid men prefer the calm of despotism to the boisterous sea |
  `\of liberty.” —Thomas Jefferson |
_o__)  |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Troubled by changes to PyPI usage agreement

2009-12-03 Thread Benjamin Peterson
2009/12/3 Ben Finney :
> Howdy all,

Hi Ben,
Could I ask why you cced this to python-dev, too? I thought the last
string of pypi related emails, we agreed the correct place for this
was the catalog-sig.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Troubled by changes to PyPI usage agreement

2009-12-03 Thread Ben Finney
On 03-Dec-2009, Benjamin Peterson wrote:
> Hi Ben,
> Could I ask why you cced this to python-dev, too? I thought the last
> string of pypi related emails, we agreed the correct place for this
> was the catalog-sig.

I did consider that. But it seems this change is being asserted by the
PSF. At the least, it seems to need clarification by Python insiders
who may not be reading the ‘catalog-sig’ forum.

Sorry for not making the reason for the cross-post clearer.

-- 
 \  “I moved into an all-electric house. I forgot and left the |
  `\   porch light on all day. When I got home the front door wouldn't |
_o__)open.” —Steven Wright |
Ben Finney 


signature.asc
Description: Digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com