[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-04 Thread Eric Smith
Eric Smith added the comment: See the discussion on python-dev, in particular Martin's comment at http://mail.python.org/pipermail/python-dev/2009-December/094412.html The solutions to this seem too complex for 2.x. It is not a problem in 3.x. -- resolution: -> wont fix status: open -

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-03 Thread Eric Smith
Eric Smith added the comment: I've raised the issue with unicode and locale on python-dev: http://mail.python.org/pipermail/python-dev/2009-December/094408.html Pending the outcome of that decision, I'll move forward on this issue. -- ___ Python tra

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-03 Thread Mark Dickinson
Mark Dickinson added the comment: Reassigning to Eric. -- assignee: mark.dickinson -> eric.smith ___ Python tracker ___ ___ Python-bug

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-02 Thread Stefan Krah
Stefan Krah added the comment: Googling "multi-byte thousands separator" gives better results. From those results, it is clear to me that decimal_point and thousands_sep are strings that may be interpreted as multi-byte characters. The Czech separator appears to be a no-break space multi-byte ch

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-02 Thread Eric Smith
Eric Smith added the comment: In trunk, Modules/_localemodule.c also treats these as "string of char", so at least we're consistent. In py3k, mbstowcs is used and the result passed to PyUnicode_FromWideChar. I'm not sure how you'd address this in locale in trunk, or if we want to do something

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-02 Thread Eric Smith
Eric Smith added the comment: I don't see any documentation that a struct lconv should be interpreted as UTF-8. In fact Googling "struct lconv utf-8" gives this bug report as the first hit. lconv.thousands_sep is char*. It's never been clear to me if this means "pointer to a single char", or "p

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-02 Thread Mark Dickinson
Mark Dickinson added the comment: So when the format string has type 'str' (as in Stefan's original example) rather than type 'unicode', I'd say Python is doing the right thing already: everything in sight, including the separators coming from localeconv(), has type 'str', so trying to inter

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-02 Thread Stefan Krah
Stefan Krah added the comment: In python3.2, the output of decimal looks good. With float, the separator is printed as two spaces on my Unicode terminal (export LC_ALL=cs_CZ.UTF-8). So decimal (3.2) interprets the separator string as a single UTF-8 char and the final output is a UTF-8 string. I

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-01 Thread Eric Smith
Eric Smith added the comment: I can duplicate this on Linux. The difference is the values in the locale for the separators, specifically, locale.localeconv()['thousands_sep']. >>> locale.localeconv()['thousands_sep'] '\xc2\xa0' The question is: since a struct lconv contains char*s, how to inte

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-01 Thread R. David Murray
R. David Murray added the comment: Interesting. My regular locale is LC_CTYPE=en_US.UTF-8, and here is what I get: Python 2.7a0 (trunk:76501, Nov 24 2009, 13:59:01) [GCC 4.4.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import local >>> import locale

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-01 Thread Eric Smith
Eric Smith added the comment: In 2.7, I get: $ ./python.exe Python 2.7a0 (trunk:76501, Nov 24 2009, 14:57:21) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> locale.setlocale(locale.LC_NUMERIC, "cs_CZ.U

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-01 Thread R. David Murray
R. David Murray added the comment: In python3: >>> locale.setlocale(locale.LC_NUMERIC, "cs_CZ.UTF-8") 'cs_CZ.UTF-8' >>> s = format(Decimal("-1.5"), ' 019.18n') >>> len(s) 20 >>> print(s) -0 000 000 000 001,5 Python3 uses unicode for strings. Python2 uses bytes. To format unicode in python2,

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-11-30 Thread Stefan Krah
Stefan Krah added the comment: What you mean by "working with bytestrings"? The UTF-8 separators or decimal points come directly from struct lconv (man localeconv). The logical way to reach a minimum width of 19 is to have 19 UTF-8 characters, which can subsequently be converted to other formats

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-11-28 Thread Matthew Barnett
Matthew Barnett added the comment: Surely this is to be expected when working with bytestrings. You should be working in Unicode and using UTF-8 only for input and output. -- nosy: +mrabarnett ___ Python tracker __

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-11-28 Thread Mark Dickinson
Changes by Mark Dickinson : -- assignee: -> mark.dickinson ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-11-15 Thread Stefan Krah
New submission from Stefan Krah : This issue affects the format functions of float and decimal. When calculating the padding necessary to reach the minimum width, UTF-8 separators and decimal points are calculated by their byte lengths. This can lead to printed representations that are too short