[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2012-01-05 Thread Benjamin Peterson
Benjamin Peterson added the comment: I'm just going to close this and say "use 3.3". -- nosy: +benjamin.peterson resolution: -> out of date status: open -> closed ___ Python tracker __

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2011-09-29 Thread Ezio Melotti
Ezio Melotti added the comment: It can still be fixed on 2.7/3.2 though. -- versions: +Python 2.7 ___ Python tracker ___ ___ Python-b

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2011-09-29 Thread STINNER Victor
STINNER Victor added the comment: This issue has been fixed in Python 3.3 thanks to the PEP 393. -- nosy: +haypo ___ Python tracker ___ _

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-27 Thread Terry J. Reedy
Terry J. Reedy added the comment: After reading the additional messages here and on a similar issue Alexander opened after this, I seem the point of wanting to make the difference between the two types of builds as transparent as sensibly possible. From that viewpoint, rejection of composed c

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-27 Thread Ezio Melotti
Ezio Melotti added the comment: I agree that s.center(char, n).encode('utf-8') should be the same on both the builds -- even if their len() will be different -- for the following reasons: 1) the string will eventually be encoded, and if they the result is the same on both builds, it will look

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-26 Thread Eric Smith
Eric Smith added the comment: I think these macros would be a reasonable approach. I think str.center, etc. should support non-BMP chars, because to not do so can raise an exception. Supporting composed graphemes seems like another problem altogether. And while we could fix that, it's clearly

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-26 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Fri, Nov 26, 2010 at 6:37 PM, Terry J. Reedy wrote: > > Terry J. Reedy added the comment: > > As a practical matter, I think that for at least the next decade, people are > at least as likely to > want to fill with a composed, multi-BMP-codepoint 'ch

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-26 Thread Terry J. Reedy
Terry J. Reedy added the comment: As a practical matter, I think that for at least the next decade, people are at least as likely to want to fill with a composed, multi-BMP-codepoint 'char' (grapheme) as with a non-BMP char. So to me, failure with the latter is no worse than failure with the

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-24 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: issue9200 already proposes a similar change to str.is* methods. -- ___ Python tracker ___ ___

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-24 Thread Ezio Melotti
Changes by Ezio Melotti : -- nosy: +amaury.forgeotdarc ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://m

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-24 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: Here is another proof of concept patch for the isalpha issue that introduces a higher level abstraction macro - Py_UNICODE_NEXT. It should be possible to reuse this macro in all isxyz methods and other places where surrogates are currently processed.

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-24 Thread Ezio Melotti
Ezio Melotti added the comment: I think that methods like str.isalpha can and should be fixed. Since _PyUnicode_IsAlpha now accepts a Py_UCS4, the body of unicode_isalpha can be changed to convert normal chars and surrogates pairs to a Py_UCS4 before calling Py_UNICODE_ISALPHA. The attached p

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-24 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: Here is another str method not ready for non-BMP chars: >>> u = '\U00010140' >>> u.translate({ord(u):ord('A')}) '𐅀' (expected 'A') >>> u = 'B' >>> u.translate({ord(u):ord('A')}) 'A' -- ___ Python tracker <

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-24 Thread Alexander Belopolsky
Changes by Alexander Belopolsky : -- Removed message: http://bugs.python.org/msg122313 ___ Python tracker ___ ___ Python-bugs-list mai

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-24 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Wed, Nov 24, 2010 at 3:37 PM, Marc-Andre Lemburg wrote: .. > I don't think we should change that for the formatting methods. That's a reasonable position. What about >>> unicodedata.category('\N{OLD ITALIC LETTER A}') 'Lo' >>> '\N{OLD ITALIC LETTER

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-24 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Wed, Nov 24, 2010 at 3:37 PM, Marc-Andre Lemburg wrote: .. > I don't think we should change that for the formatting methods. That's a reasonable position. What about 'Lo' >>> '\N{OLD ITALIC LETTER A}'.isalpha() False the str.isalpha() method is und

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Alexander Belopolsky wrote: > > New submission from Alexander Belopolsky : > 'xyz'.center(20, '\U00100140') > Traceback (most recent call last): > File "", line 1, in > TypeError: The fill character must be exactly one character long > > str.ljust

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-24 Thread Ezio Melotti
Changes by Ezio Melotti : -- nosy: +ezio.melotti ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.py

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-24 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Wed, Nov 24, 2010 at 10:33 AM, Antoine Pitrou wrote: .. > The question is, what should it do with such an input? I think the rule for such functions should be that if input.encode('utf-8') is the same on wide and narrow builds, then the output.encode(

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-24 Thread Eric Smith
Eric Smith added the comment: str.__format__ and friends (int, float, complex) also have this same problem. For example, when they're computing the "fill" character: >>> format('', 'x^') '' >>> format('', '\U00100140^') Traceback (most recent call last): File "", line 1, in ValueError: Inv

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-24 Thread Antoine Pitrou
Antoine Pitrou added the comment: The question is, what should it do with such an input? Pretend it's a single char (but other chars in the source string won't get the same treatment)? Treat it as a two-char string (but then center() and friends should logically be extended to accept strings

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2010-11-24 Thread Alexander Belopolsky
New submission from Alexander Belopolsky : >>> 'xyz'.center(20, '\U00100140') Traceback (most recent call last): File "", line 1, in TypeError: The fill character must be exactly one character long str.ljust and str.rjust are similarly affected. -- components: Interpreter Core messag