[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

Alexander Belopolsky Wed, 24 Nov 2010 11:09:32 -0800

Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:

On Wed, Nov 24, 2010 at 10:33 AM, Antoine Pitrou <rep...@bugs.python.org> wrote:
..
> The question is, what should it do with such an input?

I think the rule for such functions should be that if
input.encode('utf-8') is the same on wide and narrow builds, then the
output.encode('utf-8') should be the same.

> Pretend it's a single char (but other chars in the source string won't get 
> the same treatment)?

Yes, *and* surrogate pairs in the source string should count for one
char as well.

> Treat it as a two-char string (but then center() and friends should logically 
> be
> extended to accept strings of arbitrary lengths)?

No.  For better or worse, on wide builds these methods effectively
operate on code points.  They don't interpret multi-code-point-
graphemes or take grapheme width into account:

--------------------
123
--------------------

Application code has to ascertain that it is dealing with with fixed
width characters in the target font before using these methods for
text alignment.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10521>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

Reply via email to