Terry J. Reedy <[email protected]> added the comment:
Printing the unquoted escape representation rather than a replacement char is a
bit strange and not what I expect from the python docs. I could see it as a
bug. In any case, on Windows, it is the Python REPL that raises, but only for
sys.stdout.
>>> import sys
>>> print('\ud800', file=sys.stderr)
\ud800
>>> print('\ud800', file=sys.stdout)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position
0: surrogates not allowed
whereas on Windows the surrogate is displayed as a box with diagonal lines ([X]
compressed in one char) in both cases. When copied and pasted into FireFox,
the pasted surrogate shows as a square box containing mini D 8 0 0 chars.
>>> print('\ud800', file=sys.stdout)
�
>>> print('\ud800', file=sys.stderr)
�
I consider putting the undisplayable codepoint, rather than a replacement
character, into the editor buffer (however tcl encodes it) so that IDLE can
retrieve it without loss of information the proper thing for tk to do. IDLE can
then potentially identify the character to the user.
===
An oddity though. With
>>> import tkinter as tk
>>> r = tk.Tk()
>>> t = tk.Text(r)
>>> t.pack()
>>> t.insert('insert', 'a\ud800b')
the box is an empty square, not crossed. But when I copy-paste 'a�b' into the
font sample (Serhiy, making this editable was a great idea), it is crossed for
every font I tried, even for Courier, which is what is being used in text t.
----------
stage: -> needs patch
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue22742>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com