On Sat, Mar 12, 2016 at 12:46 AM, boB Stepp <robertvst...@gmail.com> wrote: > I did with the non-printing control character, but not with '\u25ba' ! > So I had to go through some contortions after some research to get my > Win7 cmd.exe and PowerShell to display the desired prompt using
The console is hosted by another process named conhost.exe. When Python is running in the foreground, the cmd shell is just waiting in the background until Python exits. > '\u25ba' as the character with utf-8 encoding. My new > pythonstartup.py file (Which PYTHONSTARTUP now points to) follows: > > #!/usr/bin/env python3 > > import os > import sys > > os.system('chcp 65001') # cmd.exe and PowerShell require the code > page to be changed. > sys.ps1 = '\u25ba ' # I remembered to add the additional space. chcp.com calls SetConsoleCP (to change the input codepage) and SetConsoleOutputCP. You can call these functions via ctypes if you need to separately modify the input or output codepages. For example: >>> kernel32 = ctypes.WinDLL('kernel32', use_last_error=True) >>> sys.ps1 = '\u25ba ' Γû║ kernel32.SetConsoleOutputCP(65001) 1 ► UTF-8 in the console is generally buggy, but this example works since the sys.ps1 prompt is written without buffering (discussed below). However, Python still thinks the console is using the initial codepage. To print non-ASCII characters, you'll either have to change the codepage before starting Python or rebind sys.stdout and sys.stderr. ► print(''.join(chr(x) for x in range(240, 256))) ���������������� Let's try to fix this: ► fd = os.dup(1) ► sys.stdout = open(fd, 'w', encoding='utf-8') ► print(''.join(chr(x) for x in range(240, 256))) ðñòóôõö÷øùúûüýþÿ ùúûüýþÿ �þÿ ► The above buggy output is in Windows 7. Codepage 65001 was only ever meant for encoding text to and from files and sockets, via WinAPI WideCharToMultiByte and MultiByteToWideChar. Using it in the console is buggy because the console's ANSI API hard codes a lot of assumptions that fall apart with UTF-8. For example, if you try to paste non-ASCII characters into the console using 65001 as the input codepage, Python will quit as if you had entered Ctrl+Z. Also, in Windows 7 when you print non-ASCII characters you'll get a trail of garbage written to the end of the print in proportion to the number of non-ASCII characters, especially with character codes that take more than 2 UTF-8 bytes. Also, with Python 2, the CRT's FILE buffering can split a UTF-8 sequence across two writes. The split UTF-8 sequence gets printed as 2 to 4 replacement characters. I've discussed these problems in more detail in the following issue: http://bugs.python.org/issue26345 Windows 10 fixes the problem with printing extra characters, but it still has the problem with reading non-ASCII input as UTF-8 and still can't handle a buffered writer that splits a UTF-8 sequence across two writes. Maybe Windows 20 will finally get this right. For the time being, programs that use Unicode in the console should use the wide-character (UTF-16) API. Python doesn't support this out of the box, since it's not designed to handle UTF-16 in the raw I/O layer. The win-unicode-console package add this support. https://pypi.python.org/pypi/win_unicode_console _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor