[Python-Dev] subprocess.Popen and win32
Hi, I just joined python-dev because I found the need to add some code to paper over python3's subprocess API, and I'm wondering whether I'm missing something. On python2 and python3, the (only?) way to get utf-8 arguments to subprocess was to ensure that all unicode strings are encoded into bytes before subprocess sees them. This has worked for a long time (currently compatible across python2 and 3). On python3, this still works for normal platforms, but on windows we can't pass a list of byte strings. We have to pass a list of unicode strings. This means that the application code ends up needing to do this: https://github.com/git-cola/git-cola/commit/1109aeb4354c49931d9b0435d2b7cfdc2d5d6966 basically, def start_command(cmd): if sys.platform == 'win32': # Python on windows always goes through list2cmdline() internally inside # of subprocess.py so we must provide unicode strings here otherwise # Python3 breaks when bytes are provided. cmd = [decode(c) for c in cmd] else: cmd = [encode(c) for c in cmd] return subprocess.Popen(cmd) That seems broken to me, so I wonder if this is a bug in the way python3 is handling Popen with list-of-bytestring on win32? I'm not a windows user, but I was able to install python3 under wine and the same traceback happens without the paper bag fix. This is what the traceback looks like; it dies in list2cmdline (which I am not calling directly, Popen does it under the covers): File "E:\Program Files (E)\git-cola\share\git-cola\lib\cola\core.py", line 109, in start_command universal_newlines=universal_newlines) File "C:\Python32\lib\subprocess.py", line 744, in __init__ restore_signals, start_new_session) File "C:\Python32\lib\subprocess.py", line 936, in _execute_child args = list2cmdline(args) File "C:\Python32\lib\subprocess.py", line 564, in list2cmdline needquote = (" " in arg) or ("\t" in arg) or not arg TypeError: Type str doesn't support the buffer API This is an issue for folks that use python to write cross-platform code. The unix code paths expect list-of-bytes, but win32 only expects list-of-unicode, which pushes the burden onto the application programmer. It's my opinion that the win32 code path on python3 is the odd man out. If it allowed list-of-bytes like python2/win32 and python2+3/unix then this wouldn't be an issue. Is this an actual problem, or is it something that should be handled by application-level code as I've done? Thanks, -- David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] subprocess.Popen and win32
On Sun, Apr 20, 2014 at 2:42 AM, Antoine Pitrou wrote: > On Sat, 19 Apr 2014 19:02:42 -0700 > David Aguilar wrote: >> >> On python3, this still works for normal platforms, but on windows we >> can't pass a list of byte strings. We have to pass a list of unicode >> strings. > > Windows native APIs are unicode-based. It is actually necessary to pass > *unicode* strings, not byte strings, if you want your code to be > correct in the face of non-ASCII characters. > > Under other platforms, when unicode strings are passed, Python will > encode them using the platform's detected encoding. So, unless your > platform is somehow misconfigured, passing unicode strings will also > work correctly there. > > (note this is under Python 3) Curious.. what if I don't want the default encoding? On UNIX, I can control what encoding is used because I can encoding unicode strings into bytes and the programmer is in full control. I was mainly surprised that this is valid code on unix, but not windows, and which seems like a portability concern. If I use unicode strings that means I'm beholden to the default encoding. I do agree that utf-8 (python3) is the only sane encoding (for filesystems, etc) which is why it's just a curious question, and for my use case, the default encoding on python3 (utf-8) is good enough. The projects I work on (including at work, where there is a huge python2 code base) are python2+3 compatible by necessity, so it seems like the best solution would be to check the python version, rather than the platform, before deciding whether or not to encode or decode inputs before calling into subprocess. That works for me :-) Thanks for the explanation. ciao, -- David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com