[Python-Dev] subprocess.Popen and win32

2014-04-19 Thread David Aguilar
Hi,

I just joined python-dev because I found the need to add some code to
paper over python3's subprocess API, and I'm wondering whether I'm
missing something.

On python2 and python3, the (only?) way to get utf-8 arguments to
subprocess was to ensure that all unicode strings are encoded into
bytes before subprocess sees them. This has worked for a long time
(currently compatible across python2 and 3).

On python3, this still works for normal platforms, but on windows we
can't pass a list of byte strings. We have to pass a list of unicode
strings.

This means that the application code ends up needing to do this:
https://github.com/git-cola/git-cola/commit/1109aeb4354c49931d9b0435d2b7cfdc2d5d6966

basically,

def start_command(cmd):
if sys.platform == 'win32':
# Python on windows always goes through list2cmdline() internally inside
# of subprocess.py so we must provide unicode strings here otherwise
# Python3 breaks when bytes are provided.
cmd = [decode(c) for c in cmd]
else:
cmd = [encode(c) for c in cmd]
 return subprocess.Popen(cmd)

That seems broken to me, so I wonder if this is a bug in the way
python3 is handling Popen with list-of-bytestring on win32?

I'm not a windows user, but I was able to install python3 under wine
and the same traceback happens without the paper bag fix. This is what
the traceback looks like; it dies in list2cmdline (which I am not
calling directly, Popen does it under the covers):

File "E:\Program Files
(E)\git-cola\share\git-cola\lib\cola\core.py", line 109, in
start_command
universal_newlines=universal_newlines)
File "C:\Python32\lib\subprocess.py", line 744, in __init__
restore_signals, start_new_session)
File "C:\Python32\lib\subprocess.py", line 936, in _execute_child
args = list2cmdline(args)
File "C:\Python32\lib\subprocess.py", line 564, in list2cmdline
needquote = (" " in arg) or ("\t" in arg) or not arg
TypeError: Type str doesn't support the buffer API

This is an issue for folks that use python to write cross-platform
code. The unix code paths expect list-of-bytes, but win32 only expects
list-of-unicode, which pushes the burden onto the application
programmer.

It's my opinion that the win32 code path on python3 is the odd man
out. If it allowed list-of-bytes like python2/win32 and python2+3/unix
then this wouldn't be an issue.

Is this an actual problem, or is it something that should be handled
by application-level code as I've done?

Thanks,
-- 
David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] subprocess.Popen and win32

2014-04-20 Thread David Aguilar
On Sun, Apr 20, 2014 at 2:42 AM, Antoine Pitrou  wrote:
> On Sat, 19 Apr 2014 19:02:42 -0700
> David Aguilar  wrote:
>>
>> On python3, this still works for normal platforms, but on windows we
>> can't pass a list of byte strings. We have to pass a list of unicode
>> strings.
>
> Windows native APIs are unicode-based. It is actually necessary to pass
> *unicode* strings, not byte strings, if you want your code to be
> correct in the face of non-ASCII characters.
>
> Under other platforms, when unicode strings are passed, Python will
> encode them using the platform's detected encoding. So, unless your
> platform is somehow misconfigured, passing unicode strings will also
> work correctly there.
>
> (note this is under Python 3)

Curious.. what if I don't want the default encoding? On UNIX, I can
control what encoding is used because I can encoding unicode strings
into bytes and the programmer is in full control. I was mainly
surprised that this is valid code on unix, but not windows, and which
seems like a portability concern.

If I use unicode strings that means I'm beholden to the default
encoding. I do agree that utf-8 (python3) is the only sane encoding
(for filesystems, etc) which is why it's just a curious question, and
for my use case, the default encoding on python3 (utf-8) is good
enough.

The projects I work on (including at work, where there is a huge
python2 code base) are python2+3 compatible by necessity, so it seems
like the best solution would be to check the python version, rather
than the platform, before deciding whether or not to encode or decode
inputs before calling into subprocess. That works for me :-)

Thanks for the explanation.

ciao,
-- 
David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com