Le mardi 19 octobre 2010 16:12:56, Barry Warsaw a écrit : > Going forward, is there adequate documentation, guidelines, and safeguards > for future coders so that they Do The Right Thing with new code? Perhaps > a short How To in the standard documentation would be helpful, with links > to it from any old/bad API calls?
Hum, as usual, I suggest to decode all inputs to unicode as early as possible, and encode back to bytes (or other native format) at the last moment. For filenames, it means that PyUnicode_FSDecoder() is better than PyUnicode_FSConverter(), because it gives an unicode object (instead of byte string) and so the function will support unencodable characters. Use PyUnicode_EncodeFSDefault() / PyUnicode_DecodeFSDefault() and os.fsencode() / os.fsdecode() to encode/decode filenames instead of your own function, to support the PEP 383 (undecodable bytes <=> surrogate characters). Be also careful to support undecodable bytes (on OSes other than Windows), eg. try a filename with a non-ASCII character with the C locale (ASCII locale encoding). Even with utf-8 filesystem encoding, this problem may occurs with a system not correclty configured (eg. USB key with the FAT fileystem using the "wrong" encoding). If you would like to avoid all encoding issues on filenames on UNIX/BSD, use bytes: os.environb, os.listdir(b'.'), os.getcwdb(), etc. Be careful with the utf-8 codec: its default mode (strict error handler) refuses to encode surrogate characters. Eg. print(filename) may raise a UnicodeEncodeError. Use repr(filename) to escape surrogate characters. -- I plan to fix Python documentation: specify the encoding used to decode all byte string arguments of the C API. I already wrote a draft patch: issue #9738. This lack of documentation was a big problem for me, because I had to follow the function calls to get the encoding. -- Victor Stinner http://www.haypocalc.com/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com