Hi, I patched Python 3.2 to support modules with non-ASCII paths (*). It works well on all operating systems. But the task is not completly done:
(a) Python 3 doesn't support non-ASCII module names (b) Python 3 doesn't support unencodable characters in the module path I would like to know if we need to support that. Terry J. Reedy wrote (issue #10828): "I think bugs in core syntax should have high priority. I appreciate your work toward fixing it." I wrote a patch (issue #3080) fixing both points. If you agree that both issues should be fixed, I will fix them in Python 3.3. (a) is the issue #10828 reported recently (january 2011): "import gui_jämföra" doesn't work with a locale encoding different than UTF-8 (so it doesn't work on Windows). (b) is specific to Windows: FAT32 and NTFS filesystems store filenames in unicode, but Python encodes paths to the ANSI code page (which is a very small subset of Unicode). If a character cannot be encoded to the code page, you cannot load a module. Eg. add a japanese character in a directory name on a Windows using cp1252 (english) code page. I don't think that (b) was already reported by an user, it's more a theorical problem. My patch is huge, but it simplifies the code. We doesn't need to regulary convert from/to UTF-8. And for the functions using PyUnicodeObject objects (and not a Py_UNICODE* buffer): PyUnicodeObject stores the string length (it avoids calls to strlen()) and PyUnicode_FromFormat() doesn't need a buffer size (no risk of buffer overflow). I suppose that it makes Python faster, but I didn't try. (*) Python 3.2 doesn't support non-ASCII in the module *name*, only in the path (sys.path). Victor Stinner _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com