On Fri, Dec 5, 2008 at 2:27 AM, Ulrich Eckhardt <[EMAIL PROTECTED]> wrote: > Seriously, what would you suggest to someone that > wants to handle paths in a portable way? Using the Unicode variants of > functions is fubar, because encoding/decoding is not universally possible. > Using the byte variant is equally fubar, because e.g. on MS Windows it is not > supported, except through a very lossy roundtrip through the locale's > codepage, limiting your functionality.
Write a lightweight abstraction layer that uses Unicode when possible and bytes otherwise. You'd need to write a few functions for the path handling code you need, with a platform check or two sprinkled in. Writing such an abstraction for the purpose of one specific application is usually simple enough. However, writing a similar abstraction that serves all apps and all use cases is hard. I hope that eventually someone will come up with one though -- the failure of earlier path object proposals notwithstanding. > I actually think it is about time to give up on trying to think about a path > as a string. Dito for data received from os.environ or sys.argv. There are > only very few things that are universal to them and a reliable encoding is > none of them. Then, once you have let that idea go, meditate a bit over the > Zen. This sounds too pessimistic to me. I expect that in five years it will be universally accepted that these variables must be encoded in a standard encoding. People are never going to give up thinking about filenames etc. as strings, because that's what they are conceptually. The problem is purely one of encoding, and that's where Unix/Linux are behind the curve, since (so far) they haven't taken the plunge and picked a universal standard encoding, the way Windows and Mac OS X have done. > What I propose is that paths must be treated as OS-specific, with the only > common reliable operations being joining them, concatenating them and > splitting them into segments divided by the (again, OS-specific) separator. > Other operations, like e.g. appending a string or converting it to a string > in order to display it can fail. And if they fail, they should fail noisily. That's bad though, since filenames are being displayed all the time (e.g. in error messages). > In 99% of all cases, using the default encoding will work and do what people > expect, which is why I would make this conversion automatic. In all other > cases, it will at least not fail silently (which would lead to garbage and > data loss) and allow more sophisticated applications to handle it. I think the "always fail noisily" approach isn't the best approach. E.g. if I am globbing for *.py, and there's an undecodable .txt file in a directory, its presence shouldn't cause the glob to fail. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com