On Wed, Apr 06, 2016 at 10:02:30AM +1000, Chris Angelico wrote: > My personal view on the text/bytes debate is that a path is > fundamentally a human concept, and consists therefore of text. The > fact that some file systems store (at the low level) bytes and some > store (I think) UTF-16 code units should be immaterial; path > components exist for people. We can smuggle unrecognized bytes around, > but ultimately, those bytes came from characters at some point - we > just don't know the encoding. So a Path object has no relationship > with bytes, only with str.
That might be usually true in practice, but it is incorrect in principle. Paths in POSIX systems like Linux are fundamentally byte-strings with only two restrictions: \0 and \x2f are forbidden. The fact that paths in Linux mostly happen to look like English words (often heavily abbreviated) is a historical accident. The file system itself supported paths containing (say) \xff even back in the days when text was pure US-ASCII and bytes over \x7f had no textual meaning, and these days paths still support sequences of bytes that have no human meaning in any encoding. I don't know if this makes the tiniest lick of difference for Pathlib. I would be perfectly content if we stuck with the design decision that Pathlib can only represent paths representable as Unicode strings, and left weird POSIX filenames to the legacy byte-string interface. -- Steve _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com