On 2010-08-12 09:59:30 +0200, Csaba Raduly wrote: > On Wed, Aug 11, 2010 at 4:49 PM, Michael Pruemm wrote: > > Vincent Lefevre wrote: > (snip) > >> Under these conditions, the only possibility is > >> to encode the filenames in UTF-8 anyway. So, why not enforcing > >> that? > >> > > > > But don't forget that different platforms may use different UTF-8 encodings > > for the same filename. > > Huh? There's only one UTF-8 encoding for each Unicode code point. Are > you thinking of code pages?
Michael means that there are several ways to represent a "same" string (from a semantic point of view). There are two normalized representations: NFC and NFD. While Linux does not try to normalize filenames (they are just viewed as a sequence of bytes[*]), Mac OS X (at least with HFS+) requires that the filenames are valid UTF-8 strings (even in non-UTF-8 locales) and normalize them to NFD for storing them on disk. [*] The locale doesn't matter, and top-bit-set bytes are allowed and can be handled even in ASCII-based locales. -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)