On 2010-08-12 09:59:30 +0200, Csaba Raduly wrote:
> On Wed, Aug 11, 2010 at 4:49 PM, Michael Pruemm  wrote:
> > Vincent Lefevre wrote:
> (snip)
> >> Under these conditions, the only possibility is
> >> to encode the filenames in UTF-8 anyway. So, why not enforcing
> >> that?
> >>
> >
> > But don't forget that different platforms may use different UTF-8 encodings
> > for the same filename.
> 
> Huh? There's only one UTF-8 encoding for each Unicode code point. Are
> you thinking of code pages?

Michael means that there are several ways to represent a "same"
string (from a semantic point of view). There are two normalized
representations: NFC and NFD. While Linux does not try to normalize
filenames (they are just viewed as a sequence of bytes[*]), Mac OS X
(at least with HFS+) requires that the filenames are valid UTF-8
strings (even in non-UTF-8 locales) and normalize them to NFD for
storing them on disk.

[*] The locale doesn't matter, and top-bit-set bytes are allowed and
can be handled even in ASCII-based locales.

-- 
Vincent Lefèvre <vinc...@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)

Reply via email to