(BMarcin 'Qrczak' Kowalczyk wrote:
(B
(B> >> The various UTF encodings do not have this particular problem; if a UTF
(B> >> string is valid, then it is a unique representation of a unicode string.
(B> >> However, decoding is still a partial function and can fail.
(B> >
(B> > And while it is partly true, it is qualified by the problems relative to
(B> > canonicalization (an "-B�" in Unicode can both be represented as "�" or as 
(B> > two-A
(B> > chars (an e and an accent) and they should (ideally) compare equal).
(B> 
(B> In what sense "equal"? They are supposed to be equivalent as far
(B> as the semantics of the text is concerned, but representations are
(B> clearly different and most programs distinguish them. In particular
(B> they are different filenames on both Unix and Windows. AFAIK MacOS
(B> normalizes filenames, but using a slightly different algorithm than
(B> Unicode (perhaps just an older version).
(B> 
(B> IMHO it makes no sense to pretend that they are exactly the same when
(B> strings consist of code points or lower level units (and I don't
(B> believe another choice for the default string type would be practical).
(B
(BWell, at least you and I agree on that.
(B
(BOnce you start down the "semantic equivalence" route, you will quickly
(Brun into issues like "�" == "ss", and it only gets worse from there
(Bon.
(B
(B-- 
(BGlynn Clements <[EMAIL PROTECTED]>
(B_______________________________________________
(BHaskell-Cafe mailing list
([email protected]
(Bhttp://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to