> > (A minor point: I think your definition D10, rather than D76, is closest to > what GHC implements as Char, since you can for example evaluate (length > "\xD800") with no complaints
Yikes - I thought earlier versions of GHC wouldn't evaluate "\xD800". So you are right - GHC seems to be D10, but yes, I do believe it would be best if Haskell (and GHC) defined Char in terms of D76. So to summarise, your proposal is to: > I want to make sure that all agree on the "stance" the code should take: 1. The system infers, to the best it can, the encoding used for file paths. This encoding might be wrong, though on modern systems, if it is inferred as a Unicode encoding, it is almost certainly right. Nonetheless, there is no guarantee that file paths are valid encodings. 2. The system presents to user code file paths that were valid encodings as valid Strings, and user code can present such Strings back with perfect round-trip fidelity. 3. The system presents to user code file paths that are not valid encodings as valid Strings, by mapping the invalid encodings onto the private use area U+F700 to U+F7FF. These will of course be indistinguishable from valid file paths that contained such characters (only possible if the encoding is a Unicode encoding), and thus are not round-trippable. 4. If user code presents file paths as Strings that do not encode into the inferred encoding, an exception is thrown. This includes when the inferred encoding cannot encode the private use area*. *When the inferred encoding is a Unicode encoding (UTF-*), the private use characters will be encoded normally (and thus differently if they were generated due to an original illegally encoded file path). The crux of the issue is the handling in #4. If we believe our inferred encoding is generally right, and that invalid encodings are rare to non-existant (and perhaps indicative of bigger problems on the whole) - then as stated above is the way to go. > Lastly, I'm curious how the proposed code infers the encoding from > the locale. > This code already exists in GHC. The behaviour at the moment is platform > dependent and as follows: > Thanks for those details! It looks good to me. - Mark
_______________________________________________ Cvs-ghc mailing list Cvs-ghc@haskell.org http://www.haskell.org/mailman/listinfo/cvs-ghc