Re: [PATCH] Better encoding/decoding for GHC

Simon Marlow Wed, 20 Apr 2011 02:59:23 -0700

On 18/04/2011 21:46, Mark Lentczner wrote:

    (A minor point: I think your definition D10, rather than D76,
    is closest to what GHC implements as Char, since you can for
    example evaluate (length "\xD800") with no complaints


Yikes - I thought earlier versions of GHC wouldn't evaluate "\xD800". So
you are right - GHC seems to be D10, but yes, I do believe it would be
best if Haskell (and GHC) defined Char in terms of D76.

    So to summarise, your proposal is to:

I want to make sure that all agree on the "stance" the code should take:

   1. The system infers, to the best it can, the encoding used for file
      paths. This encoding might be wrong, though on modern systems, if
      it is inferred as a Unicode encoding, it is almost certainly
      right. Nonetheless, there is no guarantee that file paths are
      valid encodings.
   2. The system presents to user code file paths that were valid
      encodings as valid Strings, and user code can present such Strings
      back with perfect round-trip fidelity.
   3. The system presents to user code file paths that are not valid
      encodings as valid Strings, by mapping the invalid encodings onto
      the private use area U+F700 to U+F7FF. These will of course be
      indistinguishable from valid file paths that contained such
      characters (only possible if the encoding is a Unicode encoding),
      and thus are not round-trippable.
   4. If user code presents file paths as Strings that do not encode
      into the inferred encoding, an exception is thrown. This includes
      when the inferred encoding cannot encode the private use area/.
      /When the inferred encoding is a Unicode encoding (UTF-*), the
      private use characters will be encoded normally (and thus
      differently if they were generated due to an original illegally
      encoded file path).

So that means filenames that are not legal in the current encoding won'tround-trip? But wasn't that the problem that Max was originally tryingto solve?


Cheers,
        Simon

The crux of the issue is the handling in #4. If we believe our inferred
encoding is generally right, and that invalid encodings are rare to
non-existant (and perhaps indicative of bigger problems on the whole) -
then as stated above is the way to go.

     > Lastly, I'm curious how the proposed code infers the encoding
    from the locale.
    This code already exists in GHC. The behaviour at the moment
    is platform dependent and as follows:

Thanks for those details! It looks good to me.

- Mark



_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc



_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Re: [PATCH] Better encoding/decoding for GHC

Reply via email to