Re: [PATCH] Better encoding/decoding for GHC

Max Bolingbroke Wed, 18 May 2011 15:14:32 -0700

On 18 May 2011 22:54, Mark Lentczner <mark.lentcz...@gmail.com> wrote:
> The range is U+EF80 through U+EFFF, called "Reserved for encoding hacks".


OK, I've applied another patch so we match this. Here's hoping that
has finally put this issue to rest :-)

> On a related note, If we want to be able to round trip file names that
> contain proper UTF-8 encoded characters from this range, we can: Treat the
> byte sequences 0xEE 0xBE 0x80 through 0xEE 0xBF 0xBF as if they were
> encoding errors, and replace such bytes with the encoding hack characters
> for each octet.

This would require us to:
  1. Unconditionally decode these bytes sequences using the escape
mechanism, even if using a non-roundtripping encoding. This is because
the chars that result might be fed back into a roundtripping encoding,
where they would otherwise get confused with escapes representing some
other bytes.
  2. Unconditonally decode these particular characters from escapes,
even if using a non-roundtripping decoding -- necessary because of 1.

Which are both a little annoying. Perhaps more seriously, it would
play badly with e.g. reading in UTF-8 and writing out UTF-16, because
your UTF-16 would have bits of UTF-8 representing these private-use
chars embedded within it..

I'm not that tempted to go down this particular rabbit hole today :-)

Cheers,
Max

_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Re: [PATCH] Better encoding/decoding for GHC

Reply via email to