On 18 May 2011 22:54, Mark Lentczner <mark.lentcz...@gmail.com> wrote: > The range is U+EF80 through U+EFFF, called "Reserved for encoding hacks".
OK, I've applied another patch so we match this. Here's hoping that has finally put this issue to rest :-) > On a related note, If we want to be able to round trip file names that > contain proper UTF-8 encoded characters from this range, we can: Treat the > byte sequences 0xEE 0xBE 0x80 through 0xEE 0xBF 0xBF as if they were > encoding errors, and replace such bytes with the encoding hack characters > for each octet. This would require us to: 1. Unconditionally decode these bytes sequences using the escape mechanism, even if using a non-roundtripping encoding. This is because the chars that result might be fed back into a roundtripping encoding, where they would otherwise get confused with escapes representing some other bytes. 2. Unconditonally decode these particular characters from escapes, even if using a non-roundtripping decoding -- necessary because of 1. Which are both a little annoying. Perhaps more seriously, it would play badly with e.g. reading in UTF-8 and writing out UTF-16, because your UTF-16 would have bits of UTF-8 representing these private-use chars embedded within it.. I'm not that tempted to go down this particular rabbit hole today :-) Cheers, Max _______________________________________________ Cvs-ghc mailing list Cvs-ghc@haskell.org http://www.haskell.org/mailman/listinfo/cvs-ghc