On 15 May 2011 18:08, Mark Lentczner <mark.lentcz...@gmail.com> wrote: > other hand, Haskell software generally does presume valid Unicode, and the > broken surrogates will break things, for example the Text package. PUA > characters will work with all Haskell software.
This is a key point - I wonder whether you have in mind a particular bit of code using the "text" package that will fail if we use lone surrogates as escapes? I definitely sympathise with your arguments, but there is also a practical argument against using the private-use area. Currently, I rely on the fact that iconv will signal an error if I try to encode a lone surrogate, so that I can spot it and encode it as the appropriate raw byte instead. However, if we escape using the private-use area then iconv will encode the escapes without error, which doesn't give me the chance to replace them with the correct raw byte. There are two ways around this that I can see: 1. Use the private-use characters for the Char data type, but ensure that all of GHC's internal buffers (which are basically Char arrays) used for text encoding represent all of roundtripping private-use characters as lone surrogates instead. Encoding these will still be an error, so we can get control over iconv again. We will need to be careful to turn these lone surrogates into private-use characters when extracting a String from the Char array, and conversely we need to map private-use characters to lone surrogates when building a Char array from a String. This option still uses lone surrogates but the user of the String data type will never see them. 2. In the iconv decoder, before calling iconv proper, do a pre-pass that replaces all private-use characters with lone surrogates in the Char array. Replace them with private-use characters before returning from the decoder. Option 1 is bad because it imposes a new complex global invariant, but will probably have minimal performance impact. Option 2 is more localised but will kill performance. Need to think about this one... Max _______________________________________________ Cvs-ghc mailing list Cvs-ghc@haskell.org http://www.haskell.org/mailman/listinfo/cvs-ghc