On Tue, Apr 12, 2011 at 5:05 AM, Max Bolingbroke <batterseapo...@hotmail.com
> wrote:

>  a) When decoding a byte sequence to a String (which in GHC is
> typically a sequence of 16-bit values representing a UTF-16 encoded
> Unicode string), any bytes in the input which are undecodable are
> represented in the String as a unicode codepoint in the range 0x00 to
> 0x7F (for bytes < 128) or 0xDC80 to 0xDD00 (for other bytes). This is
> a so-called surrogate escape mechanism because these codepoints are in
> the UTF-16 reserved region of surrogate codepoints.
>

I should chime in here and note that when encoding a String, the text
package maps all Char values in the UTF-16 reserved region to 0xfffd to
preserve safety (Haskell arguably shouldn't be allowing those as valid Char
values in the first place).
_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to