On Tue, Apr 12, 2011 at 5:05 AM, Max Bolingbroke <batterseapo...@hotmail.com > wrote:
> a) When decoding a byte sequence to a String (which in GHC is > typically a sequence of 16-bit values representing a UTF-16 encoded > Unicode string), any bytes in the input which are undecodable are > represented in the String as a unicode codepoint in the range 0x00 to > 0x7F (for bytes < 128) or 0xDC80 to 0xDD00 (for other bytes). This is > a so-called surrogate escape mechanism because these codepoints are in > the UTF-16 reserved region of surrogate codepoints. > I should chime in here and note that when encoding a String, the text package maps all Char values in the UTF-16 reserved region to 0xfffd to preserve safety (Haskell arguably shouldn't be allowing those as valid Char values in the first place).
_______________________________________________ Cvs-ghc mailing list Cvs-ghc@haskell.org http://www.haskell.org/mailman/listinfo/cvs-ghc