On 2005-01-30, Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> wrote: > Glynn Clements <[EMAIL PROTECTED]> writes: > >> And it isn't a theoretical issue. E.g. in an environment where EUC-JP >> is used, filenames may begin with <ESC>$)B (designate JISX0208 to G1), >> or they may not (because G1 is assumed to contain JISX0208 initally). > > I think such encodings are never used as default encodings of a Unix > locale. > >>> The various UTF encodings do not have this particular problem; if a UTF >>> string is valid, then it is a unique representation of a unicode string. > > BOM is a problem. Unfortunately Unicode mandates that FEFF at the > start of a UTF-8 text stream is a mark which doesn't belong to the > text.
Right > It provides variants of UTF-16/32 with and without a BOM, but > UTF-8 only has the variant with a BOM. This makes UTF-8 a stateful > encoding. I think you mean "UTF-8 only has the variant without a BOM". Otherwise I'd like to see a citation in the standard for this. Because that's not the reading I get from <http://www.unicode.org/faq/utf_bom.html>. Instead, it seems that whether the BOM is included or not is a function of the protocol, and that the UTF-8 streams themselves do not include the BOM. -- Aaron Denney -><- _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
