Nicolas De Rico wrote: > The file hi-utf16.c, created with Notepad and saved in "unicode", >contains a BOM which is, in essence, a small header at the beginning of >the file that indicates the encoding.
It's not a header that indicates the encoding. It's a header that indicates the byte order of the 16-bit values that follow when the encoding is already known to be UTF-16. When then encoding is known to be UTF-16LE or UTF-16BE there shouldn't be any "BOM" present at the start of a C file, since a "BOM" in the correct byte order is actually the Unicode "zero-width non-breaking space" character, which isn't valid as the first character in a C file. Similarly, there shouldn't be a BOM mark at the start of a UTF-8 C file, especially since UTF-8 encoded files don't have a byte-order. The presence of what looks to be UTF-16 BOM header can be used a part of a heuristic to guess the encoding of file, but I don't think it's a good idea for GCC to be guessing the encoding of files. >Of course, stdio.h is stored in UTF-8 on the system so trying to convert >it from UTF-16 will fail right away. It would probably be more accurate to describe "stdio.h" as an ASCII file. Ross Ridge