Nicolas De Rico wrote:
> The file hi-utf16.c, created with Notepad and saved in "unicode",
>contains a BOM which is, in essence, a small header at the beginning of
>the file that indicates the encoding.

It's not a header that indicates the encoding.  It's a header that
indicates the byte order of the 16-bit values that follow when the
encoding is already known to be UTF-16.  When then encoding is known
to be UTF-16LE or UTF-16BE there shouldn't be any "BOM" present at the
start of a C file, since a "BOM" in the correct byte order is actually
the Unicode "zero-width non-breaking space" character, which isn't valid
as the first character in a C file.  Similarly, there shouldn't be a
BOM mark at the start of a UTF-8 C file, especially since UTF-8 encoded
files don't have a byte-order.

The presence of what looks to be UTF-16 BOM header can be used a part
of a heuristic to guess the encoding of file, but I don't think it's a
good idea for GCC to be guessing the encoding of files.

>Of course, stdio.h is stored in UTF-8 on the system so trying to convert
>it from UTF-16 will fail right away.

It would probably be more accurate to describe "stdio.h" as an ASCII file.

                                        Ross Ridge

Reply via email to