Re: Compiling files not encoded with system settings

Nicolas De Rico Thu, 25 May 2006 09:00:44 -0700


Ross Ridge wrote:

Nicolas De Rico wrote:

The file hi-utf16.c, created with Notepad and saved in "unicode",
contains a BOM which is, in essence, a small header at the beginning of
the file that indicates the encoding.


It's not a header that indicates the encoding.  It's a header that
indicates the byte order of the 16-bit values that follow when the
encoding is already known to be UTF-16.  When then encoding is known
to be UTF-16LE or UTF-16BE there shouldn't be any "BOM" present at the
start of a C file, since a "BOM" in the correct byte order is actually
the Unicode "zero-width non-breaking space" character, which isn't valid
as the first character in a C file.  Similarly, there shouldn't be a
BOM mark at the start of a UTF-8 C file, especially since UTF-8 encoded
files don't have a byte-order.

The presence of what looks to be UTF-16 BOM header can be used a part
of a heuristic to guess the encoding of file, but I don't think it's a
good idea for GCC to be guessing the encoding of files.

Of course, stdio.h is stored in UTF-8 on the system so trying to convert
it from UTF-16 will fail right away.


It would probably be more accurate to describe "stdio.h" as an ASCII file.

It's true that stdio.h is ascii. I wasn't thinking properly, filessaved in UTF-8 or LATIN-1 (for example) should compile properly if usingthe proper setting.

But how can someone compile using gcc a file created with Visual C++ andsaved in unicode?

Microsoft puts a BOM for UTF-16 files. It even does so for UTF-8 filesthat are saved with Notepad (this can be confirmed using 'od -x'). Thisallows their programs to detect the encoding automatically. Note thatvim seems to be able to detect the encoding using the BOM.

Re: Compiling files not encoded with system settings

Reply via email to