Re: Unicode string literals

Bruno Haible Thu, 30 Apr 2020 06:17:20 -0700

Hi Marc,

> I was hoping that compilers not supporting enough of C11
> would have some other way to translate from the source file encoding
> to UTF-8, which could be exploited by Gnulib.


No, that's not the case. These not-so-new compilers don't perform
character set conversion; you have to provide the numeric value of each
byte yourself (either as escapes, or by enumerating the bytes of the
string one by one).

> > Your best bet is
> >   1) For UTF-8 encoded strings, ensure that your source code is UTF-8
> >      encoded, or use escapes, like in gnulib/tests/uniwidth/test-u8-width.c.
> 
> Using escapes for non-ASCII characters, it will work whenever the
> execution character set of the compiler is compatible with ASCII,
> right?

The only system where the execution character set is not compatible with
ASCII is z/OS. Daniel Richard G. is our expert regarding this platform.
My understanding is that
  - there are some facilities in the compiler, but we cannot make use of
    them in gnulib,
  - there are some facilities in the run-time library, and Daniel knows
    how to make use of them with gnulib,
  - overall it's case-by-case coding; there's no simple magic wand for it.

> > > for pre-C2x systems would be nice so that ASCII("c") expands into the
> > > ASCII code point of the character `c'.
> >
> > What's the point of this one? Why not just write 'c'?
> 
> I was thinking of a system whose execution character set is not
> compatible with ASCII.

You can have a statically allocated translation table from EBCDIC to ASCII
and write a macro that expands to ebcdic_to_ascii['c']. But that will not
be a constant expression. So, e.g. you cannot use this in a 'switch' statement.
And you cannot build a getopt option string from it either. And so on.

Bruno

Re: Unicode string literals

Reply via email to