Re: Unicode string literals

Bruno Haible Thu, 30 Apr 2020 03:06:31 -0700

Hi Marc,

Marc Nieper-Wißkirchen wrote:
> On a system that supports at least C11, I can create an UTF8-encoded
> literal string through:
> 
> (uint8_t const *) u8"..."
> 
> Could Gnulib abstract this into a macro so that substitutes for
> systems that do not have u8 string literals can be provided.
> 
> On a C11 system, we would have
> 
> #define UTF8(s) ((uint8_t const *) u8 ## s)
> 
> and similar definitions for UTF16 and UTF32.


Unfortunately, we cannot provide such macros. The reason is that the
translation from the source file's encoding to UTF-8/UTF-16/UTF-32 must
be done by the compiler, if you want to be able to write
  static uint8_t my_string[] = u8"Wißkirchen";

Your best bet is
  1) For UTF-8 encoded strings, ensure that your source code is UTF-8
     encoded, or use escapes, like in gnulib/tests/uniwidth/test-u8-width.c.
  2) For UTF-16 encoded strings, which you'll need only on Windows,
     write L"Wißkirchen". Or use hex codes, like in
     gnulib/tests/uniwidth/test-u16-width.c.
  3) Don't use UTF-32 encoded strings. Or use hex codes, like in
     gnulib/tests/uniwidth/test-u32-width.c.

> Similarly, something like
> 
> #define ASCII(s) (u8 ## s [0])
> 
> for pre-C2x systems would be nice so that ASCII("c") expands into the
> ASCII code point of the character `c'.

What's the point of this one? Why not just write 'c'?

Bruno

Re: Unicode string literals

Reply via email to