[Bug preprocessor/49973] Column numbers count multibyte characters as multiple columns

joseph at codesourcery dot com Mon, 16 Sep 2019 13:33:40 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49973

--- Comment #14 from joseph at codesourcery dot com <joseph at codesourcery dot 
com> ---
On Sun, 15 Sep 2019, lhyatt at gmail dot com wrote:

> I feel like the most portable solution is just to use directly the necessary
> code (from glibc or gnulib or from scratch or wherever) to handle the 
> wcwidth()
> functionality, and tweak it for this purpose. It's in essence just a binary

Both of those use data generated in some way from Unicode data (stored in 
the locale in the glibc case).

A standalone generator implementing UAX#11 rules for character width 
should be straightforward (we'd need to check in the generator sources as 
well as the generated table).

> search in a table. Basically I would convert the source line from the input
> charset to UTF-8 the same way the file is read on original input (using the
> facilities in libcpp/charset.c), and then I would just need a variant of

Yes, sources need to be processed consistently (converted from input 
charset to UTF-8).  And then of course converted from UTF-8 to the locale 
character set for the final output on the terminal (with some form of 
graceful degradation if the source character set has characters that can't 
be represented in the locale character set - extended identifiers 
diagnostics use UCNs in that case, but I don't know if that's best in 
general).

If a source line in the default -finput-charset=utf-8 case contains 
non-UTF-8 bytes in strings or comments, we can't safely display them on 
the terminal, so my inclination in such a case would be to treat such 
bytes as width-1 and output them as '?'.

[Bug preprocessor/49973] Column numbers count multibyte characters as multiple columns

Reply via email to