https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49973
--- Comment #14 from joseph at codesourcery dot com <joseph at codesourcery dot com> --- On Sun, 15 Sep 2019, lhyatt at gmail dot com wrote: > I feel like the most portable solution is just to use directly the necessary > code (from glibc or gnulib or from scratch or wherever) to handle the > wcwidth() > functionality, and tweak it for this purpose. It's in essence just a binary Both of those use data generated in some way from Unicode data (stored in the locale in the glibc case). A standalone generator implementing UAX#11 rules for character width should be straightforward (we'd need to check in the generator sources as well as the generated table). > search in a table. Basically I would convert the source line from the input > charset to UTF-8 the same way the file is read on original input (using the > facilities in libcpp/charset.c), and then I would just need a variant of Yes, sources need to be processed consistently (converted from input charset to UTF-8). And then of course converted from UTF-8 to the locale character set for the final output on the terminal (with some form of graceful degradation if the source character set has characters that can't be represented in the locale character set - extended identifiers diagnostics use UCNs in that case, but I don't know if that's best in general). If a source line in the default -finput-charset=utf-8 case contains non-UTF-8 bytes in strings or comments, we can't safely display them on the terminal, so my inclination in such a case would be to treat such bytes as width-1 and output them as '?'.