https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49973
--- Comment #16 from Lewis Hyatt <lhyatt at gmail dot com> --- Thank you both for the feedback so far. Regarding the use of wcwidth(), one thing I noticed is that glibc has a much different result than gnulib does, say for instance emojis return width 2 in the former rather than 1. (Which seems better based on what I can tell.) It seems that glibc has undergone a fair amount of tweaking to match what applications expect and so what it provides is not coming directly from parsing the Unicode specs, although that's probably the bulk of it. But I wonder, perhaps this is a sign that it might be better to just make use of glibc and not try to add in a third implementation to the mix? In any case, the underlying source of wcwidth() could easily be changed as a drop-in replacement so I guess it can also be decided later. The use of mbrtowc() is the bigger problem, since this converts from the user's locale and it needs to convert from what -finput-charset asked for (or else UTF-8) instead. I have a more or less fully-baked patch at this point, that fixes up all diagnostics that I am aware of (changes mostly in diagnostic.c and diagnostic-show-locus.c) to be multi-byte aware. That includes column numbers, carets, annotations, notes, fixit hints, etc. The patch still ignores the input-charset issue and uses mbrtowc(), so that is the last thing for me to add before I think it is worth sharing. I was wondering if I could get some advice as to where to start here please? It seems that basically location_get_source_line() in input.c needs to return the lines converted to UTF-8, since all parsing has been working with the lines in this form, and all the byte offsets they populated rich_locations with, etc, are relative to the converted data too. I am not sure what's the correct way though for location_get_source_line() to know the value of the -finput-charset option. Typically this is inspected from a cpp_reader object, but none is available in the context where this runs, that I understand anyway. It seems that in order to make use of the existing conversion machinery in libcpp/charset.c, I need to have a cpp_reader instance available too. Appreciate any suggestions here. Thanks! -Lewis