On Tue, Apr 14, 2026 at 5:22 PM Damian McGuckin <[email protected]> wrote: [...] > >> I recently was reading C++ code written in UTF-8 and I found it difficult. > >> Very tiring on the eyes because it was more than reading, it was thinking > >> about the mathematics behind the code. > > > > Not sure what this means either. If your viewer was making you > > convert individual bytes to UTF-8 characters in your head, your viewer > > is configured badly. If that's not what you meant, can you clarify? > > It was more a general comment. Nothing to do with groff per se. The C++ > code was that of an elementary mathematical function and temporary > variables related to (say) the square of a variable x were written as x > with a UTF-8 suprtscript of 2. I was amazed at how difficult the code was > to read. Maybe I need to start using my glasses more often. > > Thanks anyway - Damian
Damian, it sounds like your issue was not the way the file was encoded (UTF-8 vs. Latin-1 / ISO 8859-1), but the use of the superscript 2 character (x²). That character is U+00B2, and is representable in both UTF-8 and Latin-1. There's something to be said, I suppose, for sticking to the ASCII subset of both UTF-8 and Latin-1 when practical -- but it's not always practical. And in fact a valid ASCII file is also a valid UTF-8 file and a valid Latin-1 file. As for the larger issue, my personal bias is that UTF-8 should now be considered THE default representation for text. I still advocate that any human-readable files in the groff source tree should be UTF-8. Things like *.tmac files should of course be left alone until and unless groff itself can handle UTF-8 natively. (Many of the affected files just have a few accented characters, typically people's names in comments.) One example: all the text files in the coreutils source tree are either ASCII or UTF-8. Having said that, I'm perfectly willing to drop the whole thing, or perhaps to revisit it when/if groff acquires native UTF-8 support. Another thing to consider: When I converted everything to UTF-8 and built from source, I got a whole bunch of diagnostic message, but it still built successfully, and I get another bunch of diagnostic messages when I run it (I used the rm.1 man page as input). Perhaps the errors should have caused the build to fail? Just a thought, which you can ignore if you like. -- Keith Thompson
