Re: Latin-1 files in groff

Keith Thompson Tue, 14 Apr 2026 18:08:24 -0700

On Tue, Apr 14, 2026 at 5:22 PM Damian McGuckin <[email protected]> wrote:
[...]
> >> I recently was reading C++ code written in UTF-8 and I found it difficult.
> >> Very tiring on the eyes because it was more than reading, it was thinking
> >> about the mathematics behind the code.
> >
> > Not sure what this means either.  If your viewer was making you
> > convert individual bytes to UTF-8 characters in your head, your viewer
> > is configured badly.  If that's not what you meant, can you clarify?
>
> It was more a general comment. Nothing to do with groff per se. The C++
> code was that of an elementary mathematical function and temporary
> variables related to (say) the square of a variable x were written as x
> with a UTF-8 suprtscript of 2. I was amazed at how difficult the code was
> to read. Maybe I need to start using my glasses more often.
>
> Thanks anyway - Damian


Damian, it sounds like your issue was not the way the file was
encoded (UTF-8 vs. Latin-1 / ISO 8859-1), but the use of the
superscript 2 character (x²). That character is U+00B2, and is
representable in both UTF-8 and Latin-1.

There's something to be said, I suppose, for sticking to the ASCII
subset of both UTF-8 and Latin-1 when practical -- but it's not
always practical. And in fact a valid ASCII file is also a valid
UTF-8 file and a valid Latin-1 file.

As for the larger issue, my personal bias is that UTF-8 should
now be considered THE default representation for text. I still
advocate that any human-readable files in the groff source tree
should be UTF-8. Things like *.tmac files should of course be left
alone until and unless groff itself can handle UTF-8 natively.
(Many of the affected files just have a few accented characters,
typically people's names in comments.)

One example: all the text files in the coreutils source tree
are either ASCII or UTF-8.

Having said that, I'm perfectly willing to drop the whole thing,
or perhaps to revisit it when/if groff acquires native UTF-8 support.

Another thing to consider: When I converted everything to UTF-8 and
built from source, I got a whole bunch of diagnostic message,
but it still built successfully, and I get another bunch
of diagnostic messages when I run it (I used the rm.1 man
page as input). Perhaps the errors should have caused the build
to fail? Just a thought, which you can ignore if you like.

-- Keith Thompson

Re: Latin-1 files in groff

Reply via email to