On Mon, Apr 13, 2026 at 12:32 AM Damian McGuckin <[email protected]> wrote:
> Moving away from Latin-1 will likely cause serious problems for people who
> have lots of old files.

I'm not sure what this means.  This thread is about plain-text files
in the groff tree, which seems to have little bearing on anyone's old
files.

If you're talking about what future groff will accept as input, it
will always be able to handle Latin-1; you just might need to give it
a flag telling it to run "preconv" in the future.

> Also, of those files in the distribution which are UTF-8, what percentage
> of the characters in those files need to be UTF-8.

The question is sort of meaningless: if a 1000-line text file has just
1 character that requires UTF-8, the file has to be in UTF-8 format.
The encoding applies to the entire file, so percentages of characters
within the file don't have much relevance.

Some UTF-8 files (e.g. contrib/mm/COPYRIGHT, src/roff/nroff/nroff.sh)
do contain only characters that can be encoded in Latin-1.  I'm not
sure what the process was for deciding which files to make Latin-1 and
which UTF-8 if both were options.  It does seem fairly arbitrary
across the source tree, probably reflecting individual contributors'
whims rather than a systematic approach.

> I recently was reading C++ code written in UTF-8 and I found it difficult.
> Very tiring on the eyes because it was more than reading, it was thinking
> about the mathematics behind the code.

Not sure what this means either.  If your viewer was making you
convert individual bytes to UTF-8 characters in your head, your viewer
is configured badly.  If that's not what you meant, can you clarify?

Reply via email to