First, let me say that this is not a big deal.  I initially thought
that this would be a simple fix (convert all Latin-1 files to UTF-8
and make sure everything still works).  It turns out to be more
complicated than that.

I understand (now) that groff prefers Latin-1 for its input files and
does not handle UTF-8 well.  (That may be an oversimplification.)
I do not propose to change that.  (I'd like to see it change, but
since I'm unwilling to do that work, I don't have much standing to
advocate it.)

The issue for me is that some plain-text files are more difficult to
read *in my environment* because they use Latin-1 rather than UTF-8
encoding.  Most tools on my system (Ubuntu 24.04) are configured
to use UTF-8 by default.  I've configured other tools to do the same.

I've found that most non-ASCII text files these days are encoded
in UTF-8.  Other legacy encodings are relatively rare.  The groff
source files are somewhat unusual in using Latin-1 rather than UTF-8.

Just one example: Line 1056 of the NEWS file (as of commit 0743fce49
on the master branch) is:

   for the man, ms, me, mm, and mom packages.  Thanks to Eloi Montañés.

When I view the NEWS file with vim, it correctly determines the
encoding and shows it directly.  When I view it with less, it
doesn't, and I see:

   for the man, ms, me, mm, and mom packages.  Thanks to Eloi Monta<F1><E9>s.

Setting LESSCHARSET=latin1 doesn't fix this, perhaps because other
tools I'm using (xterm, tmux, etc.) still assume UTF-8.

When I view that line directly, with `sed -n 1056p NEWS`, I see:

   for the man, ms, me, mm, and mom packages.  Thanks to Eloi Monta�s.

(I see a single REPLACEMENT CHARACTER for the two accented letters).

The nano editor shows two REPLACEMENT CHARACTERs.

If there's a consensus that the human-readable plain-text files
should be converted to UTF-8, I volunteer to submit a patch.

If the consensus is that they should be left as they are, I'll
accept that.

-- Keith Thompson

On Sun, Apr 12, 2026 at 5:00 PM Damian McGuckin <[email protected]> wrote:
>
> On Sun, 12 Apr 2026, Keith Thompson wrote:
>
> > Meanwhile, I suggest converting only files that are treated as
> > plain text (NEWS, ChangeLog.*, */README, etc.), just to make things
> > a bit easier for human readers.
>
> Hi Keith,
>
> Why have we got to change plain text files?  I read files with vi, less,
> more and other simple tools.
>
> What UTF-8 symbols do you need that are not covered by groff's (or
> troff's) special characters or other mechanisms.
>
> I do not quite understand your comments about fr.tmac. Those hcode lines
> work for me.
>
> Thanks - Damian

Reply via email to