First, let me say that this is not a big deal. I initially thought that this would be a simple fix (convert all Latin-1 files to UTF-8 and make sure everything still works). It turns out to be more complicated than that.
I understand (now) that groff prefers Latin-1 for its input files and does not handle UTF-8 well. (That may be an oversimplification.) I do not propose to change that. (I'd like to see it change, but since I'm unwilling to do that work, I don't have much standing to advocate it.) The issue for me is that some plain-text files are more difficult to read *in my environment* because they use Latin-1 rather than UTF-8 encoding. Most tools on my system (Ubuntu 24.04) are configured to use UTF-8 by default. I've configured other tools to do the same. I've found that most non-ASCII text files these days are encoded in UTF-8. Other legacy encodings are relatively rare. The groff source files are somewhat unusual in using Latin-1 rather than UTF-8. Just one example: Line 1056 of the NEWS file (as of commit 0743fce49 on the master branch) is: for the man, ms, me, mm, and mom packages. Thanks to Eloi Montañés. When I view the NEWS file with vim, it correctly determines the encoding and shows it directly. When I view it with less, it doesn't, and I see: for the man, ms, me, mm, and mom packages. Thanks to Eloi Monta<F1><E9>s. Setting LESSCHARSET=latin1 doesn't fix this, perhaps because other tools I'm using (xterm, tmux, etc.) still assume UTF-8. When I view that line directly, with `sed -n 1056p NEWS`, I see: for the man, ms, me, mm, and mom packages. Thanks to Eloi Monta�s. (I see a single REPLACEMENT CHARACTER for the two accented letters). The nano editor shows two REPLACEMENT CHARACTERs. If there's a consensus that the human-readable plain-text files should be converted to UTF-8, I volunteer to submit a patch. If the consensus is that they should be left as they are, I'll accept that. -- Keith Thompson On Sun, Apr 12, 2026 at 5:00 PM Damian McGuckin <[email protected]> wrote: > > On Sun, 12 Apr 2026, Keith Thompson wrote: > > > Meanwhile, I suggest converting only files that are treated as > > plain text (NEWS, ChangeLog.*, */README, etc.), just to make things > > a bit easier for human readers. > > Hi Keith, > > Why have we got to change plain text files? I read files with vi, less, > more and other simple tools. > > What UTF-8 symbols do you need that are not covered by groff's (or > troff's) special characters or other mechanisms. > > I do not quite understand your comments about fr.tmac. Those hcode lines > work for me. > > Thanks - Damian
