On Sun, Apr 12, 2026 at 04:32:55PM -0700, Keith Thompson wrote:
> As an experiment, I tried building groff from source (from the git
> repo) after converting all Latin-1 files to UTF8.
>
> The build appeared to succeed, but there were about 9000 lines of
> diagnostics about invalid input characters.
>
> So obviously a naive approach isn't going to work.
>
> Apparently groff doesn't do well with UTF-8 input. I'd like to
> see that changed, but I don't know nearly enough about groff to
> even start that work, or to speculate about whether it would be a
> good idea.
>
> Meanwhile, I suggest converting only files that are treated as
> plain text (NEWS, ChangeLog.*, */README, etc.), just to make things
> a bit easier for human readers.
>
> Thoughts?
>
[...]
Simple groff only deals directly with ascii and latin1 encodings.
For all others a preprocessor is used, preconv.
"groff -K utf8" adds the preprocessor to the pipe for a UTF-8 encoded
input.
For examples in the Makefile see:
# pdfmom command used to generate .pdf
#
# Use '-K utf8', not '-k', in case 'configure' didn't find uchardet.
MOMPDFMOM = \
GROFF_COMMAND=test-groff \
GROFF_COMMAND_PREFIX= \
GROFF_BIN_PATH="$(GROFF_BIN_PATH)" \
$(PDFMOMBIN) $(FFLAG) $(MFLAG) -M$(mom_srcdir) -K utf8 -p -e -t \
-wall -b -P-W
-.-
# Use '-K utf8', not '-k', in case 'configure' didn't find uchardet.
# The French translation uses tbl; its English counterpart does not.
doc/meintro_fr.ps: doc/meintro_fr.me preconv tbl
$(GROFF_V)$(MKDIR_P) `dirname $@` \
&& $(DOC_GROFF) -K utf8 -t -Tps -me -mfr $< >$@
-.-
A warning with "invalid input character code ... [-w input]" lacks a
hint how to fix this with somthing like
use option "-K <encoding>"