On Sun, Apr 12, 2026 at 04:32:55PM -0700, Keith Thompson wrote:
> As an experiment, I tried building groff from source (from the git
> repo) after converting all Latin-1 files to UTF8.
> 
> The build appeared to succeed, but there were about 9000 lines of
> diagnostics about invalid input characters.
> 
> So obviously a naive approach isn't going to work.
> 
> Apparently groff doesn't do well with UTF-8 input. I'd like to
> see that changed, but I don't know nearly enough about groff to
> even start that work, or to speculate about whether it would be a
> good idea.
> 
> Meanwhile, I suggest converting only files that are treated as
> plain text (NEWS, ChangeLog.*, */README, etc.), just to make things
> a bit easier for human readers.
> 
> Thoughts?
> 
[...]

  Simple groff only deals directly with ascii and latin1 encodings.
For all others a preprocessor is used, preconv.

  "groff -K utf8" adds the preprocessor to the pipe for a UTF-8 encoded
input.

  For examples in the Makefile see:

# pdfmom command used to generate .pdf
#
# Use '-K utf8', not '-k', in case 'configure' didn't find uchardet.
MOMPDFMOM = \
  GROFF_COMMAND=test-groff \
  GROFF_COMMAND_PREFIX= \
  GROFF_BIN_PATH="$(GROFF_BIN_PATH)" \
  $(PDFMOMBIN) $(FFLAG) $(MFLAG) -M$(mom_srcdir) -K utf8 -p -e -t \
  -wall -b -P-W
-.-

# Use '-K utf8', not '-k', in case 'configure' didn't find uchardet.
# The French translation uses tbl; its English counterpart does not.
doc/meintro_fr.ps: doc/meintro_fr.me preconv tbl
        $(GROFF_V)$(MKDIR_P) `dirname $@` \
        && $(DOC_GROFF) -K utf8 -t -Tps -me -mfr $< >$@
-.-

  A warning with "invalid input character code ... [-w input]" lacks a
hint how to fix this with somthing like

use option "-K <encoding>"


Reply via email to