Hi Keith, I want to address only one part of your message, as other folks seem to be handling the remainder just fine.
At 2026-04-12T16:32:55-0700, Keith Thompson wrote: > Apparently groff doesn't do well with UTF-8 input. I'd like to > see that changed, but I don't know nearly enough about groff to > even start that work, or to speculate about whether it would be a > good idea. This has been a goal of groff's developers for many years, since well before I joined them. https://savannah.gnu.org/bugs/?40720 Also see: https://www.gnu.org/software/groff/groff-mission-statement.html The reason that this goal hasn't been achieved yet, in my opinion, is that James Clark gambled in 1989 on possible future configurations of character encoding popularity--I suspect to minimize groff programs' memory requirements and avoid critiques of consequent reduced performance--and lost, because Unicode happened. The presumption that a single datum of the C/C++ `unsigned char` type is adequate to represent any desired character code on input is deeply woven into groff's architecture. I've been working my way through the code to annotate and in some cases remove barriers to GNU troff's acceptance of UTF-8 input, but it is slow going and there are many frustrations. Here's an exhibit that came up recently. https://savannah.gnu.org/bugs/?68230 Also, my efforts to date to prepare for a UTF-8 input future have not gone without occasional complaint. https://lists.gnu.org/archive/html/groff/2026-03/msg00001.html One point I could add or clarify here is that as the code base moves in the direction of expecting UTF-8/Unicode input, the tractability of inferring character properties by manually maintained sets of numeric tests of character codes diminishes dramatically. When dealing with Unicode input, it's hard to keep one's sanity without relying on library functions that classify characters according to various properties. As a simplified analogy, handling of Unicode character streams makes writing iscntrl(c) rather than if (c < 32) a matter of survival rather than elegance. That's one reason I felt it necessary to disappoint John Gardner, who values being able to use control characters in the names of his *roff registers, strings, and macros. Regards, Branden
signature.asc
Description: PGP signature
