Hi Ingo,
that's interesting. When producing for a UTF8 target, your observation
is correct, but for PDF groff does not seem to function as naively assumed.
When I write U+010C (as a character, or in escape form doesn't matter),
my installation produces an "Ä" (A umlaut).
Try
printf '\xc4\x8cSSR' | groff -kT pdf > Ae.pdf
I get the following warning:
troff: <standard input>:1: warning: can't find special character
'u0043_030C'
and the PDF shows "ÄSSR".
My system: Linux fedora 5.11.15-200.fc33, groff 1.22.4, both in default
installation out of the box.
Anyway, for my purpose .AM solves the problem. Is it possible to include
that in the man pages of the groff system? I only found in online, as
indicated in my original post.
Best regards,
Oliver.
On 17/05/2021 15:35, Ingo Schwarze wrote:
Hi Oliver,
Oliver Corff wrote on Sat, May 15, 2021 at 11:39:31PM +0200:
I try to use the correct abbreviation for the former Czechoslovak
Socialist Republic, which is U+010C SSR (C + hacek, caron, wedge).
The first attempt (enter Unicode 0x010C directly, leaving everything to
preconv(1), did not work.
Works for me:
$ printf '\xc4\x8cSSR' | mandoc
$ printf '\xc4\x8cSSR' | groff -kT utf8
Both commands above produce the expected output for me (OpenBSD-current
with no fancy configuration changes, just using the default installation).
00000000 c4 8c 53 53 52 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a |..SSR...........|
00000010 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a |................|
Then I consulted groff_char(7) but there is no
predefined \[vC], only \[vS] etc. for base letters s, S, z and Z. No C!
I keep scratching my head.
Works for me:
$ printf '\\[u010C]SSR' | mandoc
$ printf '\\[u010C]SSR' | groff -T utf8
Both commands above produce the expected output; specifically:
$ printf '\\[u010C]SSR' | groff -T utf8 | hexdump -C
00000000 c4 8c 53 53 52 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a |..SSR...........|
None of the other suggested notations (like \[u0043_030C] work (see
groff(7)) out of the box.
Mandoc doesn't support that syntax, but with groff, even that works for me:
$ printf '\\[u0043_030C]SSR' | mandoc -T lint
mandoc: <stdin>:1:1: WARNING: invalid escape sequence: \[u0043_030C]
$ printf '\\[u0043_030C]SSR' | groff -T utf8 | hexdump -C
00000000 c4 8c 53 53 52 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a |..SSR...........|
.AM
I don't think any fancy workarounds are needed.
Yours,
Ingo