[self-follow-up with correction] At 2024-01-19T18:56:37-0600, G. Branden Robinson wrote: > This might be more accurately stated as: > > 2) \X behaves like .device used to (in groff 1.23.0 and earlier).
[correction follows] And I repeat: this is _NOT_ a _hard_ prerequisite to expressing Unicode sequences in the output, but it seems useful so that authors of output drivers (and supporting macro files for them) can keep their sanity. [elaboration] What I mean is that we can pass Unicode between "pdf.tmac" and the output driver _today_. Consider the following notional macro. .de pdfmark2 . nop \!x X ps:exec [\\$* pdfmark2 .. (The open bracket has something to do with PostScript syntax, I think.) ...and it getting called by some other macro encoding the argument... .de pdflink . ds pdf*input \\$*\" . encode pdf*input \" performs magic transformation, like "stringhex" . pdfmark2 \\*[pdf*input] .. ...and I have document using these. .H 1 "This is my heading" .pdflink "HI DERI 😈" This ultimately would show up in the output as something like this. x X ps: exec [4849204445524920F09F9888 pdfmark2 Something pretty close to that works on the deri-gropdf-ng branch today, as I understand it. But my _suggestion_ would be that we support something more like this. x X ps: exec [HI DERI \[u00F0]\[u009F]\[u0098]\[u0088] pdfmark2 or this... x X ps: exec [HI DERI \[uDE08]\[uD83D] pdfmark2 ...or even this... x X ps: exec [HI DERI \[u1F608] pdfmark2 These are groffish ways of expressing UTF-8, UTF-16LE, and UTF-32, respectively. The reuse of groff Unicode code point escape sequence syntax is, I would hope, more helpful than confusing. My concerns are that (1) people don't have to use two different escaping conventions _within the formatter_ to get byte sequences to the output driver, and (2) that driver-supporting macro file writers don't have to handle a bunch of special cases in device control commands. Those factors are what drive my proposal. Regards, Branden
signature.asc
Description: PGP signature