At 2022-09-19T18:59:25-0400, Peter Schaffter wrote: > On Mon, Sep 19, 2022, G. Branden Robinson wrote: > > I believe this issue came up last month as well. > > I missed it. There've been an awful lot of groff posts in my inbox > this past year. :)
That means we're thriving, right? ;-) > Is that all it is, then? 'groff -ks' is invoking preconv before > soelim? I believe so. > My confusion grows. If soelim, invoked at the start of the > chain, renders the sourced file correctly, This is an incorrect assumption, I think. soelim _doesn't_ result in a correctly rendered input stream--merely one that preconv remains capable of massaging into valid GNU troff input. > it suggests that soelim is performing preconv magic on it. No magic at all. soelim is quite simple. Consider the following example. (UTF-8 follows, but I also show hex dumps for those readers who aren't thus enabled.) $ cat EXPERIMENTS/includer.groff Minnesota is nice, .so EXPERIMENTS/include-me.groff .pl \n[nl]u $ xxd EXPERIMENTS/include-me.groff 00000000: d0af 3f0a ..?. $ groff -Tutf8 EXPERIMENTS/includer.groff Minnesota is nice, Я? $ groff -Tutf8 EXPERIMENTS/includer.groff | xxd 00000000: 4d69 6e6e 6573 6f74 6120 6973 206e 6963 Minnesota is nic 00000010: 652c 20c3 90c2 af3f 0a e, ....?. $ groff -k -Tutf8 EXPERIMENTS/includer.groff Minnesota is nice, Я? $ groff -k -Tutf8 EXPERIMENTS/includer.groff | xxd 00000000: 4d69 6e6e 6573 6f74 6120 6973 206e 6963 Minnesota is nic 00000010: 652c 20c3 90c2 af3f 0a e, ....?. $ groff -ks -Tutf8 EXPERIMENTS/includer.groff Minnesota is nice, Я? $ groff -ks -Tutf8 EXPERIMENTS/includer.groff | xxd 00000000: 4d69 6e6e 6573 6f74 6120 6973 206e 6963 Minnesota is nic 00000010: 652c 20c3 90c2 af3f 0a e, ....?. $ soelim EXPERIMENTS/includer.groff | groff -k -Tutf8 Minnesota is nice, Я? $ soelim EXPERIMENTS/includer.groff | groff -Tutf8 -k | xxd 00000000: 4d69 6e6e 6573 6f74 6120 6973 206e 6963 Minnesota is nic 00000010: 652c 20d0 af3f 0a e, ..?. > Why, then, doesn't '-s' do the same, regardless of preprocessing > order? I added the following text to our preconv(1) man page in January. Limitations preconv cannot perform any transformation on input that it cannot see. Examples include files that are interpolated by preprocessors that run subsequently, including soelim(1); files included by troff itself through “so” and similar requests; and string definitions passed to troff through its -d command‐line option. > > Ingo suggested deferring consideration of the issue, since we > > were, ehrm, "close to a release". > > Agreed. Well, we weren't all that close to a release, but I don't think we're any closer to resolving the question of how users are to express non-ASCII bytes in file names. I have a twinge of conscience that we need some way of permitting users to do this for the sake of both the `so` and `tm` families of requests. I suspect the existing \[uXXXX] notation is good enough for this purpose, but we don't have a mechanism for letting the user specify an _output_ encoding for either the standard error stream or for file system accesses. I don't think we can resolve these questions for the groff 1.23 release. But _if_ we solved it using the mechanism I propose above, we could indeed, I think, change groff's pipeline ordering so that soelim, if called for, precedes even preconv. Regards, Branden
signature.asc
Description: PGP signature