At 2021-07-08T09:59:16-0500, Dave Kemper wrote: > Groff troop, Uh-oh--who has to be Ken Berry?
> I have a question concerning the comparison operator that seems to > have neither a name nor a symbol (in that it can be invoked using > nearly any character). It is documented in groff(7) as 's1's2' with > the defined truth value of "s1 produces the same formatted output as > s2." (This is a correction to the man page since 1.22.4, one of many > changes in commit 356bc65d.) > > The info manual, in fact, has an example to show that this conditional > compares formatted output rather than input strings. This behavior is > in line with historical troff (although even CSTR #54 was unclear > about this). > > But as Bjarni notes in http://savannah.gnu.org/bugs/?60836#comment1, > there are cases where this operator doesn't behave as documented, and > I'm trying to figure out whether this is a failure of the > documentation or of the code. I've looked at our source code and that of V7 troff and Heirloom, and I'm going to say it's a documentation issue, though we have some explaining to do. :-/ > A simplified version of the example he posted there is: > > .ie "ABC\h'2n'"ABC\h'4n'\h'-2n'" \{\ > strings are the same > .\} > .el \{\ > strings are different > .br > ABC\h'2n'D > .br > ABC\h'4n'\h'-2n'D > .\} > > The two input strings: > > ABC\h'2n'D > ABC\h'4n'\h'-2n'D > > do produce identical output, which can be deduced from looking at the > strings and knowing how \h escapes work, and can be verified by > looking at groff's intermediate output via its -Z option: following > the respective strings' V commands, which each move to a different > line, the rest of the intermediate output is identical for both cases: > > H72000 > tABC > h10000 > tD > > Heirloom troff produces intermediate output different from groff's, > but Heirloom's output from both strings matches each other, differing > only in the vertical position (V) line: > > H72000 > V24000 [or 36000] > cA > h7220cB > h6670cC > h16670cD > > Yet both implementations display "strings are different," indicating > that the nameless conditional in question does not think they produce > identical output. > > This strongly looks like a bug, but the consistency across troff > implementations gives me pause. Is there some subtlety to the way > this comparison conditional is supposed to work that explains these > results? I think Ossanna implemented what was practical and easy without worrying too much about how the functionality would have to be rationalized if explained to someone without access to the source code. (And a lot of time to spend reading it, given its style.) It could also be that this comparison operator dates back to the assembly-source versions of nroff. At 2021-07-08T20:04:49+0000, Bjarni Ingi Gislason wrote: > Saying "identical output" is a wrong interpretation of the text in > "CSTR #54", which is "... if the strings compare identically (including > motions and character size and font)...". > > The words "size" and "font" are used in singular case, and the > word "motions" is plural. > > So the strings must compare identically for each motion! Without doing a line by line breakdown of what's going on, here's what I understand. My grasp of V7 and Heirloom is weaker due to less familiarity on my part and the nigh-indecipherable function names. ("cmpstr" is obvious enough, but this function calls things like "wbf", "wbt", "rbf0", and "incoff". Maybe the last increments an offset.) GNU troff creates two temporary environments and renders the comparands into them[1]. The GNU implementation has a data structure called a "node" into which most input tokens are translated. The node lists of the two comparands are what are actually compared[2]. A groff node is a class with many subclasses, and the subclasses implement a comparison operator called "same". You can find many expamples in the node.cpp source file. The same_node_list() function walks the two lists[3]. Heirloom Doctools troff, which remains similar to V7 Unix troff, has a function caseif()[4] that handles the .if request and calls a function cmpstr()[5] if it decides it has seen a delimiter. Observe that the latter function saves and restores only three data: the point size, the "actual point size" (because what was requested and what is fulfillable with respect to type size may differ--recall that the C/A/T only had a handful of discrete type sizes), and the current font/style. Hmm, this email is demanding more time to write even in summary than I can give it right now. The bottom line is that CSTR #54 seems to have been honest, if idiosyncratically loose, about what a "string comparison" is in its domain, and Bjarni was correct to point it out. I think it is best if our documentation avoids calling this "string comparison". As our Texinfo manual already notes[6], if one needs a true string comparison, one can bracket the comparands with \? escape sequences. Even calling it the "output-equivalence operator" is a bit misleading, sadly. As Dave notes, you can have an arbitrarily long sequence of ultimately nilpotent motions that result in equivalent _output_. The behavior of this operator is so specialized that I despair of coming up with a better name for it. I dislike the idea of referring to it as a "node list comparison operator" because that exposes an implementation detail. Anyone have any recommendations? Right now I like "output comparison operator", as suggested by the Subject line, and which lacks the baggage the term "equivalence" carries. Regards, Branden [1] https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp#n5793 [2] https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp#n5820 [3] https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/node.cpp#n5090 [4] https://github.com/n-t-roff/heirloom-doctools/blob/master/troff/n5.c#L1591 [5] https://github.com/n-t-roff/heirloom-doctools/blob/master/troff/n5.c#L1875 [6] ยง5.20.1, Operators in Conditionals
signature.asc
Description: PGP signature