Hi Doug, Douglas McIlroy wrote on Mon, Jun 28, 2021 at 06:05:53PM -0400:
>> Not using such a file [a tmac striipper] makes the software less effective; >> thus such a move ["skip the stripper"] is simply a sabotage. > I am not at all convinced of the first claim above. Please provide some > hard evidence for it. (A simple assertion that stripping dramatically > shortens some tmac files is not evidence for any effect on software > effectiveness, i.e. correct performance on all inputs, and timely > performance on realistic inputs.) I hesitate spending a lot of time for doing a rigorous measurement, but here is one data point: The OpenBSD ksh(1) manual page is among the largest and most relevant real-world mdoc(7) manual pages that i'm aware of. When it is already in the buffer cache, times for formatting it with commands like time groff -mdoc -Tascii ksh.1 time mandoc -mdoc -Tascii ksh.1 are, on my notebook, about 0.69 to 0.73 seconds with groff and stripped mdoc macros 0.75 to 0.77 seconds with groff and unstripped mdoc macros about 0.04 seconds with mandoc about 0.10 seconds with mandoc when the page is not yet in the buffer cache So, physical reading from disk takes about 1/20 of a second, mandoc takes about the same time again for formatting, groff takes about ten times the time of mandoc for formatting (which surprises me a bit, what i remebered was more like a factor of three than a factor of ten, but that was years ago, lots of things may have happened in the meantime). The difference between stripped and unstripped macros appears to be measurable, but likely below 10%, which is a tiny effect compared to performance differences between different implementations. Either way, i don't think the difference between 0.71s and 0.76s is particularly relevant for any conceivable application. For typical interactive use, the difference between between a response time of 0.1s and 0.7s may be noticable for impatient users, but i don't consider even that a serious issue. Times for PostScript and PDF are about: 0.79 to 0.82 seconds groff -Tps stripped 0.83 to 0.85 seconds groff -Tps unstripped 2.34 to 2.38 seconds groff -Tpdf stripped 2.37 to 2.57 seconds groff -Tpdf unstripped So surprisingly, even though real typesetting takes longer than terminal output, the performance loss is harder to measure in the typesetting case. Besides, real typesetting is rarely done interactively, so even a 10% performance loss would be less relevant than for terminal output. I'm not providing performance numbers for mandoc -Tps / -Tpdf because output quality of mandoc in these modes is so bad that a performance comparison would not make sense. The other macro sets in question, me and hdtbl, are significantly simpler than mdoc, and almost never used interactively, so unless shown otherwise, i think it is reasonable to assume that they do not suffer in a practically relevant way either. > Barring a surprise answer above, I vote a vigorous yes. Stripping, I > believe, gratuitously impairs readability. If an infelicitous tmac file > deploys so many comments and indenting spaces within time-significant > macros as to perceptibly affect performance, the right solution is to > correct, not embalm, these rare stylistic flaws . > > Furthermore, stripping is almost certainly impossible to do right. > How, for example, do you know that a line in a macro that begins .\" is > a comment? You have no idea whether . will be the control character > when the macro is expanded. Yes, it's a cooked-up example that can > be overcome by an equally cooked-up -u flag in the source repository. > Occom would not approve of this multiplication of entities. Yes. I believe that is a good summary of the main arguments against stripping. Also, it did happen in the past that stripping introduced bugs, so your argument is not a hypothetical one. Yours, Ingo