On Tue, Apr 02, 2024 at 01:29:05PM -0500, G. Branden Robinson wrote: > Subject: Re: *roff hyphenation trivia challenge > > At 2024-04-02T13:42:59-0400, Steve Izma wrote: > > On Tue, Apr 02, 2024 at 06:51:51PM +0200, Tadziu Hoffmann wrote: > > > Subject: Re: *roff hyphenation trivia challenge > > > > > For "antidisestablishmen\%tarianism", groff prints > > > > > > antidisestablishmen- > > > tar- > > > i- > > > an- > > > ism > > > > > > (which I think is strange), while TeX and Heirloom troff print > > > > > > antidisestablishmen- > > > tarianism > > > > > > which I think is the only reasonable way of handling this case. > > > > I disagree. > > Oops. I misread Tadziu's example, and hallucinated a leading `\%` in it. > > If there is no _leading_ `\%`, then infixed `\%` escape sequences can > only add hyphenation points; they cannot remove them. AIUI.
Hi Branden, Thanks for the response. But I'm not clear about your comment here. I get the same results as Tadziu, i.e., the hyphenation points prior to the \% disappear. And now, testing with printf '.ll 1n\n\%antidisestab\%lishmentarianism\n' | nroff -Wbreak | cat -s I get the same results: antidisestab‐ lish‐ men‐ tar‐ i‐ an‐ ism This seems to mean that the function of a leading \% only works until a subsequent \% -- but then the behaviour is the same even without a leading \%. In decades of using groff I've never noticed this. It's a good thing you've started this discussion. > > Also for \% at the beginning of a word, I rarely use this. > > I use it frequently in man(7) documents, because the `hw` request is not > portable/reliable (in theory). Also there's no mechanism for removing > these, so if we tolerate/encourage their use, doing so deals a blow to > reliable/predictable batch rendering.[1] Good point. > So let me amend my claim. > > I think it's weird that > > > > [f]or "antidisestablishmen\%tarianism", groff prints > > > > > > antidisestablishmen- > > > tar- > > > i- > > > an- > > > ism > > whereas > > $ printf '.ll 1n\nantidisestablishment\n' | nroff -Wbreak | cat -s > an‐ > tidis‐ > es‐ > tab‐ > lish‐ > ment > > seems like well-behaved formatting to me. > > ...except for the lack of a break point after "ti", of course. > But I'm comfortable assuming that the discrepancy here is a > limitation of the TeX hyphenation system aggravated by > English's polyglot morphology. Since most of my use of groff for books over the last thirty years has been non-fiction (mostly scholarly) material, much of the terminology used doesn't end up in hyphenation lists -- sometimes the words are just too new or rare. The same applies to the preponderance of proper names in scholarly material. Often most hyphenation points were correct but, especially for long words, a point that would make all the difference towards getting a properly spaced line would be missing, as above with "ti-dis". That's why it's convenient to use \% to add to hyphenation points that arise from hyphenation logic as opposed to exception lists. But now that I think about it, we would often prefer to use .hw in these cases because that allows you to define only what is desireable. I should really go back through my various book projects and do some research here. > Is TeX's hyphenation algorithm defeated by the pathological case of > "antidisestablishmentarianism", and groff's implementation of it > "recovers" differently? I don't remember enough about TeX to answer this. I used TeX and LaTeX up to about 15 years ago to typeset about 20 books from computer science conferences and the oversetting of lines caused by the periodic failure of the paragraph-justification algorithms drove me nuts. That was not due to hyphenation problems, but something to do with limits to word-spacing that I probably didn't understand properly. The many lines that overset by only a few points made proofreading really difficult. That's why I'm suspicious of trying to add or replicate these algorithms in groff. -- Steve -- Steve Izma - Home: 35 Locust St., Kitchener, Ontario, Canada N2H 1W6 E-mail: si...@golden.net cellphone: 519-998-2684 == The most erroneous stories are those we think we know best – and therefore never scrutinize or question. -- Stephen Jay Gould, *Full House: The Spread of Excellence from Plato to Darwin*, 1996