Hi Dave, At 2024-08-06T12:08:29-0500, Dave Kemper wrote: > On Tue, Aug 6, 2024 at 9:48 AM G. Branden Robinson > I'm [...]certain it has to do with when latin1.tmac is loaded and when > it isn't. > > $ echo ".tm Hi, I'm latin1.tmac!" >> tmac/latin1.tmac > $ groff-latest -a < /dev/null > $ groff-latest -Tutf8 < /dev/null > Hi, I'm latin1.tmac! > $ groff-latest -Tascii < /dev/null > $ [...] > You DID reproduce it. Look at the first output line of each of your > test cases:
Yes, you've got it. I: 1. hyperfocused on the full-caps RÉSUMÉ case because that was the failing instance in a regression test recently added to the suite (a case contributed by you, as I recall), and 2. forgot that "en.tmac" is going to have to select a character encoding even if none of the hyphenation patterns in "hyphen.en" actually use characters from the Latin-1 Supplement (and they don't). You can even/still override the language's choice of character encoding. Caveat dictator. $ ./build/test-groff -Tps -a -m latin1 -ww -Wbreak EXPERIMENTS/resume-special.groff .hy=4 <beginning of page> r<'e><hy> sum<'e> r<'e><hy> sum<'e> R<'E><hy> SUM<'E> $ ./build/test-groff -Tps -a -m latin9 -ww -Wbreak EXPERIMENTS/resume-special.groff .hy=4 <beginning of page> r<'e><hy> sum<'e> r<'e><hy> sum<'e> R<'E><hy> SUM<'E> > OK, now I'm certain. > > > But as it happens I can't reproduce this misbehavior anyway. > > > $ ./build/test-groff -Tutf8 -ww -Wbreak EXPERIMENTS/resume-special.groff > > troff:EXPERIMENTS/resume-special.groff:2: warning: setting computed line > > length 0u to device horizontal motion quantum > > ré‐ > > sumé > > vs > > > $ ./build/test-groff -Tps -a -ww -Wbreak EXPERIMENTS/resume-special.groff > > <beginning of page> > > r<'e>sum<'e> > > This is the only line in your test file output before any .hcode > requests were run, so this shows the default hyphenation for the > system. Well, kind of. The hyphenation language (`.hla`) and hyphenation mode (`.hy`) are the same for these two scenarios. What's happened is that these requests in "latin1.tmac" didn't get read, because the file wasn't sourced at all. .hcode é é .hcode É é Therefore these characters did not acquire nonzero hyphenation codes, and therefore were not valid hyphenation breakpoints. Does this make sense? If so, what I will do is make "en.tmac" `.mso latin1.tmac`. And add another regression test case. Thanks for the report! The subtleties involved in machine-driven hyphenation seem to be endless. Someone ought to write a Ph.D. thesis about how hard it is.[1] Regards, Branden [1] Yes, I know they did. I added a citation of it to the groff Texinfo manual a while back.
signature.asc
Description: PGP signature