Re: an observation and proposal about hyphenation codes

G. Branden Robinson Tue, 06 Aug 2024 11:34:15 -0700

Hi Dave,

At 2024-08-06T12:08:29-0500, Dave Kemper wrote:
> On Tue, Aug 6, 2024 at 9:48 AM G. Branden Robinson
> I'm [...]certain it has to do with when latin1.tmac is loaded and when
> it isn't.
> 
> $ echo ".tm Hi, I'm latin1.tmac!" >> tmac/latin1.tmac
> $ groff-latest -a < /dev/null
> $ groff-latest -Tutf8 < /dev/null
> Hi, I'm latin1.tmac!
> $ groff-latest -Tascii < /dev/null
> $
[...]
> You DID reproduce it.  Look at the first output line of each of your
> test cases:


Yes, you've got it.  I:

1.  hyperfocused on the full-caps RÉSUMÉ case because that was the
    failing instance in a regression test recently added to the suite (a
    case contributed by you, as I recall), and

2.  forgot that "en.tmac" is going to have to select a character
    encoding even if none of the hyphenation patterns in "hyphen.en"
    actually use characters from the Latin-1 Supplement (and they
    don't).

You can even/still override the language's choice of character encoding.
Caveat dictator.

$ ./build/test-groff -Tps -a -m latin1 -ww -Wbreak 
EXPERIMENTS/resume-special.groff
.hy=4
<beginning of page>
r<'e><hy>
sum<'e>
r<'e><hy>
sum<'e>
R<'E><hy>
SUM<'E>
$ ./build/test-groff -Tps -a -m latin9 -ww -Wbreak 
EXPERIMENTS/resume-special.groff
.hy=4
<beginning of page>
r<'e><hy>
sum<'e>
r<'e><hy>
sum<'e>
R<'E><hy>
SUM<'E>

> OK, now I'm certain.
> 
> > But as it happens I can't reproduce this misbehavior anyway.
> 
> > $ ./build/test-groff -Tutf8 -ww -Wbreak EXPERIMENTS/resume-special.groff
> > troff:EXPERIMENTS/resume-special.groff:2: warning: setting computed line 
> > length 0u to device horizontal motion quantum
> > ré‐
> > sumé
> 
> vs
> 
> > $ ./build/test-groff -Tps -a -ww -Wbreak EXPERIMENTS/resume-special.groff
> > <beginning of page>
> > r<'e>sum<'e>
> 
> This is the only line in your test file output before any .hcode
> requests were run, so this shows the default hyphenation for the
> system.

Well, kind of.  The hyphenation language (`.hla`) and hyphenation mode
(`.hy`) are the same for these two scenarios.  What's happened is that
these requests in "latin1.tmac" didn't get read, because the file wasn't
sourced at all.

.hcode é é
.hcode É é

Therefore these characters did not acquire nonzero hyphenation codes,
and therefore were not valid hyphenation breakpoints.

Does this make sense?

If so, what I will do is make "en.tmac" `.mso latin1.tmac`.

And add another regression test case.

Thanks for the report!

The subtleties involved in machine-driven hyphenation seem to be
endless.  Someone ought to write a Ph.D. thesis about how hard it is.[1]

Regards,
Branden

[1] Yes, I know they did.  I added a citation of it to the groff Texinfo
    manual a while back.

signature.asc
Description: PGP signature

Re: an observation and proposal about hyphenation codes

Reply via email to