Re: Do Latin-2-based hyphenation files work with Unicode?

onf Wed, 13 Nov 2024 13:32:45 -0800

Hi Branden,

On Wed Nov 13, 2024 at 7:25 PM CET, G. Branden Robinson wrote:
> [...]
> > i.e. translation should happen on output, not on input,
>
> I'm not sure I agree with that, given the above.  When I see `tr` used,
> it is typically to make input more convenient.


I never said it's not used like that. I just meant to say that groff(7)
suggests the translation happens at the moment the character is
formatted for output rather than at the moment it is read in:
  .tr abcd...
      Translate ordinary or special characters a to b, c to d, and
      so on PRIOR TO OUTPUT. [emphasis added]

which is why I wondered about the things you quote below.

> [...]
> > meaning that using .hla might not be sufficient to switch between cs
> > and fr, because that doesn't switch the encoding used.
>
> I'll have to think about this.  It might not matter in the
> wide-character-type/UTF-8-reading GNU troff future.
>
> While I don't have an ETA for that, I don't want to complicate the
> formatter itself with any features to make eight-bit encodings more
> convenient to use.  That feels like throwing good money after bad.
> UTF-8 is the future.  Heck, it's the present, most places.

I think if anything, this thread demonstrates the complexity that
arises from using multiple character encodings. I was just trying
to make it work that way because that's what we have now, but it
would obviously be much better if one could use UTF-8 directly in
the hyphenation files (or at least the \[u...] characters) without
having to jump through all these hoops.

> [...]
> > groff(7) does mention it, but it's among the last things mentioned in
> > the Hyphenation section. The texinfo manual doesn't mention it at all
> > in its section 5.1.3 about Hyphenation where I would expect it.  (At
> > least the online version -- I haven't found any git source for it,
> > just tarballs.)
>
> You can review up-to-date documentation here:
>
> https://www.dropbox.com/sh/17ftu3z31couf07/AAC_9kq0ZA-Ra2ZhmZFWlLuva?dl=0
>
> The Git source for the bleeding edge of our documentation is at:
>
> https://git.savannah.gnu.org/cgit/groff.git/tree/doc/
> https://git.savannah.gnu.org/cgit/groff.git/tree/man/

Thanks; I overlooked the texinfo source in the doc/ directory. I don't
notice any changes to the hyphenation-related sections that would make
it obvious one should load the appropriate localization files rather
than do it 'by hand' (i.e. by using .hpf etc.), though.

(By the way, that Dropbox PDF viewer is borderline unusable and
downloading the PDF requires logging in. If you ever need something
less bloated, I recommend <https://paste.c-net.org>.)

> > [...]
> > Of course, this wouldn't be necessary if .hy worked like .ad,
>
> That's actually a bad example, but a very popular misconception.  You
> probably mean "if .hy worked like .ps". Or .ft, .ev, .in, .ll,
> .ls, .lt, .po, or .vs;, or groff's .fam, .fcolor, .gcolor, or .pvs.
>
> Without an argument, neither .hy nor .ad restore the "previous"
> hypenation mode or adjustment setting, respectively.

That's not a bad example, you just misunderstood. I know .ad without
argument doesn't restore previous adjustment mode; it caused me some
headaches in the past.

I eventually realized that .ad is not meant to switch back-and-forth
between adjustment modes, but to restore adjustment after it was
disabled with .na. What I was saying above is that if .hy worked in
this way too, i.e. if .hy without arguments restored hyphenation
after .nh was called, the macro I proposed wouldn't be necessary.

> [...]
> I think these are horrible warts in the *roff language that an
> iconoclast should have smashed years ago.  But they work fine for the
> most common cases (temporary disablement with `nh` and `na`,
> respectively) [...]

I would disagree it works fins for temporary disablement with .nh;
see above.

> >  but (unless I am mistaken again :) it doesn't and cannot due to
> >  desired compatibility with AT&T troff.
>
> You might be interested in a feature in the forthcoming groff 1.24.0:
>
> NEWS:
> *  A new request, `hydefault`, and read-only register, `.hydefault`,
>    manage the default automatic hyphenation mode of an environment.
>    This resolves a long-standing problem of *roff formatting.
>
>      When processing input like this,
>      .nh
>      and we temporarily shut off automatic hyphenation,
>      .hy
>      the foregoing request would not do exactly what we expect.
>
>    AT&T and other troffs would set the hyphenation mode to 1 instead of
>    the previous value; for GNU troff this was not an appropriate value
>    for the English hyphenation patterns.  (For example, "alibi" would
>    break as "ali-bi" instead of "al-ibi" after this argumentless `hy`
>    invocation.)  With updates to groff's localization files, the
>    foregoing input now works as desired.

Sounds like what .hy should have been doing from the beginning :)

> I have plans to fix the argumentless `ad` request, but just today I
> decided to kick that out past 1.24.
>
> https://savannah.gnu.org/bugs/?65954

I don't feel like this fixes anything, honestly.
Before this, I could do:
  .ad r
  Lorem ipsum dolor sit amet...
  .br
  .na
  Lorem ipsum.
  .br
  .ad
  Lorem ipsum dolor sit amet...
and couldn't do:
  .ad r
  Lorem ipsum dolor sit amet...
  .br
  .ad c
  Lorem ipsum.
  .br
  .ad
  Lorem ipsum dolor sit amet...

Now I will not be able to do either. I suggest this instead:
  .ad
      Set adjustment mode to \n[.J] if set, b otherwise.
  .ad 0
      Disable adjustment.
      Update \n[.j] and \n[.J] (previous value of \n[.j]).
  .ad MODE
      Set adjustment mode to MODE (l,c,r,b,n).
      Update \n[.j] and \n[.J].
  .na
      As .ad 0.

This should make both scenarios work as expected without breaking any
other ways in which people currently use it. (At least I hope so.)

~ onf

Re: Do Latin-2-based hyphenation files work with Unicode?

Reply via email to