Re: [groff] 01/08: mdoc: Accept mixed-case section headings.

G. Branden Robinson Sun, 27 Sep 2020 01:18:50 -0700

At 2020-09-19T16:01:22-0500, Dave Kemper wrote:
> Straying away from man-page considerations and comparing these two
> approaches in general:
> 
> The .string* requests also have the advantage of handling alphabetic
> Latin-1 characters and the roff escapes that represent them (though
> .stringup fails kind of messily on roff escapes representing
> nonalphabetic Latin-1 characters, such as \[de]).


Admittedly, yes.

> However, if input must go through preconv, the .string* requests
> remove all non-ASCII characters (alphabetic or otherwise) from strings
> passed to them and emit warnings for each one.  The .tr approach,
> while failing to convert non-ASCII alphabetic characters, does
> preserve them.
> 
> .tr is also portable to non-groff roffs.
> 
> So there are trade-offs to either approach.

As the implementor of .string{up,down} I grant that they are feeble.

The only thing that makes them bearable is that they are pretty much
adequate to the man page considerations you're straying away from.

Several weekends ago I started down the road of learning what it would
take to convert the GNU troff engine to use a wide character type for
handling of the input tokens.  That is, you would still read a byte at a
time, but immediately toss it into a wider type and then never have to
worry about its representation format again until emitting
device-independent output, which is already 7-bit ASCII, I think.

32 bits sounded good.  Unicode is only 21 bits so I figured I'd just
move all the crazy groff enums[1] in src/roff/troff/input.h to the top
end of that space, or count backwards from the halfway point in case
someone made noise about signedness issues.  Either way, tons of space,
and they wouldn't even have to be #ifdef-ed for EBCDIC!

Complexity rapidly ramified.  First I was rewriting groff's built-in C++
string library to be wchar_t-based, and I was already anticipating this
list getting swarmed by C++ weenies screaming "why are you reinventing
the wheel AGAIN when the C++ STL is RIGHT THERE?"

Fortunately, I think Zack Weinberg answered that question for me in the
meantime:

  This is because the test probes for C++11 library features, and the
  C++ standard library is notoriously heavyweight.  The test program
  used by _AC_PROG_CXX_CXX11 is only about 150 lines long but it expands
  to 47,000 lines of gnarly template classes after preprocessing, and
  roughly 30,000 assembly instructions after compilation. With -g
  enabled (as is the default) 770,000 lines of debug information are
  also emitted into the assembly.[2]

There were other problems I don't even remember now.  I should have
written up a report of what I saw that had to be dealt with, but I got
discouraged and did not.

Maybe I'll take another crack at it sometime.  I don't yet perceive
whether there is a way to do the char-width migration in a modular way,
or if everything's so tightly coupled that you have to break the world
and then put it back together.  Right now you have to break libgroff
along with the troff executable, and breaking libgroff breaks tons of
other things in the tree.  Maybe a good start (probably on a branch)
would be to give troff its own copy of ligroff to which the violence can
be done.

Regards,
Branden

[1] They're not really enums, just global integer constants.  But at
least they're not preprocessor symbols.

[2] https://savannah.gnu.org/support/index.php?110285

signature.asc
Description: PGP signature

Re: [groff] 01/08: mdoc: Accept mixed-case section headings.

Reply via email to