Re: [groff] Mapping of \(bu to MIDDLE DOT

Jeff Conrad Thu, 28 Mar 2019 04:38:17 -0700

On Thursday, March 28, 2019 3:01 AM, G. Branden Robinson wrote:

> At 2019-03-27T04:34:18+0000, Jeff Conrad wrote:
> > Is there a reason that tty.tmac translates \(bu to \(pc or \(md
> > regardless of the output device or whether \(bu is available?
> >
> > .ie c\[pc] \
> > .  tr \[bu]\[pc]
> > .el \
> > .  if c\[md] \
> > .    tr \[bu]\[md]
> 
> Are you looking at an old implementation?  There's some important
> context missing here:


Yep-I'm using 1.22.3.  Running Windows, I've had to diddle a few things,
so the upgrade isn't as simple as it could be.

> $ nl /usr/share/groff/1.22.4/tmac/tty.tmac | sed -n '14,21p'
>     14  .if !'\*[.T]'utf8' \{\
>     15  .  ie c\[pc] \
>     16  .    tr \[bu]\[pc]
>     17  .  el \
>     18  .    if c\[md] \
>     19  .      tr \[bu]\[md]
>     20  .\}
>     21  .
> 

> It sure seems like you might be re-reporting a problem Carsten Kunze
> raised in June 2015, and which prompted Werner to wrap the conditional
> you mention in an "if device is not UTF-8" block:

> https://lists.gnu.org/archive/html/groff/2015-06/msg00040.html

Again, yep-I used the wrong search query ...

> Really we shouldn't be conditional on UTF-8 per se, but on the existence
> of the bullet glyph in the font for the tty device.

Completely agree.

> However, the tty device ignores fonts ...

> these devices can report their character repertoire up to an
> application.  VGA-style console devices, framebuffer consoles, and GUI
> terminal emulators can even change these on the fly.  (Who else
> remembers live-hacking the display font in MS-DOS?)

We're obviously at the mercy of the chosen font (on Windows, I use
Lucida Console as the best of very limited options).  But the device at
least gives us a reasonable idea of what's possible.

> So Werner's fix worked because there were (and are) no nroff/tty devices
> in the groff tree that supported the bullet character _except_ -Tutf8.
> 
> My recommendations are:
> 1) Upgrade to groff 1.22.4; and
> 2) Change the conditional on line 14 of tty.tmac from:
> 
>     14  .if !'\*[.T]'utf8' \{\
> 
> to:
> 
>     14  .if !c\[bu] \{\
> 
> ...and tell us if that fixes your problem.

Making this change (which I've already done) indeed fixes things.

> Personally, I advocate incorporating cp1252 into groff.  It's only an
> 8-bit character set, should therefore be a low maintenance burden, and
> really should make life a bit more bearable for groff's Windows users.
> And that's good PR for groff, GNU, copyleft, and Free Software.

It's yours for the asking; it's really just latin1 with the additional
characters that Microsoft added to the C1 area.  I went a bit further
and added spelled-out representations of missing Greek characters (I
hate missing symbols; in the old, old days, I guess one would print the
document and write in the missing symbols.  Yeah, right ...).  But if
these additions aren't for everyone, they're easily deleted.

> > Even for Tlatin1, I'd prefer an asterisk or even the age-old
> > overstruck '+' and 'o'.  Isn't the general rule for nroff to make the
> > best possible visual approximation when the true character isn't
> > available?
> 
> As noted above, knowing what will actually show up on the output
> device is, in principle, impossible for nroff/tty output devices.

The user needs to pick the most appropriate font; there don't seem
to be all that many choices that we need to worry about.

> However, we can generally assume that users of 8-bit encodings will
> have comprehensive fonts available by default--they'd have to go out
> of their way to avoid them.

But 8-bit encodings (e.g., ISO 8859) have their limitations; in
particular, they're missing most of the common punctuation characters
used in typesetting. The MS extensions addressed most of this.

> Life is harder in UTF-8 world.

Yep.  Especially on Windows.  I had to hack the devutf8 font files to
use U+002D rather than U+2010 for a hyphen, because Lucida Console
doesn't include the latter. Ya do what ya gotta do ...

But Microsoft are working on it ...

https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-
utf-8-output-text-buffer/

Skip to "Are we there yet?" near the end if you're less than fascinated
with the topic.

> To get that asterisk:
> 
> In your documents, or your .troffrc, could you not do this?
> 
> .fchar \[bu] *

Yes.  I've already done something similar.  But this won't help with the
few files I generate for general distribution.  For example, for GNU
units, we generate a man page from texinfo source with a perl script,
and obviously can't assume a customized .troffrc-so we include a few hacks
to override some groff settings (e.g., ".tr \(oq'").  We actually don't
even assume groff, so we try to cover all the bases; this probably is
overkill nowadays.

> As a minor point, I do think the existing fallback should be reversed in
> order:
> 
> From:
> 
> .fchar \[bu] \z+o
> 
> To:
> 
> .fchar \[bu] \zo+

Interesting how we differ on this.  I don't like either alternative, but
find the 'o' more instantly recognizable-it's sorta kinda a circle.  As
I recall, the AT&T version 2 nterm files that I had in the late 1980s
had it as you suggest, and I reversed it.  I guess it's a matter of
personal preference.  The asterisk avoids the problem.

> The \z+o status quo seems to follow a pattern that makes sense for
> modified letterforms, i.e., \z'a; on a 7-bit ASCII, non-overstriking
> device, you want the "a" to "win", because it carries the more important
> semantic information.

In general, I completely agree.

> That reasoning does not hold for bullet substitutes, which simply need
> to stand out graphically (your argument for not using a middle dot or
> centered period, which may be as small as one pixel on some devices),
> and not be semantically confusable with text.

In this circumstance, I don't know whether we can really separate
graphics and semantics.

> As "o" is actually a word (even in English, though much more prominently
> in Spanish), I find the present arrangement unfortunate.

I think it's largely a matter of context.  As the tag for a list, I
think confusion would be unlikely.  And again, an asterisk-perhaps ugly
but arguably the most common ASCII approximation of a bullet-would seem
to avoid the problem.

In my senior year of high school, I had an English teacher-a PhD-who
tried to drill into us that the "best" English is that which provides
the maximum communication (and it generally avoids pompous polysyllabic
pronouncements).  I suggest something similar for the "best" groff.  Of
course, it's not always easy to reach consensus on the details.

Regards,
Jeff

Re: [groff] Mapping of \(bu to MIDDLE DOT

Reply via email to