Hi Jan, At 2023-10-28T15:18:05+0200, Jan Engelhardt wrote: > A recent LWN.net article <https://lwn.net/Articles/947941/> (paywalled > for a while)
For the benefit of those reading this in the future, the article should be free to read starting about 2 November 2023. > pointed at https://bugs.debian.org/1041731 and the topic of > "-" vs "\-". > > Given the following input: > > -\-\[u002D]\[u2013]\[u2014]+\[u2212] > > Feeding it through `groff -Tutf8`, I get > > ‐−-–—+− > <U+2010><U+2212><U+002D><U+2013><U+002B><U+2014> > > groff_char(7) says \- maps to "minus sign/Unix dash". Ambiguous, but > ok, it is what it is. Yes. We're kind of trapped here; AT&T troff always documented `\-` specifically and exclusively as a "minus sign". Not a "hyphen-minus" or something like that. The "Unix dash" term might have been my invention to try to advise the same people who aren't listening to me in that LWN thread. > Is there a better way though than to explicitly use \[u002D] to get a > guaranteed U+002D? Not a better one, no. (There's a worse one, involving `\N`.) _Unless_ you're using man(7) or mdoc(7), your document can: 1. Remap \- to \[u002D] with `tr` or `char`; or 2. Define a string to interpolate \[u002D]. Man pages should not do either of these, because they will just make a bad situation worse, causing more man pages to be inconsistent with each other, Albert Cahalan-style. > Second, I turn to PostScript output that is generated by > `groff -Tps`. One observes: > > troff:<standard input>:1: warning: special character 'u002D' not defined > > (Converting the PS to PDF and opening that with evince), the rendered > view shows a hyphen, a minus, an endash, an emdash, and another minus > but rendered in a different vertical position which does not line up > with the '+' sign. Let's see, your input was... > -\-\[u002D]\[u2013]\[u2014]+\[u2212] That should be, in order: a. a hyphen (U+2010) b. a minus sign (U+2212) from the "current font" (likely a text font) c. a hyphen-minus (U+002D) d. an en dash (U+2013) e. an em dash (U+2014) f. a plus sign (U+002B) from the "current font" (likely a text font) g. and a minus sign (U+2212) from the "special font". A shorter way to say \[u2212] is \[mi] (or `\(mi`; it's a venerable special character identifier going back to Ossanna troff). GNU troff maps certain Unicode code points back to special characters first. https://git.savannah.gnu.org/cgit/groff.git/tree/src/libs/libgroff/uniglyph.cpp?h=1.23.0#n392 groff_char(7) attempts to explain why all this "text font" and "special font" business exists. Notes describes the glyph, elucidating the mnemonic value of the glyph name where possible. [...] Entries marked with “***” denote glyphs used for mathematical purposes. On typesetting devices, such glyphs are typically drawn from a special font (see groff_font(5)). Often, such glyphs lack bold or italic style forms or have metrics that look incongruous in ordinary prose. A few which are not uncommon in running text have “text variants”, which should work better in that context. Conversely, a handful of glyphs that are normally drawn from a text font may be required in mathematical equations. Both sets of exceptions are noted in the tables where they appear (“Logical symbols” and “Mathematical symbols”). Basic Latin [...] The vertical bar is overloaded; the \[ba] and \[or] escape sequences may render differently. See subsection “Mathematical symbols” below for special variants of the plus, minus, and equals signs normally drawn from this range. Mathematical symbols [...] Observe the two varieties of the plus‐minus, multiplication, and division signs; \[+-], \[mu], and \[di] are normally drawn from the special font, but have text font variants. Also be aware of three glyphs available in special font variants that are normally drawn from text fonts: the plus, minus, and equals signs. These variants may differ in appearance or spacing depending on the device and font selected. ...and the entire "History" section. > Third, when one copy-pastes the string shown in evince, I get back: > > -−–—+− > <U+002D><U+2212><U+2013><U+2014><U+002B><U+2212> > > I expected to receive: > > <U+2010><U+002D><U+2013><U+2014><U+002B><U+2212> > > so that copypasting commands from PS/PDF would work "right" > similarly as it does for manpages when they use \-. That is because \- is not a "hyphen-minus" (except in man pages, where we are forced to remap it for practical reasons). The C/A/T typesetter that the Bell Labs CSRC acquired didn't _have_ a "hyphen-minus" glyph. It had a hyphen, a minus sign, and an em dash. So, to troff, \- is a minus sign, and when you format `\-` when not using a man page macro package, that is what you get. If you add \[pl] to your list, _that_ plus sign's crossbar should line up with the U+2212 minus sign, and if it doesn't, I'd be curious to see the output of "groff -Tps -Z".[1] (But it's always possible for a font to be buggy.) Does this clear things up? Please tell me if there is anything not making sense, or any way I can improve the groff_char(7) man page. Regards, Branden [1] For me, it's doing what it should. $ printf -- '-\\-\\[u002D]\\[u2013]\\[u2014]+\\[u2212]\\[pl]\n' | groff -Tps -Z | tail troff:<standard input>:1: warning: special character 'u002D' not defined x font 11 S f11 Cmi h5490 Cpl h5490 n12000 0 x trailer V792000 x stop
signature.asc
Description: PGP signature