[Groff] The future redux

Peter Schaffter Mon, 24 Feb 2014 19:58:23 -0800

I'm afraid this will be a long post.  Sorry, but I don't see any way
around it.


It's ironic and instructive that the thread, "Future direction of
groff", which became a semantic-vs-presentational debate eerily
similar to a previous discussion on the same subject, originated
in the "space-width" thread, which deals with the minutiae of
presentation.

Re-reading the posts, a number of things become clear.  First,
groffers on the list value quality typesetting, and there's
considerable discomfort with moving away from the physical page,
conceptual or otherwise, to the multi-modal paradigm.  Secondly,
there's a perceived conflict between backward compatibility and future
development.  Thirdly, more than one list member expressed fear that
improving groff will somehow break it.

Lastly, there's Eric's insistence that "if it ain't semantic it's a
dinosaur," which was greeted with as much resistance this time as
the last time the subject came up.  Allow me to say here that Eric
may well be right.  His indispensable contributions to open-source
development lend considerable weight to any argument he makes.  He's
been an inspiration to me for years.  I'm disinclined to dismiss out
of hand any opinion he offers.

If groff is to have a future, if it is to remain vital, it is clear
these issues need to be looked at.  More than groff is at stake.  Do
we live in a world where typography itself is becoming an archaic
craft, like stone masonry?  Is print really dead, as some have
asserted?  Is there a future for documents that are beautifully
typeset but only render properly as PDF or PostScript?

And, if the answers are, in order, "no," "no," and "yes," do we
really need both groff and TeX when TeX is, at present, more
typographically sophisticated?

I think, too, we need to consider the distinction between
"documentation" and "a document".  Is the primary purpose of groff
to produce documentation, which requires an ordered structuring of
the subject matter, or to produce material of a more fluid nature
where the presentation (the typography) serves expressive rather
than structural needs?

* Backward Compatibility

Mike Bianchi summed up the backward compatibility concern best:

 "The fact that I can still format documents I wrote in the 1970s
  and beyond is valuable to me, and, should any of them ever become
  classics, possibly to others in the future.  So no, do not break
  groff by 'modernizing' it."

Backward compatibility has been a mainstay of *roff throughout its
history, and any effort to improve groff should remain true to that
goal.  However, ultra-strict adherence to backward compatibility
must not stand in the way of improvements to existing functionality
and the quality of final output.

While it's a wonderful convenience to use groff in 2014 to format
documents from the 1970s, it must be remembered that a) those
documents exist as plain text, thus changes to groff would have no
impact on the readability of their content, and b) the formatting
directives in those documents, equally, is plain text, and thus
easily understood presentationally and semantically--by humans, if
not by Eric's "baroque AI baby." :)

In short, if some backward compatibility were to be lost in future
versions of groff--and frankly, I doubt it would--documents written
in the 70s would remain as useful and valuable as ever.  The only
thing that might be lost is the time it would take to write a Perl
script to parse and update said documents to reflect probably very
minor changes to groff's behaviour.

Furthermore, let's not forget that under Werner's leadership, groff
underwent a considerable amount of modernization.  Nothing got
broken, and the whole groff picture improved splendidly.  I see no
reason to fear that future development will be any different.

* Fear of modernizing groff

Mike wasn't the only one who expressed fears about the modernization
of groff.  Walter Alejandro Iglesias wrote:

 "In general terms what in the name of *modernity* developers have
  being doing with Unix, by Unix I mean the idea behind, what made
  of it a well designed OS, has being just adapting it to those used
  to the MSWindows experience.  Marketing.  First that modifications
  took place in userland, but today, base, init and even the kernel
  are suffering them."

While I share Walter's feelings, I don't think they apply to the
kinds of changes we've all been dancing around that need to be
made to groff.  For example, groff's line-at-a-time approach to
formatting, if unchanged, will remain an impediment to high quality
typesetting and ensure groff's demise for anything other than
writing manpages.  Since the point of implementing page-at-once
formatting (or, as Werner dreamed, document-at-once) would be
to improve the quality of typeset output, not to change the
fundamentals of groff usage, resisting such a change seems like
misplaced Luddism.

* Groffers love good typography

I scarcely need comment on this.  The most interesting discussions
on the list, the ones that generate the most responses, deal with
fussiest of typographic details: how much space to put between
sentences, the length of dashes, hanging punctuation and margin
correction for glyphs, letter-spacing versus font expansion, and so
on.

Why this love is important relates to the future of typography in
general.  We are concerned about these things because we do not
accept shoddiness.  Since the advent of DTP, and hastened by the
rush toward generalized input suitable for multi-modal display,
there has been, globally, a serious decline in the quality of
typesetting, by which I mean the balanced, aesthetically-pleasing
arrangement of words regardless of the display medium.

Good typography lends weight--gravitas--to content, in addition
to facilitating the act of reading.  It makes words come alive
and encourages contemplation of the ideas they represent.  Good
typography says, "These words matter and deserve respect."

It is an ongoing battle, tending the fires of quality in the face of
widespread dilettantism.  Perhaps I am guilty of adopting a doomed
Romantic stance (wouldn't be the first time), but I believe groff,
and the community of users it attracts, are essential to keeping
the craft of typography alive.  The same is true of TeX and its
community.  People respond well to quality typesetting, and there's
no reason why they shouldn't expect it, if not now, at least in the
future.  If we allow groff to remain as-is, if we cease to look for
ways to improve the quality of output and refine the tools used to
create documents, we are doing a disservice to the future since it
is in our hands to preserve what, of the past, should not be lost.

* The great presentational vs semantic markup debate

>From a typographic standpoint, markup, whether presentational or
semantic with linked stylesheets, is only as useful as the program or
device interpreting it.  There's a reason why _The Binbrook Caucus_,
a novel I put online a few years back, isn't in html: browser
rendering of type sucks.  I doubt that's going to change any time
soon.  The novel uses typography to express changes of tense and
POV, as well as to convey things like verbatim newspaper articles,
train station announcements, email correspondence, and typewritten
copy.  For readability, it requires fixed margins and a degree of
control over justification that's impossible to achieve with, eg,
xhtml, which is my preferred way of formatting web pages.

For similar reasons, the novel isn't ePubbed or mobied on Kindle.
Only PDF, page-centric and type-specific, is capable of rendering
the work according to my intentions.

Eric says: 

 "What I don't believe is that there will ever again be enough
  demand for printer-*only* output to justify markup formats
  and toolchains that don't also do web and ePub or functional
  equivalent."

In this he may well be right, but he is speaking of a world where
precise control over typography no longer plays the role it
does presently in document design.  A world where "approximately"
rules--fine for certain types of writing, notably documentation,
technical papers, blogs, and the like, where, whether on paper,
a monitor or a smartphone, it is enough for a headings to "look
like" headings, paragraphs to "look like" paragraphs, nested lists
to "look like" nested lists, and so on.  And it is certainly in the
best interests of future markup formats and toolchains to do their
best at generating output renderable on the greatest number of
devices.

Interestingly, this was what the various *roffs aimed at originally:
device-independent output that could be fed to various drivers.
But somewhere along the line, I believe, that mandate lost its
importance.  After the valiant attempt of grohtml, no new drivers
were added until Deri James contributed gropdf.  Had the mandate
retained its importance, the question of whether groff should have
an XSL-FO driver would be moot because, in all likelihood, it would
already have been written, along with several other much-needed
drivers, grortf being the most pressing for many of us (and I wish
to God we didn't all hate RTF so much so somebody would, in fact,
write the driver).

I think what happened is that, over time, near-exclusive use of the
PostScript driver caused many of us to confuse groff output with
grops output--if not intellectually, at least at the conceptual
level.  We began to think of groff as a PostScript typesetting
engine.  Few are the posts in the last ten years dealing with, say,
terminal output issues, while legion are the posts about what are
clearly PostScript presentational issues (like the "space width"
thread).

I suspect the uneasiness with what Eric has to say about groff's
future, and the whole semantic-vs-presentational debate, stems from
Eric addressing the issue in terms of groff's original mandate
(device-independent output, a concept largely replaced by the notion
of display-neutral xml), while everyone else looking at groff as a
typesetter for the paper or full-screen-viewable documents that,
Eric's doom and gloom predictions aside, continue to form the bulk
of lists members' groff usage.  Consider Tadziu Hoffman's comment,
which mirrors my observations, above:

 "Personally, I use groff exclusively for printing out stuff (or
  creating PDFs as a sort of virtual paper)--something to be read,
  as is, *by humans*--nothing else.  (For the web (if I must), I
  write HTML.)  In principle, I could also print from the Browser,
  but the results are ghastly.  I would not be willing to compromise
  typesetting quality in exchange for additional media I have no
  plans of using."

We cannot ignore the need for groff to accommodate the Web and
ePub-y type things, despite its paper-centricity.  Eric is quite
right that "printer-only" will never again be enough.  However, as
he also points out, attempting to extrapolate semantic meaning from
groff output is impossible "... because the information required to
do that is thrown aware at macro expansion time."  The conclusion
he draws from this strikes me as self-evident: "The difficult but
correct thing to do is to recover structural information by looking
for cliches in the source markup *before* it goes through troff."

But why "difficult"?  Well, mostly owing to historical groff
(mis?)use, which fostered conditions where, in Mike's words,
"...presentation and formating are horribly intermingled."  The
classical macrosets are not well set up for creating stylesheets
"that really sing" (esr).  Groff-hands got in the habit of dealing
with presentational issues for semantic elements on-the-fly,
either with groff primitives or by writing their own macros--the
cliches of which Eric speaks.  Had there been, historically, a
macroset that provided a fairly complete set of semantic tags and
an easily-parsable mechanism for applying presentational markup
to those tags, Eric might never have had to write the miracle of
cleverness that is DocLifter.

Without a single change to groff as it stands now, all that's really
required to generate xml from groff source files are a) well-formed
source files, and b) mechanisms for parsing and transforming them
into xml.  (I speak here of present and future documents, not the
vintage stuff.)  If I can convert a mom file to xhtml using sed
(gasp!), xml and xml stylesheets are perfectly feasible with more
sophisticated tools.

The trick, of course, is writing well-formed source files, with
clear a distinction between metadata, stylesheet, semantic tags,
and discardable presentational markup.

* The mom macros

Mom is my baby, but that's not why I'm mentioning it here.  I was
really surprised by Mike's comment: 

 "Done right, a really great macro package would have to clearly
  separated parts: presentation and format.  But it seems *roff has
  never really provided the architecture to support that sort of
  separation, hence macro packages that mush the concepts together."

The mom macros were conceived, from the start, to do exactly what
he describes: keep presentation and formatting separate.  Every
semantic tag has a bevy of "control macros" that permit the styling
of tags separately from the tags themselves, macros that furthermore
begin with the name of the tag, making their intent instantaneously
clear, both to human eyes and to a parser assembling a stylesheet.  A
mom document begins with metadata, is followed by a stylesheet, and
begins formatting, in the sense Mike means, only after the .START
macro is issued.  Throughout the remainder of the document, there's
virtually no need for presentational markup, except discardable
markup like tightening or loosening a line with track kerning
(which, moreover, is done in macro space with the easily identified
EW [extra white] and RW [reduce white]).  In cases where it's
desirable to make available "in-document" presentational markup of
a tag, say to adjust the position of a table, the presentational
markup is attached directly to the tag as an optional argument, which
can be flagged and ignored at parse time.

Furthermore, mom's typesetting functions can be used independently
of document formatting (again, in Mike's sense) to create documents
never likely to be rendered anywhere but on paper (a poster
advertising kittens free to a good home, for example).

In short, groff has always had the "architecture" to support source
files that are both human- and xml-friendly, it's just that it's
rarely been taken advantage of.

* Summing up

Groff can, and does, have a future, one that accommodates both
printer (or analogous) output and display-neutral xml.  The
typesetting engine/pipeline (ie ditroff=>grops/gropdf) needs to be
overhauled to remove the hurdles posed by line-at-a-time processing.
Historical baggage like not respecting order of precedence can,
I believe, be fixed without earth-shattering consequences.  Adding
arrays, as has been suggested, wouldn't break anything.  (Myself, I
wouldn't mind case statements, as I find I'm increasingly using
while loops to simulate getopts functionality.)

I think we all agree some requests could do with a dose of sanity,
which could be accomplished by creating new requests in the manner
of .de1 or .am1 whenever altering their historical behaviour would
break existing documents.

And all this is all doable, it seems to me, without affecting backward
compatibility.  I don't think we have to consider forking; in fact,
I'm against it.  What we do have to do is acknowledge that groff, as
a typesetting engine, has the potential to stand next to TeX, and,
as such, remain a viable choice for quality typesetting.  I really
don't believe that printer output, or the physical page paradigm for
screen documents, is anywhere near falling into desuetude.

As for xml output, I'm convinced that's a source file, macro level
issue.  The mom macros point the way for xml-friendly structuring of
source files; who knows what a joint-development effort in a similar
vein could accomplish?  

Again, sorry for the long post.

-- 
Peter Schaffter
http://www.schaffter.ca

[Groff] The future redux

Reply via email to