I'm afraid this will be a long post. Sorry, but I don't see any way around it.
It's ironic and instructive that the thread, "Future direction of groff", which became a semantic-vs-presentational debate eerily similar to a previous discussion on the same subject, originated in the "space-width" thread, which deals with the minutiae of presentation. Re-reading the posts, a number of things become clear. First, groffers on the list value quality typesetting, and there's considerable discomfort with moving away from the physical page, conceptual or otherwise, to the multi-modal paradigm. Secondly, there's a perceived conflict between backward compatibility and future development. Thirdly, more than one list member expressed fear that improving groff will somehow break it. Lastly, there's Eric's insistence that "if it ain't semantic it's a dinosaur," which was greeted with as much resistance this time as the last time the subject came up. Allow me to say here that Eric may well be right. His indispensable contributions to open-source development lend considerable weight to any argument he makes. He's been an inspiration to me for years. I'm disinclined to dismiss out of hand any opinion he offers. If groff is to have a future, if it is to remain vital, it is clear these issues need to be looked at. More than groff is at stake. Do we live in a world where typography itself is becoming an archaic craft, like stone masonry? Is print really dead, as some have asserted? Is there a future for documents that are beautifully typeset but only render properly as PDF or PostScript? And, if the answers are, in order, "no," "no," and "yes," do we really need both groff and TeX when TeX is, at present, more typographically sophisticated? I think, too, we need to consider the distinction between "documentation" and "a document". Is the primary purpose of groff to produce documentation, which requires an ordered structuring of the subject matter, or to produce material of a more fluid nature where the presentation (the typography) serves expressive rather than structural needs? * Backward Compatibility Mike Bianchi summed up the backward compatibility concern best: "The fact that I can still format documents I wrote in the 1970s and beyond is valuable to me, and, should any of them ever become classics, possibly to others in the future. So no, do not break groff by 'modernizing' it." Backward compatibility has been a mainstay of *roff throughout its history, and any effort to improve groff should remain true to that goal. However, ultra-strict adherence to backward compatibility must not stand in the way of improvements to existing functionality and the quality of final output. While it's a wonderful convenience to use groff in 2014 to format documents from the 1970s, it must be remembered that a) those documents exist as plain text, thus changes to groff would have no impact on the readability of their content, and b) the formatting directives in those documents, equally, is plain text, and thus easily understood presentationally and semantically--by humans, if not by Eric's "baroque AI baby." :) In short, if some backward compatibility were to be lost in future versions of groff--and frankly, I doubt it would--documents written in the 70s would remain as useful and valuable as ever. The only thing that might be lost is the time it would take to write a Perl script to parse and update said documents to reflect probably very minor changes to groff's behaviour. Furthermore, let's not forget that under Werner's leadership, groff underwent a considerable amount of modernization. Nothing got broken, and the whole groff picture improved splendidly. I see no reason to fear that future development will be any different. * Fear of modernizing groff Mike wasn't the only one who expressed fears about the modernization of groff. Walter Alejandro Iglesias wrote: "In general terms what in the name of *modernity* developers have being doing with Unix, by Unix I mean the idea behind, what made of it a well designed OS, has being just adapting it to those used to the MSWindows experience. Marketing. First that modifications took place in userland, but today, base, init and even the kernel are suffering them." While I share Walter's feelings, I don't think they apply to the kinds of changes we've all been dancing around that need to be made to groff. For example, groff's line-at-a-time approach to formatting, if unchanged, will remain an impediment to high quality typesetting and ensure groff's demise for anything other than writing manpages. Since the point of implementing page-at-once formatting (or, as Werner dreamed, document-at-once) would be to improve the quality of typeset output, not to change the fundamentals of groff usage, resisting such a change seems like misplaced Luddism. * Groffers love good typography I scarcely need comment on this. The most interesting discussions on the list, the ones that generate the most responses, deal with fussiest of typographic details: how much space to put between sentences, the length of dashes, hanging punctuation and margin correction for glyphs, letter-spacing versus font expansion, and so on. Why this love is important relates to the future of typography in general. We are concerned about these things because we do not accept shoddiness. Since the advent of DTP, and hastened by the rush toward generalized input suitable for multi-modal display, there has been, globally, a serious decline in the quality of typesetting, by which I mean the balanced, aesthetically-pleasing arrangement of words regardless of the display medium. Good typography lends weight--gravitas--to content, in addition to facilitating the act of reading. It makes words come alive and encourages contemplation of the ideas they represent. Good typography says, "These words matter and deserve respect." It is an ongoing battle, tending the fires of quality in the face of widespread dilettantism. Perhaps I am guilty of adopting a doomed Romantic stance (wouldn't be the first time), but I believe groff, and the community of users it attracts, are essential to keeping the craft of typography alive. The same is true of TeX and its community. People respond well to quality typesetting, and there's no reason why they shouldn't expect it, if not now, at least in the future. If we allow groff to remain as-is, if we cease to look for ways to improve the quality of output and refine the tools used to create documents, we are doing a disservice to the future since it is in our hands to preserve what, of the past, should not be lost. * The great presentational vs semantic markup debate >From a typographic standpoint, markup, whether presentational or semantic with linked stylesheets, is only as useful as the program or device interpreting it. There's a reason why _The Binbrook Caucus_, a novel I put online a few years back, isn't in html: browser rendering of type sucks. I doubt that's going to change any time soon. The novel uses typography to express changes of tense and POV, as well as to convey things like verbatim newspaper articles, train station announcements, email correspondence, and typewritten copy. For readability, it requires fixed margins and a degree of control over justification that's impossible to achieve with, eg, xhtml, which is my preferred way of formatting web pages. For similar reasons, the novel isn't ePubbed or mobied on Kindle. Only PDF, page-centric and type-specific, is capable of rendering the work according to my intentions. Eric says: "What I don't believe is that there will ever again be enough demand for printer-*only* output to justify markup formats and toolchains that don't also do web and ePub or functional equivalent." In this he may well be right, but he is speaking of a world where precise control over typography no longer plays the role it does presently in document design. A world where "approximately" rules--fine for certain types of writing, notably documentation, technical papers, blogs, and the like, where, whether on paper, a monitor or a smartphone, it is enough for a headings to "look like" headings, paragraphs to "look like" paragraphs, nested lists to "look like" nested lists, and so on. And it is certainly in the best interests of future markup formats and toolchains to do their best at generating output renderable on the greatest number of devices. Interestingly, this was what the various *roffs aimed at originally: device-independent output that could be fed to various drivers. But somewhere along the line, I believe, that mandate lost its importance. After the valiant attempt of grohtml, no new drivers were added until Deri James contributed gropdf. Had the mandate retained its importance, the question of whether groff should have an XSL-FO driver would be moot because, in all likelihood, it would already have been written, along with several other much-needed drivers, grortf being the most pressing for many of us (and I wish to God we didn't all hate RTF so much so somebody would, in fact, write the driver). I think what happened is that, over time, near-exclusive use of the PostScript driver caused many of us to confuse groff output with grops output--if not intellectually, at least at the conceptual level. We began to think of groff as a PostScript typesetting engine. Few are the posts in the last ten years dealing with, say, terminal output issues, while legion are the posts about what are clearly PostScript presentational issues (like the "space width" thread). I suspect the uneasiness with what Eric has to say about groff's future, and the whole semantic-vs-presentational debate, stems from Eric addressing the issue in terms of groff's original mandate (device-independent output, a concept largely replaced by the notion of display-neutral xml), while everyone else looking at groff as a typesetter for the paper or full-screen-viewable documents that, Eric's doom and gloom predictions aside, continue to form the bulk of lists members' groff usage. Consider Tadziu Hoffman's comment, which mirrors my observations, above: "Personally, I use groff exclusively for printing out stuff (or creating PDFs as a sort of virtual paper)--something to be read, as is, *by humans*--nothing else. (For the web (if I must), I write HTML.) In principle, I could also print from the Browser, but the results are ghastly. I would not be willing to compromise typesetting quality in exchange for additional media I have no plans of using." We cannot ignore the need for groff to accommodate the Web and ePub-y type things, despite its paper-centricity. Eric is quite right that "printer-only" will never again be enough. However, as he also points out, attempting to extrapolate semantic meaning from groff output is impossible "... because the information required to do that is thrown aware at macro expansion time." The conclusion he draws from this strikes me as self-evident: "The difficult but correct thing to do is to recover structural information by looking for cliches in the source markup *before* it goes through troff." But why "difficult"? Well, mostly owing to historical groff (mis?)use, which fostered conditions where, in Mike's words, "...presentation and formating are horribly intermingled." The classical macrosets are not well set up for creating stylesheets "that really sing" (esr). Groff-hands got in the habit of dealing with presentational issues for semantic elements on-the-fly, either with groff primitives or by writing their own macros--the cliches of which Eric speaks. Had there been, historically, a macroset that provided a fairly complete set of semantic tags and an easily-parsable mechanism for applying presentational markup to those tags, Eric might never have had to write the miracle of cleverness that is DocLifter. Without a single change to groff as it stands now, all that's really required to generate xml from groff source files are a) well-formed source files, and b) mechanisms for parsing and transforming them into xml. (I speak here of present and future documents, not the vintage stuff.) If I can convert a mom file to xhtml using sed (gasp!), xml and xml stylesheets are perfectly feasible with more sophisticated tools. The trick, of course, is writing well-formed source files, with clear a distinction between metadata, stylesheet, semantic tags, and discardable presentational markup. * The mom macros Mom is my baby, but that's not why I'm mentioning it here. I was really surprised by Mike's comment: "Done right, a really great macro package would have to clearly separated parts: presentation and format. But it seems *roff has never really provided the architecture to support that sort of separation, hence macro packages that mush the concepts together." The mom macros were conceived, from the start, to do exactly what he describes: keep presentation and formatting separate. Every semantic tag has a bevy of "control macros" that permit the styling of tags separately from the tags themselves, macros that furthermore begin with the name of the tag, making their intent instantaneously clear, both to human eyes and to a parser assembling a stylesheet. A mom document begins with metadata, is followed by a stylesheet, and begins formatting, in the sense Mike means, only after the .START macro is issued. Throughout the remainder of the document, there's virtually no need for presentational markup, except discardable markup like tightening or loosening a line with track kerning (which, moreover, is done in macro space with the easily identified EW [extra white] and RW [reduce white]). In cases where it's desirable to make available "in-document" presentational markup of a tag, say to adjust the position of a table, the presentational markup is attached directly to the tag as an optional argument, which can be flagged and ignored at parse time. Furthermore, mom's typesetting functions can be used independently of document formatting (again, in Mike's sense) to create documents never likely to be rendered anywhere but on paper (a poster advertising kittens free to a good home, for example). In short, groff has always had the "architecture" to support source files that are both human- and xml-friendly, it's just that it's rarely been taken advantage of. * Summing up Groff can, and does, have a future, one that accommodates both printer (or analogous) output and display-neutral xml. The typesetting engine/pipeline (ie ditroff=>grops/gropdf) needs to be overhauled to remove the hurdles posed by line-at-a-time processing. Historical baggage like not respecting order of precedence can, I believe, be fixed without earth-shattering consequences. Adding arrays, as has been suggested, wouldn't break anything. (Myself, I wouldn't mind case statements, as I find I'm increasingly using while loops to simulate getopts functionality.) I think we all agree some requests could do with a dose of sanity, which could be accomplished by creating new requests in the manner of .de1 or .am1 whenever altering their historical behaviour would break existing documents. And all this is all doable, it seems to me, without affecting backward compatibility. I don't think we have to consider forking; in fact, I'm against it. What we do have to do is acknowledge that groff, as a typesetting engine, has the potential to stand next to TeX, and, as such, remain a viable choice for quality typesetting. I really don't believe that printer output, or the physical page paradigm for screen documents, is anywhere near falling into desuetude. As for xml output, I'm convinced that's a source file, macro level issue. The mom macros point the way for xml-friendly structuring of source files; who knows what a joint-development effort in a similar vein could accomplish? Again, sorry for the long post. -- Peter Schaffter http://www.schaffter.ca