Re: [Groff] Formatting algorithm

Peter Schaffter Tue, 22 Apr 2014 09:37:37 -0700

Ulrich --

On Mon, Apr 14, 2014, Ulrich Lauther wrote:
> On Mon, Apr 14, 2014 at 12:41:56PM +0100, Ralph Corderoy wrote:
> > Hi Ulrich,
> > 
> > > Even if a greedy algorithm will be implemented, it should have the
> > > whole paragraph available as input.  That way, one could easily switch
> > > over to a KP-implementation and compare the two appraoches in terms of
> > > quality, running time, and code complexity.


Comparing the two is a problem.  It's all well and good to theorize
about a more flexible line-by-line approach versus para-at-once, but
at present there's no way (that I can think of) to prototype my
proposal and compare it against, say, output from Heirloom or TeX.

To keep things clear, I'm not attached to one approach or the
other.  In terms of describing how to balance lines, the KP
algorithm wins hands down, at least theoretically.  However, neither
approach is perfect; in real world usage, both inevitably fall
short under some circumstances, requiring user intervention.  Since
the goal of improving groff's formatting is to improve overall
paragraph grey, rather than to aim for automated perfection (an
impossibility), the question is really which approach can most
easily be implemented--assuming, as I say, that neither completely
circumvents the need for occasional tweaking.

> > > Provided a clean interface and input/output specifications
> > > are available I would volunteer to implement the dynamic
> > > programming (KP) variant.

A generous and welcome offer.  Highly appreciated.

I hope I'm not speaking out of turn here, but the most knowledgable
person on the list concerning these matters is Werner.

> > Perhaps it's gathering a whole paragraph together where the
> > large amount of change to groff lies, and not implementing KP
> > itself.

Yes, I believe that's the crux of the matter.

> The proposed algorithm assumes
> 
> # Assumptions
> # ===========
>   [skipped]
> # NextWord:
> #   - can be read from a buffer
> #
> 
> and later on uses
> 
> read NextWord
> 
> 
> If this function is available, it's easy to collect the paragraph:
> 
> while (! paragraph ends) {
>   read NextWord;
>   store NextWord in paragraph_buffer;
> }
> 
> But maybe just implementing "read NextWord" is the difficulty?

After adjusting and breaking a line, groff holds remaining
input-line text as a partially-filled line, so reading NextWord is
at least theoretically possible.  However, that's not the same thing
as gathering an entire paragraph, which, moreover, would have to be
delimited in some manner in the input file itself.  And *not*, as
Clarke pleads, by assuming a block of text in one line is a
paragraph!

-- 
Peter Schaffter
http://www.schaffter.ca

Re: [Groff] Formatting algorithm

Reply via email to