Ulrich -- On Mon, Apr 14, 2014, Ulrich Lauther wrote: > On Mon, Apr 14, 2014 at 12:41:56PM +0100, Ralph Corderoy wrote: > > Hi Ulrich, > > > > > Even if a greedy algorithm will be implemented, it should have the > > > whole paragraph available as input. That way, one could easily switch > > > over to a KP-implementation and compare the two appraoches in terms of > > > quality, running time, and code complexity.
Comparing the two is a problem. It's all well and good to theorize about a more flexible line-by-line approach versus para-at-once, but at present there's no way (that I can think of) to prototype my proposal and compare it against, say, output from Heirloom or TeX. To keep things clear, I'm not attached to one approach or the other. In terms of describing how to balance lines, the KP algorithm wins hands down, at least theoretically. However, neither approach is perfect; in real world usage, both inevitably fall short under some circumstances, requiring user intervention. Since the goal of improving groff's formatting is to improve overall paragraph grey, rather than to aim for automated perfection (an impossibility), the question is really which approach can most easily be implemented--assuming, as I say, that neither completely circumvents the need for occasional tweaking. > > > Provided a clean interface and input/output specifications > > > are available I would volunteer to implement the dynamic > > > programming (KP) variant. A generous and welcome offer. Highly appreciated. I hope I'm not speaking out of turn here, but the most knowledgable person on the list concerning these matters is Werner. > > Perhaps it's gathering a whole paragraph together where the > > large amount of change to groff lies, and not implementing KP > > itself. Yes, I believe that's the crux of the matter. > The proposed algorithm assumes > > # Assumptions > # =========== > [skipped] > # NextWord: > # - can be read from a buffer > # > > and later on uses > > read NextWord > > > If this function is available, it's easy to collect the paragraph: > > while (! paragraph ends) { > read NextWord; > store NextWord in paragraph_buffer; > } > > But maybe just implementing "read NextWord" is the difficulty? After adjusting and breaking a line, groff holds remaining input-line text as a partially-filled line, so reading NextWord is at least theoretically possible. However, that's not the same thing as gathering an entire paragraph, which, moreover, would have to be delimited in some manner in the input file itself. And *not*, as Clarke pleads, by assuming a block of text in one line is a paragraph! -- Peter Schaffter http://www.schaffter.ca