> In dynamic programming we maintain a set of optimum partial solutions, > each stored as a node in an acyclic directed graph. Edges of the graph > indicate from which partial solution an extended partial solution was > derived. > Finally we end up with a set of complete solutions, pick the best one > and follow the path in the graph to recover the decisions made to > arrive at this best solution. > Afterwards all nodes of the graph a thrown away, i.e., they can be used to > solve the next problem - in our case, the next paragraph. > Often, useless nodes can already be discarded before the final solution > is found. > > All this could possibly also be implemented using recursion and/or forking, > but that would be less efficient (in terms of memory and running time needs) > and - in my opinion - also harder to understand.
I fully agree with the first paragraph. However, as there is essentially no limit on the amount (or kind) of path-dependent state that may be needed to calculate each partial solution, I see forking as the easiest way to implement the recipe. If there were only a few pertinent state variables, e.g. line length, vertical position, and page number, I wouldn't consider forking. But groff has to be prepared for all eventualities that can be triggered mid-paragraph by traps as well as markup. Even the repertoire of macros and requests can change underfoot and affect formatting. The fact that the course of computation varies with state suggests making a separate thread for each active path, with state information kept as thread-local data. This entails either (1) a massive overt initialization for every new thread, or (2) a pervasive infrastructure for differential maintenance of state. Forking notionally implements option (1) in one line of code, and (at least in Unix-like systems) invisibly accomplishes it by a coarse variant of option (2): copy-on-write paging. If care were taken to allocate the most frequently touched state variables near each other, this would usually amount to one page of non-stack memory per process. Thus I have no qualms about memory usage. Forking is still not trivial. There must be an umpire to pick the best path for each potential breakpoint--but an umpire is needed no matter how the dynamic program is implemented. To communicate with the several paths under forking, the umpire would use IPC--pipes and select(2). IPC volume would be modest, comparable to the size of input. But process switches would be many, perhaps an order of magnitude larger than word count. This could be a cause for concern. Another complication of parallelizing, common to both forking and threading, is input. Every path needs its own input pointer, whereas inherited or duplicated file descriptors are shared. For a genuinely distinct read pointer, the input must be opened afresh at a different point for each path. This might be accomplished by a tee(1) cascade, but more likely a purpose-built input distributor will be needed. I'm contemplating building a toy implementation to test the concept. (This is not a promise!) Doug