[dev-servo] Plans for parallel constraint solving in layout

Patrick Walton Tue, 16 Jul 2013 10:20:14 -0700

Hi everyone,

Eric Atkinson and I whiteboarded some ideas for ensuring data racefreedom and memory safety while performing parallel constraint solvingin layout. Here are some random notes from that discussion as well asprevious discussions we've had:

* Parallel constraint solving is about structuring the layout passes astree traversals and performing those tree traversals in parallel.Parallelizable tree traversals are bottom-up and top-down. In-order treetraversals are not parallelizable in general. Therefore, layout shouldfavor bottom-up and top-down traversals wherever possible.

* Floats constrain some subtrees to be in-order traversals, andtherefore those subtrees are not parallelizable during the assign-widthstraversal. We didn't discuss any general strategy for making sure thatmemory safety/data race freedom are not violated in the presence offloats; that will come later. (I don't foresee any fundamental problemswith ensuring this though.)

* As Leo showed, table layout can be done with 7 parallel passes, butit's probably simpler to just do those as a sequential in-ordertraversal for now.

* In Servo we have three main tree traversals: bubble-widths(bottom-up), assign-widths (top-down), and assign-heights (bottom-up).

* The traversals in Servo are designed not to be "ad hoc" traversals;rather they are driven by a traversal function. An "ad hoc traversal" iswhat (as I understand it) most browser engines do for reflow: eachframe/render object is responsible for laying out its children bycalling virtual methods (`Reflow()` or similar). There is no external"driver" to the traversal except by calling `Reflow` on the root frame.

* Instead of ad-hoc traversal, layout in Servo is driven by what I'mcalling "kernel functions". A kernel function is a small function thatlooks only at the node and its children to perform its work, and neverrecurses to children on its own. An example of a kernel function is the"assign height" function for block flows: it computes the height of oneblock flow by summing up the heights of its children and then returns.

* The problem with "ad hoc" traversals is that they give too much powerto the kernel functions. A `Reflow` virtual method is permitted to doanything: it can start top-down, bottom-up, or in-order traversals; itcan walk up or down the tree; it can race on anything. This is whatmakes ad-hoc traversals difficult to parallelize.

* By contrast, Servo kernel functions are written to operate on only oneflow at a time, consulting the children of the flow as necessary. Thismeans that we need higher-level *driver function* that have the task ofinvoking the kernel functions in the right order. We have two driverfunctions in Servo: bottom-up traversal and top-down traversal. Thesefunctions take a flow tree and a kernel function and perform theappropriate traversal. Today, this traversal is sequential.

* The key insight to ensuring data race freedom and memory safety in aparallel setting (thanks to Eric!) is to observe that, assuming thedriver function is correct, all we must do to ensure no data races is to*limit the kernel function to accessing only the node that it'soperating on and its children*. If we forbid access to the `get_parent`,`get_prev_sibling`, and `get_next_sibling` accessors inside the kernelfunctions, and ensure that kernel functions close over no mutablelayout-task-specific data, then kernel functions can do whatever theywant and there will be no data races or memory safety violations.

* We have a mechanism in Rust's type system for conditionally forbiddingaccess to methods in certain contexts: phantom types. We are alreadyusing this for the COW DOM (`AbstractNode<View>`).

* We also have a mechanism to forbid kernel functions from closing overmutable state: unique closures (soon to become thunk traits).

* What we can do in Servo is to parameterize flow contexts over aphantom `View` type. `FlowContext<SingleThreadedView>` permits access toparents, siblings, and children. `FlowContext<KernelView>` does notpermit access to parents or siblings, but does permit access tochildren. (Actually, it'd be a bit more complicated than this: the childof a `FlowContext<KernelView>` would be a`FlowContext<KernelChildView>`, which would permit access to siblingsbut not to parents. This is because kernels need to be able to accesssiblings of the first child.)

* This scheme depends on the correctness of the traversal functions, aswell as the correctness of the tree manipulation code. Both are outsidethe capability of Rust's type system to prove. If we want additionalassurance, we could use fuzz testing, or we could maybe use somethinglike Promela to prove the correctness of our parallel traversal, or evenexplore proof assistants like Coq for the tree code...

* That said, the nice thing about this scheme is that the traversalfunctions and tree manipulation code are quite small compared to theuntrusted kernel functions. The invariants they must preserve aregeneric and pretty simple. Assuming these functions are correct, thesafety of the rest of layout follows, and will continue to hold as weadd more and more CSS features.

* I suspect the most efficient dynamic mechanism to farm out the workwill be to have a thread pool and a concurrent, lock-free, fixed-sizequeue of work. In the main layout thread, the traversal function writesthe correct sequence of pointers into a queue, then signals the threadpool of workers to start. The workers then all dequeue work, perform thecast from `FlowContext<SingleThreadedView>` to`FlowContext<KernelView>`, and call the untrusted kernel function. Whenfinished, the workers send a message to the main layout to wake it up,and go to sleep. This is probably the easiest scheme to implement first.

* We don't want to use regular Rust message passing because that wouldincur an extra allocation per node, which would probably dwarf the costof the actual kernel function in many cases. (There are good reasons forthis extra allocation in general to prevent deadlocks, but this is whereit falls down.)

* If we want a static mechanism (tree tiling), then we could insteadhave the traversal function divide up the tree and farm out work tolittle sequential passes. This could be one way to handle floats,although I suspect there's some relatively simple way to do it in thedynamic setting as well.

* Since Servo is already deliberately set up to use traversal driversand kernel functions, we should be able to keep most of our layout codeexactly as it is (modulo floats). We just need to (1) add the phantomtype; (2) write the concurrent queue and scheduling code; (3) modify thetraversals to operate in parallel.

Sorry for the long e-mail :) Any thoughts as to this scheme would bemuch appreciated.


Patrick
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

[dev-servo] Plans for parallel constraint solving in layout

Reply via email to