Hi everyone,

Eric Atkinson and I whiteboarded some ideas for ensuring data race freedom and memory safety while performing parallel constraint solving in layout. Here are some random notes from that discussion as well as previous discussions we've had:

* Parallel constraint solving is about structuring the layout passes as tree traversals and performing those tree traversals in parallel. Parallelizable tree traversals are bottom-up and top-down. In-order tree traversals are not parallelizable in general. Therefore, layout should favor bottom-up and top-down traversals wherever possible.

* Floats constrain some subtrees to be in-order traversals, and therefore those subtrees are not parallelizable during the assign-widths traversal. We didn't discuss any general strategy for making sure that memory safety/data race freedom are not violated in the presence of floats; that will come later. (I don't foresee any fundamental problems with ensuring this though.)

* As Leo showed, table layout can be done with 7 parallel passes, but it's probably simpler to just do those as a sequential in-order traversal for now.

* In Servo we have three main tree traversals: bubble-widths (bottom-up), assign-widths (top-down), and assign-heights (bottom-up).

* The traversals in Servo are designed not to be "ad hoc" traversals; rather they are driven by a traversal function. An "ad hoc traversal" is what (as I understand it) most browser engines do for reflow: each frame/render object is responsible for laying out its children by calling virtual methods (`Reflow()` or similar). There is no external "driver" to the traversal except by calling `Reflow` on the root frame.

* Instead of ad-hoc traversal, layout in Servo is driven by what I'm calling "kernel functions". A kernel function is a small function that looks only at the node and its children to perform its work, and never recurses to children on its own. An example of a kernel function is the "assign height" function for block flows: it computes the height of one block flow by summing up the heights of its children and then returns.

* The problem with "ad hoc" traversals is that they give too much power to the kernel functions. A `Reflow` virtual method is permitted to do anything: it can start top-down, bottom-up, or in-order traversals; it can walk up or down the tree; it can race on anything. This is what makes ad-hoc traversals difficult to parallelize.

* By contrast, Servo kernel functions are written to operate on only one flow at a time, consulting the children of the flow as necessary. This means that we need higher-level *driver function* that have the task of invoking the kernel functions in the right order. We have two driver functions in Servo: bottom-up traversal and top-down traversal. These functions take a flow tree and a kernel function and perform the appropriate traversal. Today, this traversal is sequential.

* The key insight to ensuring data race freedom and memory safety in a parallel setting (thanks to Eric!) is to observe that, assuming the driver function is correct, all we must do to ensure no data races is to *limit the kernel function to accessing only the node that it's operating on and its children*. If we forbid access to the `get_parent`, `get_prev_sibling`, and `get_next_sibling` accessors inside the kernel functions, and ensure that kernel functions close over no mutable layout-task-specific data, then kernel functions can do whatever they want and there will be no data races or memory safety violations.

* We have a mechanism in Rust's type system for conditionally forbidding access to methods in certain contexts: phantom types. We are already using this for the COW DOM (`AbstractNode<View>`).

* We also have a mechanism to forbid kernel functions from closing over mutable state: unique closures (soon to become thunk traits).

* What we can do in Servo is to parameterize flow contexts over a phantom `View` type. `FlowContext<SingleThreadedView>` permits access to parents, siblings, and children. `FlowContext<KernelView>` does not permit access to parents or siblings, but does permit access to children. (Actually, it'd be a bit more complicated than this: the child of a `FlowContext<KernelView>` would be a `FlowContext<KernelChildView>`, which would permit access to siblings but not to parents. This is because kernels need to be able to access siblings of the first child.)

* This scheme depends on the correctness of the traversal functions, as well as the correctness of the tree manipulation code. Both are outside the capability of Rust's type system to prove. If we want additional assurance, we could use fuzz testing, or we could maybe use something like Promela to prove the correctness of our parallel traversal, or even explore proof assistants like Coq for the tree code...

* That said, the nice thing about this scheme is that the traversal functions and tree manipulation code are quite small compared to the untrusted kernel functions. The invariants they must preserve are generic and pretty simple. Assuming these functions are correct, the safety of the rest of layout follows, and will continue to hold as we add more and more CSS features.

* I suspect the most efficient dynamic mechanism to farm out the work will be to have a thread pool and a concurrent, lock-free, fixed-size queue of work. In the main layout thread, the traversal function writes the correct sequence of pointers into a queue, then signals the thread pool of workers to start. The workers then all dequeue work, perform the cast from `FlowContext<SingleThreadedView>` to `FlowContext<KernelView>`, and call the untrusted kernel function. When finished, the workers send a message to the main layout to wake it up, and go to sleep. This is probably the easiest scheme to implement first.

* We don't want to use regular Rust message passing because that would incur an extra allocation per node, which would probably dwarf the cost of the actual kernel function in many cases. (There are good reasons for this extra allocation in general to prevent deadlocks, but this is where it falls down.)

* If we want a static mechanism (tree tiling), then we could instead have the traversal function divide up the tree and farm out work to little sequential passes. This could be one way to handle floats, although I suspect there's some relatively simple way to do it in the dynamic setting as well.

* Since Servo is already deliberately set up to use traversal drivers and kernel functions, we should be able to keep most of our layout code exactly as it is (modulo floats). We just need to (1) add the phantom type; (2) write the concurrent queue and scheduling code; (3) modify the traversals to operate in parallel.

Sorry for the long e-mail :) Any thoughts as to this scheme would be much appreciated.

Patrick
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Reply via email to