Hi everyone,
Eric Atkinson and I whiteboarded some ideas for ensuring data race
freedom and memory safety while performing parallel constraint solving
in layout. Here are some random notes from that discussion as well as
previous discussions we've had:
* Parallel constraint solving is about structuring the layout passes as
tree traversals and performing those tree traversals in parallel.
Parallelizable tree traversals are bottom-up and top-down. In-order tree
traversals are not parallelizable in general. Therefore, layout should
favor bottom-up and top-down traversals wherever possible.
* Floats constrain some subtrees to be in-order traversals, and
therefore those subtrees are not parallelizable during the assign-widths
traversal. We didn't discuss any general strategy for making sure that
memory safety/data race freedom are not violated in the presence of
floats; that will come later. (I don't foresee any fundamental problems
with ensuring this though.)
* As Leo showed, table layout can be done with 7 parallel passes, but
it's probably simpler to just do those as a sequential in-order
traversal for now.
* In Servo we have three main tree traversals: bubble-widths
(bottom-up), assign-widths (top-down), and assign-heights (bottom-up).
* The traversals in Servo are designed not to be "ad hoc" traversals;
rather they are driven by a traversal function. An "ad hoc traversal" is
what (as I understand it) most browser engines do for reflow: each
frame/render object is responsible for laying out its children by
calling virtual methods (`Reflow()` or similar). There is no external
"driver" to the traversal except by calling `Reflow` on the root frame.
* Instead of ad-hoc traversal, layout in Servo is driven by what I'm
calling "kernel functions". A kernel function is a small function that
looks only at the node and its children to perform its work, and never
recurses to children on its own. An example of a kernel function is the
"assign height" function for block flows: it computes the height of one
block flow by summing up the heights of its children and then returns.
* The problem with "ad hoc" traversals is that they give too much power
to the kernel functions. A `Reflow` virtual method is permitted to do
anything: it can start top-down, bottom-up, or in-order traversals; it
can walk up or down the tree; it can race on anything. This is what
makes ad-hoc traversals difficult to parallelize.
* By contrast, Servo kernel functions are written to operate on only one
flow at a time, consulting the children of the flow as necessary. This
means that we need higher-level *driver function* that have the task of
invoking the kernel functions in the right order. We have two driver
functions in Servo: bottom-up traversal and top-down traversal. These
functions take a flow tree and a kernel function and perform the
appropriate traversal. Today, this traversal is sequential.
* The key insight to ensuring data race freedom and memory safety in a
parallel setting (thanks to Eric!) is to observe that, assuming the
driver function is correct, all we must do to ensure no data races is to
*limit the kernel function to accessing only the node that it's
operating on and its children*. If we forbid access to the `get_parent`,
`get_prev_sibling`, and `get_next_sibling` accessors inside the kernel
functions, and ensure that kernel functions close over no mutable
layout-task-specific data, then kernel functions can do whatever they
want and there will be no data races or memory safety violations.
* We have a mechanism in Rust's type system for conditionally forbidding
access to methods in certain contexts: phantom types. We are already
using this for the COW DOM (`AbstractNode<View>`).
* We also have a mechanism to forbid kernel functions from closing over
mutable state: unique closures (soon to become thunk traits).
* What we can do in Servo is to parameterize flow contexts over a
phantom `View` type. `FlowContext<SingleThreadedView>` permits access to
parents, siblings, and children. `FlowContext<KernelView>` does not
permit access to parents or siblings, but does permit access to
children. (Actually, it'd be a bit more complicated than this: the child
of a `FlowContext<KernelView>` would be a
`FlowContext<KernelChildView>`, which would permit access to siblings
but not to parents. This is because kernels need to be able to access
siblings of the first child.)
* This scheme depends on the correctness of the traversal functions, as
well as the correctness of the tree manipulation code. Both are outside
the capability of Rust's type system to prove. If we want additional
assurance, we could use fuzz testing, or we could maybe use something
like Promela to prove the correctness of our parallel traversal, or even
explore proof assistants like Coq for the tree code...
* That said, the nice thing about this scheme is that the traversal
functions and tree manipulation code are quite small compared to the
untrusted kernel functions. The invariants they must preserve are
generic and pretty simple. Assuming these functions are correct, the
safety of the rest of layout follows, and will continue to hold as we
add more and more CSS features.
* I suspect the most efficient dynamic mechanism to farm out the work
will be to have a thread pool and a concurrent, lock-free, fixed-size
queue of work. In the main layout thread, the traversal function writes
the correct sequence of pointers into a queue, then signals the thread
pool of workers to start. The workers then all dequeue work, perform the
cast from `FlowContext<SingleThreadedView>` to
`FlowContext<KernelView>`, and call the untrusted kernel function. When
finished, the workers send a message to the main layout to wake it up,
and go to sleep. This is probably the easiest scheme to implement first.
* We don't want to use regular Rust message passing because that would
incur an extra allocation per node, which would probably dwarf the cost
of the actual kernel function in many cases. (There are good reasons for
this extra allocation in general to prevent deadlocks, but this is where
it falls down.)
* If we want a static mechanism (tree tiling), then we could instead
have the traversal function divide up the tree and farm out work to
little sequential passes. This could be one way to handle floats,
although I suspect there's some relatively simple way to do it in the
dynamic setting as well.
* Since Servo is already deliberately set up to use traversal drivers
and kernel functions, we should be able to keep most of our layout code
exactly as it is (modulo floats). We just need to (1) add the phantom
type; (2) write the concurrent queue and scheduling code; (3) modify the
traversals to operate in parallel.
Sorry for the long e-mail :) Any thoughts as to this scheme would be
much appreciated.
Patrick
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo