(Copy and paste from GitHub issue #1650 because I figure mailing list
discussion is potentially more fruitful than GitHub issues for design
discussions.)
I realized this morning that we can probably avoid the need for a leaf
set by intertwining selector matching and flow construction, and flow
construction and intrinsic-width-bubbling, to some degree. This is
similar to what Gecko and WebKit do, but potentially somewhat cleaner
because they are still separate functions and the implementations are
strictly separate (no bouncing back and forth to handle special
incremental-reflow cases); the parallel driver just knows how to invoke
them. This works thanks to the heterogeneous nature of the `WorkQueue`:
it can accept heterogeneous tasks and can run them all in parallel.
* Once the selectors have been matched for a leaf node, we can
immediately start constructing its flows. Just call `construct_flows`
once the node has been matched.
* Trickier, but also likely possible: Once flows have been constructed
for a leaf node, immediately call `bubble_widths` on it. This works
because we always know when a flow is going to be a leaf since
e579daefc2956a2eb151588b628c51342de236d0.
* Once `assign_widths` has been called on a leaf, immediately start
assigning its heights via `assign_heights`.
Assuming this works out, all parallel traversals will start from the
root and go down, eliminating the need for a leaf set. We will probably
still want a "backdoor" that sequentially computes bubble-widths for two
reasons: (1) during incremental reflow, min/pref widths may have been
invalidated without invalidating the flow; (2) it's easier to benchmark
style recalc against Gecko and WebKit when it's not intertwined with
intrinsic width calculation.
This would have numerous benefits:
1. Leaf set construction is expensive. On Wikipedia it's 16% of selector
matching time on 4 cores. For comparison, that's difference between
getting a 2.4x speedup for selector matching and getting a 2.9x speedup
on 4 cores.
2. Eliminates one or two parallel traversals, reducing overhead. (In
particular the warmup phase will go to essentially zero.)
3. Eliminates the synchronization point between selector matching and
flow construction, allowing better multicore utilization.
4. Eliminates the necessity of ensuring that DOM nodes in the leaf set
are alive which will be a bit of a pain when we start doing incremental
reflow.
5. Better memory usage since the leaf set data structures will go away.
Patrick
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo