Take the comments below with a grain of salt - coming from a games
background, I am still learning about the complexities of browsers and
thus it's likely I'm over-simplifying things :)
When we first looked at the challenge of taking a single threaded game
engine and making it multi-threaded, the task seemed very difficult due
to the dependencies between systems (in a similar way to script / layout
task dependencies).
What we eventually ended up with was an architecture that required
almost zero locking, by thinking of the frame as a (simplified) CPU
pipeline.
Data was allowed to flow only one direction along the pipeline (i.e.
script -> layout -> render -> composite). Each stage was a separate
thread that was executing concurrently, working on the inputs from the
previous stage.
Results from the end of the pipeline could be fed back into the start of
the next frame. This obviously introduces a frame of latency into some
results (but a 'frame' needn't actually be a real display frame, it
could just be a single execution of the pipeline to evaluate results for
script to use in DOM APIs for example, and certain steps can be
short-circuited or skipped where it makes sense to).
Data that was shared was copied as required. In the game engine case,
this was from the main "game logic" thread to the front-end render
thread(s), and then to the back end render threads. In theory this
should have caused some performance issues - however, in practice this
slightly sped up the engine, even when running in single threaded mode.
This is because the copy step involved mutating the data structures into
a format more suitable for the next pipeline stage (compressing,
optimizing, flattening a tree to a cache-friendly array etc). The effect
of this was very noticeable, as our target platforms were typically
constrained by memory bandwidth far more than available CPU cycles.
In addition to the main pipeline there was a worker thread pool, where
small isolated jobs could be sent from the main pipeline stages. This is
where we got data parallelism (for example, the main render thread may
send off dozens of isolated tasks to transform the skeletal animation
matrices, or do visibility tests on independent parts of the scene).
This worked well with OS level threads, as the total thread count was
quite low (typically 3-5 task threads and a worker pool of around 6
threads), but extracted a large amount of parallelism.
Perhaps this is not applicable to browsers at all. But for our use case
it worked very well, and required almost zero locking (outside of the
main frame sync lock at vsync time).
Cheers
On 27/08/14 09:25, Cameron Zwarich wrote:
Due to the recent news from the Rust workweek regarding the pending removal
libgreen from the standard distribution (and any support in related libraries
like I/O, locking, etc. that goes with it) there has been a bunch of discussion
about Servo’s use of threads and tasks. This discussion has taken place via
different channels with different sets of people involved, so I figured I’d try
to bring it all to the mailing list.
Without even considering the implementation mechanism (OS threads, green
threads, etc.) what is the benefit of using multiple threads in the first
place? I can think of a few reasons:
1) Improved latency from decoupling independent tasks from a single event loop.
For example, rendering and compositing can happen independently of script and
layout, networking can avoid being blocked by the main browser event loop, etc.
This already happens to different degrees in currently shipping browser
engines, although none take it to the extent that Servo does. I’m not sure
whether this is more due to legacy architecture, or the cognitive overhead of
doing this in a language like C++.
2) Improved throughput for a single task, whether it be parallel style
resolution, layout, etc. This has some overlap with the previous item, but
could also happen with a single event loop using a parallel runtime like Cilk.
Existing browser engines generally don’t take much advantage of this
possibility.
3) Isolation between constellations and pipelines. Both WebKit and and Blink
ship with the rough equivalent of separation between constellations today, and
Gecko is soon to follow with e10s. Blink is planning to use process isolation
for cross-origin iframes, treating them essentially like a plugin. As far as I
know there is no engine besides Servo that is attempting to isolate same-origin
iframes. Due to the need for a shared JS heap, it is basically impossible to do
this with process isolation without destroying its benefits. In theory, Servo
should be able to isolate iframes without requiring a separate OS process.
Because of Servo’s design, especially for point 3), each constellation and
pipeline has a number of distinct tasks. Today, most of these tasks are
implemented as green threads, although some use native threads due to API
restrictions, some of which are insurmountable (e.g. the compositing task) and
others which should be possible to avoid in the future, either by modifying our
dependencies or rewriting them in Rust.
One of the suggestions I have heard since last week is that we should use
native threads in places where we currently use green threads. There are
arguments for and against the pervasive use of native threads.
Pros:
- More obvious and understandable mapping onto native OS threading
abstractions.
- Allows for the direct use of OS I/O primitives.
- With the removal of segmented stacks, green threads suffer from some of the
problems of native stacks anyways.
Cons:
- Still requires an abstraction layer for features that differ across
platforms, unless we fix a target platform.
- There is a semantic gap between OS I/O primitives and Rust channels, and the
problem of having a task that processes events from both still needs to be
solved.
- It is not easy to reclaim the wasted stack space of native threads.
- We currently have a lot of per-pipeline tasks for loading resources, caches,
etc. If we used native threads, it seems unlikely that we would use individual
threads for each of these (e.g. every single loader task for every frame doing
independent multiplexing of I/O), and we would want to pool them across
pipelines. This introduces the (fairly certain, given past experiences) risk of
Rust task failure induced by one pipeline affecting the loading of resources
for another pipeline. Building on a smaller browser-agnostic abstraction means
that the amount of trusted code (for the sake of isolation, not mere memory
safety) is reduced, and there is less of a chance of browser-specific concerns
causing problems.
We’re probably going to have to reach a decision on the path forward here.
Another issue that has come up independently of the libgreen changes is the
interaction between script and layout. The layout task requires DOM access in
order to build a flow tree, and the script task needs to query layout results
to report them via DOM APIs. The obvious way to implement the former is to have
a gigantic lock around the DOM, and have script take this lock when it requires
DOM access, and have the layout task take it when constructing a flow tree.
Similarly, you could have a giant lock around the laid out flow tree, and
require the DOM to take it for access.
There is a bit of a complication, which is that for a single turn of the script
task’s event loop, it needs to have a consistent view of layout. For example,
if the following sequence events were to occur:
1) Script task completes execution.
2) Some external stimulus triggers layout.
3) Flow tree construction takes the DOM lock, creates the flow tree, and
releases it.
4) Before layout actually begins, the script task begins execution and queries
layout.
What should happen here? Does the script task always wait for layout to
complete? Also, is there a solution that is better than the use of
coarse-grained locks that doesn’t require the use of copy-on-write data
structures? Does this really leave much practical script/layout parallelism on
the table?
Anyways, I apologize if this email was a bit rambling. I’m sure that despite
that, I probably overlooked some important point. I’d be interested in hearing
the thoughts of others.
Cameron
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo