Take the comments below with a grain of salt - coming from a games background, I am still learning about the complexities of browsers and thus it's likely I'm over-simplifying things :)

When we first looked at the challenge of taking a single threaded game engine and making it multi-threaded, the task seemed very difficult due to the dependencies between systems (in a similar way to script / layout task dependencies).

What we eventually ended up with was an architecture that required almost zero locking, by thinking of the frame as a (simplified) CPU pipeline.

Data was allowed to flow only one direction along the pipeline (i.e. script -> layout -> render -> composite). Each stage was a separate thread that was executing concurrently, working on the inputs from the previous stage.

Results from the end of the pipeline could be fed back into the start of the next frame. This obviously introduces a frame of latency into some results (but a 'frame' needn't actually be a real display frame, it could just be a single execution of the pipeline to evaluate results for script to use in DOM APIs for example, and certain steps can be short-circuited or skipped where it makes sense to).

Data that was shared was copied as required. In the game engine case, this was from the main "game logic" thread to the front-end render thread(s), and then to the back end render threads. In theory this should have caused some performance issues - however, in practice this slightly sped up the engine, even when running in single threaded mode. This is because the copy step involved mutating the data structures into a format more suitable for the next pipeline stage (compressing, optimizing, flattening a tree to a cache-friendly array etc). The effect of this was very noticeable, as our target platforms were typically constrained by memory bandwidth far more than available CPU cycles.

In addition to the main pipeline there was a worker thread pool, where small isolated jobs could be sent from the main pipeline stages. This is where we got data parallelism (for example, the main render thread may send off dozens of isolated tasks to transform the skeletal animation matrices, or do visibility tests on independent parts of the scene).

This worked well with OS level threads, as the total thread count was quite low (typically 3-5 task threads and a worker pool of around 6 threads), but extracted a large amount of parallelism.

Perhaps this is not applicable to browsers at all. But for our use case it worked very well, and required almost zero locking (outside of the main frame sync lock at vsync time).

Cheers


On 27/08/14 09:25, Cameron Zwarich wrote:
Due to the recent news from the Rust workweek regarding the pending removal 
libgreen from the standard distribution (and any support in related libraries 
like I/O, locking, etc. that goes with it) there has been a bunch of discussion 
about Servo’s use of threads and tasks. This discussion has taken place via 
different channels with different sets of people involved, so I figured I’d try 
to bring it all to the mailing list.

Without even considering the implementation mechanism (OS threads, green 
threads, etc.) what is the benefit of using multiple threads in the first 
place? I can think of a few reasons:

1) Improved latency from decoupling independent tasks from a single event loop. 
For example, rendering and compositing can happen independently of script and 
layout, networking can avoid being blocked by the main browser event loop, etc. 
This already happens to different degrees in currently shipping browser 
engines, although none take it to the extent that Servo does. I’m not sure 
whether this is more due to legacy architecture, or the cognitive overhead of 
doing this in a language like C++.

2) Improved throughput for a single task, whether it be parallel style 
resolution, layout, etc. This has some overlap with the previous item, but 
could also happen with a single event loop using a parallel runtime like Cilk. 
Existing browser engines generally don’t take much advantage of this 
possibility.

3) Isolation between constellations and pipelines. Both WebKit and and Blink 
ship with the rough equivalent of separation between constellations today, and 
Gecko is soon to follow with e10s. Blink is planning to use process isolation 
for cross-origin iframes, treating them essentially like a plugin. As far as I 
know there is no engine besides Servo that is attempting to isolate same-origin 
iframes. Due to the need for a shared JS heap, it is basically impossible to do 
this with process isolation without destroying its benefits. In theory, Servo 
should be able to isolate iframes without requiring a separate OS process.

Because of Servo’s design, especially for point 3), each constellation and 
pipeline has a number of distinct tasks. Today, most of these tasks are 
implemented as green threads, although some use native threads due to API 
restrictions, some of which are insurmountable (e.g. the compositing task) and 
others which should be possible to avoid in the future, either by modifying our 
dependencies or rewriting them in Rust.

One of the suggestions I have heard since last week is that we should use 
native threads in places where we currently use green threads. There are 
arguments for and against the pervasive use of native threads.

Pros:
-  More obvious and understandable mapping onto native OS threading 
abstractions.
- Allows for the direct use of OS I/O primitives.
- With the removal of segmented stacks, green threads suffer from some of the 
problems of native stacks anyways.

Cons:
- Still requires an abstraction layer for features that differ across 
platforms, unless we fix a target platform.
- There is a semantic gap between OS I/O primitives and Rust channels, and the 
problem of having a task that processes events from both still needs to be 
solved.
- It is not easy to reclaim the wasted stack space of native threads.
- We currently have a lot of per-pipeline tasks for loading resources, caches, 
etc. If we used native threads, it seems unlikely that we would use individual 
threads for each of these (e.g. every single loader task for every frame doing 
independent multiplexing of I/O), and we would want to pool them across 
pipelines. This introduces the (fairly certain, given past experiences) risk of 
Rust task failure induced by one pipeline affecting the loading of resources 
for another pipeline. Building on a smaller browser-agnostic abstraction means 
that the amount of trusted code (for the sake of isolation, not mere memory 
safety) is reduced, and there is less of a chance of browser-specific concerns 
causing problems.

We’re probably going to have to reach a decision on the path forward here.

Another issue that has come up independently of the libgreen changes is the 
interaction between script and layout. The layout task requires DOM access in 
order to build a flow tree, and the script task needs to query layout results 
to report them via DOM APIs. The obvious way to implement the former is to have 
a gigantic lock around the DOM, and have script take this lock when it requires 
DOM access, and have the layout task take it when constructing a flow tree. 
Similarly, you could have a giant lock around the laid out flow tree, and 
require the DOM to take it for access.

There is a bit of a complication, which is that for a single turn of the script 
task’s event loop, it needs to have a consistent view of layout. For example, 
if the following sequence events were to occur:

1) Script task completes execution.

2) Some external stimulus triggers layout.

3) Flow tree construction takes the DOM lock, creates the flow tree, and 
releases it.

4) Before layout actually begins, the script task begins execution and queries 
layout.

What should happen here? Does the script task always wait for layout to 
complete? Also, is there a solution that is better than the use of 
coarse-grained locks that doesn’t require the use of copy-on-write data 
structures? Does this really leave much practical script/layout parallelism on 
the table?

Anyways, I apologize if this email was a bit rambling. I’m sure that despite 
that, I probably overlooked some important point. I’d be interested in hearing 
the thoughts of others.

Cameron
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Reply via email to