Re: [dev-servo] New Servo WebGL Architecture Proposal (benchmarks included!)

Emilio Cobos Álvarez Sat, 22 Jul 2017 14:20:42 -0700


On 07/22/2017 11:14 PM, Imanol Fernández wrote:
> Hi Emilio!
> 
>  
> 
>     Yeah, I'm to blame for this one... This was one of my very first
>     contributions to Servo and I followed what Canvas2D did (which is
>     equally broken), and I never got around to clean it up.
>     Can't wait to see it going away!
> 
> 
> It's normal to follow a existing pattern in the code in your first
> contribution. Also things have changed a lot since you started: WR was
> not ready, you didn't have the external image API and more. So it's
> easier to make this kind of changes now. In any case I only mentioned
> the "bad" things. All the WebGL work you did in the DOM/script
> component, all the validations, error checking, conformance tests is
> great.  The same for the WebGLCommand enum and traits.  All that code is
> still there and WebGL will be imposible without all that work ;)
> 
> Yes, Canvas2D is broken too. Any volunteers to fix that? ;)
>  
> 
>     (and hopefully split it into a few commits? That way would make
>     reviewing it much easier... reviewing the WebVR landing was honestly
>     quite a pain, and I'm sure some things slipped through because of it was
>     a single massive commit)
> 
> 
> I know that the PR was big, sorry about that. Ok, I'll try to reduce the
> PR as much a as possible. Some of the webgl modes I mentioned are just
> tests to validate ideas and measure performance differences. 
> 
> Hopefully the review will be a lot easier this time. Github shows a lot
> of changes in the DOM objects but it's a repeated change (remove
> CanvasMsg wrapper and use the new sender method). WebGLCommands are the
> same ones as before. So a big number of lines is already reviewed (just
> moved to a new place) and almost all the modifications in the DOM follow
> the same pattern. The conformance tests will help to guarantee that we
> don't miss anything.
> 
>     Yeah, I agree multiplexing contexts is the way to go.
>     You still need the Webrender API channel (which in the case of Servo is
>     another IPC channel), right?
>     If not, I'd be really curious about how have you managed to get rid
>     of it. 
> 
> 
> There is a WebRenderAPI channel, but it's only used when a canvas is
> created or resized in order to send the required webrender_api::ImageKeys
> 
> I was talking about channel steps per WebGL command. I considered that
> other channel insignificant for performance.


Oh, ok, yeah, makes sense.

> 
>     Well, this is hard assuming you need to share textures cross-process,
>     right? If so, why would we want to do that? In general the content
>     process shouldn't have access to the GPU.
>     It'd be nice to have a small explanation about which commands are
>     executed in which process with your proposed architecture. 
> 
> 
> That's only a fake multiprocess mode. I wanted to validate that we won't
> need to change the WebGL types used in parts of the code (constellation,
> pipelinestate, script thread, dom) in the long term. For example, we
> could make this mode to batch all the WebGLCommands and send them to a
> separate GPU process using shared memory (still using the same types in
> all the code). I also used it to test the overhead difference between
> ipc and mpsc.
> 
> 
>     I don't love to have less error-checking in general, though I understand
>     the point... In any case, this sounds like the kind of standalone
>     feature that would be nice to have in its own PR (or commit)...
>     Also, is there any use case for this for now? I think if we're going to
>     introduce a cargo feature for "Servo as electron-like back-end"
>     officially, we should do so separately, and add tests for this.
> 
> 
> Right now it's also another test to validate the types used across all
> the parts of the code and measure performace. This mode it's a long term
> idea to squeeze performance.  It doesn't need to land now. I can do it
> in a separate PR if we decide to use it.
> 
> We want to talk with the "Synchro" team about all the packaging
> requirements that we need to publish WebGL/WebVR content in the stores
> (Google Play, Steam, App Store ...) ;)
> 
>     Right now the SM version is quite old, though there's an attempt to
>     update it going on[1].
>     I could believe that a bunch of the benchmarks are kinda biased because
>     of this. 
> 
>     I remember people opening bugs about the overhead of function calls into
>     JS, and a bunch of fixes a while ago. That's probably going to change
>     substantially with the JS engine update though, so you should probably
>     talk with Nick or Josh about it. 
> 
> 
> That's great news (in the sense of margin for improvement). I remember
> having nice performance boost when updating VM engines in the past ;)

Yeah, also I could believe that there are a bunch of easy wins to be had
re. bindings performance.

Btw, the link that was supposed to be [1] in my previous email was
<https://github.com/servo/mozjs/pull/119>.

> On 22 July 2017 at 19:35, Emilio Cobos Álvarez <emi...@crisal.io
> <mailto:emi...@crisal.io>> wrote:
> 
>     Hi Imanol,
> 
>     On 07/21/2017 10:58 PM, Imanol Fernández wrote:
>     > Hello,
>     >
>     > I want to share the ideas, code and some benchmarks of the new
>     Servo WebGL
>     > architecture and refactor I'm working on. I'd love to hear some
>     feedback
>     > about the big picture. For smaller things or nits we could do that
>     in the
>     > PR review (which I'll plan to do  beginning next week if there aren´t
>     > strong opinions about the overall architecture).
>     >
>     > First, I'll mention some of the problems of the current
>     Servo/WebRender
>     > WebGL implementation and source code organization
>     >
>     > * Part of the WebGL implementation is included in WebRender. This
>     makes
>     > very slow to make WebGL contributions. For example in order to add
>     a new
>     > WebGL method I usually need push a PR to gleam (wait until it's
>     merged and
>     > the crate published), then submit another PR to WebRender (wait
>     for the
>     > review & merge) and wait again until the new WR is merged into
>     Servo (with
>     > low control about it because WR can have in-progress non-related
>     features
>     > not ready-to merge)
>     >
>     > * When adding WebGL related code to WR, the WebGL code is not
>     tested when
>     > it lands into WR. It's only tested when a WR update lands into
>     Servo. When
>     > a WebGL related test fails in a WR update it's usually fixed by
>     reverting
>     > the commit that caused it to avoid delaying the WR update. This
>     makes the
>     > testing process difficult.
>     >
>     > * The current implementation creates a new WebGL renderer thread
>     (with it's
>     > ipc-channel) for each canvas created in JavaScript. This can quickly
>     > exhaust the file descriptor's available in the system for webpages
>     that
>     > create a lot of canvases (e.g Shadertoy) or on games that require many
>     > canvases. Creating a canvas instance in JavaScript is slower than
>     it should
>     > because of this too.
> 
>     Yeah, I'm to blame for this one... This was one of my very first
>     contributions to Servo and I followed what Canvas2D did (which is
>     equally broken), and I never got around to clean it up.
> 
>     Can't wait to see it going away!
> 
>     > * WebGL commands are currently run in the WR backend thread which could 
> be
>     > bad for UI latency (see https://github.com/servo/webrender/issues/607
>     <https://github.com/servo/webrender/issues/607>)
>     >
>     > * Currently there are two ipc-channels steps for each WebGL-call to hit 
> the
>     > driver (JS ==> WebGL Render Thread ==> WebRender). This adds a lot of
>     > overhead.
>     >
>     > * Slow compilation times: Modifying a WebGL command code recompiles
>     > WebRender_traits which causes additional component recompilations in 
> Servo.
>     >
>     > * Flickering and shyncronization problems with the WR compositor (I 
> created
>     > this issue some time ago: https://github.com/servo/servo/issues/14235
>     <https://github.com/servo/servo/issues/14235>)> Goals that I want to
>     achieve in the new WebGL architecture design:
>     >
>     > * Make WebGL development cycle easier.
>     > * Minimize the render path overhead for WebGL commands.
>     > * Flexible architecture for multiprocess and inprocess scenarios.
>     > * A special mode for packaging WebGL applications (not enabled by 
> default,
>     > more details later)
>     > * Faster compilation times
>     > * Proper shyncronization to avoid flickering issues
>     > * Compatibility with the WebVR render path
>     >
>     > Now I'm going into some detail of how I tried to best achieve each 
> stated
>     > goal. You can check the "almost ready for PR" source code here:
>     > 
> https://github.com/MortimerGoro/servo/commit/c2db42bbbae6d6cb612d0aa2580552a2ff3b0acf
>     
> <https://github.com/MortimerGoro/servo/commit/c2db42bbbae6d6cb612d0aa2580552a2ff3b0acf>
>     > (still have to add more docs, do a rebase and clean some tests)
> 
>     (and hopefully split it into a few commits? That way would make
>     reviewing it much easier... reviewing the WebVR landing was honestly
>     quite a pain, and I'm sure some things slipped through because of it was
>     a single massive commit)
> 
>     > Make WebGL development cycle easier.
>     > -------------------------------------------------------
>     >
>     > This is fixed by splitting the WebGL code into it's own component and
>     > removing all WebGL code from WebRender. Additional benefits for 
> WebRender
>     > (see https://github.com/servo/webrender/issues/1353
>     <https://github.com/servo/webrender/issues/1353>):
>     >
>     > * Simplify the internals of  resource cache by removing the GL context
>     > management code.
>     > * Remove the WebGL cargo feature, since not all clients use it.
>     >
>     > I opted to move the WebGL code into a Servo component. It eases
>     > development, testability and I don't think that there are use cases to 
> use
>     > the component outside Servo. Anyway, it's a low coupled component, so it
>     > will be easy to move if required but I don't think that it's worth it.
>     >
>     > The connection between a WebGL canvas and WR is implemented via the WR
>     > ExternalImage APIs as Glennw recommended.>
>     > When the new Architecture Lands in servo, I'll make a separate PR to
>     > removes all the unneeded WebGL management code in WR.
>     >
>     > Minimize the render path overhead for WebGL commands
>     > 
> ------------------------------------------------------------------------------
>     >
>     > The new architecture creates a single thread/process to manage all WebGL
>     > commands from multiple canvas sources. This will reduce the footprint 
> for
>     > creating a new canvas (avoid creating specific thread + ipc-channel) and
>     > reduces the render-path for each WebGL command by using a single 
> "channel
>     > step"
> 
>     Yeah, I agree multiplexing contexts is the way to go.
> 
>     You still need the Webrender API channel (which in the case of Servo is
>     another IPC channel), right?
> 
>     If not, I'd be really curious about how have you managed to get rid
>     of it.
> 
>     > Flexible architecture for multiprocess and inprocess scenarios
>     >
>     
> ----------------------------------------------------------------------------------
>     >
>     > One of the things I don't like about the current implementation is
>     that it
>     > uses a IpcSender<WebGLCommand> instance as the entry point to send the
>     > WebGL commands from the Script Thread to the renderer in many
>     parts of the
>     > source code. IMO this is not very flexible because we may be
>     interested in
>     > batching WebGLCommands, using shared memory, or other channel
>     approaches to
>     > improve performance.
>     >
>     > A WebGL demo can easily send e.g 200 channels messages per frame
>     (doubled
>     > in VR mode) and we need to achieve steady framerates (e.g. 90 fps on
>     > desktop VR).
>     >
>     > I decided to create custom/opaque types for all WebGL API traits:
>     > WebGLSender, WebGLReceiver, WebGLChan, etc. instead of using  fixed
>     > IpcSender<T> types. This types are defined at compilation time with no
>     > runtime overhead (e.g.: we can easily do pub type WebGLSender<T> =
>     IpcChanne
>     > l<T>).
>     >
>     > Additionally I also tried to make the threading implementation
>     flexible
>     > using some templating in the WebGLThread struct implementation,
>     which uses
>     > types that can be changed at compile time based on a cargo feature (or
>     > using a runtime preference just adding a enum wrapper). The thing
>     I like
>     > best is that we can easily change the threading and channel models
>     only
>     > modifying one or two small files, while the remaining WebGL will keep
>     > untouched. I created 2 WebGL threading/channel models based on
>     this idea
>     > (some interesting performance benchmarks at the end o the post!)
>     >
>     > * Multiprocess. This model creates a unique process/thread to
>     handle all
>     > WebGL commands from all canvases from all script threads. It uses
>     > ipc-channels for communication. Servo is not ready yet for a real
>     > multiprocess mode and we'll need to add a lot of platform
>     dependant code in
>     > Webrender and rust-offscreen-context (eg.
>     IOSurface/GLXPixmap/etc). But I
>     > wanted to add this model now so we can start testing or adding
>     some steps
>     > towards a real multiprocess mode.
> 
>     Well, this is hard assuming you need to share textures cross-process,
>     right? If so, why would we want to do that? In general the content
>     process shouldn't have access to the GPU.
> 
>     It'd be nice to have a small explanation about which commands are
>     executed in which process with your proposed architecture.
> 
>     > * Inprocess. This mode lazily creates a WebGL renderer thread for each
>     > Servo ScriptThread/tab and uses pure std mpsc channels. Using mpsc
>     channels
>     > has a lot of impact in the performace because the channel is
>     faster and it
>     > avoids serialization overhead (serialization also happens enabling the
>     > force-inprocess mode in servo/ipc-channel crate). w.r.t threading
>     model it
>     > could be easily changed to create a single thread for all
>     > ScriptThread/tabs. It was a bit more difficult to implement (and
>     to sync
>     > with the WR ExternalImage callbacks) but I opted for different
>     threads per
>     > tab because:
>     > -- This way we only need to use two threads at the same time for
>     sending
>     > WebGL commands from script to the renderer. This allows to use a spsc
>     > channel instead of a mpsc channel which allows a faster
>     shyncronization
>     > algorithm. AFAIK a rust std mpsc channel uses a faster spsc
>     algorithm until
>     > it is cloned. We could add another kind of ad-hoc spsc channel
>     > implementations for the fastest performance (have you used/ do you
>     > recommend any of them?)
>     > -- Having different threads per tab wii not exhaust CPU because
>     > requestAnimationFrame only runs in the current active tab per spec
>     so the
>     > other threads will be paused.
>     >
>     > A special mode for packaging WebGL applications
>     > --------------------------------------------------------------------
>     > In addition to the multiprocess/inprocess modes I also added a
>     "packaging
>     > mode" to improve performance in this scenario.
>     >
>     > Disclaimer: Before joining Mozilla I was a core developer in
>     Cocoon.io. We
>     > created a custom "browser" from scratch called canvas+ (using
>     standalone V8
>     > and JavaScriptCore VM engines) to run WebGL/Canvas2D games. We
>     were able to
>     > achieve better WebGL/Canvas performance than the Chromium/Safari
>     webviews
>     > available in Android/iOS. Some of the optimizations we did are not
>     allowed
>     > by the spec or security rules in a multiparadigm browser that loads
>     > arbitrary content. There are other engines/companies that opted
>     for this
>     > approach too instead of using multiparadigm webviews (e.g: Impact
>     Engine,
>     > Chukong and cocos2dx-html, etc.).
>     >
>     > My idea is to use this special mode (really a cargo feature) in order
>     > include some optimizations that these kind of standalone VM
>     engines do. It
>     > will only be enabled for packaging trusted source code (e.g.
>     android apks,
>     > electron, and more). When you package a tested and trusted source
>     code you
>     > don't really need a lot of the validations, error checking or security
>     > rules that the spec enforces or a full browser requires. Some
>     examples or
>     > forbidden things that could be allowed in this mode:
>     >
>     > * Immediate WebGL calls to a GL context living in the Script
>     thread (I did
>     > some benchmarks with this enabled (sometimes it's faster running
>     the gl
>     > call than sending it) . I'd like to focus on a thread bases render
>     path
>     > though in order to parallelize JS code better)
>     > * Rendering a full screen WebGL scene to the main window context (e.g.
>     > bypass all the compositor)
>     > * Disable some validations and error checking enforced by the spec.
> 
>     I don't love to have less error-checking in general, though I understand
>     the point... In any case, this sounds like the kind of standalone
>     feature that would be nice to have in its own PR (or commit)...
> 
>     Also, is there any use case for this for now? I think if we're going to
>     introduce a cargo feature for "Servo as electron-like back-end"
>     officially, we should do so separately, and add tests for this.
> 
>     > Faster compilation times
>     > -----------------------------------------------------------
>     >
>     > WebGL code is now splitted into canvas_traits and canvas components.
>     > Changing the DOM code or traits is still compilation costly but
>     now we can
>     > change the WebGL commands implementation or do things like adding
>     logs for
>     > each WebGL call (which I tend to use for testing) with a lot shorter
>     > compilation times (no need to recompile WebRender_traits anymore!)
>     >
>     > Proper synchronization to avoid flickering issues
>     > ----------------------------------------------------------------
>     >
>     > Flcikering issues have improved a lot in the three.js demos I have
>     tested.
>     > The synchronization is done using the ExternalImageHandler lock
>     and unlock
>     > calls that Glennw recommended combined with OpenGL Sync Objects.
>     >
>     > There are some flickering issues when moving the mouse on Linux. I
>     think
>     > that's probably a bug in servo. Mouse move seem to trigger extra
>     composites
>     > that aren't synced with the servo animation loop. IMO this could
>     be fixed
>     > when we finish the wip Glutin unfork (
>     > https://github.com/servo/servo/pull/17311
>     <https://github.com/servo/servo/pull/17311>)
>     >
>     > Benchmarks
>     > -----------------------------------------------------------------
>     > Here are the benchmark results for the old WebGL implementation,
>     the three
>     > modes in the new Architecture (multiprocess, inprocess and
>     packaging) and
>     > Chrome/Firefox.
>     >
>     > I used a desktop linux to do the benchmarks (Ubuntu 16.04, GTX 1060,
>     > Skylake i7 4GHz)
>     >
>     > I used the Three.js performance test with a fixed camera, and with
>     5K and
>     > 10 3D objects: https://threejs.org/examples/webgl_performance.html
>     <https://threejs.org/examples/webgl_performance.html>
>     >
>     > Browser------------------Average FPS-----------Average-FrameTime
>     >
>     > Servo Old WebGL 5K: -------26 fps------------------37.31 ms
>     > Servo Old WebGL 10K: ------13 fps------------------71.4  ms
>     > Servo "multiprocess" 5K----43 fps------------------22.61 ms
>     > Servo "multiprocess" 10K---21 fps------------------46.27 ms
>     > Servo "inprocess" 5K-------56-60fps----------------17.13 ms
>     > Servo "inprocess" 10K------28-30fps----------------33.5 ms
>     > Servo "packaging" 5K-------60fps-------------------15.83 ms
>     > Servo "packaging" 10K------32fps-------------------30.92 ms
>     >
>     > Firefox 5K------------------42-45fps---------------16.98 ms
>     > Firefox 10K-----------------25fps------------------34.01 ms
>     >
>     > Chrome 5K-------------------57-60fps---------------11.5 ms
>     > Chrome 10K------------------42-46ps----------------22.5 ms
>     >
>     >
>     > Future Steps
>     > -----------------------------------------------------------------
>     > There are other ideas that I wanted to implement but I'll do that
>     in later
>     > PRs (or contributions in this areas are welcomed too!)
>     >
>     > * WebGL double buffering: It may be a good idea to improve
>     performance and
>     > synchronization (e.g. ping pong WebGL render textures each frame,
>     so WR
>     > uses a texture to composite the main context while the WebGL frame is
>     > rendering the next frame to a different texture). This may add
>     some memory
>     > overhead. We can implement some texture pooling (I think that
>     Firefox does
>     > that)
>     >
>     > * Test/benchmark different spsc channels or message batching or shared
>     > memory in order to improve frame times. We want to be fastest ones
>     in the
>     > benchmarks!
>     >
>     > * WebGL 2.0 branch (My idea is to create the layer that reuses
>     some code
>     > with WebGL 1.0 implementation and open a issue will a list of TODOs in
>     > WebGL 2.0 so contributors can start implementing features step by
>     step).
>     >
>     > I have some additional questions about Servo that I think may
>     affect the
>     > benchmark performances:
>     >
>     > * How does the SpiderMonkey version used in Servo compare to V8?
> 
>     Right now the SM version is quite old, though there's an attempt to
>     update it going on[1].
> 
>     I could believe that a bunch of the benchmarks are kinda biased because
>     of this.
> 
>     > * Is the cost of JS to Rust call measured for normal functions and for
>     > functions with many arguments? is it compared to bindings implemented in
>     > Firefox/Chrome? When I worked on the standalone VM engine area the 
> binding
>     > JS/C++ binding code had a lot of impact.
> 
>     I remember people opening bugs about the overhead of function calls into
>     JS, and a bunch of fixes a while ago. That's probably going to change
>     substantially with the JS engine update though, so you should probably
>     talk with Nick or Josh about it.
> 
>     > Long post... Thanks for reading!
> 
>     Thanks for writing it, this is exciting stuff!
> 
>      -- Emilio
> 
> 
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Re: [dev-servo] New Servo WebGL Architecture Proposal (benchmarks included!)

Reply via email to