On 07/22/2017 11:14 PM, Imanol Fernández wrote: > Hi Emilio! > > > > Yeah, I'm to blame for this one... This was one of my very first > contributions to Servo and I followed what Canvas2D did (which is > equally broken), and I never got around to clean it up. > Can't wait to see it going away! > > > It's normal to follow a existing pattern in the code in your first > contribution. Also things have changed a lot since you started: WR was > not ready, you didn't have the external image API and more. So it's > easier to make this kind of changes now. In any case I only mentioned > the "bad" things. All the WebGL work you did in the DOM/script > component, all the validations, error checking, conformance tests is > great. The same for the WebGLCommand enum and traits. All that code is > still there and WebGL will be imposible without all that work ;) > > Yes, Canvas2D is broken too. Any volunteers to fix that? ;) > > > (and hopefully split it into a few commits? That way would make > reviewing it much easier... reviewing the WebVR landing was honestly > quite a pain, and I'm sure some things slipped through because of it was > a single massive commit) > > > I know that the PR was big, sorry about that. Ok, I'll try to reduce the > PR as much a as possible. Some of the webgl modes I mentioned are just > tests to validate ideas and measure performance differences. > > Hopefully the review will be a lot easier this time. Github shows a lot > of changes in the DOM objects but it's a repeated change (remove > CanvasMsg wrapper and use the new sender method). WebGLCommands are the > same ones as before. So a big number of lines is already reviewed (just > moved to a new place) and almost all the modifications in the DOM follow > the same pattern. The conformance tests will help to guarantee that we > don't miss anything. > > Yeah, I agree multiplexing contexts is the way to go. > You still need the Webrender API channel (which in the case of Servo is > another IPC channel), right? > If not, I'd be really curious about how have you managed to get rid > of it. > > > There is a WebRenderAPI channel, but it's only used when a canvas is > created or resized in order to send the required webrender_api::ImageKeys > > I was talking about channel steps per WebGL command. I considered that > other channel insignificant for performance.
Oh, ok, yeah, makes sense. > > Well, this is hard assuming you need to share textures cross-process, > right? If so, why would we want to do that? In general the content > process shouldn't have access to the GPU. > It'd be nice to have a small explanation about which commands are > executed in which process with your proposed architecture. > > > That's only a fake multiprocess mode. I wanted to validate that we won't > need to change the WebGL types used in parts of the code (constellation, > pipelinestate, script thread, dom) in the long term. For example, we > could make this mode to batch all the WebGLCommands and send them to a > separate GPU process using shared memory (still using the same types in > all the code). I also used it to test the overhead difference between > ipc and mpsc. > > > I don't love to have less error-checking in general, though I understand > the point... In any case, this sounds like the kind of standalone > feature that would be nice to have in its own PR (or commit)... > Also, is there any use case for this for now? I think if we're going to > introduce a cargo feature for "Servo as electron-like back-end" > officially, we should do so separately, and add tests for this. > > > Right now it's also another test to validate the types used across all > the parts of the code and measure performace. This mode it's a long term > idea to squeeze performance. It doesn't need to land now. I can do it > in a separate PR if we decide to use it. > > We want to talk with the "Synchro" team about all the packaging > requirements that we need to publish WebGL/WebVR content in the stores > (Google Play, Steam, App Store ...) ;) > > Right now the SM version is quite old, though there's an attempt to > update it going on[1]. > I could believe that a bunch of the benchmarks are kinda biased because > of this. > > I remember people opening bugs about the overhead of function calls into > JS, and a bunch of fixes a while ago. That's probably going to change > substantially with the JS engine update though, so you should probably > talk with Nick or Josh about it. > > > That's great news (in the sense of margin for improvement). I remember > having nice performance boost when updating VM engines in the past ;) Yeah, also I could believe that there are a bunch of easy wins to be had re. bindings performance. Btw, the link that was supposed to be [1] in my previous email was <https://github.com/servo/mozjs/pull/119>. > On 22 July 2017 at 19:35, Emilio Cobos Álvarez <emi...@crisal.io > <mailto:emi...@crisal.io>> wrote: > > Hi Imanol, > > On 07/21/2017 10:58 PM, Imanol Fernández wrote: > > Hello, > > > > I want to share the ideas, code and some benchmarks of the new > Servo WebGL > > architecture and refactor I'm working on. I'd love to hear some > feedback > > about the big picture. For smaller things or nits we could do that > in the > > PR review (which I'll plan to do beginning next week if there aren´t > > strong opinions about the overall architecture). > > > > First, I'll mention some of the problems of the current > Servo/WebRender > > WebGL implementation and source code organization > > > > * Part of the WebGL implementation is included in WebRender. This > makes > > very slow to make WebGL contributions. For example in order to add > a new > > WebGL method I usually need push a PR to gleam (wait until it's > merged and > > the crate published), then submit another PR to WebRender (wait > for the > > review & merge) and wait again until the new WR is merged into > Servo (with > > low control about it because WR can have in-progress non-related > features > > not ready-to merge) > > > > * When adding WebGL related code to WR, the WebGL code is not > tested when > > it lands into WR. It's only tested when a WR update lands into > Servo. When > > a WebGL related test fails in a WR update it's usually fixed by > reverting > > the commit that caused it to avoid delaying the WR update. This > makes the > > testing process difficult. > > > > * The current implementation creates a new WebGL renderer thread > (with it's > > ipc-channel) for each canvas created in JavaScript. This can quickly > > exhaust the file descriptor's available in the system for webpages > that > > create a lot of canvases (e.g Shadertoy) or on games that require many > > canvases. Creating a canvas instance in JavaScript is slower than > it should > > because of this too. > > Yeah, I'm to blame for this one... This was one of my very first > contributions to Servo and I followed what Canvas2D did (which is > equally broken), and I never got around to clean it up. > > Can't wait to see it going away! > > > * WebGL commands are currently run in the WR backend thread which could > be > > bad for UI latency (see https://github.com/servo/webrender/issues/607 > <https://github.com/servo/webrender/issues/607>) > > > > * Currently there are two ipc-channels steps for each WebGL-call to hit > the > > driver (JS ==> WebGL Render Thread ==> WebRender). This adds a lot of > > overhead. > > > > * Slow compilation times: Modifying a WebGL command code recompiles > > WebRender_traits which causes additional component recompilations in > Servo. > > > > * Flickering and shyncronization problems with the WR compositor (I > created > > this issue some time ago: https://github.com/servo/servo/issues/14235 > <https://github.com/servo/servo/issues/14235>)> Goals that I want to > achieve in the new WebGL architecture design: > > > > * Make WebGL development cycle easier. > > * Minimize the render path overhead for WebGL commands. > > * Flexible architecture for multiprocess and inprocess scenarios. > > * A special mode for packaging WebGL applications (not enabled by > default, > > more details later) > > * Faster compilation times > > * Proper shyncronization to avoid flickering issues > > * Compatibility with the WebVR render path > > > > Now I'm going into some detail of how I tried to best achieve each > stated > > goal. You can check the "almost ready for PR" source code here: > > > https://github.com/MortimerGoro/servo/commit/c2db42bbbae6d6cb612d0aa2580552a2ff3b0acf > > <https://github.com/MortimerGoro/servo/commit/c2db42bbbae6d6cb612d0aa2580552a2ff3b0acf> > > (still have to add more docs, do a rebase and clean some tests) > > (and hopefully split it into a few commits? That way would make > reviewing it much easier... reviewing the WebVR landing was honestly > quite a pain, and I'm sure some things slipped through because of it was > a single massive commit) > > > Make WebGL development cycle easier. > > ------------------------------------------------------- > > > > This is fixed by splitting the WebGL code into it's own component and > > removing all WebGL code from WebRender. Additional benefits for > WebRender > > (see https://github.com/servo/webrender/issues/1353 > <https://github.com/servo/webrender/issues/1353>): > > > > * Simplify the internals of resource cache by removing the GL context > > management code. > > * Remove the WebGL cargo feature, since not all clients use it. > > > > I opted to move the WebGL code into a Servo component. It eases > > development, testability and I don't think that there are use cases to > use > > the component outside Servo. Anyway, it's a low coupled component, so it > > will be easy to move if required but I don't think that it's worth it. > > > > The connection between a WebGL canvas and WR is implemented via the WR > > ExternalImage APIs as Glennw recommended.> > > When the new Architecture Lands in servo, I'll make a separate PR to > > removes all the unneeded WebGL management code in WR. > > > > Minimize the render path overhead for WebGL commands > > > ------------------------------------------------------------------------------ > > > > The new architecture creates a single thread/process to manage all WebGL > > commands from multiple canvas sources. This will reduce the footprint > for > > creating a new canvas (avoid creating specific thread + ipc-channel) and > > reduces the render-path for each WebGL command by using a single > "channel > > step" > > Yeah, I agree multiplexing contexts is the way to go. > > You still need the Webrender API channel (which in the case of Servo is > another IPC channel), right? > > If not, I'd be really curious about how have you managed to get rid > of it. > > > Flexible architecture for multiprocess and inprocess scenarios > > > > ---------------------------------------------------------------------------------- > > > > One of the things I don't like about the current implementation is > that it > > uses a IpcSender<WebGLCommand> instance as the entry point to send the > > WebGL commands from the Script Thread to the renderer in many > parts of the > > source code. IMO this is not very flexible because we may be > interested in > > batching WebGLCommands, using shared memory, or other channel > approaches to > > improve performance. > > > > A WebGL demo can easily send e.g 200 channels messages per frame > (doubled > > in VR mode) and we need to achieve steady framerates (e.g. 90 fps on > > desktop VR). > > > > I decided to create custom/opaque types for all WebGL API traits: > > WebGLSender, WebGLReceiver, WebGLChan, etc. instead of using fixed > > IpcSender<T> types. This types are defined at compilation time with no > > runtime overhead (e.g.: we can easily do pub type WebGLSender<T> = > IpcChanne > > l<T>). > > > > Additionally I also tried to make the threading implementation > flexible > > using some templating in the WebGLThread struct implementation, > which uses > > types that can be changed at compile time based on a cargo feature (or > > using a runtime preference just adding a enum wrapper). The thing > I like > > best is that we can easily change the threading and channel models > only > > modifying one or two small files, while the remaining WebGL will keep > > untouched. I created 2 WebGL threading/channel models based on > this idea > > (some interesting performance benchmarks at the end o the post!) > > > > * Multiprocess. This model creates a unique process/thread to > handle all > > WebGL commands from all canvases from all script threads. It uses > > ipc-channels for communication. Servo is not ready yet for a real > > multiprocess mode and we'll need to add a lot of platform > dependant code in > > Webrender and rust-offscreen-context (eg. > IOSurface/GLXPixmap/etc). But I > > wanted to add this model now so we can start testing or adding > some steps > > towards a real multiprocess mode. > > Well, this is hard assuming you need to share textures cross-process, > right? If so, why would we want to do that? In general the content > process shouldn't have access to the GPU. > > It'd be nice to have a small explanation about which commands are > executed in which process with your proposed architecture. > > > * Inprocess. This mode lazily creates a WebGL renderer thread for each > > Servo ScriptThread/tab and uses pure std mpsc channels. Using mpsc > channels > > has a lot of impact in the performace because the channel is > faster and it > > avoids serialization overhead (serialization also happens enabling the > > force-inprocess mode in servo/ipc-channel crate). w.r.t threading > model it > > could be easily changed to create a single thread for all > > ScriptThread/tabs. It was a bit more difficult to implement (and > to sync > > with the WR ExternalImage callbacks) but I opted for different > threads per > > tab because: > > -- This way we only need to use two threads at the same time for > sending > > WebGL commands from script to the renderer. This allows to use a spsc > > channel instead of a mpsc channel which allows a faster > shyncronization > > algorithm. AFAIK a rust std mpsc channel uses a faster spsc > algorithm until > > it is cloned. We could add another kind of ad-hoc spsc channel > > implementations for the fastest performance (have you used/ do you > > recommend any of them?) > > -- Having different threads per tab wii not exhaust CPU because > > requestAnimationFrame only runs in the current active tab per spec > so the > > other threads will be paused. > > > > A special mode for packaging WebGL applications > > -------------------------------------------------------------------- > > In addition to the multiprocess/inprocess modes I also added a > "packaging > > mode" to improve performance in this scenario. > > > > Disclaimer: Before joining Mozilla I was a core developer in > Cocoon.io. We > > created a custom "browser" from scratch called canvas+ (using > standalone V8 > > and JavaScriptCore VM engines) to run WebGL/Canvas2D games. We > were able to > > achieve better WebGL/Canvas performance than the Chromium/Safari > webviews > > available in Android/iOS. Some of the optimizations we did are not > allowed > > by the spec or security rules in a multiparadigm browser that loads > > arbitrary content. There are other engines/companies that opted > for this > > approach too instead of using multiparadigm webviews (e.g: Impact > Engine, > > Chukong and cocos2dx-html, etc.). > > > > My idea is to use this special mode (really a cargo feature) in order > > include some optimizations that these kind of standalone VM > engines do. It > > will only be enabled for packaging trusted source code (e.g. > android apks, > > electron, and more). When you package a tested and trusted source > code you > > don't really need a lot of the validations, error checking or security > > rules that the spec enforces or a full browser requires. Some > examples or > > forbidden things that could be allowed in this mode: > > > > * Immediate WebGL calls to a GL context living in the Script > thread (I did > > some benchmarks with this enabled (sometimes it's faster running > the gl > > call than sending it) . I'd like to focus on a thread bases render > path > > though in order to parallelize JS code better) > > * Rendering a full screen WebGL scene to the main window context (e.g. > > bypass all the compositor) > > * Disable some validations and error checking enforced by the spec. > > I don't love to have less error-checking in general, though I understand > the point... In any case, this sounds like the kind of standalone > feature that would be nice to have in its own PR (or commit)... > > Also, is there any use case for this for now? I think if we're going to > introduce a cargo feature for "Servo as electron-like back-end" > officially, we should do so separately, and add tests for this. > > > Faster compilation times > > ----------------------------------------------------------- > > > > WebGL code is now splitted into canvas_traits and canvas components. > > Changing the DOM code or traits is still compilation costly but > now we can > > change the WebGL commands implementation or do things like adding > logs for > > each WebGL call (which I tend to use for testing) with a lot shorter > > compilation times (no need to recompile WebRender_traits anymore!) > > > > Proper synchronization to avoid flickering issues > > ---------------------------------------------------------------- > > > > Flcikering issues have improved a lot in the three.js demos I have > tested. > > The synchronization is done using the ExternalImageHandler lock > and unlock > > calls that Glennw recommended combined with OpenGL Sync Objects. > > > > There are some flickering issues when moving the mouse on Linux. I > think > > that's probably a bug in servo. Mouse move seem to trigger extra > composites > > that aren't synced with the servo animation loop. IMO this could > be fixed > > when we finish the wip Glutin unfork ( > > https://github.com/servo/servo/pull/17311 > <https://github.com/servo/servo/pull/17311>) > > > > Benchmarks > > ----------------------------------------------------------------- > > Here are the benchmark results for the old WebGL implementation, > the three > > modes in the new Architecture (multiprocess, inprocess and > packaging) and > > Chrome/Firefox. > > > > I used a desktop linux to do the benchmarks (Ubuntu 16.04, GTX 1060, > > Skylake i7 4GHz) > > > > I used the Three.js performance test with a fixed camera, and with > 5K and > > 10 3D objects: https://threejs.org/examples/webgl_performance.html > <https://threejs.org/examples/webgl_performance.html> > > > > Browser------------------Average FPS-----------Average-FrameTime > > > > Servo Old WebGL 5K: -------26 fps------------------37.31 ms > > Servo Old WebGL 10K: ------13 fps------------------71.4 ms > > Servo "multiprocess" 5K----43 fps------------------22.61 ms > > Servo "multiprocess" 10K---21 fps------------------46.27 ms > > Servo "inprocess" 5K-------56-60fps----------------17.13 ms > > Servo "inprocess" 10K------28-30fps----------------33.5 ms > > Servo "packaging" 5K-------60fps-------------------15.83 ms > > Servo "packaging" 10K------32fps-------------------30.92 ms > > > > Firefox 5K------------------42-45fps---------------16.98 ms > > Firefox 10K-----------------25fps------------------34.01 ms > > > > Chrome 5K-------------------57-60fps---------------11.5 ms > > Chrome 10K------------------42-46ps----------------22.5 ms > > > > > > Future Steps > > ----------------------------------------------------------------- > > There are other ideas that I wanted to implement but I'll do that > in later > > PRs (or contributions in this areas are welcomed too!) > > > > * WebGL double buffering: It may be a good idea to improve > performance and > > synchronization (e.g. ping pong WebGL render textures each frame, > so WR > > uses a texture to composite the main context while the WebGL frame is > > rendering the next frame to a different texture). This may add > some memory > > overhead. We can implement some texture pooling (I think that > Firefox does > > that) > > > > * Test/benchmark different spsc channels or message batching or shared > > memory in order to improve frame times. We want to be fastest ones > in the > > benchmarks! > > > > * WebGL 2.0 branch (My idea is to create the layer that reuses > some code > > with WebGL 1.0 implementation and open a issue will a list of TODOs in > > WebGL 2.0 so contributors can start implementing features step by > step). > > > > I have some additional questions about Servo that I think may > affect the > > benchmark performances: > > > > * How does the SpiderMonkey version used in Servo compare to V8? > > Right now the SM version is quite old, though there's an attempt to > update it going on[1]. > > I could believe that a bunch of the benchmarks are kinda biased because > of this. > > > * Is the cost of JS to Rust call measured for normal functions and for > > functions with many arguments? is it compared to bindings implemented in > > Firefox/Chrome? When I worked on the standalone VM engine area the > binding > > JS/C++ binding code had a lot of impact. > > I remember people opening bugs about the overhead of function calls into > JS, and a bunch of fixes a while ago. That's probably going to change > substantially with the JS engine update though, so you should probably > talk with Nick or Josh about it. > > > Long post... Thanks for reading! > > Thanks for writing it, this is exciting stuff! > > -- Emilio > > _______________________________________________ dev-servo mailing list dev-servo@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-servo