[dev-servo] Exploratory work for layout on the GPU

Patrick Walton Sun, 09 Mar 2014 20:04:24 -0700

Hi everyone,

Over the weekend I created a small repo to play around with selectormatching on the GPU:


https://github.com/pcwalton/selectron

There's a rough prototype of selector matching in there, with CPU,OpenCL (GPU and CPU), and CUDA versions. I've only tried it on my MBP'sGeForce GT 650M.

So far the numbers have not been particularly good: 10%-100% slower thanthe parallel CPU numbers, depending on the size, even not countingmemory transfer. (It is assumed that anywhere we would want to deploythis would have zero-copy operation.)

From my profiling it seems as though the branchiness of selectormatching is not the problem: selector matching is surprisinglystraight-line if the hash tables are implemented properly. Rather theissue is that at least my GPU really wants to read in multiple DOM nodesin the same 128-byte cache line. Because it's not a realistic assumptionthat more than a couple of DOM nodes fit in a 128-byte cache line, 89%(!!) of instructions end up replayed because of memory reads.Artificially lowering the size of DOM nodes to unrealistic levels andpacking them together (cheating as far as I'm concerned) brings theperformance up again. But I don't know how to make that work in the faceof a dynamically changing DOM.

I'll try soon on Kaveri, but indications are that we'll have somehurdles to overcome.


Patrick
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

[dev-servo] Exploratory work for layout on the GPU

Reply via email to