Ubuntu 14.04 x64, Linux gfxi 3.19.0-33-generic
Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, IntelĀ® HD Graphics 5500

[Variants workgroupOpInThread]
1. Current single ADD(1) x16 times(simd=16) => 408ms => [Result: 1.608 Msum/S]
2. OP DP4(4) + ADD(1) x4 times (simd=16) => 384ms => [Result: 1.707 Msum/S]
3. OP ADD(4) + ADD(1) x4 times (simd=16) => 378ms => [Result: 1.730 Msum/S]

No call to workgroupOpInThread => 347ms => [Result: 1.886 Msum/S].

The improvement of ADD(4) in the function workgroupOpInThread is thus out of 
347ms to 408ms at 378ms hence from ~60ms to 30ms.
Using DP4(4) achieves about the same improvement or bellow ADD(4) but has 
restrictions in data type (can only be float).

I conclude that 3. is the best variant of choice out of the 3 variants.
_______________________________________________
Beignet mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/beignet

Reply via email to