On 12/02/15 09:41, Alexander Monakov wrote:
On Wed, 2 Dec 2015, Nathan Sidwell wrote:
On 12/02/15 05:40, Jakub Jelinek wrote:
Don't know the HW good enough, is there any power consumption, heat etc.
difference between the two approaches? I mean does the HW consume different
amount of power if only one thread in a warp executes code and the other
threads in the same warp just jump around it, vs. having all threads busy?
Having all threads busy will increase power consumption. >
Is that from general principles (i.e. "if it doesn't increase power
consumption, the GPU is poorly optimized"), or is that based on specific
knowledge on how existing GPUs operate (presumably reverse-engineered or
privately communicated -- I've never seen any public statements on this
point)?
Nvidia told me.
The only certain case I imagine is instructions that go to SFU rather than
normal SPs -- but those are relatively rare.
It's also bad if the other vectors are executing memory access instructions.
How so? The memory accesses are the same independent of whether you reading
the same data from 1 thread or 32 synchronous threads.
Nvidia told me.