On 10/24/2016 04:35 PM, Ilia Mirkin wrote:
On Mon, Oct 24, 2016 at 10:29 AM, Samuel Pitoiset
wrote:
Shared memory is local to CTA, thus we should only wait for
prior memory writes which are visible to other threads in
the same CTA, and not at global level. This should speedup
compute shaders
On Mon, Oct 24, 2016 at 10:29 AM, Samuel Pitoiset
wrote:
> Shared memory is local to CTA, thus we should only wait for
> prior memory writes which are visible to other threads in
> the same CTA, and not at global level. This should speedup
> compute shaders which use shared memory.
>
> Signed-off-
Shared memory is local to CTA, thus we should only wait for
prior memory writes which are visible to other threads in
the same CTA, and not at global level. This should speedup
compute shaders which use shared memory.
Signed-off-by: Samuel Pitoiset
---
src/gallium/drivers/nouveau/codegen/nv50_ir