I'm not sure if using a scratch buffer per command buffer is correct.
AFAIU each ring has a separate counter for the scratch offsets, and if a
command buffer is used in multiple compute rings at the same time, these
separate counters could conflict.
I'd think we need a preamble IB per queue that s
From: Dave Airlie
Currently LLVM 5.0 has support for spilling to a place
pointed to by the user sgprs instead of using relocations.
This is enabled by using the amdgcn-mesa-mesa3d triple.
For compute gfx shaders we spill to a buffer pointed to
by 64-bit address stored in sgprs 0/1.
For other gf