On 31/01/2020 13:56, Kwok Cheung Yeung wrote:
The GCN architecture has 4 SIMD units per compute unit, with 256 VGPRs
per SIMD unit. OpenMP threads or OpenACC workers must be distributed
across the SIMD units, with each thread/worker fitting entirely within a
single SIMD unit. VGPRs are shared b
The GCN architecture has 4 SIMD units per compute unit, with 256 VGPRs per SIMD
unit. OpenMP threads or OpenACC workers must be distributed across the SIMD
units, with each thread/worker fitting entirely within a single SIMD unit. VGPRs
are shared by the kernels running in a SIMD unit, so we can