Hello

Although GCN has a large register file, these registers are distributed among the threads (wavefronts) running on the same compute unit, so (up to a point) the fewer registers used in a kernel, the more kernels can run concurrently. While this is of limited use in trunk at the moment with only single-worker offloading, hopefully it will be of more use in the future.

These patches free up some of the registers that were previously fixed, and restrict the number of registers used in non-kernel functions to 64 SGPRs and 24 VGPRs, as opposed to 102 SGPRs and 64 VGPRs before. Kernels can still use however many they need, but the minimum limit on the number of registers needed is now reduced to that of the non-kernel functions (since kernels cannot in general know how many registers are used by the functions they call, they need to reserve the maximum number of registers usable by the callees).

These patches need the patch 'Stash reent marker in upper bits of s1 on AMD GCN' in newlib to free up s[2:3] (recently committed as commit d14714c690c0b11b0aa7e6d09c930a321eeac7f9).

Tested in standalone configuration on a gfx900 target. I have not yet tested the offload configuration with trunk sources as testsuite support has not yet been committed yet - I will retest when this is done. Internal offload testing (based on a branch of OG9) revealed a number of regressions, but they are due to latent bugs exposed by the changes rather than issues with this patchset. I have already posted fixes for these in the following patches:

[PATCH] Support multiple registers for the frame pointer
[PATCH] [LRA] Do not use eliminable registers for spilling
[PATCH] Check suitability of spill register for mode
[PATCH] [GCN] Fix handling of VCC_CONDITIONAL_REG

Kwok

Reply via email to