Hello
Although GCN has a large register file, these registers are distributed
among the threads (wavefronts) running on the same compute unit, so (up
to a point) the fewer registers used in a kernel, the more kernels can
run concurrently. While this is of limited use in trunk at the moment
with only single-worker offloading, hopefully it will be of more use in
the future.
These patches free up some of the registers that were previously fixed,
and restrict the number of registers used in non-kernel functions to 64
SGPRs and 24 VGPRs, as opposed to 102 SGPRs and 64 VGPRs before. Kernels
can still use however many they need, but the minimum limit on the
number of registers needed is now reduced to that of the non-kernel
functions (since kernels cannot in general know how many registers are
used by the functions they call, they need to reserve the maximum number
of registers usable by the callees).
These patches need the patch 'Stash reent marker in upper bits of s1 on
AMD GCN' in newlib to free up s[2:3] (recently committed as commit
d14714c690c0b11b0aa7e6d09c930a321eeac7f9).
Tested in standalone configuration on a gfx900 target. I have not yet
tested the offload configuration with trunk sources as testsuite support
has not yet been committed yet - I will retest when this is done.
Internal offload testing (based on a branch of OG9) revealed a number of
regressions, but they are due to latent bugs exposed by the changes
rather than issues with this patchset. I have already posted fixes for
these in the following patches:
[PATCH] Support multiple registers for the frame pointer
[PATCH] [LRA] Do not use eliminable registers for spilling
[PATCH] Check suitability of spill register for mode
[PATCH] [GCN] Fix handling of VCC_CONDITIONAL_REG
Kwok