Skylake changes the representation of shared local memory size: Size | 0 kB | 1 kB | 2 kB | 4 kB | 8 kB | 16 kB | 32 kB | 64 kB | ------------------------------------------------------------------- Gen7-8 | 0 | none | none | 1 | 2 | 3 | 4 | 5 | ------------------------------------------------------------------- Gen9+ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
The old formula would substantially underallocate the amount of space. This fixes GPU hangs on Skylake when running with full thread counts. Cc: "12.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> --- src/mesa/drivers/dri/i965/gen7_cs_state.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c b/src/mesa/drivers/dri/i965/gen7_cs_state.c index 750aa2c..aff1f4e 100644 --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c @@ -150,11 +150,16 @@ brw_upload_cs_state(struct brw_context *brw) assert(prog_data->total_shared <= 64 * 1024); uint32_t slm_size = 0; if (prog_data->total_shared > 0) { - /* slm_size is in 4k increments, but must be a power of 2. */ - slm_size = 4 * 1024; - while (slm_size < prog_data->total_shared) - slm_size <<= 1; - slm_size /= 4 * 1024; + /* Shared Local Memory Size is specified as powers of two. */ + slm_size = util_next_power_of_two(prog_data->total_shared); + + if (brw->gen >= 9) { + /* Use a minimum of 1kB; turn an exponent of 10 (1024 kB) into 1. */ + slm_size = ffs(MAX2(slm_size, 1024)) - 10; + } else { + /* Use a minimum of 4kB; convert to the pre-Gen9 representation. */ + slm_size = MAX2(slm_size, 4096) / 4096; + } } desc[dw++] = -- 2.8.3 _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
