On 5/3/21 12:31 PM, Jakub Jelinek wrote: > On Mon, May 03, 2021 at 12:24:10PM +0200, Tom de Vries wrote: >> The test-case included in this patch contains this target region: >> ... >> for (int i0 = 0 ; i0 < N0 ; i0++ ) >> counter_N0.i += 1; >> ... >> >> When running with nvptx accelerator, the counter variable is expected to >> be N0 after the region, but instead is N0 / 32. The problem is that rather >> than getting the result for all warp lanes, we get it for just one lane. >> >> This is caused by the implementation of SIMT being incomplete. It handles >> regular reductions, but appearantly not user-defined reductions. >> >> For now, make this explicit by erroring out for nvptx, like this: >> ... >> target-44.c: In function 'main': >> target-44.c:20:9: error: SIMT reduction not fully implemented >> ... >> >> Tested libgomp on x86_64-linux with and without nvptx accelerator. >> >> Any comments? > > If you want a workaround, the workaround should be to disable SIMT if > UDR reductions are seen, rather than erroring out. > So e.g. in lower_rec_simd_input_clauses for sctx->is_simt if sctx->max_vf > isn't 1 look for OMP_CLAUSE_REDUCTION with OMP_CLAUSE_REDUCTION_PLACEHOLDER > and punt (set max_vf = 1) in that case. >
Thanks for the review, I've tried to implement this, see patch below. > The right thing is to implement it properly of course. Ack, I've taken a look, and for me itd doesn't look like a below-a-day kind of task, so unfortunately I don't have the time for this right now. Thanks, - Tom
[openmp, simt] Disable SIMT for user-defined reduction The test-case included in this patch contains this target region: ... for (int i0 = 0 ; i0 < N0 ; i0++ ) counter_N0.i += 1; ... When running with nvptx accelerator, the counter variable is expected to be N0 after the region, but instead is N0 / 32. The problem is that rather than getting the result for all warp lanes, we get it for just one lane. This is caused by the implementation of SIMT being incomplete. It handles regular reductions, but appearantly not user-defined reductions. For now, handle this by disabling SIMT in this case, specifically by setting sctx->max_vf to 1. Tested libgomp on x86_64-linux with nvptx accelerator. gcc/ChangeLog: 2021-05-03 Tom de Vries <tdevr...@suse.de> PR target/100321 * omp-low.c (lower_rec_input_clauses): Disable SIMT for user-defined reduction. libgomp/ChangeLog: 2021-05-03 Tom de Vries <tdevr...@suse.de> PR target/100321 * testsuite/libgomp.c/target-44.c: New test. --- gcc/omp-low.c | 8 ++++++++ libgomp/testsuite/libgomp.c/target-44.c | 27 +++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 7b122059c6e..bb8d3188c26 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -4385,6 +4385,14 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx, sctx->max_vf = lower_bound (sctx->max_vf, safe_len); } } + if (sctx->is_simt && !known_eq (sctx->max_vf, 1U)) + { + tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt), + OMP_CLAUSE_REDUCTION); + if (c && OMP_CLAUSE_REDUCTION_PLACEHOLDER (c)) + /* UDR reductions are not supported yet for SIMT, disable SIMT. */ + sctx->max_vf = 1; + } if (maybe_gt (sctx->max_vf, 1U)) { sctx->idx = create_tmp_var (unsigned_type_node); diff --git a/libgomp/testsuite/libgomp.c/target-44.c b/libgomp/testsuite/libgomp.c/target-44.c new file mode 100644 index 00000000000..13e0c757845 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/target-44.c @@ -0,0 +1,27 @@ +/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */ + +#include <stdlib.h> + +struct s +{ + int i; +}; + +#pragma omp declare reduction(+: struct s: omp_out.i += omp_in.i) + +int +main (void) +{ + const int N0 = 32768; + + struct s counter_N0 = { 0 }; +#pragma omp target +#pragma omp for simd reduction(+: counter_N0) + for (int i0 = 0 ; i0 < N0 ; i0++ ) + counter_N0.i += 1; + + if (counter_N0.i != N0) + abort (); + + return 0; +}