https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85381

--- Comment #6 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 43992
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43992&action=edit
tentative patch

(In reply to Tom de Vries from comment #4)
> This looks like a JIT bug, but with this tentative patch:
> ...
> diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
> index 8c478c874bd..ac394ee1ae6 100644
> --- a/gcc/config/nvptx/nvptx.c
> +++ b/gcc/config/nvptx/nvptx.c
> @@ -4479,7 +4479,7 @@ nvptx_process_pars (parallel *par)
>           threads = nvptx_mach_vector_length ();
>         }
>  
> -      if (!empty || !is_call)
> +      if (!(empty || is_call))
>         {
>           /* Insert begin and end synchronizations.  */
>           emit_insn_before (nvptx_cta_sync (barrier, threads),
> ...
> no barriers are generated, and the minimized testcase passes.

Actually, this was a bit more complicated than that.

The condition correctly identifies the situation of a call and no state
propagation as not needing barriers.

But in the case of not a call (so, fork/join), even if there's no state
propagation, we need synchronization at the end of worker and vector loops.

[ More precisely, we need it inbetween loops, but we're currently not detecting
that situation.

And even more precisely, we need it inbetween dependent loops, but we also
currently not detecting that situation. ]

This patch skips the first of the bar.syncs, keeping the one after the loop,
and that allows parallel-loop-1.c to pass with vector length 128.

Reply via email to