https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81069

--- Comment #2 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
(In reply to me from comment #1)
> With trunk r239357, the problem disappears when disabling the "Neuter whole
> SESE regions" optimization:
> 
>     --- gcc/config/nvptx/nvptx.c
>     +++ gcc/config/nvptx/nvptx.c
>     @@ -3719,7 +3719,7 @@ nvptx_neuter_pars (parallel *par, unsigned modes,
> unsigned outer)
>          {
>            int ix, len;
>      
>     -      if (nvptx_optimize)
>     +      if (0 && nvptx_optimize)
>             {
>               /* Neuter whole SESE regions.  */
>               bb_pair_vec_t regions;
> 
> Editing out any insignificant code changes, only the following diff remains
> in the PTX code generated with that optimization disabled/enabled:
> 
>     --- ["Neuter whole SESE regions" disabled]
>     +++ ["Neuter whole SESE regions" enabled (default)]
>     @@ -253,20 +253,22 @@
>                     setp.eq.u64     %r145, %r99, %r155;
>                     selp.u32        %r166, 1, 0, %r145;
>                     st.shared.u32   [__worker_bcast], %r166;
>      $L20:
>      $L19:
>                     bar.sync        0;
>                     ld.shared.u32   %r167, [__worker_bcast];
>                     setp.ne.u32     %r145, %r167, 0;
>                     bar.sync        1;
>             @!%r145 bra.uni $L9;
>     +       @%r162  bra.uni $L17;
>     +       @%r163  bra     $L18;
>                     bra     $L8;
>      $L15:
>             @%r162  bra.uni $L23;
>             @%r163  bra     $L24;
>                     ld.u32  %r146, [%r93+12];
>                     setp.ne.u32     %r148, %r146, %r142;
>                     selp.u32        %r170, 1, 0, %r148;
>                     st.shared.u32   [__worker_bcast], %r170;
>      $L24:
>      $L23:
>     @@ -291,22 +293,20 @@
>      $L6:
>             @%r162  bra.uni $L25;
>             @%r163  bra     $L26;
>             {
>                     call _gfortran_abort;
>                     trap; // (noreturn)
>             }
>      $L26:
>      $L25:
>      $L8:
>     -       @%r162  bra.uni $L17;
>     -       @%r163  bra     $L18;
>                     add.u32 %r56, %r56, %r87;
>                     add.u32 %r152, %r56, -1;
>                     setp.le.s32     %r154, %r152, %r158;
>                     selp.u32        %r164, 1, 0, %r154;
>                     st.shared.u32   [__worker_bcast], %r164;
>      $L18:
>      $L17:
>                     bar.sync        0;
>                     ld.shared.u32   %r165, [__worker_bcast];
>                     setp.ne.u32     %r154, %r165, 0;

I played with this some more.  Hand-editing the PTX code, I found that we still
get the hang if adding back the two !W0, !V0 "bra"s after "$L8" -- which, to my
current understanding, should make the code identical in behavior to the "good"
version.  (And, I do confirm that when additionally removing the two !W0, !V0
"bra"s before "$L15", the hand-edited code works fine.)

So, after all, maybe this also involves a problem (wrong code generation) in
PTX -> SASS compilation ("ptxas")?  (I say "also" because I still think that
the missing !W0, !V0 neutering/skipping after "$L8" is wrong.)


I also found that removing the "call _gfortran_abort" and/or "trap" and/or
replacing these with an "exit", we still see the hang, so I would assume it's
not something related to PR80035.

Reply via email to