https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81069
--- Comment #2 from Thomas Schwinge <tschwinge at gcc dot gnu.org> --- (In reply to me from comment #1) > With trunk r239357, the problem disappears when disabling the "Neuter whole > SESE regions" optimization: > > --- gcc/config/nvptx/nvptx.c > +++ gcc/config/nvptx/nvptx.c > @@ -3719,7 +3719,7 @@ nvptx_neuter_pars (parallel *par, unsigned modes, > unsigned outer) > { > int ix, len; > > - if (nvptx_optimize) > + if (0 && nvptx_optimize) > { > /* Neuter whole SESE regions. */ > bb_pair_vec_t regions; > > Editing out any insignificant code changes, only the following diff remains > in the PTX code generated with that optimization disabled/enabled: > > --- ["Neuter whole SESE regions" disabled] > +++ ["Neuter whole SESE regions" enabled (default)] > @@ -253,20 +253,22 @@ > setp.eq.u64 %r145, %r99, %r155; > selp.u32 %r166, 1, 0, %r145; > st.shared.u32 [__worker_bcast], %r166; > $L20: > $L19: > bar.sync 0; > ld.shared.u32 %r167, [__worker_bcast]; > setp.ne.u32 %r145, %r167, 0; > bar.sync 1; > @!%r145 bra.uni $L9; > + @%r162 bra.uni $L17; > + @%r163 bra $L18; > bra $L8; > $L15: > @%r162 bra.uni $L23; > @%r163 bra $L24; > ld.u32 %r146, [%r93+12]; > setp.ne.u32 %r148, %r146, %r142; > selp.u32 %r170, 1, 0, %r148; > st.shared.u32 [__worker_bcast], %r170; > $L24: > $L23: > @@ -291,22 +293,20 @@ > $L6: > @%r162 bra.uni $L25; > @%r163 bra $L26; > { > call _gfortran_abort; > trap; // (noreturn) > } > $L26: > $L25: > $L8: > - @%r162 bra.uni $L17; > - @%r163 bra $L18; > add.u32 %r56, %r56, %r87; > add.u32 %r152, %r56, -1; > setp.le.s32 %r154, %r152, %r158; > selp.u32 %r164, 1, 0, %r154; > st.shared.u32 [__worker_bcast], %r164; > $L18: > $L17: > bar.sync 0; > ld.shared.u32 %r165, [__worker_bcast]; > setp.ne.u32 %r154, %r165, 0; I played with this some more. Hand-editing the PTX code, I found that we still get the hang if adding back the two !W0, !V0 "bra"s after "$L8" -- which, to my current understanding, should make the code identical in behavior to the "good" version. (And, I do confirm that when additionally removing the two !W0, !V0 "bra"s before "$L15", the hand-edited code works fine.) So, after all, maybe this also involves a problem (wrong code generation) in PTX -> SASS compilation ("ptxas")? (I say "also" because I still think that the missing !W0, !V0 neutering/skipping after "$L8" is wrong.) I also found that removing the "call _gfortran_abort" and/or "trap" and/or replacing these with an "exit", we still see the hang, so I would assume it's not something related to PR80035.