https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932

--- Comment #14 from Tom de Vries <vries at gcc dot gnu.org> ---
An observation when playing around with vector-length-128-4.c: there are two
ways in which I can make the example pass.

1. add barrier.sync.aligned 0 or membar.cta after first broad-cast receive

2. unroll loop in first broad-cast send.

At first glance, it doesn't look entirely trivial though to implement either.

Reply via email to