jdoerfert added inline comments.
================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:65-68
+ asm volatile("barrier.sync %0;"
+ :
+ : "r"(barrier)
+ : "memory");
----------------
ABataev wrote:
> Why not `__syncthreads`? It is safer to use `__syncthreads` as it is
> `convergent`. Would be good to mark this code somehow as `convergent` too to
> avoid incorrect optimizations
The problem is that syncthreads is basically a `bar.sync` which is a
`barrier.sync.aligned`, if I understood everything properly. This worked so far
because the "main thread" (lane 0, last warp) was alone in it's warp and all
other threads have been terminated. Now, we simplify the control flow (and
later get rid of the last warp) such that the threads of the last warp and the
main thread will hit different barriers. The former hit the one in the state
machine while the latter will be in `parallel_51`. The `.aligned` version
doesn't allow that. Does that make sense?
I'm not concerned about convergent though, we solved that wholesale: We mark
all functions that clang compiles for the GPU via openmp-target as convergent
(IIRC). The entire device runtime is certainly convergent.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D101976/new/
https://reviews.llvm.org/D101976
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits