jdoerfert added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:65-68
+  asm volatile("barrier.sync %0;"
+               :
+               : "r"(barrier)
+               : "memory");
----------------
ABataev wrote:
> Why not `__syncthreads`? It is safer to use `__syncthreads` as it is 
> `convergent`. Would be good to mark this code somehow as `convergent` too to 
> avoid incorrect optimizations
The problem is that syncthreads is basically a `bar.sync` which is a 
`barrier.sync.aligned`, if I understood everything properly. This worked so far 
because the "main thread" (lane 0, last warp) was alone in it's warp and all 
other threads have been terminated. Now, we simplify the control flow (and 
later get rid of the last warp) such that the threads of the last warp and the 
main thread will hit different barriers. The former hit the one in the state 
machine while the latter will be in `parallel_51`. The `.aligned` version 
doesn't allow that. Does that make sense?

I'm not concerned about convergent though, we solved that wholesale: We mark 
all functions that clang compiles for the GPU via openmp-target as convergent 
(IIRC). The entire device runtime is certainly convergent.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101976/new/

https://reviews.llvm.org/D101976

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to