Author: jketema Date: Mon Oct 9 11:36:48 2017 New Revision: 315228 URL: http://llvm.org/viewvc/llvm-project?rev=315228&view=rev Log: Make ptx barrier work irrespective of the cl_mem_fence_flags
This generates a "bar.sync 0” instruction, which not only causes the threads to wait, but does acts as a memory fence, as required by OpenCL. The fence does not differentiate between local and global memory. Unfortunately, there is no similar instruction which does not include a memory fence. Hence, we cannot optimize the case where neither CLK_LOCAL_MEM_FENCE nor CLK_GLOBAL_MEM_FENCE is passed. Modified: libclc/trunk/ptx-nvidiacl/lib/synchronization/barrier.cl Modified: libclc/trunk/ptx-nvidiacl/lib/synchronization/barrier.cl URL: http://llvm.org/viewvc/llvm-project/libclc/trunk/ptx-nvidiacl/lib/synchronization/barrier.cl?rev=315228&r1=315227&r2=315228&view=diff ============================================================================== --- libclc/trunk/ptx-nvidiacl/lib/synchronization/barrier.cl (original) +++ libclc/trunk/ptx-nvidiacl/lib/synchronization/barrier.cl Mon Oct 9 11:36:48 2017 @@ -1,8 +1,6 @@ #include <clc/clc.h> _CLC_DEF void barrier(cl_mem_fence_flags flags) { - if (flags & CLK_LOCAL_MEM_FENCE) { - __syncthreads(); - } + __syncthreads(); } _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits