Author: jvesely Date: Thu Mar 8 10:58:07 2018 New Revision: 327044 URL: http://llvm.org/viewvc/llvm-project?rev=327044&view=rev Log: amdgcn,popcount: Workaround broken llvm.ctpop intrinsic on some GCN ASICs
This is only really needed for VI+ ASICs. However, llvm would cast the value to i32 for older asics anyway. The proper fix is in LLVM-7 (r326535). Fixes CTS popcount on carrizo. Reviewer: Aaron Watry <awa...@gmail.com> Signed-off-by: Jan Vesely <jan.ves...@rutgers.edu> Added: libclc/trunk/amdgcn/lib/integer/ libclc/trunk/amdgcn/lib/integer/popcount.cl libclc/trunk/amdgcn/lib/integer/popcount.inc Modified: libclc/trunk/amdgcn/lib/SOURCES Modified: libclc/trunk/amdgcn/lib/SOURCES URL: http://llvm.org/viewvc/llvm-project/libclc/trunk/amdgcn/lib/SOURCES?rev=327044&r1=327043&r2=327044&view=diff ============================================================================== --- libclc/trunk/amdgcn/lib/SOURCES (original) +++ libclc/trunk/amdgcn/lib/SOURCES Thu Mar 8 10:58:07 2018 @@ -1,4 +1,5 @@ cl_khr_int64_extended_atomics/minmax_helpers.ll +integer/popcount.cl math/ldexp.cl mem_fence/fence.cl synchronization/barrier.cl Added: libclc/trunk/amdgcn/lib/integer/popcount.cl URL: http://llvm.org/viewvc/llvm-project/libclc/trunk/amdgcn/lib/integer/popcount.cl?rev=327044&view=auto ============================================================================== --- libclc/trunk/amdgcn/lib/integer/popcount.cl (added) +++ libclc/trunk/amdgcn/lib/integer/popcount.cl Thu Mar 8 10:58:07 2018 @@ -0,0 +1,6 @@ +#include <clc/clc.h> +#include <utils.h> +#include <integer/popcount.h> + +#define __CLC_BODY "popcount.inc" +#include <clc/integer/gentype.inc> Added: libclc/trunk/amdgcn/lib/integer/popcount.inc URL: http://llvm.org/viewvc/llvm-project/libclc/trunk/amdgcn/lib/integer/popcount.inc?rev=327044&view=auto ============================================================================== --- libclc/trunk/amdgcn/lib/integer/popcount.inc (added) +++ libclc/trunk/amdgcn/lib/integer/popcount.inc Thu Mar 8 10:58:07 2018 @@ -0,0 +1,17 @@ +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE popcount(__CLC_GENTYPE x) { +/* LLVM-4+ implements i16 ops for VI+ ASICs. However, ctpop implementation + * is missing until r326535. Therefore we have to convert sub i32 types to uint + * as a workaround. */ +#if __clang_major__ < 7 && __clang_major__ > 3 && __CLC_GENSIZE < 32 + /* Prevent sign extension on uint conversion */ + const __CLC_U_GENTYPE y = __CLC_XCONCAT(as_, __CLC_U_GENTYPE)(x); + /* Convert to uintX */ + const __CLC_XCONCAT(uint, __CLC_VECSIZE) z = __CLC_XCONCAT(convert_uint, __CLC_VECSIZE)(y); + /* Call popcount on uintX type */ + const __CLC_XCONCAT(uint, __CLC_VECSIZE) res = __clc_native_popcount(z); + /* Convert the result back to gentype. */ + return __CLC_XCONCAT(convert_, __CLC_GENTYPE)(res); +#else + return __clc_native_popcount(x); +#endif +} _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits