POCL built against LLVM 10 (sid) or LLVM 11 (experimental) causes a autopkgtest regression on armhf in libgpuarray while it succeeded with LLVM 9. https://ci.debian.net/packages/libg/libgpuarray/testing/armhf/ (The autopkgtest cannot be run in pure testing due to missing RC-buggy libclblas, it only works (and previously passed) in sid (or rather testing+sid). There are no problems on x86)
The failing test can be called with POCL_CACHE_DIR=$(mktemp -d)/pocl-cache \ DEVICE=opencl0:0 python3.9 -m nose -v pygpu.tests.test_blas It terminates with a segmentation fault in LLVM. The CL kernel is a piece of generated source code created by the (simplified) stack: python - libgpuarray - libclblas before it gets handed over to pocl. While I managed to extract the CL kernel source, I couldn't produce an OpenCL program that builds the kernel in the same way s.t. it triggers the segmentation fault. Backtraces from coredumps: #0 getEmissionKind () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/include/llvm/IR/DebugInfoMetadata.h:1244 #1 initialize () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/LexicalScopes.cpp:53 #2 0xf827f2f0 in computeIntervals () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/LiveDebugVariables.cpp:979 #3 runOnMachineFunction () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/LiveDebugVariables.cpp:996 #4 runOnMachineFunction () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/LiveDebugVariables.cpp:1023 #5 0xf82f46c8 in runOnFunction () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/MachineFunctionPass.cpp:73 #6 0xf816e494 in runOnFunction () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/IR/LegacyPassManager.cpp:1481 #7 0xf816e750 in runOnModule () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/IR/LegacyPassManager.cpp:1517 #8 0xf816eba8 in runOnModule () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/IR/LegacyPassManager.cpp:1582 #9 run () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/IR/LegacyPassManager.cpp:1694 #10 0xfdcd2446 in pocl_llvm_codegen (Device=Device@entry=0x839f60, Modp=0x321bdc0, Output=Output@entry=0xfffea5b4, OutputSize=OutputSize@entry=0xfffea5c8) at ./lib/CL/pocl_llvm_wg.cc:624 #11 0xfdc9669e in llvm_codegen (output=output@entry=0x3763c40 "/tmp/tmp.hvljjDK8aD/pocl-cache/EG/BKJEEKFFENDHPDCNOBDADIAOJNAPPBJKDBOEM/Sdot_kernel/0-0-0/Sdot_kernel.so", device_i=device_i@entry=0, kernel=kernel@entry=0xfffebf88, device=0x839f60, command=command@entry=0xfffebfc0, specialize=specialize@entry=0) at ./lib/CL/devices/common.c:158 #12 0xfdc98304 in pocl_check_kernel_disk_cache (command=command@entry=0xfffebfc0, specialized=specialized@entry=0) at ./lib/CL/devices/common.c:958 #13 0xfdc98722 in pocl_check_kernel_dlhandle_cache (command=0xfffebfc0, initial_refcount=0, specialize=0) at ./lib/CL/devices/common.c:1081 #14 0xfdc70534 in program_compile_dynamic_wg_binaries (program=program@entry=0x31af008) at ./lib/CL/pocl_build.c:179 #15 0xfdc8153c in get_binary_sizes (sizes=0xfffec0b8, program=0x31af008) at ./lib/CL/clGetProgramInfo.c:36 #16 POclGetProgramInfo (program=0x31af008, param_name=<optimized out>, param_value_size=4, param_value=0xfffec0b8, param_value_size_ret=0x0) at ./lib/CL/clGetProgramInfo.c:116 #17 0xcaf53722 in getSingleBinaryFromProgram (binary=std::vector of length 0, capacity 0, program=0x31af008) at ./src/library/blas/generic/binary_lookup.cc:392 #18 BinaryLookup::populateCache (this=this@entry=0xfffec138) at ./src/library/blas/generic/binary_lookup.cc:466 #19 0xcaf4f738 in makeKernelCached (device=0x839f60, context=0x820cd0, sid=sid@entry=320, key=key@entry=0xfffec2bc, kernelGenerator=kernelGenerator@entry=0xcaf7aad9 <generator(char*, size_t, SubproblemDim const*, PGranularity const*, void*)>, dims=0x2fb03d0, pgran=pgran@entry=0x2fb040c, extra=extra@entry=0xfffec304, buildOpts=buildOpts@entry=0xfffec55c "-g -DINCX_NONUNITY -DINCY_NONUNITY", error=error@entry=0xfffec240) at ./src/library/blas/generic/common2.cc:90 #20 0xcaf52662 in makeSolutionSeq (funcID=funcID@entry=CLBLAS_DOT, args=args@entry=0xfffec820, numCommandQueues=numCommandQueues@entry=1, commandQueues=commandQueues@entry=0x635598, numEventsInWaitList=numEventsInWaitList@entry=0, eventWaitList=eventWaitList@entry=0x0, events=events@entry=0xfffec6c4, seq=seq@entry=0xfffec6c8) at ./src/library/blas/generic/solution_seq_make.c:587 #21 0xcaf3e9b6 in doDot (kargs=kargs@entry=0xfffec820, N=1, dotProduct=<optimized out>, offDP=0, X=0xe2afe8, offx=1, incx=2, Y=0xab71a8, offy=1, incy=2, scratchBuff=0x9d7ff0, doConj=0, numCommandQueues=1, commandQueues=0x635598, numEventsInWaitList=0, eventWaitList=0x0, events=0xfffec974) at ./src/library/blas/xdot.c:132 #22 0xcaf3eac8 in clblasSdot (N=<optimized out>, dotProduct=<optimized out>, offDP=<optimized out>, X=0xe2afe8, offx=1, incx=2, Y=0xab71a8, offy=1, incy=2, scratchBuff=0x9d7ff0, numCommandQueues=1, commandQueues=0x635598, numEventsInWaitList=0, eventWaitList=0x0, events=0xfffec974) at ./src/library/blas/xdot.c:193 #23 0xfea574c2 in sdot (N=<optimized out>, X=0xde9630, offX=1, incX=2, Y=0x4ccd20, offY=1, incY=2, Z=0xdf7410, offZ=0) at ./src/gpuarray_blas_opencl_clblas.c:212 #24 0xfea4425c in GpuArray_rdot (X=X@entry=0xca593174, Y=Y@entry=0xca593134, Z=Z@entry=0xca5931b4, nocopy=nocopy@entry=0) at ./src/gpuarray_array_blas.c:77 #25 0xca38e7d4 in __pyx_f_5pygpu_4blas_pygpu_blas_rdot (__pyx_v_X=__pyx_v_X@entry=0xca593168, __pyx_v_Y=__pyx_v_Y@entry=0xca593128, __pyx_v_Z=__pyx_v_Z@entry=0xca5931a8, __pyx_v_nocopy=__pyx_v_nocopy@entry=0) at pygpu/blas.c:1931 #26 0xca38edb4 in __pyx_pf_5pygpu_4blas_dot (__pyx_self=<optimized out>, __pyx_v_overwrite_z=<optimized out>, __pyx_v_Z=0xca5931a8, __pyx_v_Y=<optimized out>, __pyx_v_X=<optimized out>) at pygpu/blas.c:2871 #27 __pyx_pw_5pygpu_4blas_1dot (__pyx_self=<optimized out>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at pygpu/blas.c:2757 #28 0x0009fff4 in cfunction_call (func=<built-in function dot>, args=<optimized out>, kwargs={'overwrite_z': True}) at ../Objects/methodobject.c:539 #29 0x00084ef8 in _PyObject_MakeTpCall (tstate=0x3eb0d8, callable=<built-in function dot>, args=0xfddde4b4, nargs=<optimized out>, keywords=<optimized out>) at ../Objects/call.c:191 #30 0x0007e618 in _PyObject_VectorcallTstate (kwnames=('overwrite_z',), nargsf=<optimized out>, args=<optimized out>, callable=<built-in function dot>, tstate=0x3eb0d8) at ../Include/cpython/abstract.h:116 #31 _PyObject_VectorcallTstate (kwnames=('overwrite_z',), nargsf=<optimized out>, args=<optimized out>, callable=<built-in function dot>, tstate=0x3eb0d8) at ../Include/cpython/abstract.h:103 #32 PyObject_Vectorcall (kwnames=('overwrite_z',), nargsf=<optimized out>, args=<optimized out>, callable=<built-in function dot>) at ../Include/cpython/abstract.h:127 [...] #0 getEmissionKind () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/include/llvm/IR/DebugInfoMetadata.h:1282 #1 initialize () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/LexicalScopes.cpp:54 #2 0xf7b02dfc in computeIntervals () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/LiveDebugVariables.cpp:971 #3 runOnMachineFunction () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/LiveDebugVariables.cpp:988 #4 runOnMachineFunction () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/LiveDebugVariables.cpp:1015 #5 0xf7b7c198 in runOnFunction () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/MachineFunctionPass.cpp:73 #6 0xf79d43e4 in runOnFunction () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/IR/LegacyPassManager.cpp:1516 #7 0xf79d990c in runOnModule () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/IR/LegacyPassManager.cpp:1552 #8 0xf79d494c in runOnModule () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/IR/LegacyPassManager.cpp:1617 #9 run () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/IR/LegacyPassManager.cpp:614 #10 0xfdcd1b52 in pocl_llvm_codegen (Device=Device@entry=0x81dca8, Modp=0x36e3150, Output=Output@entry=0xfffea5b4, OutputSize=OutputSize@entry=0xfffea5c8) at ./lib/CL/pocl_llvm_wg.cc:624 #11 0xfdc95bb6 in llvm_codegen (output=output@entry=0xe5ce88 "/tmp/tmp.9VVoi1yAx0/pocl-cache/EK/PMEOKDJDCAGIGHNLHHOJHFMFBJEIDPNFKHIHE/Sdot_kernel/0-0-0/Sdot_kernel.so", device_i=device_i@entry=0, kernel=kernel@entry=0xfffebf88, device=0x81dca8, command=command@entry=0xfffebfc0, specialize=specialize@entry=0) at ./lib/CL/devices/common.c:158 #12 0xfdc9781c in pocl_check_kernel_disk_cache (command=command@entry=0xfffebfc0, specialized=specialized@entry=0) at ./lib/CL/devices/common.c:958 #13 0xfdc97c3a in pocl_check_kernel_dlhandle_cache (command=0xfffebfc0, initial_refcount=0, specialize=0) at ./lib/CL/devices/common.c:1081 #14 0xfdc6fe20 in program_compile_dynamic_wg_binaries (program=program@entry=0x36df0f8) at ./lib/CL/pocl_build.c:179 #15 0xfdc80998 in get_binary_sizes (sizes=0xfffec0b8, program=0x36df0f8) at ./lib/CL/clGetProgramInfo.c:36 #16 POclGetProgramInfo (program=0x36df0f8, param_name=4453, param_value_size=4, param_value=0xfffec0b8, param_value_size_ret=0x0) at ./lib/CL/clGetProgramInfo.c:115 #17 0xca659722 in getSingleBinaryFromProgram (binary=std::vector of length 0, capacity 0, program=0x36df0f8) at ./src/library/blas/generic/binary_lookup.cc:392 #18 BinaryLookup::populateCache (this=this@entry=0xfffec138) at ./src/library/blas/generic/binary_lookup.cc:466 #19 0xca655738 in makeKernelCached (device=0x81dca8, context=0x821cd8, sid=sid@entry=320, key=key@entry=0xfffec2bc, kernelGenerator=kernelGenerator@entry=0xca680ad9 <generator(char*, size_t, SubproblemDim const*, PGranularity const*, void*)>, dims=0xe1b0c8, pgran=pgran@entry=0xe1b104, extra=extra@entry=0xfffec304, buildOpts=buildOpts@entry=0xfffec55c "-g -DINCX_NONUNITY -DINCY_NONUNITY", error=error@entry=0xfffec240) at ./src/library/blas/generic/common2.cc:90 #20 0xca658662 in makeSolutionSeq (funcID=funcID@entry=CLBLAS_DOT, args=args@entry=0xfffec820, numCommandQueues=numCommandQueues@entry=1, commandQueues=commandQueues@entry=0x635598, numEventsInWaitList=numEventsInWaitList@entry=0, eventWaitList=eventWaitList@entry=0x0, events=events@entry=0xfffec6c4, seq=seq@entry=0xfffec6c8) at ./src/library/blas/generic/solution_seq_make.c:587 #21 0xca6449b6 in doDot (kargs=kargs@entry=0xfffec820, N=1, dotProduct=<optimized out>, offDP=0, X=0xe5fc08, offx=1, incx=2, Y=0xe5f890, offy=1, incy=2, scratchBuff=0xe5d458, doConj=0, numCommandQueues=1, commandQueues=0x635598, numEventsInWaitList=0, eventWaitList=0x0, events=0xfffec974) at ./src/library/blas/xdot.c:132 #22 0xca644ac8 in clblasSdot (N=<optimized out>, dotProduct=<optimized out>, offDP=<optimized out>, X=0xe5fc08, offx=1, incx=2, Y=0xe5f890, offy=1, incy=2, scratchBuff=0xe5d458, numCommandQueues=1, commandQueues=0x635598, numEventsInWaitList=0, eventWaitList=0x0, events=0xfffec974) at ./src/library/blas/xdot.c:193 #23 0xfea574c2 in sdot (N=<optimized out>, X=0xe5ce08, offX=1, incX=2, Y=0x4ccd20, offY=1, incY=2, Z=0x7ba990, offZ=0) at ./src/gpuarray_blas_opencl_clblas.c:212 #24 0xfea4425c in GpuArray_rdot (X=X@entry=0xc9c99174, Y=Y@entry=0xc9c99134, Z=Z@entry=0xc9c991b4, nocopy=nocopy@entry=0) at ./src/gpuarray_array_blas.c:77 #25 0xc9a947d4 in __pyx_f_5pygpu_4blas_pygpu_blas_rdot (__pyx_v_X=__pyx_v_X@entry=0xc9c99168, __pyx_v_Y=__pyx_v_Y@entry=0xc9c99128, __pyx_v_Z=__pyx_v_Z@entry=0xc9c991a8, __pyx_v_nocopy=__pyx_v_nocopy@entry=0) at pygpu/blas.c:1931 #26 0xc9a94db4 in __pyx_pf_5pygpu_4blas_dot (__pyx_self=<optimized out>, __pyx_v_overwrite_z=<optimized out>, __pyx_v_Z=0xc9c991a8, __pyx_v_Y=<optimized out>, __pyx_v_X=<optimized out>) at pygpu/blas.c:2871 #27 __pyx_pw_5pygpu_4blas_1dot (__pyx_self=<optimized out>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at pygpu/blas.c:2757 #28 0x0009fff4 in cfunction_call (func=<built-in function dot>, args=<optimized out>, kwargs={'overwrite_z': True}) at ../Objects/methodobject.c:539 #29 0x00084ef8 in _PyObject_MakeTpCall (tstate=0x3eb0d8, callable=<built-in function dot>, args=0xfddde4b4, nargs=<optimized out>, keywords=<optimized out>) at ../Objects/call.c:191 #30 0x0007e618 in _PyObject_VectorcallTstate (kwnames=('overwrite_z',), nargsf=<optimized out>, args=<optimized out>, callable=<built-in function dot>, tstate=0x3eb0d8) at ../Include/cpython/abstract.h:116 #31 _PyObject_VectorcallTstate (kwnames=('overwrite_z',), nargsf=<optimized out>, args=<optimized out>, callable=<built-in function dot>, tstate=0x3eb0d8) at ../Include/cpython/abstract.h:103 #32 PyObject_Vectorcall (kwnames=('overwrite_z',), nargsf=<optimized out>, args=<optimized out>, callable=<built-in function dot>) at ../Include/cpython/abstract.h:127 [...] Andreas