Control: retitle -1 clblas: *gemm wrong answers in out-of-order queues Control: reassign -1 src:clblas Control: found -1 2.12-1
I think I've found the actual bug, in clblas src/library/blas/xgemm.cc: clblasGemm (with a single command queue) enqueues up to 4 kernels and returns an event that depends on only the last of them, so if the queue is out-of-order, waiting on this event doesn't necessarily wait for all of them to finish.
This was previously noticed in https://github.com/clMathLibraries/clBLAS/issues/269#issuecomment-225453543 , but not actually reported as a bug.
clblas includes a client/performance tester that creates an out-of-order queue (at src/client/clfunc_common.hpp:306), implying that it intends to allow such queues. (We don't run clblas' own tests, possibly because of https://github.com/clMathLibraries/clBLAS/issues/338.)
The real fix would be to return an event that depends on all the kernels' events (e.g. created with clEnqueueMarkerWithWaitList).
As a workaround for now, I intend to disable out-of-order queues in libgpuarray. (It appears to be the only reverse dependency of clblas that also uses out-of-order queues.)