Control: retitle -1 beignet: silently does nothing on large arrays
(previous discussion:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=781875 )
This isn't a Haswell-specific problem (and might not be ICD-specific
either: did you test that at these sizes, or only with the testsuite?),
it's a large-array-specific problem: I simply hadn't tried arrays that
big before.
The array size required to trigger it decreases as the number of arrays
(arguments to the kernel being run, not total existing) increases,
suggesting a total-memory limit: on my system, approximately
2x240Mifloat, 3x160Mifloat, or 5x100Mifloat, but these vary ~10% from
run to run. (Hence, re-adding the per-array size limit probably
wouldn't completely avoid the problem, though I haven't actually tried
that.)
These sizes do _not_ appear to depend on free system memory, but due to
their variability and the limited range I can test before running into
"running out of memory in beignet hangs the entire system"
(https://bugs.launchpad.net/ubuntu/+source/beignet/+bug/1354086 ), I
cannot be completely sure of this.
#!/usr/bin/env python3
#Depends: python3-pyopencl python3-numpy
from __future__ import division,print_function
import pyopencl
import pyopencl.array
import numpy as np
import time
import pyopencl.clmath
ctx=pyopencl.create_some_context()
cq=pyopencl.CommandQueue(ctx)
asize=100*(2**20)#fails above approx. 235 for 2-array, 162 for 3-array, 100 for 5-array, but the exact number varies
#Warning: very large sizes will hang your system, https://bugs.launchpad.net/ubuntu/+source/beignet/+bug/1354086
aCL=pyopencl.array.arange(cq,0,asize,1,dtype='float32')
bCL=pyopencl.array.arange(cq,0,asize,1,dtype='float32')
cCL=pyopencl.array.arange(cq,0,asize,1,dtype='float32')
dCL=pyopencl.array.arange(cq,0,asize,1,dtype='float32')
eCL=pyopencl.array.arange(cq,0,asize,1,dtype='float32')
print("CL arrays created")
ans=aCL[0:1000].get()*4
f2=pyopencl.elementwise.ElementwiseKernel(ctx,pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *a,"+pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *c","c[i]=3*a[i]+c[i]","twoarray")
f3=pyopencl.elementwise.ElementwiseKernel(ctx,pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *a,"+pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *b,"+pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *c","c[i]=3*a[i]+b[i]","threearray")
f5=pyopencl.elementwise.ElementwiseKernel(ctx,pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *a,"+pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *b,"+pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *c,"+pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *d,"+pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *e","c[i]=a[i]+b[i]+d[i]+e[i]","fivearray")
f5b=pyopencl.elementwise.ElementwiseKernel(ctx,pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *a,"+pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *b,"+pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *c,"+pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *d,"+pyopencl.tools.dtype_to_ctype(aCL.dtype)+" *e","c[i]=4*e[i]","fivearray_usetwo")
f2(aCL,cCL).wait()
#f3(aCL,bCL,cCL).wait()
f5(aCL,bCL,cCL,dCL,eCL).wait()
#f5b(aCL,bCL,cCL,dCL,eCL).wait()
print("size",len(aCL)," error ",np.max(np.nan_to_num(np.abs(cCL[0:1000].get()-ans))),"first 10 ",ans[0:10],cCL[0:10].get())