The basic idea is for each thread to save own result in SLM then read back all 
workgroup results. I've correctted the initial mistake with SLM offset as 
follows:

Correctness problem 1:
/* write result data to SLM with offset using offset (threadID + 
threadNum*groupID) */
threadNum = sel.selReg(ocl::threadn, ir::TYPE_U32);
sel.MUL(addr, groupID, threadNum);
sel.ADD(addr, addr, sel.selReg(ocl::threadid, ir::TYPE_U32));
sel.MUL(addr, addr, GenRegister::immud(4));
sel.ADD(addr, addr, GenRegister::immud(slmAddr));
Should this be enough to ensure SLM conflicts don't occur ?

Correctness problem 2:
It looks that the initial input data pertains only to the initial local 
workgroup. Hence if
I have (1 2 3 ... 128) as input src and 4 threads (x16 elem each) x 2 
workgroups only the elements (1 2 3.. 64) are read for each workgroup.

Efficiency problem 1:
In a configuration of 512 elements (local size 64) only 4 threads launch at a 
time (threadID register). The execution looks like bellow. Launching 4096 or 
more with 64 local size issues again only 4 threads. I would expect that this 
is a waste of hardware...Is this normal ? Or maybe this is for each EU ?
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 
2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 
3 3 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 
2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 
3 3 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3


_______________________________________________
Beignet mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/beignet

Reply via email to