On Wed, 21 Mar 2018, David G. Wonnacott wrote:
for i in 0..w-1 {
xm1 = 0.0;
y1.resetScalars();
for j in 0..h-1 {
y1.set(i,j, a1*imgIn[i,j] + a2*xm1 + b1*y1.get_mRW() + b2*y1.get_pW());
xm1 = imgIn[i,j];
}
}
What is the advantage of using set/get as above instead of something like
// Assume w > 1 and h > 1
forall i in 0..w-1
{
var xy3 = imgIn[i, 0];
var xy2 = a1 * xy3;
var xy1 = 0.0;
var xy0 = xy1;
// should this loop be unrolled
for j in 1..h-1
{
y1[i, j-1] = xy2;
xy0 = xy1
xy1 = xy2;
xy2 = xy3;
xy3 = imgIn[i, j];
// the following computation should get
// done in parallel with any branching??
xy2 = a1 * xy3 + a2 * xy2 + b1 * xy1 + b2 * xy0;
}
y1[i, h-1] = xy2;
}
Do you need to consider reimplementing the kernel
xy2 = a1 * xy3 + a2 * xy2 + b1 * xy1 + b2 * xy0;
such that it gets compiled into vector instructions but that probably
means a lot more memory transfer traffic which we probably do not want.
Sorry if it is a dumb question?
Regards - Damian
Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users