On Wed, 21 Mar 2018, David G. Wonnacott wrote:

for i in 0..w-1 {
  xm1 = 0.0;
  y1.resetScalars();
  for j in 0..h-1 {
    y1.set(i,j, a1*imgIn[i,j] + a2*xm1 + b1*y1.get_mRW() + b2*y1.get_pW());
    xm1 = imgIn[i,j];
  }
}

What is the advantage of using set/get as above instead of something like

        // Assume w > 1 and h > 1

        forall i in 0..w-1
        {
                var xy3 = imgIn[i, 0];
                var xy2 = a1 * xy3;
                var xy1 = 0.0;
                var xy0 = xy1;

                // should this loop be unrolled

                for j in 1..h-1
                {
                        y1[i, j-1] = xy2;

                        xy0 = xy1
                        xy1 = xy2;
                        xy2 = xy3;
                        xy3 = imgIn[i, j];

                        // the following computation should get
                        // done in parallel with any branching??

                        xy2 = a1 * xy3 + a2 * xy2 + b1 * xy1 + b2 * xy0;
                }
                y1[i, h-1] = xy2;
        }

Do you need to consider reimplementing the kernel

        xy2 = a1 * xy3 + a2 * xy2 + b1 * xy1 + b2 * xy0;

such that it gets compiled into vector instructions but that probably
means a lot more memory transfer traffic which we probably do not want.

Sorry if it is a dumb question?

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Reply via email to