Dmitry,
On Thursday, September 20, 2018 at 5:05:34 PM UTC-6, Dmitry Babokin wrote:
>
> First of all, I don't see how foreach may increase parallelism in this
> case, as the swap happen for varying values, not for scalars. I.e. the
> following code fragment
> float aux[2];
> aux[0] = c[2*x + 0];
> aux[1] = c[2*x + 1];
> c[2*x + 0] = c[2*y + 0];
> c[2*x + 1] = c[2*y + 1];
> c[2*y + 0] = aux[0];
> c[2*y + 1] = aux[1];
> operates on vectors. x and y are vectors, so all accesses to "c" are
> effectively gathers.
>
Even if there's no more parallelism available, I was hoping to expose more
concurrency to give the compiler (or a future instance of the compiler)
more scheduling flexibility.
> As for why foreach behaves differently than "serial" version.
> Documentation for foreach states that nested foreach not supported. It's
> not exactly accurate statement. The accurate statement would be that
> foreach reestablishes execution mask and for the context, which already has
> an execution mask, which is intended to be preserved, this will cause
> unpredictable effects.
>
Ah, that makes the situation a lot clearer. Could you please add that
explanation to the documentation?
Practically speaking, you can use "printf("x=%, y=%\n", x, y);" to print
> the real values in both cases and see that the behaviour is not the one
> that you expect. Note that printed values in double braces are masked out.
>
[printf → print]
Okay, that's a handy debugging tip.
Thanks for the help,
— Scott
--
You received this message because you are subscribed to the Google Groups
"Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.