I am tuning a population genetics simulation, and after replacing the two
main computational kernels (formerly intrinsics and SLEEF) with ISPC code,
my reference benchmark performance regressed from ~4.5s to ~19s.
Here is the ISPC code:
// Update the frequencies performing n *= exp(w * growth_factor) on count
elements
export uniform double update_frequencies_sum_n(
uniform uint len,
uniform double n_arr[],
uniform const double w_arr[],
uniform double growth_factor
) {
double sum_n = 0;
foreach (i = 0 ... len) {
double temp = w_arr[i] * growth_factor;
temp = exp(temp);
n_arr[i] *= temp;
sum_n += n_arr[i];
}
return reduce_add(sum_n);
}
/// Normalize n to sum of 1 and sum w weighted by n
export uniform double renormalize_n_weighted_sum_w(
uniform uint len,
uniform double n_arr[],
uniform const double w_arr[],
uniform double sum_n
) {
double sum_w = 0;
foreach (i = 0 ... len) {
n_arr[i] /= sum_n;
sum_w += n_arr[i] * w_arr[i];
}
return reduce_add(sum_w);
}
Both this and the original kernels with SLEEF + intrinsics were using AVX2
(i64x4 settings for ISPC/SLEEF). The original kernels included a scalar
peel loop for alignment, a pure vector loop, and a scalar remainder loop to
process extra elements.
I noticed that the performance difference was much milder if using floats
instead of doubles for the data being processed (~1.2s original kernel
switched to floats, ~2s ISPC) but the higher precision is needed here and
the performance is still much worse with the ISPC code.
Is there some performance optimization I'm missing with this ISPC code?
--
You received this message because you are subscribed to the Google Groups
"Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/ispc-users/e5303487-9c0c-40f9-91a4-51a1b93f0bb0o%40googlegroups.com.