https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104722

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|og11 (devel/omp/gcc-11)     |12.0
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
In this case there is actually no rounding error at all.
The numbers are small enough that everything is representable in double.
It is even representable in int, so changing the testcase to s/double/int/g
and s/\.0//g should work too.
If OMP_NUM_THREADS=1 is used, the result is the same in both.
If OMP_NUM_THREADS=2 is used and I add extra debugging printouts in the lambda
(print omp_get_thread_num (), tot, vA * vA and tot + vA * vA), then it seems in
that case thread 0 computes correctly the sum of 1^2 to 499^2 like it does in
the OMP_NUM_THREADS=1 case, but in thread 1 the first call is with
tot 500, vA^2 251001 (aka vA 501) and then keeps adding up to vA^2 998001.0
(aka vA 999) and finally there are extra 2 lambda calls in thread 0,
one that sums up 0 and the previously computed sum from thread 0,
and another one that sums up the result of the above and the accumulated result
from thread 1 (again, squared).
I don't know how exactly std::accumulate constraints what the lambda can or
can't do, but if it is to be parallelized in any way (and doesn't really matter
if using OpenMP or TBB or plain pthread_create managed threads etc.), it at
least needs to compute the partial results from each thread and then needs to
add those together.  If that "add things together" is done through the same
lambda as the rest, then this lambda isn't appropriate for that, because it
doesn't treat the operands the same, one isn't squared and one is squared.
Even in thread 0 it starts with tot = 0, vA = 1 rather than what you'd probably
expect - tot = 0, vA = 0, but as 0 * 0 is 0, it doesn't make a difference.
So, if std::accumulate's lambda is allowed to treat the two operands
differently,
instead we'd need to invoke in thread 0 with tot = 0, vA = 0, then tot = prev,
vA = 1 until tot = prev, vA = 499 and in thread 1 with tot = 0, vA = 500, until
tot = prev, vA = 999.  But we need to somehow add the 2 results together.
If more than two threads participate in the work, of course more extra
accumulations are needed.

Reply via email to