https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564
--- Comment #29 from Jeffrey A. Law <law at redhat dot com> ---
So to bring this BZ back to the core questions (the scope seems to have widened
through the year since this originally reported). Namely are the use of LTO or
C++ making things slower, particularly for scimark's LU factorization test.
>From my experiments, the answer is a very clear yes. I hacked up the test a bit
to only run LU and run a fixed number of iterations. That makes comparisons
with something like callgrind much easier.
Use of C++ adds 2-3% in terms of instruction counts. LTO adds an additional
2-3% to the instruction counts. These are additive, C++ with LTO is about 5%
higher than C without LTO.
The time (not surprisingly) is lost in LU_factor, the main culprit seems to be
this pair of nested loops:
int ii;
for (ii=j+1; ii<M; ii++)
{
double *Aii = A[ii];
double *Aj = A[j];
double AiiJ = Aii[j]; /* Here */
int jj;
for (jj=j+1; jj<N; jj++)
Aii[jj] -= AiiJ * Aj[jj];
}
Callgrind calls out the marked line, which probably in reality means the
preheader for the inner loop. For C w/o LTO it's ~12million instructions. For
C++ with LTO it's ~21million instructions (remember, I'm just running LU and
for a relatively small number of iterations).
It's a bit of a surprise as these loops are dead simple, but it appears we've
got to be doing something dumb somewhere. Hopefully that narrows things down a
bit.