https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

            Bug ID: 79390
           Summary: 10% performance drop in SciMark2 LU after r242550
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: krister.walfridsson at gmail dot com
  Target Milestone: ---

Created attachment 40677
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40677&action=edit
The relevant source code and generated asm before/after this change

The dense LU matrix factorization test from the old SciMark2
(http://math.nist.gov/scimark) used in the Phoronix compiler test suite has
regressed 10% compared to the November trunk when run on Intel i7 6800K
Broadwell (compiled with "-O3 -march=native"). GCC 6 generated much slower
code, so this is not a regression compared to released versions of the
compiler.

The regression was introduced in r242550:
------------------------------------------------------------------------
r242550 | wschmidt | 2016-11-17 15:22:17 +0100 (tor, 17 nov 2016) | 18 lines

[gcc]

2016-11-17  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
            Richard Biener  <rguent...@suse.de>

        PR tree-optimization/77848
        * tree-if-conv.c (tree_if_conversion): Always version loops unless
        the user specified -ftree-loop-if-convert.

[gcc/testsuite]

2016-11-17  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
            Richard Biener  <rguent...@suse.de>

        PR tree-optimization/77848
        * gfortran.dg/vect/pr77848.f: New test.
------------------------------------------------------------------------
and has the effect that the pivot-finding loop

    int LU_factor(int M, int N, double **A,  int *pivot)
    {
      int minMN =  M < N ? M : N;
      int j=0;

      for (j=0; j<minMN; j++)
      {
        /* find pivot in column j and  test for singularity. */

        int jp=j;
        int i;

        double t = fabs(A[j][j]);
        for (i=j+1; i<M; i++)
        {
          double ab = fabs(A[i][j]);
          if ( ab > t)
          {
            jp = i;
            t = ab;
          }
        }

        pivot[j] = jp;
        ...

is transformed. The perf output seems to say that this is due to bad branch
prediction, but I do not understand x86 assembler enough to be able to
determine its cause (or to say if it really is a bug or just some random thing
the compiler cannot know about...)

Reply via email to