http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51269

             Bug #: 51269
           Summary: Vectorization profitability threshold is not actually
                    used
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: jamb...@gcc.gnu.org
              Host: x86_64-linux-gnu
            Target: x86_64-linux-gnu


Created attachment 25883
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25883
Testcase

When analyzing performance of 410.bwaves from SPEC 2006 I've been
compiling a slightly modified source and noticed that vectorizer
generates only useless run-time checks for the "Profitability
threshold."

It is observable when compiling the attached simple file with trunk
revision 181552 with the following options:

/path/to/compiler/gfortran -S -Ofast -g -funroll-loops -fpeel-loops
-fno-prefetch-loop-arrays -march=bdver1 -mtune=bdver1 -mveclibabi=svml
-DSPEC_CPU_LP64 test.f -fdump-tree-vect-details -fverbose-asm --param
min-vect-loop-bound=8

The last parameters just forces the threshold to be 19, so that it is
easier to spot in the IL than the original 2.  And the IL part with
the test looks like this:

  D.1883_6 = *nb_5(D);
  ...

  D.1962_99 = (character(kind=4)) D.1883_6;
  D.1963_100 = D.1962_99 <= 19;
  if (D.1963_100 != 0)
    goto <bb 17>;
  else
    goto <bb 18>;

<bb 17>:
  prologue_after_cost_adjust.24_101 = (character(kind=4)) D.1883_6;

<bb 18>:
Invalid sum of incoming frequencies 2800, should be 1400
  # prologue_after_cost_adjust.24_102 = PHI
<prologue_after_cost_adjust.24_101(17), prolog_loop_niters.22_90(16)>
  D.1965_103 = prolog_loop_niters.22_90 == 0;
  if (D.1965_103 != 0)
    goto <bb 22>;
  else
    goto <bb 19>;

The thing is that prologue_after_cost_adjust.24_102 does not appear in
the IL anywhere else (all occurrences of all SSA names of the variable
are in the above snippet), the phi node is useless together with the
condition and both are promptly removed by subsequent passes.  I
therefore think the vectorizer either should not generate this
calculation at all or should do something with the result.

Finally, the whole point of the examination was to explore ways of
avoiding an expensive prologue when the number of iterations is small,
because then the vectorizer actually makes the generated code much
slower.  It would therefore be great if we could avoid as much of the
prologue as possible when the threshold is not exceeded.

Reply via email to