http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51269
Bug #: 51269 Summary: Vectorization profitability threshold is not actually used Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: jamb...@gcc.gnu.org Host: x86_64-linux-gnu Target: x86_64-linux-gnu Created attachment 25883 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25883 Testcase When analyzing performance of 410.bwaves from SPEC 2006 I've been compiling a slightly modified source and noticed that vectorizer generates only useless run-time checks for the "Profitability threshold." It is observable when compiling the attached simple file with trunk revision 181552 with the following options: /path/to/compiler/gfortran -S -Ofast -g -funroll-loops -fpeel-loops -fno-prefetch-loop-arrays -march=bdver1 -mtune=bdver1 -mveclibabi=svml -DSPEC_CPU_LP64 test.f -fdump-tree-vect-details -fverbose-asm --param min-vect-loop-bound=8 The last parameters just forces the threshold to be 19, so that it is easier to spot in the IL than the original 2. And the IL part with the test looks like this: D.1883_6 = *nb_5(D); ... D.1962_99 = (character(kind=4)) D.1883_6; D.1963_100 = D.1962_99 <= 19; if (D.1963_100 != 0) goto <bb 17>; else goto <bb 18>; <bb 17>: prologue_after_cost_adjust.24_101 = (character(kind=4)) D.1883_6; <bb 18>: Invalid sum of incoming frequencies 2800, should be 1400 # prologue_after_cost_adjust.24_102 = PHI <prologue_after_cost_adjust.24_101(17), prolog_loop_niters.22_90(16)> D.1965_103 = prolog_loop_niters.22_90 == 0; if (D.1965_103 != 0) goto <bb 22>; else goto <bb 19>; The thing is that prologue_after_cost_adjust.24_102 does not appear in the IL anywhere else (all occurrences of all SSA names of the variable are in the above snippet), the phi node is useless together with the condition and both are promptly removed by subsequent passes. I therefore think the vectorizer either should not generate this calculation at all or should do something with the result. Finally, the whole point of the examination was to explore ways of avoiding an expensive prologue when the number of iterations is small, because then the vectorizer actually makes the generated code much slower. It would therefore be great if we could avoid as much of the prologue as possible when the threshold is not exceeded.