https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564

--- Comment #30 from Jeffrey A. Law <law at redhat dot com> ---
Looking at the dumps, the tests for whether or not to use the vectorized loop
are considerably more complex with LTO with memory references as well when
compared to the non-LTO version.  It's almost as-if the program-wide visibility
provided by LTO has *hidden* redundancies and mucked up the IV analysis.

The first block of the LTO check looks like:

  # ivtmp.66_807 = PHI <ivtmp.87_773(55), ivtmp.66_812(67)>
  Aii_206 = MEM[base: A_139, index: ivtmp.66_807, step: 8, offset: 0B];
  _208 = Aii_206 + _765;
  AiiJ_209 = *_208;
  _449 = SOR_size_15 > i_167;
  _763 = (unsigned int) i_167;
  _762 = _686 - _763;
  _444 = _762 > 8;
  _443 = _444 & _449;
  _438 = Aii_206 + _760;
  _433 = _434 >= _438;
  _425 = Aii_206 + _435;
  _424 = _425 >= _429;
  _423 = _424 | _433;
  _422 = _423 & _443;
  if (_422 != 0)
    goto <bb 57>; [80.00%]
  else
    goto <bb 65>; [20.00%]

Compared to the non-LTO block:

  _156 = ivtmp.112_242 + 16;
  _155 = prephitmp_168 + _156;
  _151 = Aii_72 + ivtmp.112_242;
  _150 = _151 >= _155;
  _146 = Aii_72 + _156;
  _142 = prephitmp_168 + ivtmp.112_242;
  _141 = _142 >= _146;
  _140 = _141 | _150;
  _139 = _140 & _160;
  if (_139 != 0)
    goto <bb 23>;
  else
    goto <bb 32>;


Then the next block in the check (LTO):

;;   basic block 57, loop depth 3, count 0, freq 66, maybe hot
;;    prev block 56, next block 58, flags: (NEW, REACHABLE, VISITED)
;;    pred:       56 [80.0%]  (TRUE_VALUE,EXECUTABLE)
  niters.2_408 = SOR_size_15 > i_167 ? _762 : 1;
  _366 = (unsigned long) _425;
  _365 = _366 >> 3;
  _364 = -_365;
  _363 = (unsigned int) _364;
  prolog_loop_niters.4_367 = _363 & 1;
  _740 = (unsigned int) ivtmp.86_769;
  _509 = _686 + 4294967294;
  _738 = _509 - _740;
  _295 = SOR_size_15 > i_167 ? _738 : 0;
  _283 = prolog_loop_niters.4_367 == 0 ? 1 : 2;
  if (_283 > _295)
    goto <bb 63>; [10.00%]
  else
    goto <bb 58>; [90.00%]


non-LTO:

;;   basic block 23, loop depth 2, count 0, freq 2, maybe hot
;;    prev block 22, next block 24, flags: (NEW, REACHABLE)
;;    pred:       22 [80.0%]  (TRUE_VALUE,EXECUTABLE)
  _119 = (unsigned long) _151;
  _118 = _119 & 15;
  _117 = _118 >> 3;
  _116 = -_117;
  _115 = (unsigned int) _116;
  _114 = _115 & 1;
  prolog_loop_niters.46_120 = MIN_EXPR <_114, ivtmp.115_244>;
  if (prolog_loop_niters.46_120 == 0)
    goto <bb 25>;
  else
    goto <bb 24>;


Egad.  No wonder LTO loses.  I don't think the loop iterates enough to make up
for this mess.

Reply via email to