https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564
--- Comment #30 from Jeffrey A. Law <law at redhat dot com> --- Looking at the dumps, the tests for whether or not to use the vectorized loop are considerably more complex with LTO with memory references as well when compared to the non-LTO version. It's almost as-if the program-wide visibility provided by LTO has *hidden* redundancies and mucked up the IV analysis. The first block of the LTO check looks like: # ivtmp.66_807 = PHI <ivtmp.87_773(55), ivtmp.66_812(67)> Aii_206 = MEM[base: A_139, index: ivtmp.66_807, step: 8, offset: 0B]; _208 = Aii_206 + _765; AiiJ_209 = *_208; _449 = SOR_size_15 > i_167; _763 = (unsigned int) i_167; _762 = _686 - _763; _444 = _762 > 8; _443 = _444 & _449; _438 = Aii_206 + _760; _433 = _434 >= _438; _425 = Aii_206 + _435; _424 = _425 >= _429; _423 = _424 | _433; _422 = _423 & _443; if (_422 != 0) goto <bb 57>; [80.00%] else goto <bb 65>; [20.00%] Compared to the non-LTO block: _156 = ivtmp.112_242 + 16; _155 = prephitmp_168 + _156; _151 = Aii_72 + ivtmp.112_242; _150 = _151 >= _155; _146 = Aii_72 + _156; _142 = prephitmp_168 + ivtmp.112_242; _141 = _142 >= _146; _140 = _141 | _150; _139 = _140 & _160; if (_139 != 0) goto <bb 23>; else goto <bb 32>; Then the next block in the check (LTO): ;; basic block 57, loop depth 3, count 0, freq 66, maybe hot ;; prev block 56, next block 58, flags: (NEW, REACHABLE, VISITED) ;; pred: 56 [80.0%] (TRUE_VALUE,EXECUTABLE) niters.2_408 = SOR_size_15 > i_167 ? _762 : 1; _366 = (unsigned long) _425; _365 = _366 >> 3; _364 = -_365; _363 = (unsigned int) _364; prolog_loop_niters.4_367 = _363 & 1; _740 = (unsigned int) ivtmp.86_769; _509 = _686 + 4294967294; _738 = _509 - _740; _295 = SOR_size_15 > i_167 ? _738 : 0; _283 = prolog_loop_niters.4_367 == 0 ? 1 : 2; if (_283 > _295) goto <bb 63>; [10.00%] else goto <bb 58>; [90.00%] non-LTO: ;; basic block 23, loop depth 2, count 0, freq 2, maybe hot ;; prev block 22, next block 24, flags: (NEW, REACHABLE) ;; pred: 22 [80.0%] (TRUE_VALUE,EXECUTABLE) _119 = (unsigned long) _151; _118 = _119 & 15; _117 = _118 >> 3; _116 = -_117; _115 = (unsigned int) _116; _114 = _115 & 1; prolog_loop_niters.46_120 = MIN_EXPR <_114, ivtmp.115_244>; if (prolog_loop_niters.46_120 == 0) goto <bb 25>; else goto <bb 24>; Egad. No wonder LTO loses. I don't think the loop iterates enough to make up for this mess.