https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed| |2017-02-06 CC| |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Ok, so what I see is (good assembly): .L4: movq (%r15,%rax,8), %rcx vmovsd (%rcx,%rbx), %xmm0 vandpd %xmm3, %xmm0, %xmm0 vucomisd %xmm1, %xmm0 vmaxsd %xmm1, %xmm0, %xmm1 cmova %eax, %edx addq $1, %rax cmpl %eax, %r14d jg .L4 vs. .L4: movq (%r15,%rax,8), %rdx movl %eax, %edi addq $1, %rax vmovsd (%rdx,%rbx), %xmm0 vandpd %xmm3, %xmm0, %xmm0 vucomisd %xmm1, %xmm0 jbe .L56 cmpl %eax, %r14d jle .L68 vmovapd %xmm0, %xmm1 movl %edi, %r8d jmp .L4 .p2align 4,,10 .p2align 3 .L56: cmpl %eax, %r14d jg .L4 ... .L68: movl %edi, %r8d jmp .L8 ... which is split-paths going amok again on the no longer GIMPLE if-converted IL: <bb 6> [16.86%]: # jp_62 = PHI <j_94(5), jp_176(8)> # t_184 = PHI <t_70(5), t_175(8)> # ivtmp.59_327 = PHI <ivtmp.59_329(5), ivtmp.59_328(8)> i_185 = (int) ivtmp.59_327; _180 = MEM[base: A_69(D), index: ivtmp.59_327, step: 8, offset: 0B]; _179 = _180 + _2; _178 = *_179; ab_177 = ABS_EXPR <_178>; if (ab_177 > t_184) goto <bb 7>; [50.00%] else goto <bb 8>; [50.00%] <bb 7> [8.43%]: <bb 8> [16.86%]: # jp_176 = PHI <jp_62(6), i_185(7)> # t_175 = PHI <t_184(6), ab_177(7)> ivtmp.59_328 = ivtmp.59_327 + 1; if (ivtmp.59_328 != _339) goto <bb 6>; [85.00%] so I wonder whether -fno-split-paths restores the performance? It's the threading heuristic again btw. and both preds of the joiner are empty. The loop is basically a max-index reduction.