https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed|2018-02-01 00:00:00 |2025-2-10 --- Comment #78 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Thomas Koenig from comment #77) > Just wondering... has this been fixed in the meantime? With GCC 3.4.6 I get a score of 7074.5 while GCC 15 gets 3304.7, with -fno-tree-pre the score is 7001. So no. Note even GCC 4.0.4 gets just 3342.7, with -fno-tree-pre its 5449.3 there. These are now measurements on the Zen4 uarch. Profile from 3.4.6: │150:┌─→mov %rdx,(%rdi,%rax,8) ▒ 215 │ │ mov %rcx,(%rdi,%r9,8) ▒ 473 │ │ mov %rax,%r9 ▒ 11 │ │ lea (%r9,%r9,1),%rdx ▒ 291 │ │ cmp %r8,%rdx ▒ │ │↓ ja 197 ▒ │164:│ cmp %r8,%rdx ▒ 110 │ │ mov %rdx,%rax ▒ 225 │ │↓ jae 17d ▒ │ │ mov 0x8(%rdi,%rdx,8),%rbx ▒ 3603 │ │ cmp %rbx,(%rdi,%rdx,8) ▒ 1924 │ │ lea 0x1(%rdx),%rax ▒ 364 │ │ cmovge %rdx,%rax ▒ 1554 │17d:│ mov (%rdi,%r9,8),%rdx ▒ 1681 │ │ mov (%rdi,%rax,8),%rcx ◆ 7092 │ ├──cmp %rcx,%rdx ▒ 1130 │ └──jl 150 and from GCC 15: │240:┌─→mov %rbp,(%r12) ▒ 799 │ │ mov %r8,(%rax) ▒ 77 │ │ lea (%rdx,%rdx,1),%rax ▒ 113 │ │ cmp %rax,%rdi ▒ 5 │ │↓ jb 2b0 ▒ │250:│ mov %rdx,%rbp ▒ 92 │253:│ cmp %rdi,%rax ▒ 3 │ │↓ jae 271 ▒ │ │ lea 0x1(%rax),%rdx ▒ 77 │ │ mov %rbp,%r15 ▒ 125 │ │ lea (%rbx,%rdx,8),%r12 ▒ 265 │ │ shl $0x4,%r15 ▒ 448 │ │ mov (%r12),%r8 ▒ 3436 │ │ cmp %r8,(%rbx,%r15,1) ▒ 770 │ │↓ jl 27c ▒ │271:│ lea (%rbx,%rax,8),%r12 ▒ 3344 │ │ mov %rax,%rdx ◆ 11 │ │ mov (%r12),%r8 ▒ 923 │27c:│ lea (%rbx,%rbp,8),%rax ▒ 3927 │ │ mov (%rax),%rbp ▒ 2308 │ ├──cmp %r8,%rbp ▒ 51 │ └──jl 240 so it looks like the if-conversion done by GCC 3.4 is missing here and the PRE of the (%r12) load to %r8 is inhibiting it.