https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2018-02-01 00:00:00         |2025-2-10

--- Comment #78 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Thomas Koenig from comment #77)
> Just wondering... has this been fixed in the meantime?

With GCC 3.4.6 I get a score of 7074.5 while GCC 15 gets 3304.7, with
-fno-tree-pre the score is 7001.

So no.

Note even GCC 4.0.4 gets just 3342.7, with -fno-tree-pre its 5449.3 there.

These are now measurements on the Zen4 uarch.

Profile from 3.4.6:

       │150:┌─→mov    %rdx,(%rdi,%rax,8)                                      
▒
   215 │    │  mov    %rcx,(%rdi,%r9,8)                                       
▒
   473 │    │  mov    %rax,%r9                                                
▒
    11 │    │  lea    (%r9,%r9,1),%rdx                                        
▒
   291 │    │  cmp    %r8,%rdx                                                
▒
       │    │↓ ja     197                                                     
▒
       │164:│  cmp    %r8,%rdx                                                
▒
   110 │    │  mov    %rdx,%rax                                               
▒
   225 │    │↓ jae    17d                                                     
▒
       │    │  mov    0x8(%rdi,%rdx,8),%rbx                                   
▒
  3603 │    │  cmp    %rbx,(%rdi,%rdx,8)                                      
▒
  1924 │    │  lea    0x1(%rdx),%rax                                          
▒
   364 │    │  cmovge %rdx,%rax                                               
▒
  1554 │17d:│  mov    (%rdi,%r9,8),%rdx                                       
▒
  1681 │    │  mov    (%rdi,%rax,8),%rcx                                      
◆
  7092 │    ├──cmp    %rcx,%rdx                                               
▒
  1130 │    └──jl     150

and from GCC 15:

       │240:┌─→mov    %rbp,(%r12)                                             
▒
   799 │    │  mov    %r8,(%rax)                                              
▒
    77 │    │  lea    (%rdx,%rdx,1),%rax                                      
▒
   113 │    │  cmp    %rax,%rdi                                               
▒
     5 │    │↓ jb     2b0                                                     
▒
       │250:│  mov    %rdx,%rbp                                               
▒
    92 │253:│  cmp    %rdi,%rax                                               
▒
     3 │    │↓ jae    271                                                     
▒
       │    │  lea    0x1(%rax),%rdx                                          
▒
    77 │    │  mov    %rbp,%r15                                               
▒
   125 │    │  lea    (%rbx,%rdx,8),%r12                                      
▒
   265 │    │  shl    $0x4,%r15                                               
▒
   448 │    │  mov    (%r12),%r8                                              
▒
  3436 │    │  cmp    %r8,(%rbx,%r15,1)                                       
▒
   770 │    │↓ jl     27c                                                     
▒
       │271:│  lea    (%rbx,%rax,8),%r12                                      
▒
  3344 │    │  mov    %rax,%rdx                                               
◆
    11 │    │  mov    (%r12),%r8                                              
▒
   923 │27c:│  lea    (%rbx,%rbp,8),%rax                                      
▒
  3927 │    │  mov    (%rax),%rbp                                             
▒
  2308 │    ├──cmp    %r8,%rbp                                                
▒
    51 │    └──jl     240

so it looks like the if-conversion done by GCC 3.4 is missing here and the
PRE of the (%r12) load to %r8 is inhibiting it.

Reply via email to