https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109690

            Bug ID: 109690
           Summary: bad SLP vectorization on zen
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

model name      : AMD Ryzen 7 5800X 8-Core Processor
reproduces on my znver1 laptop too.

h@ryzen3:~/gcc-kub/build/gcc> cat tt.c
int a[100];

[[gnu::noipa]]
void loop()
{
          for (int i = 0; i < 3; i++)
                  a[i]+=a[i];
}
int
main()
{
        for (int j = 0; j < 1000000000; j++)
          loop ();
        return 0;
}


jh@ryzen3:~/gcc-kub/build/gcc> ./xgcc -B ./ -O2 -march=native tt.c ; perf stat
./a.out

 Performance counter stats for './a.out':

           2683.95 msec task-clock:u                     #    1.000 CPUs
utilized             
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec        
                52      page-faults:u                    #   19.374 /sec        
       13001141361      cycles:u                         #    4.844 GHz        
                (83.31%)
            691180      stalled-cycles-frontend:u        #    0.01% frontend
cycles idle        (83.31%)
            101980      stalled-cycles-backend:u         #    0.00% backend
cycles idle         (83.31%)
       12999928665      instructions:u                   #    1.00  insn per
cycle            
                                                  #    0.00  stalled cycles per
insn     (83.31%)
        3000013809      branches:u                       #    1.118 G/sec      
                (83.41%)
              1525      branch-misses:u                  #    0.00% of all
branches             (83.36%)

       2.684376360 seconds time elapsed

       2.684369000 seconds user
       0.000000000 seconds sys


jh@ryzen3:~/gcc-kub/build/gcc> ./xgcc -B ./ -O2 -march=native tt.c
-fno-tree-vectorize ; perf stat ./a.out

 Performance counter stats for './a.out':

           1238.92 msec task-clock:u                     #    1.000 CPUs
utilized             
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec        
                52      page-faults:u                    #   41.972 /sec        
        6000338140      cycles:u                         #    4.843 GHz        
                (83.21%)
            314660      stalled-cycles-frontend:u        #    0.01% frontend
cycles idle        (83.21%)
                 0      stalled-cycles-backend:u         #    0.00% backend
cycles idle         (83.23%)
        7999796562      instructions:u                   #    1.33  insn per
cycle            
                                                  #    0.00  stalled cycles per
insn     (83.53%)
        2999887795      branches:u                       #    2.421 G/sec      
                (83.53%)
               698      branch-misses:u                  #    0.00% of all
branches             (83.28%)

       1.239116606 seconds time elapsed

       1.239121000 seconds user
       0.000000000 seconds sys

Reply via email to