https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91776

            Bug ID: 91776
           Summary: `-fsplit-paths` generates slower code on arm
           Product: gcc
           Version: 8.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yhr-_-yhr at qq dot com
  Target Milestone: ---

I'm doing this test on a Raspberry Pi Model 3B+. The CPU is BCM2835 ARMv7.
Writing a silly program calculating the cycle length of Fibonacci sequence
modulo n.

version: gcc (Raspbian 8.3.0-6+rpi1) 8.3.0

#include <stdio.h>
#include <time.h>
typedef unsigned int uint;
typedef unsigned long long ullong;
int main(){
        uint m;
        ullong cyc=0,lastcyc=0;
        clock_t lastclock=0;
        for(m=2;;m++){
                uint
                        a=0,
                        b=1,
                        n=0;
                do{
                        b+=a;
                        a=b-a;
                        n++;
                        if(b>=m)
                                b-=m;
                }while(
                        a!=0||
                        b!=1
                );
                cyc+=n;
                //if(n>=4*m)
                //      printf("%u: %u %.2f\n",m,n,(double)n/m);
                if(cyc-lastcyc>100000000){
                        clock_t now=clock();
                        printf("~ %.0f
loop/s\n",(double)(cyc-lastcyc)/(now-lastclock)*CLOCKS_PER_SEC);
                        lastclock=now;
                        lastcyc=cyc;
                }
        }
}

(1)
pi@rpi:~/Desktop $ gcc -Wall -march=native -mtune=native -o fibmod -O2 
fibmod.c 
pi@rpi:~/Desktop $ ./fibmod
~ 240755135 loop/s
~ 277965738 loop/s
~ 276675919 loop/s
~ 277244469 loop/s
~ 277207289 loop/s
~ 277303633 loop/s
^C

(2)
pi@rpi:~/Desktop $ gcc -Wall -march=native -mtune=native -o fibmod -O2
-fsplit-paths fibmod.c 
pi@rpi:~/Desktop $ ./fibmod
~ 137691044 loop/s
~ 144593838 loop/s
~ 144397428 loop/s
~ 144519131 loop/s
~ 144392500 loop/s
^C

Also tested with `-Ofast -nofsplit-paths`, the speed measured is almost same as
(1).

On other hardware with x86_64 arch, this option doesn't seem to make observable
difference in running time.

btw, clang without `-march=mative -mtune-native` also produces the same speed
as (1), but with these two options, the speed is even higher.

(3)
pi@rpi:~/Desktop $ clang -Wall -march=native -mtune=native -o fibmodclang
-Ofast fibmod.c 
pi@rpi:~/Desktop $ ./fibmodclang 
~ 291343047 loop/s
~ 347350967 loop/s
~ 349217005 loop/s
~ 349320149 loop/s
~ 349367926 loop/s
~ 349372536 loop/s
^C

Reply via email to