https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91776
Bug ID: 91776 Summary: `-fsplit-paths` generates slower code on arm Product: gcc Version: 8.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: yhr-_-yhr at qq dot com Target Milestone: --- I'm doing this test on a Raspberry Pi Model 3B+. The CPU is BCM2835 ARMv7. Writing a silly program calculating the cycle length of Fibonacci sequence modulo n. version: gcc (Raspbian 8.3.0-6+rpi1) 8.3.0 #include <stdio.h> #include <time.h> typedef unsigned int uint; typedef unsigned long long ullong; int main(){ uint m; ullong cyc=0,lastcyc=0; clock_t lastclock=0; for(m=2;;m++){ uint a=0, b=1, n=0; do{ b+=a; a=b-a; n++; if(b>=m) b-=m; }while( a!=0|| b!=1 ); cyc+=n; //if(n>=4*m) // printf("%u: %u %.2f\n",m,n,(double)n/m); if(cyc-lastcyc>100000000){ clock_t now=clock(); printf("~ %.0f loop/s\n",(double)(cyc-lastcyc)/(now-lastclock)*CLOCKS_PER_SEC); lastclock=now; lastcyc=cyc; } } } (1) pi@rpi:~/Desktop $ gcc -Wall -march=native -mtune=native -o fibmod -O2 fibmod.c pi@rpi:~/Desktop $ ./fibmod ~ 240755135 loop/s ~ 277965738 loop/s ~ 276675919 loop/s ~ 277244469 loop/s ~ 277207289 loop/s ~ 277303633 loop/s ^C (2) pi@rpi:~/Desktop $ gcc -Wall -march=native -mtune=native -o fibmod -O2 -fsplit-paths fibmod.c pi@rpi:~/Desktop $ ./fibmod ~ 137691044 loop/s ~ 144593838 loop/s ~ 144397428 loop/s ~ 144519131 loop/s ~ 144392500 loop/s ^C Also tested with `-Ofast -nofsplit-paths`, the speed measured is almost same as (1). On other hardware with x86_64 arch, this option doesn't seem to make observable difference in running time. btw, clang without `-march=mative -mtune-native` also produces the same speed as (1), but with these two options, the speed is even higher. (3) pi@rpi:~/Desktop $ clang -Wall -march=native -mtune=native -o fibmodclang -Ofast fibmod.c pi@rpi:~/Desktop $ ./fibmodclang ~ 291343047 loop/s ~ 347350967 loop/s ~ 349217005 loop/s ~ 349320149 loop/s ~ 349367926 loop/s ~ 349372536 loop/s ^C