https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86029
--- Comment #3 from Zsolt <hg2ecz at ham dot hu> --- smallrx$ clang-6.0 -O3 -march=native rx.c -S smallrx$ cat rx.s | grep mulsc callq __mulsc3 callq __mulsc3 callq __mulsc3 smallrx$ gcc -O3 -march=native rx.c -S smallrx$ cat rx.s | grep mulsc call __mulsc3@PLT call __mulsc3@PLT call __mulsc3@PLT What is the difference between gcc's and clang's __mulsc3? More test result: $ gcc -O1 -march=native rx.c -lm -o rx $ dd if=/dev/urandom bs=1M count=30 | time -p ./rx 145500000 145525000 f > /dev/null ... gcc -O1 isn't slower than gcc -O2 or gcc -O3 ??? $ clang-6.0 -O1 -march=native rx.c -lm -o rx $ dd if=/dev/urandom bs=1M count=30 | time -p ./rx 145500000 145525000 f > /dev/null ... 5x faster the "clang -O1" than "gcc -O3".