http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55623
--- Comment #3 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2012-12-09 11:18:56 UTC --- (In reply to comment #2) > This is an ARM (both arm32 and arm64) specific issue due to the shifts being > "free". If you look at the mips assembly, it looks good for a dual issue > processor as it is scheduled as an add followed by a shift. > > I think the issue is reassocdoes not know that shifts are free on arm. This does not look like only an ARM issue. To properly demonstrate it on MIPS and even without dual-issue, all the additions can be just changed with multiplications (because it is a long latency instruction). In this case we get: unsigned int f1(unsigned int x) { unsigned int a, b; a = x >> 1; b = x >> 2; a *= x >> 3; b *= x >> 4; a *= x >> 5; b *= x >> 6; a *= x >> 7; b *= x >> 8; a *= x >> 9; b *= x >> 10; a *= x >> 11; b *= x >> 12; a *= x >> 13; b *= x >> 14; a *= x >> 15; b *= x >> 16; a *= x >> 17; b *= x >> 18; a *= x >> 19; b *= x >> 20; a *= x >> 21; b *= x >> 22; a *= x >> 23; b *= x >> 24; return a * b; } unsigned int f2(unsigned int x) { unsigned int a, b; a = x >> 1; b = x >> 2; a *= x >> 3; b *= x >> 4; a *= x >> 5; b *= x >> 6; a *= x >> 7; b *= x >> 8; a *= x >> 9; b *= x >> 10; a *= x >> 11; b *= x >> 12; a *= x >> 13; b *= x >> 14; a *= x >> 15; b *= x >> 16; a *= x >> 17; b *= x >> 18; a *= x >> 19; b *= x >> 20; a *= x >> 21; b *= x >> 22; a *= x >> 23; b *= x >> 24; asm ("" : "+r" (a)); return a * b; } And the benchmark run on MIPS 74K: $ gcc -O2 -march=mips32r2 -mtune=74kc -o badschedmul badschedmul.c $ time ./badchedmul 1 real 0m34.934s user 0m34.689s sys 0m0.073s $ time ./badchedmul 2 real 0m19.261s user 0m19.122s sys 0m0.050s The symptoms are still the same. GCC just merges two independent calculations into a single dependency chain. While I would have expected it to be the other way around (breaking dependency chains to run faster on the target CPU).