+++ This bug was initially created as a clone of Bug #36487 +++ The code generation by GCC 4.4 (trunk) for 68K/Coldfire will run slow on the SuperScalar pipelines of the 68060 and Coldfire V4/V5 cores.
if you compilining this example: uint32_t fletcher( uint16_t *data, size_t len ) { uint32_t sum1 = 0xffff, sum2 = 0xffff; while (len) { unsigned tlen = len > 360 ? 360 : len; len -= tlen; do { sum1 += *data++; sum2 += sum1; } while (--tlen); sum1 = (sum1 & 0xffff) + (sum1 >> 16); sum2 = (sum2 & 0xffff) + (sum2 >> 16); } /* Second reduction step to reduce sums to 16 bits */ sum1 = (sum1 & 0xffff) + (sum1 >> 16); sum2 = (sum2 & 0xffff) + (sum2 >> 16); return sum2 << 16 | sum1; } with "m68k-linux-gnu-gcc -mcpu=54470 -fomit-frame-pointer -O3 -S -o example.s example.c" Then you will see that this code is created: 1 clr.w %d3 2 swap %d3 3 clr.w %d4 4 swap %d4 Instruction 2 depends on instruction 1 Instruction 4 depends on instruction 3 A simple reorder of the code to have the instruction in that order would double the performance as now Superscaler design as 68060 or V5 Coldfire can execute more instruction in parrallel 1 clr.w %d3 2 clr.w %d4 3 swap %d3 4 swap %d4 GCC does not try to reduce the instruction dependencies. The Code that GCC generates does not follow the scheduling recommendation for 68040/68060 and above multiscalar CPUs. Can you please be so kind and correct this? -- Summary: Generated 68K code bad for pipelining (case swap) Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: gunnar at greyhound-data dot com GCC host triplet: m68k-linux-gnu GCC target triplet: m68k-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36488