Hi! I guess many people have noticed, that for floating point multiplications the msp430-gcc does not use the hardware multiplier, but IAR does. I will present a solution for discussion. All source code fragments are given with a leading line number.
Floating point arithmetics is implementd in gcc-3.2.3/gcc/fp-bit.c and gcc-3.2.3/gcc/config/fp-bit.h fp-bit.c defines float type only (no double): 2: #define FLOAT_ONLY This results in fp-bit.h: 110: #ifdef FLOAT_ONLY 111: #define NO_DI_MODE 112: #endif The reason for this can be located in fp-bit.h: 102: typedef unsigned int UHItype __attribute__ ((mode (HI))); 103: typedef unsigned int USItype __attribute__ ((mode (SI))); 104: typedef unsigned int UDItype __attribute__ ((mode (DI))); which defines UHItype to be a 16 bit integer, USItype, 32 bits and unfortunately UDItype 32 bit, too. Normally one would expect 64 bits as noted in <http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/gcc/vector-extensions.html>. And now to the problem itself - fp-bit.c: 635: #if defined(NO_DI_MODE) 636: { 637: fractype x = a->fraction.ll; 638: fractype ylow = b->fraction.ll; 639: fractype yhigh = 0; 640: int bit; 641: 642: /* ??? This does multiplies one bit at a time. Optimize. */ 643: for (bit = 0; bit < FRAC_NBITS; bit++) 644: { 645: int carry; 646: 647: if (x & 1) 648: { 649: carry = (low += ylow) < ylow; 650: high += yhigh + carry; 651: } 652: yhigh <<= 1; 653: if (ylow & FRACHIGH) 654: { 655: yhigh |= 1; 656: } 657: ylow <<= 1; 658: x >>= 1; 659: } 660: } 661: #elif defined(FLOAT) 662: /* Multiplying two USIs to get a UDI, we're safe. */ 663: { 664: UDItype answer = (UDItype)a->fraction.ll * (UDItype)b->fraction.ll; 665: 666: high = answer >> BITS_PER_SI; 667: low = answer; 668: } 669: #else 670: ... Because NO_DI_MODE is defined, the 1st if-branch is taken, which obvoisly implements a software multiplication. the 2nd branch cannot be taken, because UDItype is only 32 bits wide. Proposal: Replace lines 637 to 659 with unsigned long long int answer; /*64 bits*/ /*64 bits <- 32 bits * 32 bits*/ answer = (unsigned long long int)a->fraction.ll * (unsigned long int)b->fraction.ll; high = answer >> 32; low = answer; to get a hardware multiplication, if a hardware multiplier is available, which gives quite a nice speedup. (But remember, that unpacking / packing takes also some time, so don't expect a speedup like for integer multiplication.) Drawbacks: As everyone can see, the proposed solution is fixed to the actual type definitions used in the msp430-gcc. One solution might be to get the UDItype 64 bits wide (but I don't know how to do this) and to not define NO_DI_MODE, which would result in the 2nd branch. Ralf
