Hi!

I guess many people have noticed, that for floating point multiplications
the msp430-gcc does not use the hardware multiplier, but IAR does. I will
present a solution for discussion. All source code fragments are given with
a leading line number.

Floating point arithmetics is implementd in gcc-3.2.3/gcc/fp-bit.c and
gcc-3.2.3/gcc/config/fp-bit.h

fp-bit.c defines float type only (no double):
2:      #define FLOAT_ONLY

This results in fp-bit.h:
110:    #ifdef FLOAT_ONLY
111:    #define NO_DI_MODE
112:    #endif

The reason for this can be located in fp-bit.h:
102:    typedef unsigned int UHItype __attribute__ ((mode (HI)));
103:    typedef unsigned int USItype __attribute__ ((mode (SI)));
104:    typedef unsigned int UDItype __attribute__ ((mode (DI)));

which defines UHItype to be a 16 bit integer, USItype, 32 bits and
unfortunately UDItype 32 bit, too. Normally one would expect 64 bits as
noted in
<http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/gcc/vector-extensions.html>.

And now to the problem itself - fp-bit.c:
635:    #if defined(NO_DI_MODE)
636:        {
637:          fractype x = a->fraction.ll;
638:          fractype ylow = b->fraction.ll;
639:          fractype yhigh = 0;
640:          int bit;
641:    
642:          /* ??? This does multiplies one bit at a time.  Optimize.  */
643:          for (bit = 0; bit < FRAC_NBITS; bit++)
644:            {
645:              int carry;
646:    
647:              if (x & 1)
648:                {
649:                  carry = (low += ylow) < ylow;
650:                  high += yhigh + carry;
651:                }
652:              yhigh <<= 1;
653:              if (ylow & FRACHIGH)
654:                {
655:                  yhigh |= 1;
656:                }
657:              ylow <<= 1;
658:              x >>= 1;
659:            }
660:        }
661:    #elif defined(FLOAT) 
662:        /* Multiplying two USIs to get a UDI, we're safe.  */
663:        {
664:          UDItype answer = (UDItype)a->fraction.ll *
(UDItype)b->fraction.ll;
665:          
666:          high = answer >> BITS_PER_SI;
667:          low = answer;
668:        }
669:    #else
670:    ...

Because NO_DI_MODE is defined, the 1st if-branch is taken, which obvoisly
implements a software multiplication. the 2nd branch cannot be taken,
because UDItype is only 32 bits wide.


Proposal: Replace lines 637 to 659 with
unsigned long long int answer; /*64 bits*/
/*64 bits <- 32 bits * 32 bits*/
answer = (unsigned long long int)a->fraction.ll * (unsigned long
int)b->fraction.ll;
high = answer >> 32;
low = answer;

to get a hardware multiplication, if a hardware multiplier is available,
which gives quite a nice speedup. (But remember, that unpacking / packing
takes also some time, so don't expect a speedup like for integer
multiplication.)


Drawbacks: As everyone can see, the proposed solution is fixed to the actual
type definitions used in the msp430-gcc. One solution might be to get the
UDItype 64 bits wide (but I don't know how to do this) and to not define
NO_DI_MODE, which would result in the 2nd branch.


Ralf

Reply via email to