Re: Problem exposed by recent ARM multiply changes

Segher Boessenkool Thu, 26 Sep 2019 15:35:36 -0700

On Fri, Sep 27, 2019 at 12:30:50AM +0200, Jakub Jelinek wrote:
> On Thu, Sep 26, 2019 at 05:17:40PM -0500, Segher Boessenkool wrote:
> > On Wed, Sep 25, 2019 at 10:06:13PM -0600, Jeff Law wrote:
> > > (insn 14 13 16 2 (parallel [
> > >             (set (reg:SI 132)
> > >                 (plus:SI (mult:SI (zero_extend:DI (reg/v:SI 115 [ sec ]))
> > >                         (zero_extend:DI (reg:SI 124)))
> > >                     (reg:SI 130)))
> > >             (set (reg:SI 133 [+4 ])
> > >                 (plus:SI (truncate:SI (lshiftrt:DI (plus:DI (mult:DI 
> > > (zero_extend:DI (reg/v:SI 115 [ sec ]))
> > >                                     (zero_extend:DI (reg:SI 124)))
> > >                                 (zero_extend:DI (reg:SI 130)))
> > >                             (const_int 32 [0x20])))
> > >                     (reg:SI 131 [+4 ])))
> > >         ]) "j.c":10:54 60 {umlal}
> > >      (expr_list:REG_DEAD (reg:SI 131 [+4 ])
> > >         (expr_list:REG_DEAD (reg:SI 130)
> > >             (expr_list:REG_DEAD (reg:SI 124)
> > >                 (expr_list:REG_DEAD (reg/v:SI 115 [ sec ])
> > >                     (nil))))))
> > 
> > This is not a correct pattern for umlal (it should have a carry from the
> > low half of the addition to the high half).
> 
> Are you sure?
> "The UMLAL instruction interprets the values from Rn and Rm as unsigned
> integers.  It multiplies these integers, and adds the 64-bit result to the
> 64-bit unsigned integer contained in RdHi and RdLo."


Yes.  It adds two 64-bit numbers.  That is not the same as adding the
top and bottom 32-bit halfs of that separately.

> Appart from the bug I've provided untested fix in the PR, the above pattern
> seems to do what the documentation says, the low 32 bits are
> (unsigned int) (Rn*Rm+RdLo), and the whole result for unsigned int
> Rn, Rm, RdHi and RdLo is
> (unsigned long long)Rn*Rm+((unsigned long long)RdHi<<32)+RdLo
> and the upper 32 bits of that are IMHO equivalent to
> (unsigned) (((unsigned long long)Rn*Rm+RdLo)>>32)+RdHi

The upper half of the result should be one bigger if the addition of the
low half carries.


Segher

Re: Problem exposed by recent ARM multiply changes

Reply via email to