On Wed, 9 Jun 2021 at 02:05, Richard Henderson
<[email protected]> wrote:
>
> On 6/7/21 9:57 AM, Peter Maydell wrote:
> > +#define DO_LDAVH(OP, ESIZE, TYPE, H, XCHG, EVENACC, ODDACC, TO128) \
> > + uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, \
> > + void *vm, uint64_t a) \
> > + { \
> > + uint16_t mask = mve_element_mask(env); \
> > + unsigned e; \
> > + TYPE *n = vn, *m = vm; \
> > + Int128 acc = TO128(a); \
>
> This seems to miss the << 8.
Oops, yes it does.
> Which suggests that the whole thing can be done without Int128:
>
> > + for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
> > + if (mask & 1) { \
> > + if (e & 1) { \
> > + acc = ODDACC(acc, TO128(n[H(e - 1 * XCHG)] *
> > m[H(e)])); \
>
> tmp = n * m;
> tmp = (tmp >> 8) + ((tmp >> 7) & 1);
> acc ODDACC tmp;
I'm not sure about this suggestion though. It throws away all
of the bottom 7 bits of the product, but because we're iterating through
this 4 times and adding (potentially) four of these products together,
those bottom 7 bits in the 4 products might be able to add together
to become significant enough to affect the final result.
-- PMM