> -----Original Message-----
> From: Richard Sandiford <[email protected]>
> Sent: Monday, November 16, 2020 11:56 AM
> To: Tamar Christina <[email protected]>
> Cc: [email protected]; Richard Earnshaw
> <[email protected]>; nd <[email protected]>; Marcus Shawcroft
> <[email protected]>
> Subject: Re: [PATCH v2 10/16]AArch64: Add NEON RTL patterns for Complex
> Addition, Multiply and FMA.
>
> > +;; A conjucate is a rotation of 180* around the argand plane, or * I.
>
> Hmm, but a complex conjugate is a reflection around the real axis rather than
> a rotation. Also, 180 degrees around the Argand plane is * -1 rather than *
> I.
> So…
Yes indeed, Sorry this is a comment from waaaaay in the beginning that I forgot
to update..
The operation itself expects an actual reflection :)
>
> > +(define_int_attr rot_op [(UNSPEC_FCMLS "")
> > + (UNSPEC_FCMLS180 "_conj")
> > + (UNSPEC_FCMLA "")
> > + (UNSPEC_FCMLA180 "_conj")
> > + (UNSPEC_FCMUL "")
> > + (UNSPEC_FCMUL180 "_conj")
> > + (UNSPEC_CMLS "")
> > + (UNSPEC_CMLA "")
> > + (UNSPEC_CMLA180 "_conj")
> > + (UNSPEC_CMUL "")
> > + (UNSPEC_CMUL180 "_conj")])
> > +
> > +(define_int_attr rotsplit1 [(UNSPEC_FCMLA "0")
> > + (UNSPEC_FCMLA180 "0")
> > + (UNSPEC_FCMUL "0")
> > + (UNSPEC_FCMUL180 "0")
> > + (UNSPEC_FCMLS "270")
> > + (UNSPEC_FCMLS180 "90")])
> > +
> > +(define_int_attr rotsplit2 [(UNSPEC_FCMLA "90")
> > + (UNSPEC_FCMLA180 "270")
> > + (UNSPEC_FCMUL "90")
> > + (UNSPEC_FCMUL180 "270")
> > + (UNSPEC_FCMLS "180")
> > + (UNSPEC_FCMLS180 "180")])
>
> I think it would be worth clarifying what the patterns do. AIUI, the effect
> of
> the MUL180 and MLA180 “rot_op” cases as written in the patch is to multiply
> the conjugate of the first operand by the second operand, whereas the
> documentation in the earlier patches implied that they should multiply the
> conjugate of the second operand by the first.
> But maybe I've got this wrong.
>
> operands to the pattern: a + bi, c + di
>
> conjugate of first operand * second operand [A]:
>
> (a - bi) * (c + di) = (ac + bd) + (ad - bc)i
>
> first operand * conjugate of second operand [B]:
>
> (a + bi) * (c - di) = (ac + bd) + (bc - ad)i
>
> the FCMUL180 sequence chosen by the patch:
>
> 0: a * (c + di) = ac + adi
> 270: b * (d - ci) = bd - bci
>
> which gives A rather than B.
Correct, it expects the conjucate In the second operand and corrects the
argument order to account
for it during vectorization if it's on the first operand.
i.e. conjucate first returns
mov v0.16b, v3.16b
ldr q1, [x0, x3]
ldr q2, [x1, x3]
fcmla v0.4s, v1.4s, v2.4s, #0
fcmla v0.4s, v1.4s, v2.4s, #270
str q0, [x2, x3]
and conjucate second returns
mov v0.16b, v3.16b
ldr q1, [x1, x3]
ldr q2, [x0, x3]
fcmla v0.4s, v1.4s, v2.4s, #0
fcmla v0.4s, v1.4s, v2.4s, #270
str q0, [x2, x3]
I think at some point the documentation in got out of sync with the
implementation because it turned out to be easier to conj first
and flip conj second. I'll update the optab docs.
>
> > +;; The complex mla/mls operations always need to expand to two
> instructions.
> > +;; The first operation does half the computation and the second does
> > +the ;; remainder. Because of this, expand early.
> > +(define_expand "cml<fcmac1><rot_op><mode>4"
> > + [(set (match_operand:VHSDF 0 "register_operand")
> > + (plus:VHSDF (match_operand:VHSDF 1 "register_operand")
> > + (unspec:VHSDF [(match_operand:VHSDF 2
> "register_operand")
> > + (match_operand:VHSDF 3
> "register_operand")]
> > + FCMLA_OP)))]
> > + "TARGET_COMPLEX"
> > +{
> > + emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (operands[0],
> operands[1],
> > + operands[2], operands[3]));
> > + emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0],
> operands[0],
> > + operands[2], operands[3]));
> > + DONE;
> > +})
>
> The interface doesn't guarantee that operands[0] is distinct from the inputs,
> so I think the result of the first instruction should be a temporary register.
Hmm sorry I'm not sure I follow.. operands[0] doesn't have to be distinct from
the inputs,
and in most cases it wouldn't be.
In c = c + (a * b)
operands[0] should be the same as operands[1].
I'm just trying to figure out why It needs to be different :) since it's an
accumulation instruction..
Thanks,
Tamar
>
> Thanks,
> Richard