https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123260

--- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Yichao Yu from comment #0)
> 
> but refuses to do so for the scalar version, even though they are doing
> exactly the same operations AFAICT,
> 
> 
> ```
>         ldp     d30, d28, [x2]
>         ldp     d31, d29, [x1]
>         ldp     d27, d26, [x0]
>         fmadd   d27, d31, d30, d27
>         fmadd   d26, d31, d28, d26
>         fmsub   d27, d29, d28, d27
>         fmadd   d26, d30, d29, d26
>         stp     d27, d26, [x0]
> ```
> 
> Maybe related https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121925

This is because of a missing handling to deal with + commutativity in the FMA.

The nodes get changed to

>>> p debug (vals[1])
fcmla_scal.c:6:9: note: node 0x66ad4e0 (max_nunits=2, refcnt=2) vector(2)
double
fcmla_scal.c:6:9: note: op template: a$real_10 = a.real;
fcmla_scal.c:6:9: note:         stmt 0 a$real_10 = a.real;
fcmla_scal.c:6:9: note:         stmt 1 a$imag_11 = a.imag;
$5 = void
>>> p debug (l0node[0])
fcmla_scal.c:6:9: note: node 0x66ad220 (max_nunits=2, refcnt=3) vector(2)
double
fcmla_scal.c:6:9: note: op template: _2 = _1 + a$real_10;
fcmla_scal.c:6:9: note:         stmt 0 _2 = _1 + a$real_10;
fcmla_scal.c:6:9: note:         stmt 1 _6 = _5 + a$imag_11;
fcmla_scal.c:6:9: note:         children 0x66ad2d0 0x66ad4e0
$6 = void
>>> p debug (vals[0])
fcmla_scal.c:6:9: note: node 0x66ad2d0 (max_nunits=2, refcnt=2) vector(2)
double
fcmla_scal.c:6:9: note: op template: _1 = b$real_12 * c$real_14;
fcmla_scal.c:6:9: note:         stmt 0 _1 = b$real_12 * c$real_14;
fcmla_scal.c:6:9: note:         stmt 1 _5 = b$real_12 * c$imag_15;
fcmla_scal.c:6:9: note:         children 0x66ad380 0x66ad430
$7 = void

by match.pd, which gets the multiplication on the first operand of the +. and
we only check the first one.

Fixing that gives the right sequence.

Reply via email to