https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101

--- Comment #2 from Gael Guennebaud <gael.guennebaud at gmail dot com> ---
Indeed, it fails to remove the dup only if the coefficient is used multiple
times as in the following reduced exemple: (https://godbolt.org/z/hmSaE0)


#include <arm_neon.h>

void foo(const float* a, const float * b, float * c, int n) {
    float32x4_t c0, c1, c2, c3;
    c0 = vld1q_f32(c+0*4);
    c1 = vld1q_f32(c+1*4);
    for(int k=0; k<n; k++)
    {
        float32x4_t a0 = vld1q_f32(a+0*4+k*4);
        float32x4_t b0 = vld1q_f32(b+k*4);
        c0 = vfmaq_laneq_f32(c0, a0, b0, 0);
        c1 = vfmaq_laneq_f32(c1, a0, b0, 0);
    }
    vst1q_f32(c+0*4, c0);
    vst1q_f32(c+1*4, c1);
}


I tested with gcc 7 and 8.

Reply via email to