https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
With open-code

successfully optimize

__m128d f1(__m128d x, __m128d y, __m128d z){
    __m128d tem = _mm_mul_pd (x,y);
    __m128d tem2 = tem + z;
    __m128d tem3 = tem - z;
    return __builtin_shuffle (tem2, tem3, (__m128i) {0, 3});
}

to

f1:
.LFB5481:
        .cfi_startproc
        vfmsubadd132pd  %xmm1, %xmm2, %xmm0
        ret
        .cfi_endproc


But failed to optimize

__m256d f2(__m256d x, __m256d y, __m256d z){
    __m256d tem = _mm256_mul_pd (x,y);
    __m256d tem2 = tem + z;
    __m256d tem3 = tem - z;
    return __builtin_shuffle (tem2, tem3, (__m256i) {0, 5, 2, 7});
}

since simplify_rtx didn't realize

Failed to match this instruction:
(set (reg:V4SF 88)
    (vec_merge:V4SF (fma:V4SF (reg/v:V4SF 85 [ x ])
            (reg/v:V4SF 86 [ y ])
            (neg:V4SF (reg/v:V4SF 87 [ z ])))
        (fma:V4SF (reg/v:V4SF 85 [ x ])
            (reg/v:V4SF 86 [ y ])
            (reg/v:V4SF 87 [ z ]))
        (const_int 10 [0xa])))

is equal to

(set (reg:V4SF 88)
    (vec_merge:V4SF 
        (fma:V4SF (reg/v:V4SF 85 [ x ])
            (reg/v:V4SF 86 [ y ])
            (reg/v:V4SF 87 [ z ]))
        (fma:V4SF (reg/v:V4SF 85 [ x ])
            (reg/v:V4SF 86 [ y ])
            (neg:V4SF (reg/v:V4SF 87 [ z ])))
        (const_int 5 [0x5])))

later is how our pattern is defined.

So it there any canonical rtx for vec_merge? 
(vec_merge (A B const_int 10) should abviously equal to (vec_merge B A
const_int 5)

Reply via email to