https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #9)
> So we now have a "real" FRE after the vectorizer but we fail to CSE
> 
>   MEM <vector(4) double> [(double *)&r] = vect__3.20_74;
> ...
>   MEM <vector(2) double> [(double *)&r + 32B] = vect__62.26_88;
> ...
>   vect__5.7_34 = MEM <vector(4) double> [(double *)&r + 16B];
> 
> mine for GCC 11 to look at.  The code to CSE that load for _74 and _88
> is going to be a bit awkward though but it will nicely combine with the
> following stmts
> 
>   vect__5.8_35 = VEC_PERM_EXPR <vect__5.7_34, vect__5.7_34, { 3, 2, 1, 0 }>;
>   stmp_t_12.9_36 = BIT_FIELD_REF <vect__5.8_35, 64, 0>;
>   stmp_t_12.9_37 = stmp_t_12.9_36 + 0.0;
>   stmp_t_12.9_38 = BIT_FIELD_REF <vect__5.8_35, 64, 64>;
>   stmp_t_12.9_39 = stmp_t_12.9_37 + stmp_t_12.9_38;
>   stmp_t_12.9_40 = BIT_FIELD_REF <vect__5.8_35, 64, 128>;
>   stmp_t_12.9_41 = stmp_t_12.9_39 + stmp_t_12.9_40;
>   stmp_t_12.9_42 = BIT_FIELD_REF <vect__5.8_35, 64, 192>;
>   t_12 = stmp_t_12.9_41 + stmp_t_12.9_42;
> 
> and hopefully elide 'r' completely.

So the difficult thing is that we need to compose the upper v2df half of
vect__3.20_74 and the v2df vect__62.26_88.  Assembly for that would be sth
like

        vextractf128    $0x1, %ymm0, %xmm0
        vinsertf128     $0x1, %xmm1, %ymm0, %ymm0

and on GIMPLE

    tem_42 = BIT_FIELD_REF <vect__3.20_74, 128, 128>;
    vect__5.7_34 = { tem_42, vect__62.26_88 };

that's two stmts which at the moment VN simplification insertion doesn't
support.  It would be "nicer" to enhance for example VEC_PERM to allow

    vect__5.7_34 = VEC_PERM <vect__3.20_74, vect__62.26_88, { 2, 3, 4, 5 }>

"implicitely" extending _88 to v4df (aka a paradoxical v4df subreg of
the v2df SSE reg).  It would turn VEC_PERM into a concat + select operation
with not requiring the intermediate to have vector mode (in this case
it would have v6df without introducing subregs, a mode not possible).
On RTL unfortunately (vec_select:V4DF (vec_concat (reg:V4DF ..) (reg:V2DF ..))
..) is not possible because of that restriction.  OTOH RTL lacks that
concat-and-select operation, allowing the cited form and vec_merge to be
"merged" (vec_merge doesn't require such intermediate mode either).

I'll see how difficult it is to teach VN multi-stmt insertions.

Reply via email to