https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579
--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #9) > So we now have a "real" FRE after the vectorizer but we fail to CSE > > MEM <vector(4) double> [(double *)&r] = vect__3.20_74; > ... > MEM <vector(2) double> [(double *)&r + 32B] = vect__62.26_88; > ... > vect__5.7_34 = MEM <vector(4) double> [(double *)&r + 16B]; > > mine for GCC 11 to look at. The code to CSE that load for _74 and _88 > is going to be a bit awkward though but it will nicely combine with the > following stmts > > vect__5.8_35 = VEC_PERM_EXPR <vect__5.7_34, vect__5.7_34, { 3, 2, 1, 0 }>; > stmp_t_12.9_36 = BIT_FIELD_REF <vect__5.8_35, 64, 0>; > stmp_t_12.9_37 = stmp_t_12.9_36 + 0.0; > stmp_t_12.9_38 = BIT_FIELD_REF <vect__5.8_35, 64, 64>; > stmp_t_12.9_39 = stmp_t_12.9_37 + stmp_t_12.9_38; > stmp_t_12.9_40 = BIT_FIELD_REF <vect__5.8_35, 64, 128>; > stmp_t_12.9_41 = stmp_t_12.9_39 + stmp_t_12.9_40; > stmp_t_12.9_42 = BIT_FIELD_REF <vect__5.8_35, 64, 192>; > t_12 = stmp_t_12.9_41 + stmp_t_12.9_42; > > and hopefully elide 'r' completely. So the difficult thing is that we need to compose the upper v2df half of vect__3.20_74 and the v2df vect__62.26_88. Assembly for that would be sth like vextractf128 $0x1, %ymm0, %xmm0 vinsertf128 $0x1, %xmm1, %ymm0, %ymm0 and on GIMPLE tem_42 = BIT_FIELD_REF <vect__3.20_74, 128, 128>; vect__5.7_34 = { tem_42, vect__62.26_88 }; that's two stmts which at the moment VN simplification insertion doesn't support. It would be "nicer" to enhance for example VEC_PERM to allow vect__5.7_34 = VEC_PERM <vect__3.20_74, vect__62.26_88, { 2, 3, 4, 5 }> "implicitely" extending _88 to v4df (aka a paradoxical v4df subreg of the v2df SSE reg). It would turn VEC_PERM into a concat + select operation with not requiring the intermediate to have vector mode (in this case it would have v6df without introducing subregs, a mode not possible). On RTL unfortunately (vec_select:V4DF (vec_concat (reg:V4DF ..) (reg:V2DF ..)) ..) is not possible because of that restriction. OTOH RTL lacks that concat-and-select operation, allowing the cited form and vec_merge to be "merged" (vec_merge doesn't require such intermediate mode either). I'll see how difficult it is to teach VN multi-stmt insertions.