https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55266
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Blocks| |53947 Depends on| |65832 --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- The latter is because of 'convert' leaving us with _1 = BIT_FIELD_REF <x_32(D), 32, 0>; _2 = (double) _1; _3 = BIT_FIELD_REF <x_32(D), 32, 32>; _4 = (double) _3; _5 = BIT_FIELD_REF <x_32(D), 32, 64>; _6 = (double) _5; _7 = BIT_FIELD_REF <x_32(D), 32, 96>; _8 = (double) _7; _9 = {_2, _4, _6, _8}; rather than vect__1.83_46 = x; vect__2.84_47 = [vec_unpack_lo_expr] vect__1.83_46; vect__2.84_48 = [vec_unpack_hi_expr] vect__1.83_46; MEM[(vector(4) double *)&dx] = vect__2.84_47; MEM[(vector(4) double *)&dx + 16B] = vect__2.84_48; (which is in itself not optimal because not being in SSA form). This means generic vector support lacks widening/shortening and thus you have to jump through hoops with things like 'convert'. And SLP vectorization doesn't "vectorize" with vector CONSTRUCTORs as root (a possible enhancement I think). For the original testcase it's a duplicate of PR65832 as we get <bb 2>: _1 = *x_5(D); _7 = BIT_FIELD_REF <_1, 128, 0>; _9 = _7 + _7; _10 = BIT_FIELD_REF <_1, 128, 128>; _12 = _10 + _10; _14 = _7 + _9; _16 = _10 + _12; _3 = {_14, _16}; *x_5(D) = _3; w/o fixing PR65832 this can be improved by "combining" the loads with the extracts and the CONSTRUCTOR with the store. I have done sth similar for COMPLEX_EXPR in tree-ssa-forwprop.c ... (not that I am very proud of that - heh). Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65832 [Bug 65832] Inefficient vector construction