https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96166
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|--- |10.2 Blocks| |53947 Keywords| |missed-optimization Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed| |2020-07-13 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Confirmed. 0x3c02500 _10 1 times vector_store costs 12 in body 0x3c02500 <unknown> 1 times vec_construct costs 8 in prologue 0x3a3e900 _10 1 times scalar_store costs 12 in body 0x3a3e900 _9 1 times scalar_store costs 12 in body t.i:14:1: note: Cost model analysis: Vector inside of basic block cost: 12 Vector prologue cost: 8 Vector epilogue cost: 0 Scalar cost of basic block: 24 t.i:14:1: note: Basic block will be vectorized using SLP and we end up with <bb 2> [local count: 1073741824]: _3 = MEM <long unsigned int> [(char * {ref-all})x_2(D)]; _9 = (int) _3; _10 = BIT_FIELD_REF <_3, 32, 32>; _11 = {_10, _9}; _7 = VIEW_CONVERT_EXPR<long unsigned int>(_11); MEM <long unsigned int> [(char * {ref-all})x_2(D)] = _7; the IL we feed into the vectorizer and the earlier bswap pass is _3 = MEM <long unsigned int> [(char * {ref-all})x_2(D)]; _9 = (int) _3; _10 = BIT_FIELD_REF <_3, 32, 32>; y = _10; MEM[(int &)&y + 4] = _9; _4 = MEM <long unsigned int> [(char * {ref-all})&y]; MEM <long unsigned int> [(char * {ref-all})x_2(D)] = _4; I guess fixing the vectorizer to handle the "grouped load" would eventually allow fixing this. I don't think there's anything to do from the costing side... Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations