https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96166
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |10.2
Blocks| |53947
Keywords| |missed-optimization
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Last reconfirmed| |2020-07-13
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.
0x3c02500 _10 1 times vector_store costs 12 in body
0x3c02500 <unknown> 1 times vec_construct costs 8 in prologue
0x3a3e900 _10 1 times scalar_store costs 12 in body
0x3a3e900 _9 1 times scalar_store costs 12 in body
t.i:14:1: note: Cost model analysis:
Vector inside of basic block cost: 12
Vector prologue cost: 8
Vector epilogue cost: 0
Scalar cost of basic block: 24
t.i:14:1: note: Basic block will be vectorized using SLP
and we end up with
<bb 2> [local count: 1073741824]:
_3 = MEM <long unsigned int> [(char * {ref-all})x_2(D)];
_9 = (int) _3;
_10 = BIT_FIELD_REF <_3, 32, 32>;
_11 = {_10, _9};
_7 = VIEW_CONVERT_EXPR<long unsigned int>(_11);
MEM <long unsigned int> [(char * {ref-all})x_2(D)] = _7;
the IL we feed into the vectorizer and the earlier bswap pass is
_3 = MEM <long unsigned int> [(char * {ref-all})x_2(D)];
_9 = (int) _3;
_10 = BIT_FIELD_REF <_3, 32, 32>;
y = _10;
MEM[(int &)&y + 4] = _9;
_4 = MEM <long unsigned int> [(char * {ref-all})&y];
MEM <long unsigned int> [(char * {ref-all})x_2(D)] = _4;
I guess fixing the vectorizer to handle the "grouped load" would
eventually allow fixing this. I don't think there's anything to
do from the costing side...
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations