https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96166

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |10.2
             Blocks|                            |53947
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-07-13

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.

0x3c02500 _10 1 times vector_store costs 12 in body
0x3c02500 <unknown> 1 times vec_construct costs 8 in prologue
0x3a3e900 _10 1 times scalar_store costs 12 in body
0x3a3e900 _9 1 times scalar_store costs 12 in body
t.i:14:1: note:  Cost model analysis:
  Vector inside of basic block cost: 12
  Vector prologue cost: 8
  Vector epilogue cost: 0
  Scalar cost of basic block: 24
t.i:14:1: note:  Basic block will be vectorized using SLP

and we end up with

  <bb 2> [local count: 1073741824]:
  _3 = MEM <long unsigned int> [(char * {ref-all})x_2(D)];
  _9 = (int) _3;
  _10 = BIT_FIELD_REF <_3, 32, 32>;
  _11 = {_10, _9};
  _7 = VIEW_CONVERT_EXPR<long unsigned int>(_11);
  MEM <long unsigned int> [(char * {ref-all})x_2(D)] = _7;

the IL we feed into the vectorizer and the earlier bswap pass is

  _3 = MEM <long unsigned int> [(char * {ref-all})x_2(D)];
  _9 = (int) _3;
  _10 = BIT_FIELD_REF <_3, 32, 32>;
  y = _10;
  MEM[(int &)&y + 4] = _9;
  _4 = MEM <long unsigned int> [(char * {ref-all})&y];
  MEM <long unsigned int> [(char * {ref-all})x_2(D)] = _4;

I guess fixing the vectorizer to handle the "grouped load" would
eventually allow fixing this.  I don't think there's anything to
do from the costing side...


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

Reply via email to