https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121536

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2025-08-13
                 CC|                            |tnfchris at gcc dot gnu.org
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |tnfchris at gcc dot 
gnu.org

--- Comment #1 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
The main loop originally picks VNx16QI

and the costing of them looks the same:

: note:  Original vector body cost = 6
: note:  Vector loop iterates at most 0 times
: note:  Scalar issue estimate:
: note:    load operations = 0
: note:    store operations = 0
: note:    general operations = 2
: note:    reduction latency = 2
: note:    estimated min cycles per iteration = 2.000000
: note:    estimated cycles per vector iteration (for VF 2) = 4.000000
: note:  SVE issue estimate:
: note:    load operations = 0
: note:    store operations = 0
: note:    general operations = 2
: note:    predicate operations = 2
: note:    reduction latency = 4
: note:    estimated cycles per iteration to rename = 1.000000
: note:    estimated min cycles per iteration without predication = 4.000000
: note:    estimated min cycles per iteration for predication = 1.000000
: note:    estimated min cycles per iteration = 4.000000
: note:  Cost model analysis: 
  Vector inside of loop cost: 6
  Vector prologue cost: 2
  Vector epilogue cost: 6
  Scalar iteration cost: 2
  Scalar outside cost: 2
  Vector outside cost: 8
  prologue iterations: 0
  epilogue iterations: 0

vs

: note:  Original vector body cost = 6
: note:  Vector loop iterates at most 0 times
: note:  Scalar issue estimate:
: note:    load operations = 0
: note:    store operations = 0
: note:    general operations = 2
: note:    reduction latency = 2
: note:    estimated min cycles per iteration = 2.000000
: note:    estimated cycles per vector iteration (for VF 2) = 4.000000
: note:  SVE issue estimate:
: note:    load operations = 0
: note:    store operations = 0
: note:    general operations = 2
: note:    predicate operations = 2
: note:    reduction latency = 4
: note:    estimated cycles per iteration to rename = 1.000000
: note:    estimated min cycles per iteration without predication = 4.000000
: note:    estimated min cycles per iteration for predication = 1.000000
: note:    estimated min cycles per iteration = 4.000000
: note:  Cost model analysis: 
  Vector inside of loop cost: 6
  Vector prologue cost: 2
  Vector epilogue cost: 6
  Scalar iteration cost: 4
  Scalar outside cost: 2
  Vector outside cost: 8
  prologue iterations: 0
  epilogue iterations: 0
  Minimum number of vector iterations: 4
  Calculated minimum iters for profitability: 8

but the conclusion is different.

The old code still thinks the vector loop iterates 4 times whereas the new one
thinks it doesn't...

But this seems like a backend issue so I'll take a look.

Mine.

Reply via email to