10 Regression] Runtime regression for SPEC2000 177.mesa on Haswell around the end of August 2018

rguenth at gcc dot gnu.org Wed, 11 Sep 2019 03:34:17 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91735


--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Reducing the VF here should be the goal.  For the particular case "filling"
the holes with neutral data and blending in the original values at store time
will likely be optimal.  So do

  tem = vector load
  zero all [4] elements
  compute
  blend in 'tem' into the [4] elements
  vector store

eliding all the shuffling/striding.  Should end up at a VF of 4 (SSE) or 8
(AVX).

Doesn't fit very well into the current vectorizer architecture.

So currently we can only address this from the costing side.

arm can probably leverage load/store-lanes here.

With char elements and an SLP size of 3 it's probably the worst case we can
think of.

[Bug target/91735] [9/10 Regression] Runtime regression for SPEC2000 177.mesa on Haswell around the end of August 2018

Reply via email to