https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772
rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2021-01-21
CC| |rsandifo at gcc dot gnu.org
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
--- Comment #4 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org>
---
To try to summarise a conversation we had on IRC:
As things stand, codes like WIDEN_MULT_EXPR are intended
to be code-generated as a hi/lo pair, with both the hi
and lo operation being vector(N*2) → vector(N) operations.
This works for BB SLP if the SLP group size is ≥ N*2,
but (as things stand) is bound to fail otherwise.
On targets that operate on only a single vector size,
a hard failure is not a problem for group sizes < N*2,
since we would have failed in the same place even if
we hadn't matched a WIDEN_MULT_EXPR. But it hurts on
aarch64 because we could vectorise the multiplication
and conversions using mixed vector sizes.
I think the conclusion was that:
(1) We should define vector(N) → vector(N) optabs for
each current widening operation. E.g. in the testcase
aarch64 would provide v8qi → v8hi widening operations.
(2) We should add directly-mapped internal functions for the new optabs.
(3) We should make the modifier==NONE paths in vectorizable_conversion
use the new internal functions for WIDEN_*_EXPRs.