On 12/05/2023 14:28, Richard Biener wrote:
On Fri, 12 May 2023, Andre Vieira (lists) wrote:
I have dealt with, I think..., most of your comments. There's quite a few
changes, I think it's all a bit simpler now. I made some other changes to the
costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve
the same behaviour as we had with the tree codes before. Also added some extra
checks to tree-cfg.cc that made sense to me.
I am still regression testing the gimple-range-op change, as that was a last
minute change, but the rest survived a bootstrap and regression test on
aarch64-unknown-linux-gnu.
cover letter:
This patch replaces the existing tree_code widen_plus and widen_minus
patterns with internal_fn versions.
DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively
except they provide convenience wrappers for defining conversions that require
a hi/lo split. Each definition for <NAME> will require optabs for _hi and _lo
and each of those will also require a signed and unsigned version in the case
of widening. The hi/lo pair is necessary because the widening and narrowing
operations take n narrow elements as inputs and return n/2 wide elements as
outputs. The 'lo' operation operates on the first n/2 elements of input. The
'hi' operation operates on the second n/2 elements of input. Defining an
internal_fn along with hi/lo variations allows a single internal function to
be returned from a vect_recog function that will later be expanded to hi/lo.
For example:
IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> ->
(u/s)addl2
IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode>
-> (u/s)addl
This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree
codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
What I still don't understand is how we are so narrowly focused on
HI/LO? We need a combined scalar IFN for pattern selection (not
sure why that's now called _HILO, I expected no suffix). Then there's
three possibilities the target can implement this:
1) with a widen_[su]add<mode> instruction - I _think_ that's what
RISCV is going to offer since it is a target where vector modes
have "padding" (aka you cannot subreg a V2SI to get V4HI). Instead
RVV can do a V4HI to V4SI widening and widening add/subtract
using vwadd[u] and vwsub[u] (the HI->SI widening is actually
done with a widening add of zero - eh).
IIRC GCN is the same here.
2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree
codes currently support (exclusively)
3) similar, but widen_[su]add{_even,_odd}<mode>
that said, things like decomposes_to_hilo_fn_p look to paint us into
a 2) corner without good reason.
I was kind of just keeping the naming, I had forgotten to mention I was
also going to add _EVENODD but you are right, the pattern selection IFN
does not need to be restrictive.
And then at supportable_widening_operation we could check what the
target offers support for (either 1, 2 or 3). We can then actually just
get rid of decomposes_to_hilo_fn_p and just assume that for all
narrowing or widening IFN's there are optabs (that may or may not be
implemented by a target) for all three variants
Having said that, that means we should have an optab to cover 1, which
should probably just have the original name. Let me write it out...
Say we have a IFN_VEC_WIDEN_PLUS pattern and assume its signed,
supportable_widening_operation would then first check if the target
supported vec_widen_sadd_optab for say V8HI -> V8SI? Risc-V would take
this path I guess?
If the target doesn't then it could check for support for:
vec_widen_sadd_lo_optab V4HI -> V4SI
vec_widen_sadd_hi_optab V4HI -> V4SI
AArch64 Advanced SIMD would implement this.
If the target still didn't support this it would check for (not sure
about the modes here):
vec_widen_sadd_even_optab VNx8HI -> VNx4SI
vec_widen_sadd_odd_optab VNx8HI -> VNx4SI
This is one SVE would implement.
So that would mean that I'd probably end up rewriting
#define DEF_INTERNAL_OPTAB_WIDENING_FN (NAME, FLAGS, SELECTOR, SOPTAB,
UOPTAB, TYPE)
as:
for1)
DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB,
TYPE)
for 2)
DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_LO, FLAGS, SELECTOR, SOPTAB,
UOPTAB, TYPE)
DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_HI, FLAGS, SELECTOR, SOPTAB,
UOPTAB, TYPE)
for 3)
DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_EVEN, FLAGS, SELECTOR, SOPTAB,
UOPTAB, TYPE)
DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_ODD, FLAGS, SELECTOR, SOPTAB,
UOPTAB, TYPE)
And the same for narrowing (but with DEF_INTERNAL_OPTAB_FN instead of
SIGNED_OPTAB).
So each widening and narrowing IFN would have optabs for all its
variants and each target implements the ones it supports.
I'm happy to do this, but implementing support to handle the 1 and 3
variants without having optabs for them right now seems a bit odd and it
would delay this patch, so I suggest I add the framework and the optabs
but leave adding the vectorizer support for later? I can add comments to
where I think that should go.
Richard.
gcc/ChangeLog:
2023-05-12 Andre Vieira <andre.simoesdiasvie...@arm.com>
Joel Hutton <joel.hut...@arm.com>
Tamar Christina <tamar.christ...@arm.com>
* config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>):
Rename
this ...
(vec_widen_<su>add_lo_<mode>): ... to this.
(vec_widen_<su>addl_hi_<mode>): Rename this ...
(vec_widen_<su>add_hi_<mode>): ... to this.
(vec_widen_<su>subl_lo_<mode>): Rename this ...
(vec_widen_<su>sub_lo_<mode>): ... to this.
(vec_widen_<su>subl_hi_<mode>): Rename this ...
(vec_widen_<su>sub_hi_<mode>): ...to this.
* doc/generic.texi: Document new IFN codes.
* internal-fn.cc (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Macro to
define an
internal_fn that expands into multiple internal_fns for widening.
(DEF_INTERNAL_OPTAB_NARROWING_HILO_FN): Likewise but for narrowing.
(ifn_cmp): Function to compare ifn's for sorting/searching.
(lookup_hilo_internal_fn): Add lookup function.
(commutative_binary_fn_p): Add widen_plus fn's.
(widening_fn_p): New function.
(narrowing_fn_p): New function.
(decomposes_to_hilo_fn_p): New function.
(direct_internal_fn_optab): Change visibility.
* internal-fn.def (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Define
widening
plus,minus functions.
(VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS_EXPR tree code.
(VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS_EXPR tree code.
* internal-fn.h (GCC_INTERNAL_FN_H): Add headers.
(direct_internal_fn_optab): Declare new prototype.
(lookup_hilo_internal_fn): Likewise.
(widening_fn_p): Likewise.
(Narrowing_fn_p): Likewise.
(decomposes_to_hilo_fn_p): Likewise.
* optabs.cc (commutative_optab_p): Add widening plus optabs.
* optabs.def (OPTAB_D): Define widen add, sub optabs.
* tree-cfg.cc (verify_gimple_call): Add checks for new widen
add and sub IFNs.
* tree-inline.cc (estimate_num_insns): Return same
cost for widen add and sub IFNs as previous tree_codes.
* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
patterns with a hi/lo split.
(vect_recog_sad_pattern): Refactor to use new IFN codes.
(vect_recog_widen_plus_pattern): Likewise.
(vect_recog_widen_minus_pattern): Likewise.
(vect_recog_average_pattern): Likewise.
* tree-vect-stmts.cc (vectorizable_conversion): Add support for
_HILO IFNs.
(supportable_widening_operation): Likewise.
* tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vect-widen-add.c: Test that new
IFN_VEC_WIDEN_PLUS is being used.
* gcc.target/aarch64/vect-widen-sub.c: Test that new
IFN_VEC_WIDEN_MINUS is being used.