On Wed, Nov 14, 2018 at 12:09 AM Joern Wolfgang Rennecke <joern.renne...@riscy-ip.com> wrote: > > > On 12/11/18 14:30, Richard Biener wrote: > > I guess I already asked this question when WIDEN_MULT_PLUS_EXPR was > > introduced - but isn't that fully contained within a DOT_PROD_EXPR? > > I'm not sure what exactly you mean here. > A mailing list search to find that post was unsuccessful. The earliest > it found for > widen_sum_expr and richard and (biener or guenther) was from October 2012, > and WIDEN_SUM_EXPR already existed then as was visibile in the context > of the > patch under discussion. > > From the perspective of expressing scalar operations, or expressing > reduction operations on > (originally) scalars in the vectorizer, WIDEN_SUM_EXPR is redundant, as > we could use DOT_PROD_EXPR with one constant input. > For describing vectorizing narrow vector operations, WIDEN_SUM_EXPR > would be more suitable, as > we wouldn't need to bring in matrix operations, but the current > non-widening semantics suffer > from being insufficiently precisely defined in what they are actually > summing up. > > From the perspective of providing optabs, there are two aspects to > consider: First, if a dot_prod > optab were to be to be used for expanding WIDEN_SUM_EXPR (by that or any > other name), a > target that implements WIDEN_SUM but not DOT_PROD in general would have > to have a dot_prod > expander that requires a vector of constant ones for one of its > operands. And then vectorizer, when > it wants to consider DOT_PROD_EXPR for two vector variables, would have > to test the operand > predicates of the expander. That sounds like a lot more trouble than > the little reduction in the > set of optabs is worth. > > > > > Some comments on the patch. > > > > + tree vecotype > > + = build_vector_type (otype, GET_MODE_NUNITS (TYPE_MODE (vecitype))); > > > > TYPE_VECTOR_SUBPARTS (vecitype) > > > > You want to pass in the half/full types and use get_vectype_for_scalar_type > > which also makes sure the target supports the vector type. > > WIDEN_MULT_PLUS is special on our target in that it creates double-sized > vectors.
Are there really double-size vectors or does the target simply produce the output in two vectors? Usually targets have WIDEN_MULT_PLUS_LOW/HIGH or _EVEN/ODD split operations. Or, like - what I now remember - for the DOT_PROD_EXPR optab, the target already reduces element pairs of the result vector (unspecified which ones) so the result vector is of the same size as the inputs. That is, if your target produces two vectors you may instead want to hide that fact by claiming you support DOT_PROD_EXPR and expanding it to the widen-mult-plus plus reducing (adding) the two result vectors to get a single one. The vectorizer cannot really deal with multiple sizes, thus for example a V4SI * V4SI + V4DI operation and that all those tree codes are exposed as "scalar" is sth that continues to confuse me but is mainly done because at pattern recognition time there's only the scalars. > To get a full-sized vector in the reduction loop, you have to have > a double-sized vector result. We already make the reduction special in that > the result can vary in size, it's scalar for an ordinary DOT_PROD_EXPR, > while > a vector for other operations. > > > > I think you want to extend and re-use supportable_widening_operation > > here anyways. > > I see. Other targets generally use even/odd or lo/hi vectorized > widening mult operations. > ia64/vect.md, s390/vector.md, spu/spu.md and tilegx/tilegx.md have > mult_lo / mult_even patterns, but no dot_prod pattern, so if I go via an > enhanced supportable_widening_operation, my patch should become relevant > to these platforms. > > To make it cover our target as well, I'll have to make it able, if the > optabs indicate that, to specify a single WIDEN_MULT_PLUS_EXPR with a > double-sized vector result instead of a split into two halves. There doesn't seem to be any target providing vector variants for any of the [us][us]madd/sub optabs so I believe those were added for scalar operations only. > grep [us\"][us]madd config/*/*.md config/arc/arc.md:(define_expand "umaddsidi4" config/arc/arc.md:(define_insn_and_split "umaddsidi4_split" config/arm/arm.md:(define_expand "umaddsidi4" config/avr/avr.md:;; "*usmaddqihi4" config/avr/avr.md:;; "*sumaddqihi4" config/avr/avr.md:;; "umaddqihi4.uconst" config/avr/avr.md:(define_insn_and_split "*sumaddqihi4.uconst" config/avr/avr.md: ; *sumaddqihi4 config/csky/csky_insn_dsp.md:(define_insn "umaddsidi4" config/mips/mips-fixed.md:(define_insn "ssmaddsqdq4" config/nds32/nds32-dspext.md:(define_insn "smaddhidi" config/nds32/nds32-dspext.md:(define_insn "smaddhidi2" config/tilegx/tilegx.md:(define_insn "umaddsidi4" config/tilepro/tilepro.md:(define_insn "umaddhisi4" > grep [us\"][us]msub config/*/*.md config/avr/avr.md:;; "*usmsubqihi4" config/avr/avr.md:;; "*sumsubqihi4" config/avr/avr.md:(define_insn_and_split "*sumsubqihi4.uconst" config/avr/avr.md: ; *sumsubqihi4 config/csky/csky_insn_dsp.md:(define_insn "umsubsidi4" config/mips/mips-fixed.md:(define_insn "ssmsubsqdq4" For vectorization I would advise to provide expansion patterns for codes that are already supported, in your case DOT_PROD_EXPR. Richard. > > > > > Index: gcc/tree-vect-stmts.c > > =================================================================== > > --- gcc/tree-vect-stmts.c (revision 266008) > > +++ gcc/tree-vect-stmts.c (working copy) > > @@ -10638,7 +10638,11 @@ vect_get_vector_types_for_stmt (stmt_vec > > scalar_type); > > > > if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)), > > - GET_MODE_SIZE (TYPE_MODE (nunits_vectype)))) > > + GET_MODE_SIZE (TYPE_MODE (nunits_vectype))) > > + /* Reductions that use a widening reduction would show > > + a mismatch but that's already been checked to be OK. */ > > + && STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def) > > + > > return opt_result::failure_at (stmt, > > "not vectorized: different sized vector > > " > > "types in statement, %T and %T\n", > > > > that change doesn't look good. > > Would it become acceptable if I made it specific to WIDEN_MULT_PLUS_EXPR ? >