On Fri, 4 Nov 2011, Jakub Jelinek wrote: > On Fri, Nov 04, 2011 at 10:52:44AM +0100, Richard Guenther wrote: > > > - Gather vectorization patch + incremental patches > > > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02411.html > > > > I'm not sure I like using new builtins for gather representation > > on the tree level too much, given that we are now moving > > towards using tree codes for suffle. Thus, how complicated > > would it be to have a gather tree code and optab and to > > handle the mixed size index issue in the expander? > > > > I realize this would be quite some reorg to the patchset ... > > so, why did you choose builtins over a more generic approach? > > Because while permutations etc. are common to most targets, > currently gather is very specialized, specific to one target, > with lots of details how it must look like (e.g. the mask stuff where > we currently don't even have tree code for conditional loads or conditional > stores), and additionally unlike VEC_PERM_EXPR etc. which are normal > expressions this one is a reference (and conditional one too), so > I'm afraid I'd need to touch huge amounts of code (most places that > currently handle MEM_REF/TARGET_MEM_REF would need to handle > VEC_GATHER_MEM_REF too, as it is a memory read (well, set of conditional > memory reads). The i?86 backend already has (except 4) all the needed > builtins anyway and handles expanding them too, the 4 ones are just > to cope with the weirdo definition of some of them (half sized vectors). > And when it is represented as builtin, the side-effects are handled by > all optimization passes automatically, similarly how e.g. atomic builtins > are right now builtins and not expressions. > > So I thought it is better to use builtins right now, then when we in 4.8+ > hopefully do something about conditional loads/stores and their > vectorization and if we introduce for that some new GIMPLE representation, > this could be done on top of that.
Ok. I guess it's ok to use builtins for now - I didn't think of the memory reference issue ;) > > > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02846.html > > > > + if (TREE_CODE (base) == MEM_REF) > > { > > - off = TREE_OPERAND (base, 1); > > + if (!integer_zerop (TREE_OPERAND (base, 1))) > > + { > > + if (off == NULL_TREE) > > + { > > + double_int moff = mem_ref_offset (base); > > + off = double_int_to_tree (sizetype, moff); > > + } > > + else > > + off = size_binop (PLUS_EXPR, off, TREE_OPERAND (base, 1)); > > > > that's not safe, TREE_OPEAND (base, 1) is of pointer type, so > > you unconditionally need to do the conversion to sizetype of > > TREE_OPEAND (base, 1). > > Ok, will fix. > > > The routine lacks comments - it's got quite big and fails to > > And add the comments. > > > state any reason for its complexity. I'm also not sure why > > DR would include any loop invariant parts of the SCEV - doesn't > > it instantiate just for the loop in question? > > I'm not sure I understand your question. With the incremental > patch I'm not using any DR info appart from DR_REF to determine > what is loop invariant part of the address and what is not. > The reason for that is that split_constant_offset turns the GIMPLE > code into a sometimes big tree, which actually may contain a mixture > of loop invariants/constants and SSA_NAMEs defined in the loop, > that all with casts, multiplications/additions/subtractions. > For gather I need to split it into a single loop invariant > argument (which can be computed before the loop as loop invariant > and thus can be arbitrary tree expression that is just gimplified > there) and another SSA_NAME defined into the loop which can be > vectorized which is perhaps sign-extended and multiplied by 1/2/4/8. > > With the approach the incremental patch does I just > walk what split_constant_offset during DR walks and peel off > loop invariants until I have something that should be used as the > vectorized index. It looks like split_constant_offset walks def stmts in an unbound fashion. That's surely a bad idea - SCEV should already have expanded everything non-loop-invariant, thus it should at most look through DEFs that trivially add to the constant offset, not through others. Richard.