On Fri, 4 Nov 2011, Jakub Jelinek wrote:

> On Fri, Nov 04, 2011 at 10:52:44AM +0100, Richard Guenther wrote:
> > > - Gather vectorization patch + incremental patches
> > >   http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02411.html
> > 
> > I'm not sure I like using new builtins for gather representation
> > on the tree level too much, given that we are now moving
> > towards using tree codes for suffle.  Thus, how complicated
> > would it be to have a gather tree code and optab and to
> > handle the mixed size index issue in the expander?
> > 
> > I realize this would be quite some reorg to the patchset ...
> > so, why did you choose builtins over a more generic approach?
> 
> Because while permutations etc. are common to most targets,
> currently gather is very specialized, specific to one target,
> with lots of details how it must look like (e.g. the mask stuff where
> we currently don't even have tree code for conditional loads or conditional
> stores), and additionally unlike VEC_PERM_EXPR etc. which are normal
> expressions this one is a reference (and conditional one too), so
> I'm afraid I'd need to touch huge amounts of code (most places that
> currently handle MEM_REF/TARGET_MEM_REF would need to handle
> VEC_GATHER_MEM_REF too, as it is a memory read (well, set of conditional
> memory reads).  The i?86 backend already has (except 4) all the needed
> builtins anyway and handles expanding them too, the 4 ones are just
> to cope with the weirdo definition of some of them (half sized vectors).
> And when it is represented as builtin, the side-effects are handled by
> all optimization passes automatically, similarly how e.g. atomic builtins
> are right now builtins and not expressions.
> 
> So I thought it is better to use builtins right now, then when we in 4.8+
> hopefully do something about conditional loads/stores and their
> vectorization and if we introduce for that some new GIMPLE representation,
> this could be done on top of that.

Ok.  I guess it's ok to use builtins for now - I didn't think of
the memory reference issue ;)

> > >   http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02846.html
> > 
> > +  if (TREE_CODE (base) == MEM_REF)
> >      {
> > -      off = TREE_OPERAND (base, 1);
> > +      if (!integer_zerop (TREE_OPERAND (base, 1)))
> > +   {
> > +     if (off == NULL_TREE)
> > +       {
> > +         double_int moff = mem_ref_offset (base);
> > +         off = double_int_to_tree (sizetype, moff);
> > +       }
> > +     else
> > +       off = size_binop (PLUS_EXPR, off, TREE_OPERAND (base, 1));
> > 
> > that's not safe, TREE_OPEAND (base, 1) is of pointer type, so
> > you unconditionally need to do the conversion to sizetype of
> > TREE_OPEAND (base, 1).
> 
> Ok, will fix.
> 
> > The routine lacks comments - it's got quite big and fails to
> 
> And add the comments.
> 
> > state any reason for its complexity.  I'm also not sure why
> > DR would include any loop invariant parts of the SCEV - doesn't
> > it instantiate just for the loop in question?
> 
> I'm not sure I understand your question.  With the incremental
> patch I'm not using any DR info appart from DR_REF to determine
> what is loop invariant part of the address and what is not.
> The reason for that is that split_constant_offset turns the GIMPLE
> code into a sometimes big tree, which actually may contain a mixture
> of loop invariants/constants and SSA_NAMEs defined in the loop,
> that all with casts, multiplications/additions/subtractions.
> For gather I need to split it into a single loop invariant
> argument (which can be computed before the loop as loop invariant
> and thus can be arbitrary tree expression that is just gimplified
> there) and another SSA_NAME defined into the loop which can be
> vectorized which is perhaps sign-extended and multiplied by 1/2/4/8.
> 
> With the approach the incremental patch does I just
> walk what split_constant_offset during DR walks and peel off
> loop invariants until I have something that should be used as the
> vectorized index.

It looks like split_constant_offset walks def stmts in an unbound
fashion.  That's surely a bad idea - SCEV should already have
expanded everything non-loop-invariant, thus it should at most
look through DEFs that trivially add to the constant offset,
not through others.

Richard.

Reply via email to