Richard Guenther wrote:
> On Thu, 23 Feb 2012, Ulrich Weigand wrote:
> > The assert in question looks like:
> > 
> >   if (nested_in_vect_loop
> >       && (TREE_INT_CST_LOW (STMT_VINFO_DR_STEP (stmt_info))
> >           % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0))
> >     { 
> >       gcc_assert (alignment_support_scheme != 
> > dr_explicit_realign_optimized);
> >       compute_in_loop = true;
> >     }
> > 
> > where your patch changed DR_STEP to STMT_VINFO_DR_STEP (reverting just this
> > one change makes the ICEs go away).
> > 
> > However, at the place where the decision to use the 
> > dr_explicit_realign_optimized 
> > strategy is made (tree-vect-data-refs.c:vect_supportable_dr_alignment), we 
> > still
> > have:
> > 
> >           if ((nested_in_vect_loop
> >                && (TREE_INT_CST_LOW (DR_STEP (dr))
> >                    != GET_MODE_SIZE (TYPE_MODE (vectype))))
> >               || !loop_vinfo)
> >             return dr_explicit_realign;
> >           else
> >             return dr_explicit_realign_optimized;
> > 
> > Should this now also use STMT_VINFO_DR_STEP?
> 
> Yes, I think so.

Hmmm.  Reading the comment in vect_supportable_dr_alignment:

     However, in the case of outer-loop vectorization, when vectorizing a
     memory access in the inner-loop nested within the LOOP that is now being
     vectorized, while it is guaranteed that the misalignment of the
     vectorized memory access will remain the same in different outer-loop
     iterations, it is *not* guaranteed that is will remain the same throughout
     the execution of the inner-loop.  This is because the inner-loop advances
     with the original scalar step (and not in steps of VS).  If the inner-loop
     step happens to be a multiple of VS, then the misalignment remains fixed
     and we can use the optimized realignment scheme. 

it would appear that in this case, checking the inner-loop step is deliberate.

Given the comment in vectorizable_load:

  /* If the misalignment remains the same throughout the execution of the
     loop, we can create the init_addr and permutation mask at the loop
     preheader.  Otherwise, it needs to be created inside the loop.
     This can only occur when vectorizing memory accesses in the inner-loop
     nested within an outer-loop that is being vectorized.  */

this looks to me that, since the check is intended to verify that
"misalignment remains the same throughout the execuction of the loop",
we actually want to check the inner-loop step here as well, i.e. revert
this chunk of your patch ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

Reply via email to