On Wed, 3 Jul 2019, Kewen.Lin wrote:

> Hi Richard,
> 
> Thanks very much for reviewing my patch.  I'll update it as your comments.
> Before sending the next version, I've several questions embedded for further 
> check.
> 
> on 2019/7/2 下午8:43, Richard Biener wrote:
> > On Wed, 20 Mar 2019, Kewen.Lin wrote:
> > 
> >> +/* { dg-require-effective-target vect_double } */
> >> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } 
> >> } } */
> >> +/* { dg-options "-O2 -ffast-math" } */
> >> +/* { dg-options "-O2 -ffast-math -mvsx -fdump-tree-reassoc1" { target { 
> >> powerpc*-*-* } } } */
> > 
> > Use
> > 
> > /* { dg-additional-options "-mvsx" { target { powerpc...
> > 
> > that saves duplicate typing.  I guess that x86_64/i?86 also works
> > if you enable SSE2 - can you check that and do adjustments accordingly?
> > 
> 
> OK, I'll learn SSE2 and update it. 

I think most testcases should just pass with SSE2.

> >> +/* Hold the information of one specific VECTOR_TYPE SSA_NAME.
> >> +    - offsets: for different BIT_FIELD_REF offsets accessing same VECTOR.
> >> +    - ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF. 
> >>  */
> >> +struct v_info
> >> +{
> >> +  auto_vec<unsigned HOST_WIDE_INT, 32> offsets;
> > 
> > with SVE this probably needs to be poly_int64 or so
> > 
> 
> OK, will extend it to poly_int64 (can it be negative? or poly_uint64 better?)
> 
> >> +  auto_vec<unsigned, 32> ops_indexes;
> >> +};
> > 
> > To have less allocations you could use a
> > 
> >      auto_vec<std::pair<unsigned HOST_WIDE_INT, unsigned>, 32> elements;
> > 
> > or even
> > 
> >  hash_map<tree, vec<std::pair<unsigned HOST_WIDE_INT, unsigned> > >
> > 
> > where you use .release() in the cleanup and .create (TYPE_VECTOR_SUBPARTS 
> > (vector_type)) during collecting and then can use quick_push ()
> > (ah, no - duplicates...).
> > 
> 
> Good idea!
> 
> >> +
> >> +typedef struct v_info *v_info_ptr;
> >> +
> >> +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets.  */
> >> +static int
> >> +unsigned_cmp (const void *p_i, const void *p_j)
> >> +{
> >> +  if (*(const unsigned HOST_WIDE_INT *) p_i
> >> +      >= *(const unsigned HOST_WIDE_INT *) p_j)
> >> +    return 1;
> >> +  else
> >> +    return -1;
> > 
> > That's an issue with some qsort implementations comparing
> > an element against itself.
> > 
> > I think so you should add the equality case.
> > 
> 
> The equality case seems already involved in ">=".
> Do you mean that I need to make it equality case explicitly? 
> Or return zero for "=="? like:
> 
>    
>    const unsigned HOST_WIDE_INT val_i = *(const unsigned HOST_WIDE_INT *) p_i;
>    const unsigned HOST_WIDE_INT val_j = *(const unsigned HOST_WIDE_INT *) p_j;
>    if (val_i != val_j)
>      return val_i > val_j? 1: -1;
>    else
>      return 0;

Yes.  It needs to return zero, otherwise some qsort will endlessly
swap two same elements.

> >> +
> >> +   TODO:
> >> +    1) The current implementation restrict all candidate VECTORs should 
> >> have
> >> +       the same VECTOR type, but it can be extended into different groups 
> >> by
> >> +       VECTOR types in future if any profitable cases found.
> >> +    2) The current implementation requires the whole VECTORs should be 
> >> fully
> >> +       covered, but it can be extended to support partial, checking 
> >> adjacent
> >> +       but not fill the whole, it may need some cost model to define the
> >> +       boundary to do or not.
> >> +
> 
> >> +      tree elem_type = TREE_TYPE (vec_type);
> >> +      unsigned HOST_WIDE_INT size = TREE_INT_CST_LOW (TYPE_SIZE 
> >> (elem_type));
> >> +      if (size != TREE_INT_CST_LOW (op1))
> > 
> >   if (!tree_int_cst_equal (TYPE_SIZE (elem_type), op1))
> > 
> > avoids some of the TREE_INT_CST_LOW we like to avoid.
> > 
> > You miss a check for op2 % op1 being zero.  Given you store a 
> > HOST_WIDE_INT offset you also want to check for INTEGER_CST op1/op2
> > (beware of SVE...).
> 
> OK, thanks!  For op2 % op1 == zero, I thought it's a must for granted, I'll 
> fix it.
> I think it can be constructed in IR artificially, but since I have almost 
> none knowledge 
> on other targets vector support, I'm curious that does it exist in real world 
> codes?

BIT_FIELD_REF is quite unconstrained, you could build a testcase
for the GIMPLE frontend quite easily.  Note that the first reassoc
runs before vector lowering so vector code written via vector
extensions does not necessarily have target support but will be lowered
later.

> btw, BIT_FIELD_REF in tree.def says the op1/op2 is constant, it looks need to 
> be updated
> due to SVE?

A POLY_CONST_INT is also "constant" in some sense ;)
 
> > 
> > There's also a poly-int friendly multiple_p predicate so you could
> > have the initial checks SVE friendly but bail out on non-INTEGER_CST
> > offset later.
> > 
> 
> Got it, thanks!
> 
> > 
> > Since you are using a hashtable keyed off SSA name pointers code
> > generation will depend on host addresses here if you consider
> > there's one mismatching vector type that might become ref_vec
> > dependent on the order of elements in the hashtable.  That's
> > a no-no.
> > 
> > Even if doing it like above looks to possibly save compile-time
> > by avoiding qsort calls I think the appropriate thing to do is
> > to partition the vector candidates into sets of compatible
> > vectors (consider summing two v2df and two v4df vectors for example)
> > and then take out ones you disqualify because they do not cover
> > the vector in full.
> > 
> 
> You are absolutely right, the randomness of hashtable keys order could 
> be a problem here.  The partition part is TODO 1).  Since Power has only
> one kind whole vector width now, doesn't have v2df and v4df co-existence,
> it's put into TODO.  I will extend it in the next version of patch, add
> one more hashtable from vector type mode to v_info_map, feel free to
> correct me if you have any concerns.

I think avoiding the hashtable ordering issue is enough for now,
you could simply remember the first vector you insert (thus
pick the first candiate from the original ops[] vector).

You could also do sth like

 // move hashtable elements to a vector
 for (hashtable)
   v.push (element);
 v.qsort (sort-after-mode);

and then you have chunks of candidates with the same mode.  You
can further discriminate the candidates by their SSA name version
after the mode to get a stable sort.

Richard.

> > I think whether the vector is fully covered can be done way cheaper
> > as well by using a sbitmap and clearing a bit whenever its 
> > corresponding lane is in the vector (and bailing out if the bit
> > was already clear since you then got a duplicate).  So start
> > with bitmap_ones () and at the end check bitmap_empty_p ().
> > I don't think you depend on the vector being sorted later?
> > 
> 
> Good idea, yes, it doesn't depend on sorted or not.
> 
> >> +    {
> >> +      sum = build_and_add_sum (TREE_TYPE (ref_vec), sum_vec, tr, opcode);
> >> +      info = *(v_info_map.get (tr));
> >> +      unsigned j;
> >> +      FOR_EACH_VEC_ELT (info->ops_indexes, j, idx)
> >> +  {
> >> +    gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
> >> +    gimple_set_visited (def, true);
> > 
> > you set the visited flag to get the ops definition DCEd eventually, right?
> > Note this in a comment.
> > 
> 
> Yes, it's for that.  Will add the comment, thanks!

Reply via email to