Hi,

I have revised the patch based on your suggestions, and the following are some 
points that I think is needed to be mentioned in the mail.

> > @@ -502,17 +504,18 @@ simd_clone_adjust_return_type (struct
> cgraph_node *node)
> >      veclen = node->simdclone->vecsize_int;
> >    else
> >      veclen = node->simdclone->vecsize_float;
> > -  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t));
> > -  if (veclen > node->simdclone->simdlen)
> > +  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE
> > + (t)));  if (known_gt (veclen, node->simdclone->simdlen))
> >      veclen = node->simdclone->simdlen;
>
> Although again it probably doesn't make a difference in practice, the
> known/maybe situation is similar here.  When comparing:
>
> - an SVE vector of 2 + 2 * (VQ - 1) doubles and
> - an Advanced SIMD vector of 2 doubles
>
> the Advanced SIMD version is conceptually ordered <= the SVE one, in the
> sense that the SVE vector always contains a whole number of Advanced SIMD
> vectors whereas the Advanced SIMD vector might not contain a whole number
> of SVE vectors.
>
> In other words, the number of lanes in the Advanced SIMD vector is known_le
> the number of lanes in the SVE vector, and the number of lanes in the SVE
> vector is known_ge and maybe_gt (but not known_gt) the number of lanes in
> the Advanced SIMD vector.  So for these kinds of comparison, known_gt can
> look a bit unexpected, even if (as here) it's probably fine in practice.
>
> There's currently a hard-coded assumption in this code and in the vectoriser
> that both constant-length software vectors and constant-length hardware
> vectors are a power of 2 in size.  This means that the > above is effectively
> testing whether veclen contains a whole number of
> node->simdclone->simdlens, not just whether the veclen is bigger.
>
> So when adding the initial autovec support for SVE, we generally rewrote these
> kinds of test to use multiple_p instead of comparisons.  I think that makes 
> the
> intent more obvious (but others might disagree :-)).
>
> That was a long-winded way of saying that I suggest we use:
>
>   if (multiple_p (veclen, node->simdclone->simdlen))
>     veclen = node->simdclone->simdlen;
>
> instead.

As far as I understand, multiple_p (a, b) is equal to known_ge (a, b) due to 
the hard-coded assumption, and !multiple_p (b, a) is equal to known_gt (a, b). 
So it seems better to use !multiple_p (node->simdclone->simdlen, veclen) 
here,while using multiple_p (veclen, node->simdclone->simdlen) is OK since 
doing the assignment for the eq situation won't change anything. Anyway, I used 
multiple_p (veclen, node->simdclone->simdlen) in the revised patch.

> >      }
> >    TREE_TYPE (TREE_TYPE (fndecl)) = t;
> >    if (!node->definition)
> > @@ -663,8 +669,9 @@ simd_clone_adjust_argument_types (struct
> cgraph_node *node)
> >     veclen = sc->vecsize_int;
> >        else
> >     veclen = sc->vecsize_float;
> > -      veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
> > -      if (veclen > sc->simdlen)
> > +      veclen = exact_div (veclen,
> > +                     GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
> > +      if (known_gt (veclen, sc->simdlen))
> >     veclen = sc->simdlen;
>
> Similarly here.

Similarly here, use multiple_p (veclen, sc->simdlen) in the revised patch in 
addition.

> > @@ -981,8 +992,11 @@ ipa_simd_modify_function_body (struct
> cgraph_node *node,
> >               iter, NULL_TREE, NULL_TREE);
> >        adjustments->register_replacement
> > (&(*adjustments->m_adj_params)[j], r);
> >
> > -      if (simd_clone_subparts (vectype) < node->simdclone->simdlen)
> > -   j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1;
> > +      if (known_lt (simd_clone_subparts (vectype),
> node->simdclone->simdlen))
> > +   {
> > +     j += vector_unroll_factor (node->simdclone->simdlen,
> > +                                simd_clone_subparts (vectype)) - 1;
> > +   }
>
> And similarly here:
>
>       if (multiple_p (node->simdclone->simdlen, simd_clone_subparts
> (vectype)))
>

Similarly here, use multiple_p (node->simdclone->simdlen, simd_clone_subparts 
(vectype)) in the revised patch in addition.

> >  /* Wrapper for poly_int arguments to target macros, so that if a target
> >     doesn't need polynomial-sized modes, its header file can continue to
> >     treat the argument as a normal constant.  This should go away once
> > @@ -3902,12 +3902,12 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> stmt_vec_info stmt_info,
> >      n = n->simdclone->next_clone)
> >        {
> >     unsigned int this_badness = 0;
> > -   if (n->simdclone->simdlen > vf
> > +   if (known_gt (n->simdclone->simdlen, vf)
>
> I think !constant_multiple_p rather than known_gt here.  Until we add
> support for vectorising simd clones with partial vectors, a whole number of
> simdlens must fit within VF.  So maybe:
>
>      unsigned int num_calls;
>      if (!constant_multiple_p (n->simdclone->simdlen, vf, &num_calls)
>
> and then…
>
> >         || n->simdclone->nargs != nargs)
> >       continue;
> > -   if (n->simdclone->simdlen < vf)
> > -     this_badness += (exact_log2 (vf)
> > -                      - exact_log2 (n->simdclone->simdlen)) * 1024;
> > +   if (known_lt (n->simdclone->simdlen, vf))
> > +     this_badness += exact_log2
> > +       (vector_unroll_factor (vf, n->simdclone->simdlen)) * 1024;
>
> …use num_calls instead of vector_unroll_factor here.

The revised patch wrote these code as:

unsigned int num_calls;
if (!constant_multiple_p (vf, n->simdclone->simdlen, &num_calls)
  || n->simdclone->nargs != nargs)
 continue;
if (!multiple_p (n->simdclone->simdlen, vf))
 this_badness += exact_log2 (num_calls) * 1024;

>
> >     if (n->simdclone->inbranch)
> >       this_badness += 2048;
> >     int target_badness = targetm.simd_clone.usable (n); @@ -3988,19
> > +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info
> stmt_info,
> >     arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
> >                                                       slp_node);
> >     if (arginfo[i].vectype == NULL
> > -       || (simd_clone_subparts (arginfo[i].vectype)
> > -           > bestn->simdclone->simdlen))
> > +       || (known_gt (simd_clone_subparts (arginfo[i].vectype),
> > +                     bestn->simdclone->simdlen)))
>
> Here too I think we want constant_multiple_p:
>
>           || !constant_multiple_p (bestn->simdclone->simdlen,
>                                    simd_clone_subparts (arginfo[i].vectype))
>

Use multiple_p here since the multiple is not needed.

> >       if (ratype)
> >         new_temp = create_tmp_var (ratype);
> >       else if (useless_type_conversion_p (vectype, rtype)) @@ -4264,12
> > +4266,13 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> >
> >        if (vec_dest)
> >     {
> > -     if (simd_clone_subparts (vectype) < nunits)
> > +     if (known_lt (simd_clone_subparts (vectype), nunits))
>
> multiple_p (nunits, …) here too.  It would also be possible to do:
>
>         if (constant_multiple_p (nunits, simd_clone_subparts (vectype), &k)
>
> and avoid the separate vector_unroll_factor.

Use !multiple_p (simd_clone_subparts (vectype), nunits) here, since it seems 
that the following code should not be executed in the eq situation.

>
> >         {
> >           unsigned int k, l;
> >           poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
> >           poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
> > -         k = nunits / simd_clone_subparts (vectype);
> > +         k = vector_unroll_factor (nunits,
> > +                                   simd_clone_subparts (vectype));
> >           gcc_assert ((k & (k - 1)) == 0);
> >           for (l = 0; l < k; l++)
> >             {
> > @@ -4295,16 +4298,18 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> stmt_vec_info stmt_info,
> >             vect_clobber_variable (vinfo, stmt_info, gsi, new_temp);
> >           continue;
> >         }
> > -     else if (simd_clone_subparts (vectype) > nunits)
> > +     else if (known_gt (simd_clone_subparts (vectype), nunits))
>
> multiple_p here too.
>

Similarly, use !multiple_p (nunits, simd_clone_subparts (vectype)) here.

Bootstrap and tested on both aarch64 and x86 Linux platform, no new regression 
witnessed.

Any suggestions?

Thanks,
Yang Yang

Attachment: PR96342-part1-v2.patch
Description: PR96342-part1-v2.patch

Reply via email to