On Wed, Nov 16, 2022 at 4:25 AM Richard Biener via Gcc-patches
<[email protected]> wrote:
>
> On Tue, 15 Nov 2022, Richard Sandiford wrote:
>
> > "Andre Vieira (lists)" <[email protected]> writes:
> > > On 07/11/2022 11:05, Richard Biener wrote:
> > >> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:
> > >>
> > >>> Sorry for the delay, just been reminded I still had this patch 
> > >>> outstanding
> > >>> from last stage 1. Hopefully since it has been mostly reviewed it could 
> > >>> go in
> > >>> for this stage 1?
> > >>>
> > >>> I addressed the comments and gave the slp-part of vectorizable_call 
> > >>> some TLC
> > >>> to make it work.
> > >>>
> > >>> I also changed vect_get_slp_defs as I noticed that the call from
> > >>> vectorizable_call was creating an auto_vec with 'nargs' that might be 
> > >>> less
> > >>> than the number of children in the slp_node
> > >> how so?  Please fix that in the caller.  It looks like it probably
> > >> shoud use vect_nargs instead?
> > > Well that was my first intuition, but when I looked at it further the
> > > variant it's calling:
> > > void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec<vec<tree> >
> > > *vec_oprnds, unsigned n)
> > >
> > > Is actually creating a vector of vectors of slp defs. So for each child
> > > of slp_node it calls:
> > > void vect_get_slp_defs (slp_tree slp_node, vec<tree> *vec_defs)
> > >
> > > Which returns a vector of vectorized defs. So vect_nargs would be the
> > > right size for the inner vec<tree> of vec_defs, but the outer should
> > > have the same number of elements as the original slp_node has children.
> > >
> > > However, at the call site (vectorizable_call), the operand we pass to
> > > vect_get_slp_defs 'vec_defs', is initialized before the code-path is
> > > specialized for slp_node. I'll go see if I can change the call site to
> > > not have to do that, given the continue at the end of the if (slp_node)
> > > BB I don't think it needs to use vec_defs after it, but it may require
> > > some massaging to be able to define it separately for each code-path.
> > >
> > >>
> > >>> , so that quick_push might not be
> > >>> safe as is, so I added the reserve (n) to ensure it's safe to push. I 
> > >>> didn't
> > >>> actually come across any failure because of it though. Happy to split 
> > >>> this
> > >>> into a separate patch if needed.
> > >>>
> > >>> Bootstrapped and regression tested on aarch64-none-linux-gnu and
> > >>> x86_64-pc-linux-gnu.
> > >>>
> > >>> OK for trunk?
> > >> I'll leave final approval to Richard but
> > >>
> > >> -     This only needs 1 bit, but occupies the full 16 to ensure a nice
> > >> +     This only needs 1 bit, but occupies the full 15 to ensure a nice
> > >>        layout.  */
> > >>     unsigned int vectorizable : 16;
> > >>
> > >> you don't actually change the width of the bitfield.  I would find
> > >> it more natural to have
> > >>
> > >>    signed int type0 : 7;
> > >>    signed int type0_vtrans : 1;
> > >>    signed int type1 : 7;
> > >>    signed int type1_vtrans : 1;
> > >>
> > >> with typeN_vtrans specifying how the types transform when vectorized.
> > >> I would imagine another variant we could need is narrow/widen
> > >> according to either result or other argument type?  That said,
> > >> just your flag would then be
> > >>
> > >>    signed int type0 : 7;
> > >>    signed int pad   : 1;
> > >>    signed int type1 : 7;
> > >>    signed int type1_vect_as_scalar : 1;
> > >>
> > >> ?
> > > That's a cool idea! I'll leave it as a single bit for now like that, if
> > > we want to re-use it for multiple transformations we will obviously need
> > > to rename & give it more bits.
> >
> > I think we should steal bits from vectorizable rather than shrink
> > type0 and type1 though.  Then add a 14-bit padding field to show
> > how many bits are left.
> >
> > > @@ -3340,9 +3364,20 @@ vectorizable_call (vec_info *vinfo,
> > >        rhs_type = unsigned_type_node;
> > >      }
> > >
> > > +  /* The argument that is not of the same type as the others.  */
> > >    int mask_opno = -1;
> > > +  int scalar_opno = -1;
> > >    if (internal_fn_p (cfn))
> > > -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> > > +    {
> > > +      internal_fn ifn = as_internal_fn (cfn);
> > > +      if (direct_internal_fn_p (ifn)
> > > +     && direct_internal_fn (ifn).type1_is_scalar_p)
> > > +   scalar_opno = direct_internal_fn (ifn).type1;
> > > +      else
> > > +   /* For masked operations this represents the argument that carries the
> > > +      mask.  */
> > > +   mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> >
> > This doesn't seem logically like an else.  We should do both.
> >
> > LGTM otherwise for the bits outside match.pd.  If Richard's happy with
> > the match.pd bits then I think the patch is OK with those changes and
> > without the vect_get_slp_defs thing (as you mentioned downthread).
>
> Yes, the match.pd part looked OK.

I was in the process of cleaning up patchworks for aarch64 patches and
came across this one which looks like it was approved but never went
in.
I doubt it applies now. And we are in stage 3 already. Maybe for stage
1 this patch can be revived/revisited.
I have not checked to see if the testcases now emit the expected code.

Thanks,
Andrew


>
> > Thanks,
> > Richard
> >
> >
> > >>
> > >>> gcc/ChangeLog:
> > >>>
> > >>>          * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New
> > >>> pattern.
> > >>>          * config/aarch64/iterators.md (FRINTNZ): New iterator.
> > >>>          (frintnz_mode): New int attribute.
> > >>>          (VSFDF): Make iterator conditional.
> > >>>          * internal-fn.def (FTRUNC_INT): New IFN.
> > >>>          * internal-fn.cc (ftrunc_int_direct): New define.
> > >>>          (expand_ftrunc_int_optab_fn): New custom expander.
> > >>>          (direct_ftrunc_int_optab_supported_p): New supported_p.
> > >>>          * internal-fn.h (direct_internal_fn_info): Add new member
> > >>>          type1_is_scalar_p.
> > >>>          * match.pd: Add to the existing TRUNC pattern match.
> > >>>          * optabs.def (ftrunc_int): New entry.
> > >>>          * stor-layout.h (element_precision): Moved from here...
> > >>>          * tree.h (element_precision): ... to here.
> > >>>          (element_type): New declaration.
> > >>>          * tree.cc (element_type): New function.
> > >>>          (element_precision): Changed to use element_type.
> > >>>          * tree-vect-stmts.cc (vectorizable_internal_function): Add
> > >>> support for
> > >>>          IFNs with different input types.
> > >>>          (vect_get_scalar_oprnds): New function.
> > >>>          (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
> > >>>          * tree-vect-slp.cc (check_scalar_arg_ok): New function.
> > >>>          (vect_slp_analyze_node_operations): Use check_scalar_arg_ok.
> > >>>          (vect_get_slp_defs): Ensure vec_oprnds has enough slots to 
> > >>> push.
> > >>>          * doc/md.texi: New entry for ftrunc pattern name.
> > >>>          * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.
> > >>>
> > >>> gcc/testsuite/ChangeLog:
> > >>>
> > >>>          * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz
> > >>> instructions available.
> > >>>          * lib/target-supports.exp: Added aarch64_frintnzx_ok target and
> > >>> aarch64_frintz options.
> > >>>          * gcc.target/aarch64/frintnz.c: New test.
> > >>>          * gcc.target/aarch64/frintnz_vec.c: New test.
> > >>>          * gcc.target/aarch64/frintnz_slp.c: New test.
> > >>>
> >
>
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)

Reply via email to