On Mon, 7 Nov 2022, Andre Vieira (lists) wrote:

> 
> On 07/11/2022 11:05, Richard Biener wrote:
> > On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:
> >
> >> Sorry for the delay, just been reminded I still had this patch outstanding
> >> from last stage 1. Hopefully since it has been mostly reviewed it could go
> >> in
> >> for this stage 1?
> >>
> >> I addressed the comments and gave the slp-part of vectorizable_call some
> >> TLC
> >> to make it work.
> >>
> >> I also changed vect_get_slp_defs as I noticed that the call from
> >> vectorizable_call was creating an auto_vec with 'nargs' that might be less
> >> than the number of children in the slp_node
> > how so?  Please fix that in the caller.  It looks like it probably
> > shoud use vect_nargs instead?
> Well that was my first intuition, but when I looked at it further the variant
> it's calling:
> void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec<vec<tree> >
> *vec_oprnds, unsigned n)
> 
> Is actually creating a vector of vectors of slp defs. So for each child of
> slp_node it calls:
> void vect_get_slp_defs (slp_tree slp_node, vec<tree> *vec_defs)
> 
> Which returns a vector of vectorized defs. So vect_nargs would be the right
> size for the inner vec<tree> of vec_defs, but the outer should have the same
> number of elements as the original slp_node has children.

No, the inner vector is the vector of vectors for each arg, the outer
vector should be the one for each argument.  Hm, that was a confusing
sentence.

That said, the number of SLP children of a call node should eventually
be the number of arguments of the call (plus masks, etc.).  So it
looks about correct besides the vec_nargs issue?

> 
> However, at the call site (vectorizable_call), the operand we pass to
> vect_get_slp_defs 'vec_defs', is initialized before the code-path is
> specialized for slp_node. I'll go see if I can change the call site to not
> have to do that, given the continue at the end of the if (slp_node) BB I don't
> think it needs to use vec_defs after it, but it may require some massaging to
> be able to define it separately for each code-path.
> 
> >
> >> , so that quick_push might not be
> >> safe as is, so I added the reserve (n) to ensure it's safe to push. I
> >> didn't
> >> actually come across any failure because of it though. Happy to split this
> >> into a separate patch if needed.
> >>
> >> Bootstrapped and regression tested on aarch64-none-linux-gnu and
> >> x86_64-pc-linux-gnu.
> >>
> >> OK for trunk?
> > I'll leave final approval to Richard but
> >
> > -     This only needs 1 bit, but occupies the full 16 to ensure a nice
> > +     This only needs 1 bit, but occupies the full 15 to ensure a nice
> >        layout.  */
> >     unsigned int vectorizable : 16;
> >
> > you don't actually change the width of the bitfield.  I would find
> > it more natural to have
> >
> >    signed int type0 : 7;
> >    signed int type0_vtrans : 1;
> >    signed int type1 : 7;
> >    signed int type1_vtrans : 1;
> >
> > with typeN_vtrans specifying how the types transform when vectorized.
> > I would imagine another variant we could need is narrow/widen
> > according to either result or other argument type?  That said,
> > just your flag would then be
> >
> >    signed int type0 : 7;
> >    signed int pad   : 1;
> >    signed int type1 : 7;
> >    signed int type1_vect_as_scalar : 1;
> >
> > ?
> That's a cool idea! I'll leave it as a single bit for now like that, if we
> want to re-use it for multiple transformations we will obviously need to
> rename & give it more bits.
> >
> >> gcc/ChangeLog:
> >>
> >>         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New
> >> pattern.
> >>          * config/aarch64/iterators.md (FRINTNZ): New iterator.
> >>          (frintnz_mode): New int attribute.
> >>          (VSFDF): Make iterator conditional.
> >>          * internal-fn.def (FTRUNC_INT): New IFN.
> >>          * internal-fn.cc (ftrunc_int_direct): New define.
> >>          (expand_ftrunc_int_optab_fn): New custom expander.
> >>          (direct_ftrunc_int_optab_supported_p): New supported_p.
> >>          * internal-fn.h (direct_internal_fn_info): Add new member
> >>          type1_is_scalar_p.
> >>          * match.pd: Add to the existing TRUNC pattern match.
> >>          * optabs.def (ftrunc_int): New entry.
> >>          * stor-layout.h (element_precision): Moved from here...
> >>          * tree.h (element_precision): ... to here.
> >>          (element_type): New declaration.
> >>          * tree.cc (element_type): New function.
> >>          (element_precision): Changed to use element_type.
> >>          * tree-vect-stmts.cc (vectorizable_internal_function): Add
> >> support for
> >>          IFNs with different input types.
> >>          (vect_get_scalar_oprnds): New function.
> >>          (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
> >>          * tree-vect-slp.cc (check_scalar_arg_ok): New function.
> >>          (vect_slp_analyze_node_operations): Use check_scalar_arg_ok.
> >>          (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push.
> >>          * doc/md.texi: New entry for ftrunc pattern name.
> >>          * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz
> >> instructions available.
> >>         * lib/target-supports.exp: Added aarch64_frintnzx_ok target and
> >> aarch64_frintz options.
> >>          * gcc.target/aarch64/frintnz.c: New test.
> >>          * gcc.target/aarch64/frintnz_vec.c: New test.
> >>          * gcc.target/aarch64/frintnz_slp.c: New test.
> >>
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Reply via email to