[SLP] SLP vectorization: vectorize vector constructors

2019-10-30 Thread Joel Hutton
On 15/10/2019 13:11, Richard Biener wrote:
 >>  > You miss to check that CONSTRUCTOR_NELTS == TYPE_VECTOR_SUBPARTS
 >>  > (we can have omitted trailing zeros).
 >
 > ^^^
 >
 > I don't see this being handled?  You give up on non-SSA names
 > but not on the omitted trailing zeros.

I had thought checking the number of vectors produced would work. I've 
added that check.
I'm slightly confused about what should be done for non-SSA names. 
There's no scalar stmt to gather for a constant in a vector constructor.

 >
 > You build a CONSTRUCTOR of vectors, thus
 >
 > _orig_ssa_1 = { vect_part1_2, vect_part2_3, ... };
I've added code to do this, and a testcase which triggers it.

 >
 > +
 > + if (constructor)
 > +   {
 > + SLP_INSTANCE_ROOT_STMT (new_instance) = stmt_info->stmt;
 > +   }
 > + else
 > +   SLP_INSTANCE_ROOT_STMT (new_instance) = NULL;
 > +
 >
 > too much vertical space, no {} around single-stmt if clauses
Fixed.

 >
 >
 > @@ -2725,6 +2760,10 @@ vect_bb_slp_scalar_cost (basic_block bb,
 > stmt_vec_info use_stmt_info = vinfo->lookup_stmt
 > (use_stmt);
 > if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
 >   {
 > +   /* Check this is not a constructor that will be
 > vectorized
 > +  away.  */
 > +   if (BB_VINFO_GROUPED_STORES (vinfo).contains
 > (use_stmt_info))
 > +   continue;
 >
 > hmm, so why not set the slp type on SLP_INSTANCE_ROOT_STMT instead?
 > In theory the stmt should be part of the SLP tree itself but that's
 > probably too awkward to be made work at the moment ;)
I did try this, but it was indeed very awkward to be made to work.

 >
 > vect_ssa_use_outside_bb and vect_slp_check_for_constructors are new
 > functions which need comments.
Fixed. I've also taken the 'vectorize the root' out into a separate 
function.

 >
 > +  /* For vector constructors, the same SSA name must be used to 
maintain
 > data
 > + flow into other basic blocks.  */
 > +  if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance)
 > +  && SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1
 > +  && SLP_TREE_VEC_STMTS (node).exists ())
 > +    {
 >
 > it should read
 >
 >   /* Vectorize the instance root.  */
 >
 > and be in vect_schedule_slp after the vect_schedule_slp_instance.
 > As said above you need to handle SLP_TREE_NUMBER_OF_VEC_STMTS > 1,
 > you also cannot simply do "nothing" here, "failing" vectorization
 > (well, you probably can - DCE will remove the vectorized code - but
 > at least if there were calls in the SLP tree they will be mangled
 > by vectorization so the scalar code is wrecked).  SO it should be
 >
 >  if (SLP_INSTANCE_ROOT_STMT (instance))
 >    .. you may not fail to replace the scalar root stmt here ..
 >
So what should be done in the case that CONSTRUCTOR_NELTS < 
TYPE_VECTOR_SUBPARTS?

 > + if (CONSTRUCTOR_NELTS (rhs) == 0)
 > +   vectorizable = false;
 > +
 >
 > if you use continue; you can elide the 'vectorizable' variable.
Done.

 >
 > + if (!vect_ssa_use_outside_bb (gimple_assign_lhs (stmt)))
 > +   vectorizable = false;
 > +
 >
 > why's that?  no comments that clarify ;)  The vector may be
 > used as argument to a call or as source of a store.  So I'd simply
 > remove this check (and the function).

Done. The thinking was that if the vector was used as a source of a 
store the SLP tree would be built from the grouped store instead. Will 
it not cause problems if both the grouped store and the vector 
constructor are used to build SLP trees?



2019-10-10  Joel Hutton  

 * expr.c (store_constructor): Add case for constructor of vectors.
 * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector 
constructors.
 (vect_bb_slp_scalar_cost): Likewise.
     (vect_slp_check_for_constructors): New function.
 (vect_slp_analyze_bb_1): Add check for vector constructors.
 (vect_schedule_slp_instance): Add case to fixup vector constructor 
stmt.
 (vectorize_slp_instance_root_stmt): New function
 * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.

gcc/testsuite/ChangeLog:

2019-10-10  Joel Hutton  

 * gcc.dg/vect/bb-slp-40.c: New test.
 * gcc.dg/vect/bb-slp-41.c: New test.

From 902510bd498acfc9e30636f8267b57027bc63254 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Tue, 22 Oct 2019 10:05:19 +0100
Subject: [PATCH] SLP Vectorization: Vectorize vector constructors

---
 gcc/expr.c|   5 +-
 gcc/tests

Re: [SLP] SLP vectorization: vectorize vector constructors

2019-10-30 Thread Joel Hutton
On 30/10/2019 13:49, Richard Biener wrote:
> Why do you need this?  The vectorizer already creates such CTORs.  Any
> testcase that you can show?

typedef long v2di __attribute__((vector_size(16)));
v2di v;
void
foo()
{
   v = (v2di){v[1], v[0]};
}


>>   * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector 
>> constructors.
>>   (vect_bb_slp_scalar_cost): Likewise.
>>   (vect_slp_check_for_constructors): New function.
>>   (vect_slp_analyze_bb_1): Add check for vector constructors.
>>   (vect_schedule_slp_instance): Add case to fixup vector constructor 
>> stmt.
>>   (vectorize_slp_instance_root_stmt): New function
>>   * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.
> + SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ?
> stmt_info->stmt\
> + : NULL;
>
> SLP_INSTANCE_ROOT_STMT should be a stmt_vec_info I guess...
>
> @@ -2801,6 +2830,10 @@ vect_bb_slp_scalar_cost (basic_block bb,
>  stmt_vec_info use_stmt_info = vinfo->lookup_stmt
> (use_stmt);
>  if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
>{
> +   /* Check this is not a constructor that will be
> vectorized
> +  away.  */
> +   if (BB_VINFO_GROUPED_STORES (vinfo).contains
> (use_stmt_info))
> +   continue;
>  (*life)[i] = true;
>
> ... which you then simply mark as PURE_SLP_STMT where we call
> vect_mark_slp_stmts in vect_slp_analyze_bb_1.
>
> I see you have the TYPE_VECTOR_SUBPARTS check still at transform
> stage in vectorize_slp_instance_root_stmt, please simply move
> it to vect_slp_check_for_constructors or to vect_analyze_slp_instance
> where you have the other rejects (non-SSA names in the ctor).
If the check is in vect_slp_check_for_constructors, which vector is used 
as the input to tYPE_VECTOR_SUBPARTS, the lhs of the constructor?
> That is, vectorize_slp_instance_root_stmt may not fail.
>
> +bool
> +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance)
> +{
> +
>
> extra newline
>
> +  if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1)
>
> the stmt replacing can be commonized between == 1 and > 1 cases.
>
> FOR_EACH_VEC_ELT (slp_instances, i, instance)
>   {
> +  slp_tree node = SLP_INSTANCE_TREE (instance);
> /* Schedule the tree of INSTANCE.  */
> -  vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance),
> +  vect_schedule_slp_instance (node,
>instance, bst_map);
> +
> +  /* Vectorize the instance root.  */
> +  if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance)
> + && SLP_TREE_VEC_STMTS (node).exists ())
> +   if (!vectorize_slp_instance_root_stmt (node, instance))
> + {
>
> instance->root == node is always true.  Likewise
> SLP_TREE_VEC_STMTS (node).exists ().
>
> @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo)
> if (is_a  (vinfo))
>
> the ChangeLog doesn't mention this.  I guess this isn't necessary
> unless you fail vectorizing late which you shouldn't.
>
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index
> 56be28b0cc5a77412f996e70636b08d5b615813e..9f8419e4208b7d438ace41892022f93ebcadd019
> 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -151,6 +151,10 @@ public:
> /* The root of SLP tree.  */
> slp_tree root;
>
> +  /* For vector constructors, the constructor stmt that the SLP tree is
> built
> + from, NULL otherwise.  */
> +  gimple *root_stmt;
>
> as said, make this a stmt_vec_info
>
> Thanks,
> Richard.
>
>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-10-10  Joel Hutton  
>>
>>   * gcc.dg/vect/bb-slp-40.c: New test.
>>   * gcc.dg/vect/bb-slp-41.c: New test.
>>
>>


Re: [SLP] SLP vectorization: vectorize vector constructors

2019-10-30 Thread Joel Hutton

On 30/10/2019 14:51, Richard Biener wrote:
> On Wed, 30 Oct 2019, Joel Hutton wrote:
>
>> On 30/10/2019 13:49, Richard Biener wrote:
>>> Why do you need this?  The vectorizer already creates such CTORs.  Any
>>> testcase that you can show?
>> typedef long v2di __attribute__((vector_size(16)));
>> v2di v;
>> void
>> foo()
>> {
>>     v = (v2di){v[1], v[0]};
>> }
> Huh.  On what architecture?  Is that for a V2DI constructor of
> V1DI vectors maybe?  On x86 the code doesn't trigger.

On aarch64.

emode = E_DImode

eltmode = E_DImode


> The condition was indeed to check for vector mode elements so
> maybe it should simply read
>
>if (VECTOR_MODE_P (emode))
>
> ?  eltmode is always scalar.
I'll try that, thanks.


Re: [SLP] SLP vectorization: vectorize vector constructors

2019-11-01 Thread Joel Hutton
On 30/10/2019 13:49, Richard Biener wrote:
 >>
 >>  * expr.c (store_constructor): Add case for constructor of 
vectors.
 > Why do you need this?  The vectorizer already creates such CTORs.  Any
 > testcase that you can show?
 >
 > typedef long v2di __attribute__((vector_size(16)));
 > v2di v;
 > void
 > foo()
 > {
 >    v = (v2di){v[1], v[0]};
 > }
There is a special case for single element vectors, where the vector 
mode and
the element mode are the same.
I've changed this slightly, as your solution of using VECTOR_MODE_P didn't
work for my case where:
   emode = E_DImode
   eltmode = E_DImode
On aarch64. As E_DImode is not a vector mode.
Based on some checks I found in verify_gimple, I set the type of the 
replacement
constructor to the same as the original constructor, as verify_gimple 
seems to
expect flattened types. i.e. a vector of vectors of ints should have 
type vector
of ints.


 > What could be done is "compact" the
 > operands to vectorize to only contain SSA names, try to SLP
 > match and vectorize them and then combine the vectorized result
 > with the constant elements.
 >
 > Not worth doing I guess unless it's sth regular like
 >
 >  { a, b, c, d, 0, 0, 0, 0 }
 >
 > or so.  But this can be handled as followup if necessary.
 >
I agree, it should be possible to support this in future patches.


 > + SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ?
 > stmt_info->stmt\
 > + : NULL;
 >
 > SLP_INSTANCE_ROOT_STMT should be a stmt_vec_info I guess...
 >
 > @@ -2801,6 +2830,10 @@ vect_bb_slp_scalar_cost (basic_block bb,
 > stmt_vec_info use_stmt_info = vinfo->lookup_stmt
 > (use_stmt);
 > if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
 >   {
 > +   /* Check this is not a constructor that will be
 > vectorized
 > +  away.  */
 > +   if (BB_VINFO_GROUPED_STORES (vinfo).contains
 > (use_stmt_info))
 > +   continue;
 > (*life)[i] = true;
 >
 > ... which you then simply mark as PURE_SLP_STMT where we call
 > vect_mark_slp_stmts in vect_slp_analyze_bb_1.
Done.


 > I see you have the TYPE_VECTOR_SUBPARTS check still at transform
 > stage in vectorize_slp_instance_root_stmt, please simply move
 > it to vect_slp_check_for_constructors or to vect_analyze_slp_instance
 > where you have the other rejects (non-SSA names in the ctor).
 >
Done.


 >
 > +bool
 > +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance)
 > +{
 > +
 >
 > extra newline
Fixed.


 > +  /* For vector constructors, the constructor stmt that the SLP tree is
 > built
 > + from, NULL otherwise.  */
 > +  gimple *root_stmt;
 >
 > as said, make this a stmt_vec_info
Done.


 > +  if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1)
 >
 > the stmt replacing can be commonized between == 1 and > 1 cases.
Done.


 > +
 > +  /* Vectorize the instance root.  */
 > +  if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance)
 > + && SLP_TREE_VEC_STMTS (node).exists ())
 > +   if (!vectorize_slp_instance_root_stmt (node, instance))
 > + {
 >
 > instance->root == node is always true.  Likewise
 > SLP_TREE_VEC_STMTS (node).exists ().
Done.


 > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo)
 >    if (is_a  (vinfo))
 >
 > the ChangeLog doesn't mention this.  I guess this isn't necessary
 > unless you fail vectorizing late which you shouldn't.
 >
It's necessary to avoid:

     removing stmts twice for constructors of the form {_1,_1,_1,_1}
     removing stmts that define ssa names used elsewhere (which 
previously wasn't a consideration because the scalar_stmts were stores 
to memory, not assignments to ssa names)

Updated changelog (and patch)

2019-10-31  Joel Hutton  

     * expr.c (store_constructor): Modify to handle single element 
vectors.
     * tree-vect-slp.c (vect_analyze_slp_instance): Add case for 
vector constructors.
     (vect_slp_check_for_constructors): New function.
     (vect_slp_analyze_bb_1): Call new function to check for vector 
constructors.
     (vectorize_slp_instance_root_stmt): New function.
     (vect_schedule_slp): Call new function to vectorize root stmt 
of vector constructors.
     * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.

gcc/testsuite/ChangeLog:

2019-10-31  Joel Hutton  

     * gcc.dg/vect/bb-slp-40.c: New test.
     * gcc.dg/vect/bb-slp-41.c: New test.

From 35942aebea1e93497280e11a17af0ca393539e2f Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Tue, 22 O

Re: [SLP] SLP vectorization: vectorize vector constructors

2019-11-04 Thread Joel Hutton
First copy bounced from mailing list.
 > > On 30/10/2019 13:49, Richard Biener wrote:
 > >  >>
 > >  >>  * expr.c (store_constructor): Add case for constructor of
 > > vectors.
 > >  > Why do you need this?  The vectorizer already creates such 
CTORs.  Any
 > >  > testcase that you can show?
 > >  >
 > >  > typedef long v2di __attribute__((vector_size(16)));
 > >  > v2di v;
 > >  > void
 > >  > foo()
 > >  > {
 > >  >    v = (v2di){v[1], v[0]};
 > >  > }
 > > There is a special case for single element vectors, where the vector
 > > mode and
 > > the element mode are the same.
 > > I've changed this slightly, as your solution of using VECTOR_MODE_P 
didn't
 > > work for my case where:
 > >    emode = E_DImode
 > >    eltmode = E_DImode
 > > On aarch64. As E_DImode is not a vector mode.
 > > Based on some checks I found in verify_gimple, I set the type of the
 > > replacement
 > > constructor to the same as the original constructor, as verify_gimple
 > > seems to
 > > expect flattened types. i.e. a vector of vectors of ints should have
 > > type vector
 > > of ints.
 >
 > Huh.  On aarch64 for the above testcase I see mode V2DImode and
 > emode = eltmode = DImode.  That's just a regular CTOR with
 > non-vector elements so not sure why you need any adjustment at all
 > here?
 >
 > It looks like your vectorization of the CTOR introduces a
 > V2DImode CTOR of vector(1) long elements which incidentially
 > have DImode.  That's because somehow V2DI vectorization isn't
 > profitable but V1DI one is.  In the end it's a noop transform
 > but the testcase shows that for V2DI vectorization we fail
 > to cost the CTOR in the scalar cost computation (you have to
 > include the pattern root there I guess).
 >
Yes, sorry, I was unclear about this, it's the new constructor that the 
change is needed for.

 > Technically we feed valid GIMPLE to the expander so we indeed
 > shouldn't ICE.  So I'd like to have the earlier keying on
 > vec_vec_init_p match the later assert we run into, thus
 > sth like
 >
 > Index: gcc/expr.c
 > ===
 > --- gcc/expr.c  (revision 277765)
 > +++ gcc/expr.c  (working copy)
 > @@ -6809,6 +6809,7 @@ store_constructor (tree exp, rtx target,
 > && n_elts.is_constant (&const_n_elts))
 >   {
 > machine_mode emode = eltmode;
 > +   bool vector_typed_elts_p = false;
 >
 > if (CONSTRUCTOR_NELTS (exp)
 > && (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp,
 > 0)->value))
 > @@ -6819,13 +6820,14 @@ store_constructor (tree exp, rtx target,
 >   * TYPE_VECTOR_SUBPARTS (etype),
 >   n_elts));
 > emode = TYPE_MODE (etype);
 > +   vector_typed_elts_p = true;
 >   }
 > icode = convert_optab_handler (vec_init_optab, mode, emode);
 > if (icode != CODE_FOR_nothing)
 >   {
 > unsigned int n = const_n_elts;
 >
 > -   if (emode != eltmode)
 > +   if (vector_typed_elts_p)
 >   {
 > n = CONSTRUCTOR_NELTS (exp);
 > vec_vec_init_p = true;
 >
 >
I've used this, thank you.

 > >  > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo)
 > >  >    if (is_a  (vinfo))
 > >  >
 > >  > the ChangeLog doesn't mention this.  I guess this isn't necessary
 > >  > unless you fail vectorizing late which you shouldn't.
 > >  >
 > > It's necessary to avoid:
 > >
 > >  removing stmts twice for constructors of the form 
{_1,_1,_1,_1}
 > >  removing stmts that define ssa names used elsewhere (which
 > > previously wasn't a consideration because the scalar_stmts were stores
 > > to memory, not assignments to ssa names)
 >
 > OK, but the code you are patching is just supposed to remove stores
 > from the final scalar stmt seeds - it doesn't expect any loads there
 > which is probably what happens.  I'd instead add a
 >
 >    /* Do not CSE the stmts feeding a CTOR, they might have uses
 >   outside of the vectorized stmts.  */
 >    if (SLP_INSTANCE_ROOT_STMT (instance))
 >  continue;
 >
 > before the loop over SLP_TREE_SCALAR_STMTS.
Done.

 > + if (!subparts.is_constant () || !(subparts.to_constant ()
 > +   

[GCC][IRA] Revert 11b8091fb to fix Bug 93221

2020-01-21 Thread Joel Hutton
Hi all,

A previous change to simplify LRA introduced in 11b809 (From-SVN: 
r279550) disabled hard register splitting for -O0. This causes a problem 
on aarch64 in cases where parameters are passed in multiple registers 
(in the bug report an OI passed in 2 V4SI registers). This is mandated 
by the AAPCS.

Vlad, Eric, do you have a preferred alternate solution to reverting the 
patch?

Previously discussed here: 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01414.html
Bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93221

Bootstrapped and regression tested on aarch64

Changelog:

2020-01-21  Joel Hutton  

     * ira.c (ira): Revert use of simplified LRA algorithm.

gcc/testsuite/ChangeLog:

2020-01-21  Joel Hutton  

     PR bug/93221
     * gcc.target/aarch64/pr93221.c: New test.

From 0d9980d2327c61eb99d041a217d6ea5c5b34c754 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Tue, 21 Jan 2020 09:37:48 +
Subject: [PATCH] [IRA] Fix bug 93221 by reverting 11b8091fb

11b8091fb introduced a simplified LRA algorithm for -O0 that turned off
hard register splitting, this causes a problem for parameters passed in
multiple registers on aarch64. This fixes bug 93221.
---
 gcc/ira.c  | 38 +-
 gcc/testsuite/gcc.target/aarch64/pr93221.c | 10 ++
 2 files changed, 25 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr93221.c

diff --git a/gcc/ira.c b/gcc/ira.c
index 46091adf8109263c72343dccfe4913857b5c74ae..c8b5f869da121506f0414901271eae9810689316 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -5205,35 +5205,27 @@ ira (FILE *f)
   /* Perform target specific PIC register initialization.  */
   targetm.init_pic_reg ();
 
-  if (optimize)
-{
-  ira_conflicts_p = true;
-
-  /* Determine the number of pseudos actually requiring coloring.  */
-  unsigned int num_used_regs = 0;
-  for (unsigned int i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
-	if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
-	  num_used_regs++;
-
-  /* If there are too many pseudos and/or basic blocks (e.g. 10K
-	 pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
-	 use simplified and faster algorithms in LRA.  */
-  lra_simple_p
-	= ira_use_lra_p
-	  && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
-}
-  else
-{
-  ira_conflicts_p = false;
-  lra_simple_p = ira_use_lra_p;
-}
+  ira_conflicts_p = optimize > 0;
+
+  /* Determine the number of pseudos actually requiring coloring.  */
+  unsigned int num_used_regs = 0;
+  for (unsigned int i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
+if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
+  num_used_regs++;
+
+  /* If there are too many pseudos and/or basic blocks (e.g. 10K
+ pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
+ use simplified and faster algorithms in LRA.  */
+  lra_simple_p
+= ira_use_lra_p
+  && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
 
   if (lra_simple_p)
 {
   /* It permits to skip live range splitting in LRA.  */
   flag_caller_saves = false;
   /* There is no sense to do regional allocation when we use
-	 simplified LRA.  */
+	simplified LRA.  */
   flag_ira_region = IRA_REGION_ONE;
   ira_conflicts_p = false;
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/pr93221.c b/gcc/testsuite/gcc.target/aarch64/pr93221.c
new file mode 100644
index ..517135a889de8a7e379c79222f8a8b2efcc7b422
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr93221.c
@@ -0,0 +1,10 @@
+/* PR bug/93221 */
+/* { dg-do compile } */
+/* { dg-options "-O0 -mno-omit-leaf-frame-pointer" } */
+
+struct S { __Int32x4_t b[2]; };
+
+void __attribute__((optimize (0)))
+foo (struct S x)
+{
+}
-- 
2.17.1



Re: [GCC][IRA] Revert 11b8091fb to fix Bug 93221

2020-01-21 Thread Joel Hutton
On 21/01/2020 19:26, Jakub Jelinek wrote:
> Not a review, just nitpicking.  Please avoid the backslash before >.
> The PR number should go to both gcc/ and gcc/testsuite/ ChangeLog
> entry and we don't have bug/ category; the bug is target category,
> so it should be PR target/93221, or could be reclassified first in
> bugzilla e.g. to middle-end and then written as PR middle-end/93221.
>
Done
>> @@ -0,0 +1,10 @@
>> +/* PR bug/93221 */
> Likewise here.
Done
> Not sure if I understand why do you need optimize (0) attribute when the
> whole test is compiled with -O0.  Doesn't it ICE without the attribute
> too?
Done. It's not really necessary, belt and braces.

Updated patch attached

From 1a2980ef6eeb76dbf0556f806a85a4f49ad3ebdd Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Tue, 21 Jan 2020 09:37:48 +
Subject: [PATCH] [IRA] Fix bug 93221 by reverting 11b8091fb

11b8091fb introduced a simplified LRA algorithm for -O0 that turned off
hard register splitting, this causes a problem for parameters passed in
multiple registers on aarch64. This fixes bug 93221.
---
 gcc/ira.c  | 38 +-
 gcc/testsuite/gcc.target/aarch64/pr93221.c | 10 ++
 2 files changed, 25 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr93221.c

diff --git a/gcc/ira.c b/gcc/ira.c
index 46091adf8109263c72343dccfe4913857b5c74ae..c8b5f869da121506f0414901271eae9810689316 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -5205,35 +5205,27 @@ ira (FILE *f)
   /* Perform target specific PIC register initialization.  */
   targetm.init_pic_reg ();
 
-  if (optimize)
-{
-  ira_conflicts_p = true;
-
-  /* Determine the number of pseudos actually requiring coloring.  */
-  unsigned int num_used_regs = 0;
-  for (unsigned int i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
-	if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
-	  num_used_regs++;
-
-  /* If there are too many pseudos and/or basic blocks (e.g. 10K
-	 pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
-	 use simplified and faster algorithms in LRA.  */
-  lra_simple_p
-	= ira_use_lra_p
-	  && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
-}
-  else
-{
-  ira_conflicts_p = false;
-  lra_simple_p = ira_use_lra_p;
-}
+  ira_conflicts_p = optimize > 0;
+
+  /* Determine the number of pseudos actually requiring coloring.  */
+  unsigned int num_used_regs = 0;
+  for (unsigned int i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
+if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
+  num_used_regs++;
+
+  /* If there are too many pseudos and/or basic blocks (e.g. 10K
+ pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
+ use simplified and faster algorithms in LRA.  */
+  lra_simple_p
+= ira_use_lra_p
+  && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
 
   if (lra_simple_p)
 {
   /* It permits to skip live range splitting in LRA.  */
   flag_caller_saves = false;
   /* There is no sense to do regional allocation when we use
-	 simplified LRA.  */
+	simplified LRA.  */
   flag_ira_region = IRA_REGION_ONE;
   ira_conflicts_p = false;
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/pr93221.c b/gcc/testsuite/gcc.target/aarch64/pr93221.c
new file mode 100644
index ..4dc2c3d0149423dd3d666f7428277ffa9eb765c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr93221.c
@@ -0,0 +1,10 @@
+/* PR target/93221 */
+/* { dg-do compile } */
+/* { dg-options "-O0 -mno-omit-leaf-frame-pointer" } */
+
+struct S { __Int32x4_t b[2]; };
+
+void
+foo (struct S x)
+{
+}
-- 
2.17.1



Re: [GCC][IRA] Revert 11b8091fb to fix Bug 93221

2020-01-21 Thread Joel Hutton
Updated changelog:

Changelog:

2020-01-21  Joel Hutton  <mailto:joel.hut...@arm.com>

PR target/93221
* ira.c (ira): Revert use of simplified LRA algorithm.

gcc/testsuite/ChangeLog:

2020-01-21  Joel Hutton  <mailto:joel.hut...@arm.com>

PR target/93221
* gcc.target/aarch64/pr93221.c: New test.


Re: [GCC][IRA] Revert 11b8091fb to fix Bug 93221

2020-01-21 Thread Joel Hutton
Changelog was mangled by thunderbird, updated changelog:

Changelog:

2020-01-21  Joel Hutton  

 PR target/93221
 * ira.c (ira): Revert use of simplified LRA algorithm.

gcc/testsuite/ChangeLog:

2020-01-21  Joel Hutton  

 PR target/93221
 * gcc.target/aarch64/pr93221.c: New test.



[Ping][GCC][IRA] Revert 11b8091fb to fix Bug 93221

2020-01-27 Thread Joel Hutton
Ping! Eric, do you have any objections to reverting?

On 21/01/2020 19:16, Vladimir Makarov wrote:
> I am in favour of reverting the patch now.  But may be Eric can provide
> another version of the patch not causing the arm problem.  I am ready to
> reconsider this too.  So I guess the decision is upto Eric.

Eric did previously say "Feel free to eventually revert it.", but I
hoped he would reply on this thread.

--- Comment #7 from Eric Botcazou  ---
Probably missing live range splitting or somesuch, as envisioned by
Vladimir in
its approval message.  Feel free to eventually revert it.


Changelog:

2020-01-21  Joel Hutton  

 PR target/93221
 * ira.c (ira): Revert use of simplified LRA algorithm.

gcc/testsuite/ChangeLog:

2020-01-21  Joel Hutton  

 PR target/93221
 * gcc.target/aarch64/pr93221.c: New test.
From 1a2980ef6eeb76dbf0556f806a85a4f49ad3ebdd Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Tue, 21 Jan 2020 09:37:48 +
Subject: [PATCH] [IRA] Fix bug 93221 by reverting 11b8091fb

11b8091fb introduced a simplified LRA algorithm for -O0 that turned off
hard register splitting, this causes a problem for parameters passed in
multiple registers on aarch64. This fixes bug 93221.
---
 gcc/ira.c  | 38 +-
 gcc/testsuite/gcc.target/aarch64/pr93221.c | 10 ++
 2 files changed, 25 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr93221.c

diff --git a/gcc/ira.c b/gcc/ira.c
index 46091adf8109263c72343dccfe4913857b5c74ae..c8b5f869da121506f0414901271eae9810689316 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -5205,35 +5205,27 @@ ira (FILE *f)
   /* Perform target specific PIC register initialization.  */
   targetm.init_pic_reg ();
 
-  if (optimize)
-{
-  ira_conflicts_p = true;
-
-  /* Determine the number of pseudos actually requiring coloring.  */
-  unsigned int num_used_regs = 0;
-  for (unsigned int i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
-	if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
-	  num_used_regs++;
-
-  /* If there are too many pseudos and/or basic blocks (e.g. 10K
-	 pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
-	 use simplified and faster algorithms in LRA.  */
-  lra_simple_p
-	= ira_use_lra_p
-	  && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
-}
-  else
-{
-  ira_conflicts_p = false;
-  lra_simple_p = ira_use_lra_p;
-}
+  ira_conflicts_p = optimize > 0;
+
+  /* Determine the number of pseudos actually requiring coloring.  */
+  unsigned int num_used_regs = 0;
+  for (unsigned int i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
+if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
+  num_used_regs++;
+
+  /* If there are too many pseudos and/or basic blocks (e.g. 10K
+ pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
+ use simplified and faster algorithms in LRA.  */
+  lra_simple_p
+= ira_use_lra_p
+  && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
 
   if (lra_simple_p)
 {
   /* It permits to skip live range splitting in LRA.  */
   flag_caller_saves = false;
   /* There is no sense to do regional allocation when we use
-	 simplified LRA.  */
+	simplified LRA.  */
   flag_ira_region = IRA_REGION_ONE;
   ira_conflicts_p = false;
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/pr93221.c b/gcc/testsuite/gcc.target/aarch64/pr93221.c
new file mode 100644
index ..4dc2c3d0149423dd3d666f7428277ffa9eb765c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr93221.c
@@ -0,0 +1,10 @@
+/* PR target/93221 */
+/* { dg-do compile } */
+/* { dg-options "-O0 -mno-omit-leaf-frame-pointer" } */
+
+struct S { __Int32x4_t b[2]; };
+
+void
+foo (struct S x)
+{
+}
-- 
2.17.1



Re: [Ping][GCC][IRA] Revert 11b8091fb to fix Bug 93221

2020-01-28 Thread Joel Hutton
On 28/01/2020 09:07, Eric Botcazou wrote:
>> Ping! Eric, do you have any objections to reverting?
>
> See my comment posted in the audit trail of the TN on 01/20...

> Probably missing live range splitting or somesuch, as envisioned by
> Vladimir in its approval message.  Feel free to eventually revert it.

Great. Vladimir, Ok for trunk?

Changelog:

2020-01-21  Joel Hutton  

 PR target/93221
 * ira.c (ira): Revert use of simplified LRA algorithm.

gcc/testsuite/ChangeLog:

2020-01-21  Joel Hutton  

 PR target/93221
 * gcc.target/aarch64/pr93221.c: New test.
From 1a2980ef6eeb76dbf0556f806a85a4f49ad3ebdd Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Tue, 21 Jan 2020 09:37:48 +
Subject: [PATCH] [IRA] Fix bug 93221 by reverting 11b8091fb

11b8091fb introduced a simplified LRA algorithm for -O0 that turned off
hard register splitting, this causes a problem for parameters passed in
multiple registers on aarch64. This fixes bug 93221.
---
 gcc/ira.c  | 38 +-
 gcc/testsuite/gcc.target/aarch64/pr93221.c | 10 ++
 2 files changed, 25 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr93221.c

diff --git a/gcc/ira.c b/gcc/ira.c
index 46091adf8109263c72343dccfe4913857b5c74ae..c8b5f869da121506f0414901271eae9810689316 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -5205,35 +5205,27 @@ ira (FILE *f)
   /* Perform target specific PIC register initialization.  */
   targetm.init_pic_reg ();
 
-  if (optimize)
-{
-  ira_conflicts_p = true;
-
-  /* Determine the number of pseudos actually requiring coloring.  */
-  unsigned int num_used_regs = 0;
-  for (unsigned int i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
-	if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
-	  num_used_regs++;
-
-  /* If there are too many pseudos and/or basic blocks (e.g. 10K
-	 pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
-	 use simplified and faster algorithms in LRA.  */
-  lra_simple_p
-	= ira_use_lra_p
-	  && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
-}
-  else
-{
-  ira_conflicts_p = false;
-  lra_simple_p = ira_use_lra_p;
-}
+  ira_conflicts_p = optimize > 0;
+
+  /* Determine the number of pseudos actually requiring coloring.  */
+  unsigned int num_used_regs = 0;
+  for (unsigned int i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
+if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
+  num_used_regs++;
+
+  /* If there are too many pseudos and/or basic blocks (e.g. 10K
+ pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
+ use simplified and faster algorithms in LRA.  */
+  lra_simple_p
+= ira_use_lra_p
+  && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
 
   if (lra_simple_p)
 {
   /* It permits to skip live range splitting in LRA.  */
   flag_caller_saves = false;
   /* There is no sense to do regional allocation when we use
-	 simplified LRA.  */
+	simplified LRA.  */
   flag_ira_region = IRA_REGION_ONE;
   ira_conflicts_p = false;
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/pr93221.c b/gcc/testsuite/gcc.target/aarch64/pr93221.c
new file mode 100644
index ..4dc2c3d0149423dd3d666f7428277ffa9eb765c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr93221.c
@@ -0,0 +1,10 @@
+/* PR target/93221 */
+/* { dg-do compile } */
+/* { dg-options "-O0 -mno-omit-leaf-frame-pointer" } */
+
+struct S { __Int32x4_t b[2]; };
+
+void
+foo (struct S x)
+{
+}
-- 
2.17.1



SPEC 521.wrf_r failing due to new fortran checks

2019-09-27 Thread Joel Hutton
Hi Thomas,

I've noticed that SPEC has been failing to build on trunk since the below 
commit, do you have access to SPEC? do you know if this is due to a bug in the 
patch or a bug in SPEC?

>  commit ddeea2110539a432b302401a617ceb5bde191094 (HEAD, refs/bisect/bad)
>  Author: tkoenig 
>  Date:   Thu Aug 15 22:52:40 2019 +
> 2019-08-15  Thomas Koenig  
>
> PR fortran/91443
> * frontend-passes.c (check_externals_expr): New function.
> (check_externals_code): New function.
> (gfc_check_externals): New function.
> * gfortran.h (debug): Add prototypes for gfc_symbol * and
> gfc_expr *.
> (gfc_check_externals): Add prototype.
> * interface.c (compare_actual_formal): Do not complain about
> alternate returns if the formal argument is optional.
> (gfc_procedure_use): Handle cases when an error has been issued
> previously.  Break long line.
> * parse.c (gfc_parse_file): Call gfc_check_externals for all
> external procedures.
> * resolve.c (resolve_global_procedure): Remove checking of
> argument list.
>
> 2019-08-15  Thomas Koenig  
>
> PR fortran/91443
> * gfortran.dg/argument_checking_19.f90: New test.
> * gfortran.dg/altreturn_10.f90: Change dg-warning to dg-error.
> * gfortran.dg/dec_union_11.f90: Add -std=legacy.
> * gfortran.dg/hollerith8.f90: Likewise. Remove warning for
> Hollerith constant.
> * gfortran.dg/integer_exponentiation_2.f90: New subroutine gee_i8;
> use it to avoid type mismatches.
> * gfortran.dg/pr41011.f: Add -std=legacy.
> * gfortran.dg/whole_file_1.f90: Change warnings to errors.
> * gfortran.dg/whole_file_2.f90: Likewise.
>
> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@274551 
> 138bc75d-0d04-0410-961f-82ee72b054a4

Joel



[RFC][SLP] SLP vectorization: vectorize vector constructors

2019-10-01 Thread Joel Hutton
Hi all,


Currently SLP vectorization can build SLP trees starting from reductions or 
from group stores. This patch adds a third starting point: vector constructors.


For the following aarch64 test case (compiled with -O3 -fno-vect-cost-model):


char g_d[1024], g_s1[1024], g_s2[1024];
void test_loop(void)
{
char d = g_d, s1 = g_s1, *s2 = g_s2;




for ( int y = 0; y < 128; y++ )
{
  for ( int x = 0; x < 16; x++ )
d[x] = s1[x] + s2[x];
  d += 16;
}


}

before patch: test_loop: .LFB0: .cfi_startproc adrp x0, g_s1 adrp x2, g_s2 add 
x3, x0, :lo12:g_s1 add x4, x2, :lo12:g_s2 ldrb w7, [x2, #:lo12:g_s2] ldrb w1, 
[x0, #:lo12:g_s1] adrp x0, g_d ldrb w6, [x4, 1] add x0, x0, :lo12:g_d ldrb w5, 
[x3, 1] add w1, w1, w7 fmov s0, w1 ldrb w7, [x4, 2] add w5, w5, w6 ldrb w1, 
[x3, 2] ldrb w6, [x4, 3] add x2, x0, 2048 ins v0.b[1], w5 add w1, w1, w7 ldrb 
w7, [x3, 3] ldrb w5, [x4, 4] add w7, w7, w6 ldrb w6, [x3, 4] ins v0.b[2], w1 
ldrb w8, [x4, 5] add w6, w6, w5 ldrb w5, [x3, 5] ldrb w9, [x4, 6] add w5, w5, 
w8 ldrb w1, [x3, 6] ins v0.b[3], w7 ldrb w8, [x4, 7] add w1, w1, w9 ldrb w11, 
[x3, 7] ldrb w7, [x4, 8] add w11, w11, w8 ldrb w10, [x3, 8] ins v0.b[4], w6 
ldrb w8, [x4, 9] add w10, w10, w7 ldrb w9, [x3, 9] ldrb w7, [x4, 10] add w9, 
w9, w8 ldrb w8, [x3, 10] ins v0.b[5], w5 ldrb w6, [x4, 11] add w8, w8, w7 ldrb 
w7, [x3, 11] ldrb w5, [x4, 12] add w7, w7, w6 ldrb w6, [x3, 12] ins v0.b[6], w1 
ldrb w12, [x4, 13] add w6, w6, w5 ldrb w5, [x3, 13] ldrb w1, [x3, 14] add w5, 
w5, w12 ldrb w13, [x4, 14] ins v0.b[7], w11 ldrb w12, [x4, 15] add w4, w1, w13 
ldrb w1, [x3, 15] add w1, w1, w12 ins v0.b[8], w10 ins v0.b[9], w9 ins 
v0.b[10], w8 ins v0.b[11], w7 ins v0.b[12], w6 ins v0.b[13], w5 ins v0.b[14], 
w4 ins v0.b[15], w1 .p2align 3,,7 .L2: str q0, [x0], 16 cmp x2, x0 bne .L2 ret 
.cfi_endproc .LFE0:

After patch:

test_loop: .LFB0: .cfi_startproc adrp x3, g_s1 adrp x2, g_s2 add x3, x3, 
:lo12:g_s1 add x2, x2, :lo12:g_s2 adrp x0, g_d add x0, x0, :lo12:g_d add x1, 
x0, 2048 ldr q1, [x2] ldr q0, [x3] add v0.16b, v0.16b, v1.16b .p2align 3,,7 
.L2: str q0, [x0], 16 cmp x0, x1 bne .L2 ret .cfi_endproc .LFE0:

bootstrapped and tested on aarch64-none-linux-gnu


[RFC][SLP] SLP vectorization: vectorize vector constructors

2019-10-01 Thread Joel Hutton
On 01/10/2019 12:07, Joel wrote:
>
> SLP vectorization: vectorize vector constructors
>
>
> Currently SLP vectorization can build SLP trees staring from 
> reductions or from group stores. This patch adds a third starting 
> point: vector constructors.
>
>
> For the following test case (compiled with -O3 -fno-vect-cost-model):
>
>
> char g_d[1024], g_s1[1024], g_s2[1024]; void test_loop(void) { char /d 
> = g_d, /s1 = g_s1, *s2 = g_s2;
>
>
> for ( int y = 0; y < 128; y++ )
> {
>for ( int x = 0; x < 16; x++ )
>  d[x] = s1[x] + s2[x];
>d += 16;
> }
>
> }
>
>
> before patch: test_loop: .LFB0: .cfi_startproc adrp x0, g_s1 adrp x2, 
> g_s2 add x3, x0, :lo12:g_s1 add x4, x2, :lo12:g_s2 ldrb w7, [x2, 
> #:lo12:g_s2] ldrb w1, [x0, #:lo12:g_s1] adrp x0, g_d ldrb w6, [x4, 1] 
> add x0, x0, :lo12:g_d ldrb w5, [x3, 1] add w1, w1, w7 fmov s0, w1 ldrb 
> w7, [x4, 2] add w5, w5, w6 ldrb w1, [x3, 2] ldrb w6, [x4, 3] add x2, 
> x0, 2048 ins v0.b[1], w5 add w1, w1, w7 ldrb w7, [x3, 3] ldrb w5, [x4, 
> 4] add w7, w7, w6 ldrb w6, [x3, 4] ins v0.b[2], w1 ldrb w8, [x4, 5] 
> add w6, w6, w5 ldrb w5, [x3, 5] ldrb w9, [x4, 6] add w5, w5, w8 ldrb 
> w1, [x3, 6] ins v0.b[3], w7 ldrb w8, [x4, 7] add w1, w1, w9 ldrb w11, 
> [x3, 7] ldrb w7, [x4, 8] add w11, w11, w8 ldrb w10, [x3, 8] ins 
> v0.b[4], w6 ldrb w8, [x4, 9] add w10, w10, w7 ldrb w9, [x3, 9] ldrb 
> w7, [x4, 10] add w9, w9, w8 ldrb w8, [x3, 10] ins v0.b[5], w5 ldrb w6, 
> [x4, 11] add w8, w8, w7 ldrb w7, [x3, 11] ldrb w5, [x4, 12] add w7, 
> w7, w6 ldrb w6, [x3, 12] ins v0.b[6], w1 ldrb w12, [x4, 13] add w6, 
> w6, w5 ldrb w5, [x3, 13] ldrb w1, [x3, 14] add w5, w5, w12 ldrb w13, 
> [x4, 14] ins v0.b[7], w11 ldrb w12, [x4, 15] add w4, w1, w13 ldrb w1, 
> [x3, 15] add w1, w1, w12 ins v0.b[8], w10 ins v0.b[9], w9 ins 
> v0.b[10], w8 ins v0.b[11], w7 ins v0.b[12], w6 ins v0.b[13], w5 ins 
> v0.b[14], w4 ins v0.b[15], w1 .p2align 3,,7 .L2: str q0, [x0], 16 cmp 
> x2, x0 bne .L2 ret .cfi_endproc .LFE0:
>
>
> After patch:
>
>
> test_loop: .LFB0: .cfi_startproc adrp x3, g_s1 adrp x2, g_s2 add x3, 
> x3, :lo12:g_s1 add x2, x2, :lo12:g_s2 adrp x0, g_d add x0, x0, 
> :lo12:g_d add x1, x0, 2048 ldr q1, [x2] ldr q0, [x3] add v0.16b, 
> v0.16b, v1.16b .p2align 3,,7 .L2: str q0, [x0], 16 cmp x0, x1 bne .L2 
> ret .cfi_endproc .LFE0:
>
>
>
>
> bootstrapped and tested on aarch64-none-linux-gnu
>
Patch attached:
From 7b9e6d02017ffe6f7ab17cbdd48da41ccc5f6db0 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Fri, 27 Sep 2019 10:26:00 +0100
Subject: [PATCH] SLP vectorization: vectorize vector constructors

---
 gcc/tree-vect-slp.c   | 98 ---
 gcc/tree-vect-stmts.c | 20 +
 2 files changed, 103 insertions(+), 15 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 9b86b67734ad3e3506e9cee6a532b68decf24ae6..4c715ebe34dbdb8072e15dc9053f53a1949a070d 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1923,6 +1923,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
   vec scalar_stmts;
 
+  bool constructor = false;
+
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
 {
   scalar_type = TREE_TYPE (DR_REF (dr));
@@ -1935,6 +1937,17 @@ vect_analyze_slp_instance (vec_info *vinfo,
   vectype = STMT_VINFO_VECTYPE (stmt_info);
   group_size = REDUC_GROUP_SIZE (stmt_info);
 }
+  else if (is_gimple_assign (stmt_info->stmt)
+  && TREE_CODE (TREE_TYPE (gimple_assign_lhs (stmt_info->stmt)))
+	== VECTOR_TYPE
+  && gimple_assign_rhs_code (stmt_info->stmt) == CONSTRUCTOR)
+{
+  vectype = TREE_TYPE (gimple_assign_rhs1 (stmt_info->stmt));
+  group_size = CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt_info->stmt));
+  constructor = true;
+  if (TREE_CODE (vectype) != VECTOR_TYPE)
+	vectype = NULL;
+}
   else
 {
   gcc_assert (is_a  (vinfo));
@@ -1981,6 +1994,32 @@ vect_analyze_slp_instance (vec_info *vinfo,
   STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))
 	= STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ()));
 }
+  else if (constructor)
+{
+  tree rhs = gimple_assign_rhs1 (stmt_info->stmt);
+  tree val;
+  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val)
+	{
+	  tree prev_val;
+	  int j;
+	  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), j, prev_val)
+	{
+	  if (val == prev_val && i!=j)
+		return false;
+	}
+	  if (TREE_CODE (val) == SSA_NAME)
+	{
+	  gimple* def = SSA_NAME_DEF_STMT (val);
+	  stmt_vec_info def_info = vinfo->lookup_stmt (def);
+	  /* Value is defined in another basic block.  */
+	  if (!def_info)
+		return false;
+	  scalar_stmts.safe_push (def_info);
+	}
+	else
+	 

[DOC] Replace reference to removed macro

2019-10-10 Thread Joel Hutton
Hi all,

I noticed when reading the documentation that BREAK_FROM_IMM_USE_SAFE 
was removed at some point in 2006 and replaced with BREAK_FROM_IMM_USE_STMT.


2019-10-10 Joel Hutton joel.hut...@arm.com

     * doc/tree-ssa.texi: Update renamed macro name.

From a9569599ee885842eabc6cc22e718486ceb7034d Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Thu, 10 Oct 2019 11:12:17 +0100
Subject: [PATCH] Doc: FOR_EACH_IMM_USE_SAFE removed

---
 gcc/doc/tree-ssa.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/tree-ssa.texi b/gcc/doc/tree-ssa.texi
index 9714c5addfaad5d8418e88ded309a713439184b0..a7da6f9ef04264465d06cc55eded4a810642eaa5 100644
--- a/gcc/doc/tree-ssa.texi
+++ b/gcc/doc/tree-ssa.texi
@@ -385,14 +385,14 @@ optimization can manipulate the stmt when all the uses have been
 processed.  This is a little slower than the FAST version since it adds a
 placeholder element and must sort through the list a bit for each statement.
 This placeholder element must be also be removed if the loop is
-terminated early.  The macro @code{BREAK_FROM_IMM_USE_SAFE} is provided
+terminated early.  The macro @code{BREAK_FROM_IMM_USE_STMT} is provided
 to do this :
 
 @smallexample
   FOR_EACH_IMM_USE_STMT (stmt, iterator, ssa_var)
 @{
   if (stmt == last_stmt)
-BREAK_FROM_SAFE_IMM_USE (iter);
+BREAK_FROM_IMM_USE_STMT (iter);
 
   FOR_EACH_IMM_USE_ON_STMT (imm_use_p, iterator)
 SET_USE (imm_use_p, ssa_var_2);
-- 
2.17.1



[SLP] SLP vectorization: vectorize vector constructors

2019-10-11 Thread Joel Hutton
    .L2
     ret
     .cfi_endproc
.LFE0:

After patch:

test_loop:
.LFB0:
     .cfi_startproc
     adrp    x3, g_s1
     adrp    x2, g_s2
     add x3, x3, :lo12:g_s1
     add x2, x2, :lo12:g_s2
     adrp    x0, g_d
     add x0, x0, :lo12:g_d
     add x1, x0, 2048
     ldr q1, [x2]
     ldr q0, [x3]
     add v0.16b, v0.16b, v1.16b
     .p2align 3,,7
.L2:
     str q0, [x0], 16
     cmp x0, x1
     bne .L2
     ret
     .cfi_endproc
.LFE0:


2019-10-11  Joel Hutton  joel.hut...@arm.com

     * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector 
constructors.
     (vect_bb_slp_scalar_cost): Likewise.
     (vect_ssa_use_outside_bb): New function.
     (vect_slp_check_for_constructors): New function.
     (vect_slp_analyze_bb_1): Add check for vector constructors.
     (vect_schedule_slp_instance): Add case to fixup vector constructor 
stmt.
     * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.


gcc/testsuite/ChangeLog:

2019-10-11  Joel Hutton  joel.hut...@arm.com

     * gcc.dg/vect/bb-slp-40.c: New test.

bootstrapped and regression tested on aarch64-none-linux-gnu
From 2bc57c17faa1dd494ed3898298e9fbe91f8a8675 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Wed, 2 Oct 2019 17:38:53 +0100
Subject: [PATCH] SLP Vectorization: Vectorize Vector Constructors

---
 gcc/testsuite/gcc.dg/vect/bb-slp-40.c |  33 +++
 gcc/tree-vect-slp.c   | 127 ++
 gcc/tree-vectorizer.h |   5 +
 3 files changed, 165 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-40.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-40.c b/gcc/testsuite/gcc.dg/vect/bb-slp-40.c
new file mode 100644
index ..51566b716bcda2fe82f50c50e9e9685cb3eb10ae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-40.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-slp-all" } */
+/* { dg-require-effective-target vect_int } */
+
+char g_d[1024], g_s1[1024], g_s2[1024];
+void foo(void)
+{
+char *d = g_d, *s1 = g_s1, *s2 = g_s2;
+
+for ( int y = 0; y < 128; y++ )
+{
+  d[0 ] = s1[0 ] + s2[0 ];
+  d[1 ] = s1[1 ] + s2[1 ];
+  d[2 ] = s1[2 ] + s2[2 ];
+  d[3 ] = s1[3 ] + s2[3 ];
+  d[4 ] = s1[4 ] + s2[4 ];
+  d[5 ] = s1[5 ] + s2[5 ];
+  d[6 ] = s1[6 ] + s2[6 ];
+  d[7 ] = s1[7 ] + s2[7 ];
+  d[8 ] = s1[8 ] + s2[8 ];
+  d[9 ] = s1[9 ] + s2[9 ];
+  d[10] = s1[10] + s2[10];
+  d[11] = s1[11] + s2[11];
+  d[12] = s1[12] + s2[12];
+  d[13] = s1[13] + s2[13];
+  d[14] = s1[14] + s2[14];
+  d[15] = s1[15] + s2[15];
+  d += 16;
+}
+}
+
+/* See that we vectorize an SLP instance.  */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "slp1" } } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 9b86b67734ad3e3506e9cee6a532b68decf24ae6..c4d452e3dfd46acdaa94dc047e4c6114d8295458 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1922,6 +1922,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
   unsigned int i;
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
   vec scalar_stmts;
+  bool constructor = false;
 
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
 {
@@ -1935,6 +1936,13 @@ vect_analyze_slp_instance (vec_info *vinfo,
   vectype = STMT_VINFO_VECTYPE (stmt_info);
   group_size = REDUC_GROUP_SIZE (stmt_info);
 }
+  else if (is_gimple_assign (stmt_info->stmt)
+	&& gimple_assign_rhs_code (stmt_info->stmt) == CONSTRUCTOR)
+{
+  vectype = TREE_TYPE (gimple_assign_rhs1 (stmt_info->stmt));
+  group_size = CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt_info->stmt));
+  constructor = true;
+}
   else
 {
   gcc_assert (is_a  (vinfo));
@@ -1981,6 +1989,25 @@ vect_analyze_slp_instance (vec_info *vinfo,
   STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))
 	= STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ()));
 }
+  else if (constructor)
+{
+  tree rhs = gimple_assign_rhs1 (stmt_info->stmt);
+  tree val;
+  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val)
+	{
+	  if (TREE_CODE (val) == SSA_NAME)
+	{
+	  gimple* def = SSA_NAME_DEF_STMT (val);
+	  stmt_vec_info def_info = vinfo->lookup_stmt (def);
+	  /* Value is defined in another basic block.  */
+	  if (!def_info)
+		return false;
+	  scalar_stmts.safe_push (def_info);
+	}
+	  else
+	return false;
+	}
+}
   else
 {
   /* Collect reduction statements.  */
@@ -2038,6 +2065,14 @@ vect_analyze_slp_instance (vec_info *vinfo,
 	  SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size;
 	  SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor;
 	  SLP_INSTANCE_LOADS (new_instance) = vNULL;
+
+	  if (constructor)
+	{
+	  SLP_INS

[RFC][GCC][AArch64] Add minmax phi-reduction pattern

2019-11-15 Thread Joel Hutton
Hi all,

Just looking for some feedback on the approach.

Currently the loop vectorizer can't vectorize the following typical loop 
for getting max value and index from an array:

void test_vec(int *data, int n) {
     int best_i, best = 0;

     for (int i = 0; i < n; i++) {
     if (data[i] > best) {
     best = data[i];
     best_i = i;
     }
     }

     data[best_i] = data[0];
     data[0] = best;
}

This patch adds some support for this pattern.

This patch addresses Bug 88259.

Regression testing is still in progress,
This patch does not work correctly with vect-epilogues-nomask, and the 
reason for that is still being investigated.

gcc/ChangeLog:


2019-11-15  Joel Hutton  
     Tamar Christina  

     PR tree-optimization/88259
     * tree-vect-loop.c (vect_reassociating_reduction_simple_p): New 
function.
     (vect_recog_minmax_index_pattern): New function.
     (vect_is_simple_reduction): Add check for minmax pattern.
     (vect_model_reduction_cost): Add case for minmax pattern.
     (vect_create_epilog_for_reduction): Add fixup for minmax epilog.
     * tree-vectorizer.h (enum vect_reduction_type): Add 
INDEX_MINMAX_REDUCTION reduction type.
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index b600d3157457c3180d0456c4f66cbc57012e3c71..dc97dea38a504e8f9391e6d138aad0a2e3872b50 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -387,6 +387,83 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
   return opt_result::success ();
 }
 
+static bool
+vect_reassociating_reduction_simple_p (stmt_vec_info stmt_info, tree_code code,
+   tree *op0_out, tree *op1_out)
+{
+  loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_info);
+  if (!loop_info)
+return false;
+
+  gassign *assign = dyn_cast  (stmt_info->stmt);
+  if (!assign || gimple_assign_rhs_code (assign) != code)
+return false;
+
+  /* We don't allow changing the order of the computation in the inner-loop
+ when doing outer-loop vectorization.  */
+  class loop *loop = LOOP_VINFO_LOOP (loop_info);
+  if (loop && nested_in_vect_loop_p (loop, stmt_info))
+return false;
+
+  *op0_out = gimple_assign_rhs1 (assign);
+  *op1_out = gimple_assign_rhs2 (assign);
+  return true;
+}
+
+
+static bool
+vect_recog_minmax_index_pattern (stmt_vec_info stmt_vinfo, loop_vec_info loop_info)
+{
+  tree oprnd0, oprnd1;
+  gimple *last_stmt = stmt_vinfo->stmt;
+  vec_info *vinfo = stmt_vinfo->vinfo;
+  gimple *use_stmt;
+  use_operand_p use_p;
+  imm_use_iterator iter;
+
+  /* Starting from LAST_STMT, follow the defs of its uses in search
+ of the above pattern.  */
+
+  if (!vect_reassociating_reduction_simple_p (stmt_vinfo, MAX_EXPR,
+	  &oprnd0, &oprnd1))
+return NULL;
+
+  if (!is_a  (SSA_NAME_DEF_STMT (oprnd1)))
+return NULL;
+
+  stmt_vec_info phy_vinfo = vinfo->lookup_def (oprnd1);
+  if (!phy_vinfo)
+return NULL;
+
+  basic_block top_bb = gimple_bb (last_stmt);
+
+  FOR_EACH_IMM_USE_FAST (use_p, iter, oprnd1)
+{
+  use_stmt = USE_STMT (use_p);
+  if (is_gimple_assign (use_stmt)
+	  && gimple_assign_rhs_code (use_stmt) == COND_EXPR)
+	{
+	  basic_block bb = gimple_bb (use_stmt);
+
+	  if (bb == top_bb
+	  && gimple_uid (use_stmt) < gimple_uid (last_stmt))
+	{
+	  tree cond = gimple_assign_rhs1 (use_stmt);
+	  if (TREE_CODE (cond) != LE_EXPR)
+		continue;
+
+	  stmt_vec_info ind_stmt = loop_info->lookup_stmt (use_stmt);
+	  ind_stmt->reduc_related_stmt = stmt_vinfo;
+	  stmt_vinfo->reduc_related_stmt = ind_stmt;
+	  return true;
+	}
+	}
+}
+
+  return false;
+}
+
+
 
 /* Function vect_is_simple_iv_evolution.
 
@@ -2771,20 +2848,6 @@ pop:
 	  fail = true;
 	  break;
 	}
-  /* Check there's only a single stmt the op is used on inside
- of the loop.  */
-  imm_use_iterator imm_iter;
-  gimple *op_use_stmt;
-  unsigned cnt = 0;
-  FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op)
-	if (!is_gimple_debug (op_use_stmt)
-	&& flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt)))
-	  cnt++;
-  if (cnt != 1)
-	{
-	  fail = true;
-	  break;
-	}
   tree_code use_code = gimple_assign_rhs_code (use_stmt);
   if (use_code == MINUS_EXPR)
 	{
@@ -2963,10 +3026,22 @@ vect_is_simple_reduction (loop_vec_info loop_info, stmt_vec_info phi_info,
   return def_stmt_info;
 }
 
+  /* Detect if this is a reduction that we have special code to handle the use
+ of the reduction multiple times.  */
+  bool supported_multi_use_reduction
+= vect_recog_minmax_index_pattern (def_stmt_info, loop_info);
+
+  /* Update the reduction type for multi use reduction.  */
+  if (supported_multi_use_reduction)
+STMT_VINFO_REDUC_TYPE (phi_info) = INDEX_MINMAX_REDUCTION;
+
+  def_stmt_info->reduc_phi_node = phi_info;
+
   /* If 

Re: [RFC][GCC][AArch64] Add minmax phi-reduction pattern

2019-11-15 Thread Joel Hutton
Forgot to CC maintainer.
On 15/11/2019 18:03, Joel wrote:
> Hi all,
>
> Just looking for some feedback on the approach.
>
> Currently the loop vectorizer can't vectorize the following typical 
> loop for getting max value and index from an array:
>
> void test_vec(int *data, int n) {
>     int best_i, best = 0;
>
>     for (int i = 0; i < n; i++) {
>     if (data[i] > best) {
>     best = data[i];
>     best_i = i;
>     }
>     }
>
>     data[best_i] = data[0];
>     data[0] = best;
> }
>
> This patch adds some support for this pattern.
>
> This patch addresses Bug 88259.
>
> Regression testing is still in progress,
> This patch does not work correctly with vect-epilogues-nomask, and the 
> reason for that is still being investigated.
>
> gcc/ChangeLog:
>
>
> 2019-11-15  Joel Hutton  
>     Tamar Christina  
>
>     PR tree-optimization/88259
>     * tree-vect-loop.c (vect_reassociating_reduction_simple_p): New 
> function.
>     (vect_recog_minmax_index_pattern): New function.
>     (vect_is_simple_reduction): Add check for minmax pattern.
>     (vect_model_reduction_cost): Add case for minmax pattern.
>     (vect_create_epilog_for_reduction): Add fixup for minmax epilog.
>     * tree-vectorizer.h (enum vect_reduction_type): Add 
> INDEX_MINMAX_REDUCTION reduction type.


[vect][testsuite] Update vect_char_add target selector to use its own cache

2019-11-26 Thread Joel Hutton
Hi all,

This patch updates the vect_char_add target selector to use its own 
cache instead of the vect_int cache.

This was causing a situation where bb-slp-40.c would fail on sparc when 
run after other tests that use the vect_int target selector, but pass 
when run on its own.

This fixes Bug 92391

Tested on x86 and sparc64.

2019-11-26  Joel Hutton  

     * lib/target-supports.exp: Update vect_char_add target selector 
to use its own cache.

From 7ed08950f4440f8605b9df1114edcc8ee834 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Tue, 26 Nov 2019 17:09:12 +
Subject: [PATCH] Update vect_char_add target selector to use its own cache

---
 gcc/testsuite/lib/target-supports.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5fe1e83492cae97adf82ed53e12d8f5f7ea5a951..a4418a3151677a2a45275463b9828db7e7755b28 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5753,7 +5753,7 @@ proc check_effective_target_vect_bswap { } {
 # one vector length.
 
 proc check_effective_target_vect_char_add { } {
-return [check_cached_effective_target_indexed vect_int {
+return [check_cached_effective_target_indexed vect_char_add {
   expr {
  [istarget i?86-*-*] || [istarget x86_64-*-*]
  || ([istarget powerpc*-*-*]
-- 
2.17.1



[AArch64] Use scvtf fbits option where appropriate

2019-06-13 Thread Joel Hutton
Hi all,

There was previously no backend pattern to utilise the scvtf fbits option. 
Where a fixed point is converted to a float, and divided by a power of 2, (or 
multiplied by the reciprocal of a power of 2), this can be combined into a 
single scvtf with fbits operation. This patch adds a pattern to combine these 
instructions, and adds a helper function.

For the following test case:

float f(int a) { return ((float) a) / 65536.0; }
double g(int a) { return ((double) a) / 4096.0; }

the output generated is currently:

f:
scvtf   s1, w0  // 6[c=8 l=4]  floatsisf2/1
mov w1, 931135488   // 17   [c=4 l=4]  *movsi_aarch64/3
fmovs0, w1  // 18   [c=4 l=4]  *movsf_aarch64/1
fmuls0, s1, s0  // 13   [c=8 l=4]  mulsf3
ret // 24   [c=0 l=4]  *do_return
g:
scvtf   d1, w0  // 6[c=8 l=4]  floatsidf2
mov x1, 4553139223271571456 // 17   [c=4 l=4] *movdi_aarch64/3
fmovd0, x1  // 18   [c=4 l=4]  *movdf_aarch64/1
fmuld0, d1, d0  // 13   [c=8 l=4]  muldf3
ret // 24   [c=0 l=4]  *do_return

The output with this patch applied is:

f:
scvtf   s0, w0, #16 // 13   [c=24 l=4]  *combine_scvtf_SI_sf3/1
ret // 22   [c=0 l=4]  *do_return
g:
scvtf   d0, w0, #12 // 13   [c=24 l=4]  *combine_scvtf_SI_df3
ret // 22   [c=0 l=4]  *do_return

gcc/ChangeLog:

2019-06-12  Joel Hutton  

* config/aarch64/aarch64-protos.h (aarch64_fpconst_pow2_recip): New 
prototype
* config/aarch64/aarch64.c (aarch64_fpconst_pow2_recip): New function
* config/aarch64/aarch64.md 
(*aarch64_cvtf__2_mult): New pattern
(aarch64_cvtf__2_mult): New pattern
* config/aarch64/constraints.md (Dt): New constraint
* config/aarch64/predicates.md (aarch64_fpconst_pow2_recip): New 
predicate

gcc/testsuite/ChangeLog:

2019-06-12  Joel Hutton  

* gcc.target/aarch64/fmul_scvtf.c: New test.

Bootstrapped and regression tested on aarch64-linux-none target.From 6aac0d56dc7d34e7a6fcabc1e8b0c7c291c0d51a Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Thu, 13 Jun 2019 11:08:56 +0100
Subject: [PATCH] SCVTF fbits

---
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 gcc/config/aarch64/aarch64.c  |  30 
 gcc/config/aarch64/aarch64.md |  34 +
 gcc/config/aarch64/constraints.md |   7 +
 gcc/config/aarch64/predicates.md  |   4 +
 gcc/testsuite/gcc.target/aarch64/fmul_scvtf.c | 140 ++
 6 files changed, 216 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmul_scvtf.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 1e3b1c91db1..ad1ba458a3f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -494,6 +494,7 @@ enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
 int aarch64_fpconst_pow_of_2 (rtx);
+int aarch64_fpconst_pow2_recip (rtx);
 machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
 		   machine_mode);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9a035dd9ed8..49a7d9256cf 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -18707,6 +18707,36 @@ aarch64_fpconst_pow_of_2 (rtx x)
   return exact_log2 (real_to_integer (r));
 }
 
+/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of a
+   power of 2 (i.e 1/2^n) return the number of float bits. e.g. for x==(1/2^n)
+   return log2 (n). Otherwise return 0.  */
+int
+aarch64_fpconst_pow2_recip (rtx x)
+{
+  REAL_VALUE_TYPE r0;
+
+  if (!CONST_DOUBLE_P (x))
+return 0;
+
+  r0 = *CONST_DOUBLE_REAL_VALUE (x);
+  if (exact_real_inverse (DFmode, &r0)
+  && !REAL_VALUE_NEGATIVE (r0))
+{
+  if (exact_real_truncate (DFmode, &r0))
+	{
+	  HOST_WIDE_INT value = real_to_integer (&r0);
+	  value = value & 0x;
+	  if ((value != 0) && ( (value & (value - 1)) == 0))
+	{
+	  int ret = exact_log2 (value);
+	  gcc_assert (IN_RANGE (ret, 0, 31));
+	  return ret;
+	}
+	}
+}
+  return 0;
+}
+
 /* If X is a vector of equal CONST_DOUBLE values and that value is
Y, return the aarch64_fpconst_pow_of_2 of Y.  Otherwise return -1.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 526c7fb0dab..60bcf1bc8d9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6016,6 +6016,40 @@
   [(set_attr "type" "f_cvtf2i")]
 )
 
+(define_insn "*aarch64_cvtf__2_mult"
+  [(set (match_operand:GPF 0 "register_operand" "=w,w")
+	(mult:GPF (FL

[AArch64] Use scvtf fbits option where appropriate

2019-06-18 Thread Joel Hutton
Hi,

On 13/06/2019 18:26, Wilco Dijkstra wrote:
> Wouldn't it be easier to just do exact_log2 (real_to_integer (&r0))
> and then check the range is in 1..31?
I've revised this section.
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -6016,6 +6016,40 @@
> [(set_attr "type" "f_cvtf2i")]
>   )
>   
> +(define_insn "*aarch64_cvtf__2_mult"
> +  [(set (match_operand:GPF 0 "register_operand" "=w,w")
> + (mult:GPF (FLOATUORS:GPF
> +(match_operand: 1 "register_operand" "w,?r"))
> +(match_operand 2 "aarch64_fp_pow2_recip""Dt,Dt")))]
>
> We should add a comment before both define_insn similar to the other
> conversions, explaining what they do and why there are 2 separate patterns
> (the default versions of the conversions appear to be missing a comment too).
I've added comments to the new and existing patterns
From 5a9dfa6c6eb1c5b9c8c464780b7098058989d472 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Thu, 13 Jun 2019 11:08:56 +0100
Subject: [PATCH] SCVTF fbits

---
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 gcc/config/aarch64/aarch64.c  |  28 
 gcc/config/aarch64/aarch64.md |  39 +
 gcc/config/aarch64/constraints.md |   7 +
 gcc/config/aarch64/predicates.md  |   4 +
 gcc/testsuite/gcc.target/aarch64/fmul_scvtf.c | 140 ++
 6 files changed, 219 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmul_scvtf.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 1e3b1c91db1..ad1ba458a3f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -494,6 +494,7 @@ enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
 int aarch64_fpconst_pow_of_2 (rtx);
+int aarch64_fpconst_pow2_recip (rtx);
 machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
 		   machine_mode);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9a035dd9ed8..424ca6c9932 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -18707,6 +18707,34 @@ aarch64_fpconst_pow_of_2 (rtx x)
   return exact_log2 (real_to_integer (r));
 }
 
+/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of a
+   power of 2 (i.e 1/2^n) return the number of float bits. e.g. for x==(1/2^n)
+   return n. Otherwise return -1.  */
+int
+aarch64_fpconst_pow2_recip (rtx x)
+{
+  REAL_VALUE_TYPE r0;
+
+  if (!CONST_DOUBLE_P (x))
+return -1;
+
+  r0 = *CONST_DOUBLE_REAL_VALUE (x);
+  if (exact_real_inverse (DFmode, &r0)
+  && !REAL_VALUE_NEGATIVE (r0))
+{
+	int ret = exact_log2 (real_to_integer (&r0));
+	if (ret >= 1 && ret <= 31)
+	  {
+	return ret;
+	  }
+	else
+	  {
+	return -1;
+	  }
+}
+  return -1;
+}
+
 /* If X is a vector of equal CONST_DOUBLE values and that value is
Y, return the aarch64_fpconst_pow_of_2 of Y.  Otherwise return -1.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 526c7fb0dab..d9812aa238e 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6016,6 +6016,44 @@
   [(set_attr "type" "f_cvtf2i")]
 )
 
+;; equal width integer to fp combine
+(define_insn "*aarch64_cvtf__2_mult"
+  [(set (match_operand:GPF 0 "register_operand" "=w,w")
+	(mult:GPF (FLOATUORS:GPF
+		   (match_operand: 1 "register_operand" "w,?r"))
+		   (match_operand 2 "aarch64_fp_pow2_recip""Dt,Dt")))]
+  "TARGET_FLOAT"
+  {
+operands[2] = GEN_INT (aarch64_fpconst_pow2_recip (operands[2]));
+switch (which_alternative)
+{
+  case 0:
+	return "cvtf\t%0, %1, #%2";
+  case 1:
+	return "cvtf\t%0, %1, #%2";
+  default:
+	gcc_unreachable();
+}
+  }
+  [(set_attr "type" "neon_int_to_fp_,f_cvti2f")
+   (set_attr "arch" "simd,fp")]
+)
+
+;; inequal width integer to fp combine
+(define_insn "*aarch64_cvtf__2_mult"
+  [(set (match_operand:GPF 0 "register_operand" "=w")
+	(mult:GPF (FLOATUORS:GPF
+		   (match_operand: 1 "register_operand" "r"))
+		   (match_operand 2 "aarch64_fp_pow2_recip" "Dt")))]
+  "TARGET_FLOAT"
+  {
+operands[2] = GEN_INT (aarch64_fpconst_pow2_recip (operands[2]));
+return "cvtf\t%0, %1, #%2";
+  }
+  [(set_attr "type" "f_cvti2f")]
+)
+
+;; equal width integer to fp conversion
 (define_insn "2

[AArch64] Use scvtf fbits option where appropriate

2019-06-18 Thread Joel Hutton

On 18/06/2019 11:37, Richard Earnshaw (lists) wrote:
> Start sentences with a capital letter.  End them with a full stop.
> "inequal" isn't a word: you probably mean "unequal".

I've fixed this, the iterator is, however defined as 'fcvt_iesize'
and described in the adjacent comment in iterators.md as 'inequal'.
I've addressed your other comments.

On 18/06/2019 13:30, Richard Sandiford wrote:
> Wilco Dijkstra  writes:
>>   > +/* If X is a positive CONST_DOUBLE with a value that is the 
>> reciprocal of a
>>   > +   power of 2 (i.e 1/2^n) return the number of float bits. e.g. 
>> for x==(1/2^n)
>>   > +   return n. Otherwise return -1.  */
>>   > +int
>>   > +aarch64_fpconst_pow2_recip (rtx x)
>>   > +{
>>   > +  REAL_VALUE_TYPE r0;
>>   > +
>>   > +  if (!CONST_DOUBLE_P (x))
>>   > +    return -1;
>>> CONST_DOUBLE can be used for things other than floating point.  You
>>> should really check that the mode on the double in is in class 
>>> MODE_FLOAT.
>>   Several other functions (eg aarch64_fpconst_pow_of_2) do the same 
>> since
>> this function is only called with HF/SF/DF mode. We could add an 
>> assert for
>> SCALAR_FLOAT_MODE_P (but then aarch64_fpconst_pow_of_2 should do
>> the same).
> IMO we should leave it as-is.  aarch64.h has:
I've gone with the majority and left it as-is, but I don't have strong 
feelings on it.
From 1e44ef7e999527a0b03316cf0ea002f8d4437052 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Thu, 13 Jun 2019 11:08:56 +0100
Subject: [PATCH] SCVTF fbits

---
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 gcc/config/aarch64/aarch64.c  |  23 +++
 gcc/config/aarch64/aarch64.md |  39 +
 gcc/config/aarch64/constraints.md |   7 +
 gcc/config/aarch64/predicates.md  |   4 +
 gcc/testsuite/gcc.target/aarch64/fmul_scvtf.c | 140 ++
 6 files changed, 214 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmul_scvtf.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 1e3b1c91db1..ad1ba458a3f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -494,6 +494,7 @@ enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
 int aarch64_fpconst_pow_of_2 (rtx);
+int aarch64_fpconst_pow2_recip (rtx);
 machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
 		   machine_mode);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9a035dd9ed8..028da32174d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -18707,6 +18707,29 @@ aarch64_fpconst_pow_of_2 (rtx x)
   return exact_log2 (real_to_integer (r));
 }
 
+/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of a
+   power of 2 (i.e 1/2^n) return the number of float bits. e.g. for x==(1/2^n)
+   return n. Otherwise return -1.  */
+
+int
+aarch64_fpconst_pow2_recip (rtx x)
+{
+  REAL_VALUE_TYPE r0;
+
+  if (!CONST_DOUBLE_P (x))
+return -1;
+
+  r0 = *CONST_DOUBLE_REAL_VALUE (x);
+  if (exact_real_inverse (DFmode, &r0)
+  && !REAL_VALUE_NEGATIVE (r0))
+{
+	int ret = exact_log2 (real_to_integer (&r0));
+	if (ret >= 1 && ret <= 31)
+	return ret;
+}
+  return -1;
+}
+
 /* If X is a vector of equal CONST_DOUBLE values and that value is
Y, return the aarch64_fpconst_pow_of_2 of Y.  Otherwise return -1.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 526c7fb0dab..c7c6a18b0ff 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6016,6 +6016,44 @@
   [(set_attr "type" "f_cvtf2i")]
 )
 
+;; Equal width integer to fp combine.
+(define_insn "*aarch64_cvtf__2_mult"
+  [(set (match_operand:GPF 0 "register_operand" "=w,w")
+	(mult:GPF (FLOATUORS:GPF
+		   (match_operand: 1 "register_operand" "w,?r"))
+		   (match_operand:GPF 2 "aarch64_fp_pow2_recip" "Dt,Dt")))]
+  "TARGET_FLOAT"
+  {
+operands[2] = GEN_INT (aarch64_fpconst_pow2_recip (operands[2]));
+switch (which_alternative)
+{
+  case 0:
+	return "cvtf\t%0, %1, #%2";
+  case 1:
+	return "cvtf\t%0, %1, #%2";
+  default:
+	gcc_unreachable ();
+}
+  }
+  [(set_attr "type" "neon_int_to_fp_,f_cvti2f")
+   (set_attr "arch" "simd,fp")]
+)
+
+;; Unequal width integer to fp combine.
+(define_insn "*aarch64_cvtf__2_mult"
+  [(set 

[PING][AArch64] Use scvtf fbits option where appropriate

2019-06-26 Thread Joel Hutton
Ping, plus minor rework (mostly non-functional changes)

gcc/ChangeLog:

2019-06-12  Joel Hutton  

 * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow2_recip): New 
prototype
 * config/aarch64/aarch64.c (aarch64_fpconst_pow2_recip): New function
 * config/aarch64/aarch64.md 
(*aarch64_cvtf2_mult): New pattern
 (*aarch64_cvtf2_mult): New pattern
 * config/aarch64/constraints.md (Dt): New constraint
 * config/aarch64/predicates.md (aarch64_fpconst_pow2_recip): New 
predicate

gcc/testsuite/ChangeLog:

2019-06-12  Joel Hutton  

 * gcc.target/aarch64/fmul_scvtf_1.c: New test.

Bootstrapped and regression tested on aarch64-linux-none target.

From e866ce55c9febd92ab8e6314bf79b067085b2d1b Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Wed, 19 Jun 2019 17:24:38 +0100
Subject: [PATCH] SCVTF

---
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 gcc/config/aarch64/aarch64.c  |  23 +++
 gcc/config/aarch64/aarch64.md |  39 +
 gcc/config/aarch64/constraints.md |   7 +
 gcc/config/aarch64/predicates.md  |   4 +
 .../gcc.target/aarch64/fmul_scvtf_1.c | 140 ++
 6 files changed, 214 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmul_scvtf_1.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 1e3b1c91db1026a44f32b144a6e97398c0659feb..ad1ba458a3fa081d83acf806776e911aa789b5d0 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -494,6 +494,7 @@ enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
 int aarch64_fpconst_pow_of_2 (rtx);
+int aarch64_fpconst_pow2_recip (rtx);
 machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
 		   machine_mode);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9a035dd9ed8665274249581f8c404d18ae72e873..d88716576850eedd1070de108da152838c127c36 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -18707,6 +18707,29 @@ aarch64_fpconst_pow_of_2 (rtx x)
   return exact_log2 (real_to_integer (r));
 }
 
+/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of a
+   power of 2 (i.e 1/2^n) return the number of float bits. e.g. for x==(1/2^n)
+   return n.  Otherwise return -1.  */
+
+int
+aarch64_fpconst_pow2_recip (rtx x)
+{
+  REAL_VALUE_TYPE r0;
+
+  if (!CONST_DOUBLE_P (x))
+return -1;
+
+  r0 = *CONST_DOUBLE_REAL_VALUE (x);
+  if (exact_real_inverse (DFmode, &r0)
+  && !REAL_VALUE_NEGATIVE (r0))
+{
+	int ret = exact_log2 (real_to_integer (&r0));
+	if (ret >= 1 && ret <= 32)
+	return ret;
+}
+  return -1;
+}
+
 /* If X is a vector of equal CONST_DOUBLE values and that value is
Y, return the aarch64_fpconst_pow_of_2 of Y.  Otherwise return -1.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 526c7fb0dabc540065d77d4a7922aeca16a402aa..0ccd5de3d807f079614b0076ac439c1cb8e56ab8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6016,6 +6016,44 @@
   [(set_attr "type" "f_cvtf2i")]
 )
 
+;; Equal width integer to fp and multiply combine.
+(define_insn "*aarch64_cvtf2_mult"
+  [(set (match_operand:GPF 0 "register_operand" "=w,w")
+	(mult:GPF (FLOATUORS:GPF
+		   (match_operand: 1 "register_operand" "w,?r"))
+		   (match_operand:GPF 2 "aarch64_fp_pow2_recip" "Dt,Dt")))]
+  "TARGET_FLOAT"
+  {
+operands[2] = GEN_INT (aarch64_fpconst_pow2_recip (operands[2]));
+switch (which_alternative)
+{
+  case 0:
+	return "cvtf\t%0, %1, #%2";
+  case 1:
+	return "cvtf\t%0, %1, #%2";
+  default:
+	gcc_unreachable ();
+}
+  }
+  [(set_attr "type" "neon_int_to_fp_,f_cvti2f")
+   (set_attr "arch" "simd,fp")]
+)
+
+;; Unequal width integer to fp and multiply combine.
+(define_insn "*aarch64_cvtf2_mult"
+  [(set (match_operand:GPF 0 "register_operand" "=w")
+	(mult:GPF (FLOATUORS:GPF
+		   (match_operand: 1 "register_operand" "r"))
+		   (match_operand:GPF 2 "aarch64_fp_pow2_recip" "Dt")))]
+  "TARGET_FLOAT"
+  {
+operands[2] = GEN_INT (aarch64_fpconst_pow2_recip (operands[2]));
+return "cvtf\t%0, %1, #%2";
+  }
+  [(set_attr "type" "f_cvti2f")]
+)
+
+;; Equal width integer to fp conversion.
 (define_insn "2"
   [(set (match_operand:GPF 0 "register_operand" "=w,w")
 (FLOATUORS:GPF (match_operand: 1 "register_operand" "

Re: [PING][AArch64] Use scvtf fbits option where appropriate

2019-07-08 Thread Joel Hutton
On 01/07/2019 18:03, James Greenhalgh wrote:

>> gcc/testsuite/ChangeLog:
>>
>> 2019-06-12  Joel Hutton  
>>
>>   * gcc.target/aarch64/fmul_scvtf_1.c: New test.
> This testcase will fail on ILP32 targets where unsigned long will still
> live in a 'w' register.
Updated to use long long and unsigned long long.

Joel

From e10d5fdb9430799cd2050b8a2f567d1b4e43cde1 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Mon, 8 Jul 2019 11:59:50 +0100
Subject: [PATCH] SCVTF

---
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 gcc/config/aarch64/aarch64.c  |  23 +++
 gcc/config/aarch64/aarch64.md |  39 +
 gcc/config/aarch64/constraints.md |   7 +
 gcc/config/aarch64/predicates.md  |   4 +
 .../gcc.target/aarch64/fmul_scvtf_1.c | 140 ++
 6 files changed, 214 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmul_scvtf_1.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index e2f4cc19e68a79368f939cb8a83cf1f6d0412264..568c2d5846c6501c60de85cfd2fa07e0a9e5831a 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -494,6 +494,7 @@ enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
 int aarch64_fpconst_pow_of_2 (rtx);
+int aarch64_fpconst_pow2_recip (rtx);
 machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
 		   machine_mode);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a18fbd0f0aa8acc000fd57af5d060961ef0a4e13..0dfcef454a1594497a6bc493d92f7b2b7335a244 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -18750,6 +18750,29 @@ aarch64_fpconst_pow_of_2 (rtx x)
   return exact_log2 (real_to_integer (r));
 }
 
+/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of a
+   power of 2 (i.e 1/2^n) return the number of float bits. e.g. for x==(1/2^n)
+   return n. Otherwise return -1.  */
+
+int
+aarch64_fpconst_pow2_recip (rtx x)
+{
+  REAL_VALUE_TYPE r0;
+
+  if (!CONST_DOUBLE_P (x))
+return -1;
+
+  r0 = *CONST_DOUBLE_REAL_VALUE (x);
+  if (exact_real_inverse (DFmode, &r0)
+  && !REAL_VALUE_NEGATIVE (r0))
+{
+	int ret = exact_log2 (real_to_integer (&r0));
+	if (ret >= 1 && ret <= 32)
+	return ret;
+}
+  return -1;
+}
+
 /* If X is a vector of equal CONST_DOUBLE values and that value is
Y, return the aarch64_fpconst_pow_of_2 of Y.  Otherwise return -1.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4d559c4c928e5949d0494bf384a9ea044cf6fc7c..1b03c1fe71630a72fd00221eb1bbde7f0ba2ac1a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6021,6 +6021,44 @@
   [(set_attr "type" "f_cvtf2i")]
 )
 
+;; Equal width integer to fp and multiply combine.
+(define_insn "*aarch64_cvtf2_mult"
+  [(set (match_operand:GPF 0 "register_operand" "=w,w")
+	(mult:GPF (FLOATUORS:GPF
+		   (match_operand: 1 "register_operand" "w,?r"))
+		   (match_operand:GPF 2 "aarch64_fp_pow2_recip" "Dt,Dt")))]
+  "TARGET_FLOAT"
+  {
+operands[2] = GEN_INT (aarch64_fpconst_pow2_recip (operands[2]));
+switch (which_alternative)
+{
+  case 0:
+	return "cvtf\t%0, %1, #%2";
+  case 1:
+	return "cvtf\t%0, %1, #%2";
+  default:
+	gcc_unreachable ();
+}
+  }
+  [(set_attr "type" "neon_int_to_fp_,f_cvti2f")
+   (set_attr "arch" "simd,fp")]
+)
+
+;; Unequal width integer to fp and multiply combine.
+(define_insn "*aarch64_cvtf2_mult"
+  [(set (match_operand:GPF 0 "register_operand" "=w")
+	(mult:GPF (FLOATUORS:GPF
+		   (match_operand: 1 "register_operand" "r"))
+		   (match_operand:GPF 2 "aarch64_fp_pow2_recip" "Dt")))]
+  "TARGET_FLOAT"
+  {
+operands[2] = GEN_INT (aarch64_fpconst_pow2_recip (operands[2]));
+return "cvtf\t%0, %1, #%2";
+  }
+  [(set_attr "type" "f_cvti2f")]
+)
+
+;; Equal width integer to fp conversion.
 (define_insn "2"
   [(set (match_operand:GPF 0 "register_operand" "=w,w")
 (FLOATUORS:GPF (match_operand: 1 "register_operand" "w,?r")))]
@@ -6032,6 +6070,7 @@
(set_attr "arch" "simd,fp")]
 )
 
+;; Unequal width integer to fp conversions.
 (define_insn "2"
   [(set (match_operand:GPF 0 "register_operand" "=w")
 (FLOATUORS:GPF (match_operand: 1 "register_operand" "r")))]
diff --git a/gcc/config/aarch64

[Arm][CMSE]Add warn_unused_return attribute to cmse functions

2019-07-17 Thread Joel Hutton
At present it is possible to call the CMSE functions for checking 
addresses (such as cmse_check_address_range) and  forget to check/use 
the return value. This patch makes the interfaces more robust against 
programmer error by marking these functions with the warn_unused_result 
attribute. With this set, any use of these functions that does not use 
the result will produce a warning.

This produces a warning on default warn levels when the result of the 
cmse functions is not used.

For the following function:
void foo()
{
     int *data;
     cmse_check_address_range((int*)data, 0, 0);
}
The following warning is emitted:
warning: ignoring return value of 'cmse_check_address_range' declared 
with attribute 'warn_unused_result' [-Wunused-result]
     6 |  cmse_check_address_range((int*)data, 0, 0);
    |  ^~

gcc/ChangeLog:

2019-07-10  Joel Hutton  

     * config/arm/arm_cmse.h (cmse_nonsecure_caller): Add 
warn_unused_result attribute.
     (cmse_check_address_range): Add warn_unused_result attribute.

libgcc/ChangeLog:

2019-07-10  Joel Hutton  

     * config/arm/cmse.c (cmse_check_address_range): Add 
warn_unused_result attribute.

2019-07-10  Joel Hutton  

     * gcc.target/arm/cmse/cmse-17.c: New test.

From 628070faaf157934e6b4c8d7d2d288244467bea6 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Wed, 10 Jul 2019 09:59:58 +0100
Subject: [PATCH] CMSE warn unused result

---
 gcc/config/arm/arm_cmse.h   |  2 ++
 gcc/testsuite/gcc.target/arm/cmse/cmse-17.c | 10 ++
 libgcc/config/arm/cmse.c|  1 +
 3 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/cmse/cmse-17.c

diff --git a/gcc/config/arm/arm_cmse.h b/gcc/config/arm/arm_cmse.h
index b543cbfe455ae57487be199f7c918eb87db30bf2..a72c46f1a954bd3ba4aefcdbb7a31085d0f583c0 100644
--- a/gcc/config/arm/arm_cmse.h
+++ b/gcc/config/arm/arm_cmse.h
@@ -164,6 +164,7 @@ __CMSE_TT_ASM (at)
 
 /* FIXME: diagnose use outside cmse_nonsecure_entry functions.  */
 __extension__ static __inline int __attribute__ ((__always_inline__))
+__attribute__ ((warn_unused_result))
 cmse_nonsecure_caller (void)
 {
   return __builtin_arm_cmse_nonsecure_caller ();
@@ -184,6 +185,7 @@ cmse_nonsecure_caller (void)
 #define CMSE_MPU_READ		8
 
 __extension__ void *
+__attribute__ ((warn_unused_result))
 cmse_check_address_range (void *, size_t, int);
 
 #define cmse_check_pointed_object(p, f) \
diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-17.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-17.c
new file mode 100644
index ..a2cce09afae590461b86397e73e9b98649bed95a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-17.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-mcmse"}  */
+
+#include 
+
+void foo()
+{
+	int *data;
+	cmse_check_address_range((int*)data, 0, 0); /* { dg-warning "ignoring return value" } */
+}
diff --git a/libgcc/config/arm/cmse.c b/libgcc/config/arm/cmse.c
index 34a46fde2d2fcd9dc181bf5a74dd698de2ebc9bd..0c5a3eaefab49ae07e67b82481fdd0d8dd100227 100644
--- a/libgcc/config/arm/cmse.c
+++ b/libgcc/config/arm/cmse.c
@@ -30,6 +30,7 @@
address range.  See ACLE changes for ARMv8-M.  */
 
 void *
+__attribute__ ((warn_unused_result))
 cmse_check_address_range (void *p, size_t size, int flags)
 {
   cmse_address_info_t permb, perme;
-- 
2.17.1



RE: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-06-30 Thread Joel Hutton via Gcc-patches
> We can go with a private vect_gimple_build function until we sort out the API
> issue to unblock Tamar (I'll reply to Richards reply with further thoughts on
> this)
> 

Done.

> > Similarly are you ok with the use of gimple_extract_op? I would lean
> towards using it as it is cleaner, but I don't have strong feelings.
> 
> I don't like using gimple_extract_op here, I think I outlined a variant that 
> is
> even shorter.
> 

Done.

Updated patches attached, bootstrapped and regression tested on aarch64.

Tomorrow is my last working day at Arm, so it will likely be Andre that commits 
this/addresses any further comments.



0001-Refactor-to-allow-internal_fn-s.patch
Description: 0001-Refactor-to-allow-internal_fn-s.patch


0002-Refactor-widen_plus-as-internal_fn.patch
Description: 0002-Refactor-widen_plus-as-internal_fn.patch


0003-Remove-widen_plus-minus_expr-tree-codes.patch
Description: 0003-Remove-widen_plus-minus_expr-tree-codes.patch


Re: [AArch64] Remove backend support for widen-sub

2021-01-21 Thread Joel Hutton via Gcc-patches
> I think we should try to fix the PR instead.  The widening operations
> can only be used for SLP if the group_size is at least double the
> number of elements in the vectype, so one idea (not worked through)
> is to make the vect_build_slp_tree family of routines undo pattern
> recognition for widening operations if the group size is less than that. >

This seems reasonable, how do I go about 'undoing' the pattern recognition.

Ideally the patterns wouldn't be substituted in the first place, but group size 
is chosen after pattern substitution.

From: Richard Sandiford 
Sent: 21 January 2021 13:40
To: Richard Biener 
Cc: Joel Hutton via Gcc-patches ; Joel Hutton 

Subject: Re: [AArch64] Remove backend support for widen-sub

Richard Biener  writes:
> On Thu, 21 Jan 2021, Richard Sandiford wrote:
>
>> Joel Hutton via Gcc-patches  writes:
>> > Hi all,
>> >
>> > This patch removes support for the widening subtract operation in the 
>> > aarch64 backend as it is causing a performance regression.
>> >
>> > In the following example:
>> >
>> > #include 
>> > extern void wdiff( int16_t d[16], uint8_t *restrict pix1, uint8_t 
>> > *restrict pix2)
>> > {
>> >for( int y = 0; y < 4; y++ )
>> >   {
>> > for( int x = 0; x < 4; x++ )
>> >   d[x + y*4] = pix1[x] - pix2[x];
>> > pix1 += 16;
>> > pix2 += 16;
>> >  }
>> >
>> > The widening minus pattern is recognized and substituted, but cannot be 
>> > used due to the input vector type chosen in slp vectorization. This 
>> > results in an attempt to do an 8 byte->8 short widening subtract 
>> > operation, which is not supported.
>> >
>> > The issue is documented in PR 98772.
>>
>> IMO removing just the sub patterns is too arbitrary.  Like you say,
>> the PR affects all widening instructions. but it happens that the
>> first encountered regression used subtraction.
>>
>> I think we should try to fix the PR instead.  The widening operations
>> can only be used for SLP if the group_size is at least double the
>> number of elements in the vectype, so one idea (not worked through)
>> is to make the vect_build_slp_tree family of routines undo pattern
>> recognition for widening operations if the group size is less than that.
>>
>> Richi would know better than me though.
>
> Why should the widen ops be only usable with such constraints?
> As I read md.texi they for example do v8qi -> v8hi operations
> (where the v8qi is either lo or hi part of a v16qi vector).

They're v16qi->v8hi operations rather than v8qi->v8hi.  The lo
operations operate on one half of the v16qi and the hi operations
operate on the other half.  They're supposed to be used together
to produce v16qi->v8hi+v8hi, so for BB SLP we need a group size
of 16 to feed them.  (Loop-aware SLP is fine as-is because we can
just increase the unroll factor.)

In the testcase, the store end of the SLP graph is operating on
8 shorts, which further up the graph are converted from 8 chars.
To use WIDEN_MINUS_EXPR at v8hi we'd need 16 shorts and 16 chars.

> The dumps show we use a VF of 4 which makes us have two v8hi
> vectors and one v16qi which vectorizable_conversion should
> be able to handle by emitting hi/lo widened subtracts.

AIUI the problem is with slp1.  I think the loop side is fine.

Thanks,
Richard


[RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-01 Thread Joel Hutton via Gcc-patches
Hi Richard(s),

I'm just looking to see if I'm going about this the right way, based on the 
discussion we had on IRC. I've managed to hack something together, I've 
attached a (very) WIP patch which gives the correct codegen for the testcase in 
question (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would 
obviously need to support other widening patterns and differentiate between 
big/little endian among other things.

I added a backend pattern because I wasn't quite clear which changes to make in 
order to allow the existing backend patterns to be used with a V8QI, or how to 
represent V16QI where we don't care about the top/bottom 8. I made some attempt 
in optabs.c, which is in the patch commented out, but I'm not sure if I'm going 
about this the right way.

Joel
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index be2a5a865172bdd7848be4082abb0fdfb0b35937..c66b8a367623c8daf4423677d292e292feee3606 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3498,6 +3498,14 @@
   DONE;
 })
 
+(define_insn "vec_widen_usubl_half_v8qi"
+  [(match_operand:V8HI 0 "register_operand")
+(match_operand:V8QI 1 "register_operand")
+(match_operand:V8QI 2 "register_operand")]
+  "TARGET_SIMD"
+  "usubl\t%0., %1., %2."
+)
+
 (define_expand "vec_widen_subl_hi_"
   [(match_operand: 0 "register_operand")
(ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
diff --git a/gcc/expr.c b/gcc/expr.c
index 04ef5ad114d0662948c896cdbf58e67737b39c7e..0939a156deef63f1cf2fa7e29c2c94925820f2ba 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9785,6 +9785,7 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 
 case VEC_WIDEN_PLUS_HI_EXPR:
 case VEC_WIDEN_PLUS_LO_EXPR:
+case VEC_WIDEN_MINUS_HALF_EXPR:
 case VEC_WIDEN_MINUS_HI_EXPR:
 case VEC_WIDEN_MINUS_LO_EXPR:
 case VEC_WIDEN_MULT_HI_EXPR:
diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
index 876a3a6f348de122e5a52e6dd70d7946bc810162..10aa21d07595325fd8ef3057444853fc946385de 100644
--- a/gcc/optabs-query.h
+++ b/gcc/optabs-query.h
@@ -186,6 +186,9 @@ bool can_vec_perm_const_p (machine_mode, const vec_perm_indices &,
 enum insn_code find_widening_optab_handler_and_mode (optab, machine_mode,
 		 machine_mode,
 		 machine_mode *);
+enum insn_code find_half_mode_optab_and_mode (optab, machine_mode,
+		 machine_mode,
+		 machine_mode *);
 int can_mult_highpart_p (machine_mode, bool);
 bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool);
 opt_machine_mode get_len_load_store_mode (machine_mode, bool);
diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
index 3248ce2c06e65c9c0366757907ab057407f7c594..7abfc04aa18b7ee5b734a1b1f4378b4615ee31fd 100644
--- a/gcc/optabs-query.c
+++ b/gcc/optabs-query.c
@@ -462,6 +462,17 @@ can_vec_perm_const_p (machine_mode mode, const vec_perm_indices &sel,
   return false;
 }
 
+enum insn_code
+find_half_mode_optab_and_mode (optab op, machine_mode to_mode,
+  machine_mode from_mode,
+  machine_mode *found_mode)
+{
+insn_code icode = CODE_FOR_nothing;
+if (GET_MODE_2XWIDER_MODE(from_mode).exists(found_mode))
+  icode = optab_handler (op, *found_mode);
+return icode;
+}
+
 /* Find a widening optab even if it doesn't widen as much as we want.
E.g. if from_mode is HImode, and to_mode is DImode, and there is no
direct HI->SI insn, then return SI->DI, if that exists.  */
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index c94073e3ed98f8c4cab65891f65dedebdb1ec274..eb52dc15f8094594c4aa22d5fc1c442886e4ebf6 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -185,6 +185,9 @@ optab_for_tree_code (enum tree_code code, const_tree type,
 case VEC_WIDEN_MINUS_HI_EXPR:
   return (TYPE_UNSIGNED (type)
 	  ? vec_widen_usubl_hi_optab : vec_widen_ssubl_hi_optab);
+
+case VEC_WIDEN_MINUS_HALF_EXPR:
+  return vec_widen_usubl_half_optab;
 
 case VEC_UNPACK_HI_EXPR:
   return (TYPE_UNSIGNED (type)
@@ -308,6 +311,16 @@ supportable_convert_operation (enum tree_code code,
   if (!VECTOR_MODE_P (m1) || !VECTOR_MODE_P (m2))
 return false;
 
+  /* The case where vectype_in is half the vector width, as opposed to the
+ normal case for widening patterns of vector width input, with output in
+ multiple registers. */
+  if (code == WIDEN_MINUS_EXPR &&
+  known_eq(TYPE_VECTOR_SUBPARTS(vectype_in),TYPE_VECTOR_SUBPARTS(vectype_out)) )
+  {
+*code1 = VEC_WIDEN_MINUS_HALF_EXPR;
+return true;
+  }
+
   /* First check if we can done conversion directly.  */
   if ((code == FIX_TRUNC_EXPR
&& can_fix_p (m1,m2,TYPE_UNSIGNED (vectype_out), &truncp)
diff --git a/gcc/optabs.c b/gcc/optabs.c
index f4614a394587787293dc8b680a38901f7906f61c..1252097be9893d7d65ea844fc0eda9bad70b9256 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -293,6 +293,13 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,

Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Joel Hutton via Gcc-patches
>>> In practice this will only affect targets that choose to use mixed
>>> vector sizes, and I think it's reasonable to optimise only for the
>>> case in which such targets support widening conversions.  So what
>>> do you think about the idea of emitting separate conversions and
>>> a normal subtract?  We'd be relying on RTL to fuse them together,
>>> but at least there would be no redundancy to eliminate.
>>
>> So in vectorizable_conversion for the widen-minus you'd check
>> whether you can do a v4qi -> v4hi and then emit a conversion
>> and a wide minus?
>
>Yeah.

This seems reasonable, as I recall we decided against adding
internal functions for the time being as all the existing vec patterns
code would have to be refactored. So emit a v4qi->v8qi gimple conversion
then a regular widen_lo/hi using the existing backend patterns/optabs?


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Joel Hutton via Gcc-patches
>> So emit a v4qi->v8qi gimple conversion
>> then a regular widen_lo/hi using the existing backend patterns/optabs?
>
>I was thinking of using a v8qi->v8hi convert on each operand followed
>by a normal v8hi subtraction.  That's what we'd generate if the target
>didn't define the widening patterns.

Is there a reason that conversion is preferred? If we use a widening subtract
then we don't need to rely on RTL fusing later.


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Joel Hutton via Gcc-patches
> Do you mean a v8qi->v8hi widening subtract or a v16qi->v8hi widening
> subtract?  

I mean the latter, that seemed to be what richi was suggesting previously. 

> The problem with the latter is that we need to fill the
> extra unused elements with something and remove them later.

That's fair enough, fake/don't care elements is a bit of a hack. I'll try 
something out with the conversions and regular subtract.

thanks for the help,
Joel


[aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

2021-02-09 Thread Joel Hutton via Gcc-patches
Hi Richards,

This patch adds support for the V8QI->V8HI case from widening vect patterns as 
discussed to target PR98772.

Bootstrapped and regression tested on aarch64.


[aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

In the case where 8 out of every 16 elements are widened using a
widening pattern and the next 8 are skipped the patterns are not
recognized. This is because they are normally used in a pair, such  as
VEC_WIDEN_MINUS_HI/LO, to achieve a v16qi->v16hi conversion for example.
This patch adds support for V8HI->V8QI patterns.

gcc/ChangeLog:
        PR tree-optimisation/98772
        * optabs-tree.c (supportable_convert_operation): Add case for V8QI->V8HI
        * tree-vect-stmts.c (vect_create_vectorized_promotion_stmts): New 
function to generate promotion stmts for V8QI->V8HI
        (vectorizable_conversion): Add case for V8QI->V8HI

gcc/testsuite/ChangeLog:
        PR tree-optimisation/98772
        * gcc.target/aarch64/pr98772.c: New test.diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index c94073e3ed98f8c4cab65891f65dedebdb1ec274..b91ce3af6f0d4b3a62110bdb38f68ecc53765cad 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -308,6 +308,40 @@ supportable_convert_operation (enum tree_code code,
   if (!VECTOR_MODE_P (m1) || !VECTOR_MODE_P (m2))
 return false;
 
+  /* The case where a widening operation is not making use of the full width of
+ of the input vector, but using the full width of the output vector.
+ Return the non-wided code, which will be used after the inputs are
+ converted to the wide type.  */
+  if ((code == WIDEN_MINUS_EXPR
+  || code == WIDEN_PLUS_EXPR
+  || code == WIDEN_MULT_EXPR
+  || code == WIDEN_LSHIFT_EXPR)
+  && known_eq (TYPE_VECTOR_SUBPARTS (vectype_in),
+		  TYPE_VECTOR_SUBPARTS (vectype_out)))
+  {
+switch (code)
+{
+  case WIDEN_LSHIFT_EXPR:
+	*code1 = LSHIFT_EXPR;
+	return true;
+	break;
+  case WIDEN_MINUS_EXPR:
+	*code1 = MINUS_EXPR;
+	return true;
+	break;
+  case WIDEN_PLUS_EXPR:
+	*code1 = PLUS_EXPR;
+	return true;
+	break;
+  case WIDEN_MULT_EXPR:
+	*code1 = MULT_EXPR;
+	return true;
+	break;
+  default:
+	gcc_unreachable ();
+}
+  }
+
   /* First check if we can done conversion directly.  */
   if ((code == FIX_TRUNC_EXPR
&& can_fix_p (m1,m2,TYPE_UNSIGNED (vectype_out), &truncp)
diff --git a/gcc/testsuite/gcc.target/aarch64/pr98772.c b/gcc/testsuite/gcc.target/aarch64/pr98772.c
new file mode 100644
index ..35568a9f9d60c44aa01a6afc5f7e6a0935009aaf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr98772.c
@@ -0,0 +1,155 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include 
+#include 
+
+#define DSIZE 16
+#define PIXSIZE 64
+
+extern void
+wplus (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] + pix2[x];
+	pix1 += 16;
+	pix2 += 16;
+}
+}
+extern void __attribute__((optimize (0)))
+wplus_no_opt (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] + pix2[x];
+	pix1 += 16;
+	pix2 += 16;
+}
+}
+
+extern void
+wminus (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] - pix2[x];
+	pix1 += 16;
+	pix2 += 16;
+}
+}
+extern void __attribute__((optimize (0)))
+wminus_no_opt (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] - pix2[x];
+	pix1 += 16;
+	pix2 += 16;
+}
+}
+
+extern void
+wmult (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] * pix2[x];
+	pix1 += 16;
+	pix2 += 16;
+}
+}
+extern void __attribute__((optimize (0)))
+wmult_no_opt (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] * pix2[x];
+	pix1 += 16;
+	pix2 += 16;
+}
+}
+
+extern void
+wlshift (uint16_t *d, uint8_t *restrict pix1)
+
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] << 8;
+	pix1 += 16;
+}
+}
+extern void __attribute__((optimize (0)))
+wlshift_no_opt (uint16_t *d, uint8_t *restrict pix1)
+
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] << 8;
+	pix1 += 16;
+}
+}
+
+void __attribute__((optimize (0)))
+init_arrays(uint16_t *d_a, uint16_t *d_b, uint8_t *pix1, uint8_t *pix2)
+{
+  for(int i = 0; i < DSIZE; i++)
+  {
+d_a[i] = (1074 * i)%17;
+d_b[i] = (1074 * i)%17;
+  }
+  for(int i = 0; i < PIXSIZE; i++)
+  {
+pix1[i] = (

Re: [aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

2021-02-10 Thread Joel Hutton via Gcc-patches
units_in))
>> -modifier = NONE;
>> +if (code == WIDEN_MINUS_EXPR
>> + || code == WIDEN_PLUS_EXPR
>> + || code == WIDEN_LSHIFT_EXPR
>> + || code == WIDEN_MULT_EXPR)
>> + modifier = WIDEN;
>
>Formatting nit: the last line should be indented by 6 spaces rather than 8.
Done.

>> @@ -4743,9 +4815,21 @@ vectorizable_conversion (vec_info *vinfo,
>>return false;
>> 
>>  case WIDEN:
>> -  if (supportable_widening_operation (vinfo, code, stmt_info, 
>> vectype_out,
>> -   vectype_in, &code1, &code2,
>> -   &multi_step_cvt, &interm_types))
>> +  if (known_eq (nunits_out, nunits_in)
>> +   && (code == WIDEN_MINUS_EXPR
>> +   || code == WIDEN_LSHIFT_EXPR
>> +   || code == WIDEN_PLUS_EXPR
>> +   || code == WIDEN_MULT_EXPR)
>> +   && supportable_convert_operation (code, vectype_out, vectype_in,
>> + &code1))
>
>Guess this is personal taste, sorry, since it's clearly right both ways,
>but IMO it'd be better to drop the code test.  We can only get here
>with nunits_out==nunits_in if we're converting a widening operation into
>a non-widening operation.  If we do end up calling a separate function
>(as per the comment above), then it would abort in a meaningful place
>if something unexpected slips through.
Done.

>> + {
>> +   gcc_assert (!(multi_step_cvt && op_type == binary_op));
>> +   break;
>> + }
>> +  else if (supportable_widening_operation (vinfo, code, stmt_info,
>> +vectype_out, vectype_in, &code1,
>> +&code2, &multi_step_cvt,
>> +&interm_types))
>>{
>>  /* Binary widening operation can only be supported directly by the
>> architecture.  */
>> @@ -4981,10 +5065,20 @@ vectorizable_conversion (vec_info *vinfo,
>>  c1 = codecvt1;
>>  c2 = codecvt2;
>>}
>> -   vect_create_vectorized_promotion_stmts (vinfo, &vec_oprnds0,
>> -   &vec_oprnds1, stmt_info,
>> -   this_dest, gsi,
>> -   c1, c2, op_type);
>> +   if ((code == WIDEN_MINUS_EXPR
>> +|| code == WIDEN_PLUS_EXPR
>> +|| code == WIDEN_LSHIFT_EXPR
>> +|| code == WIDEN_MULT_EXPR)
>> +   && known_eq (nunits_in, nunits_out))
>
>Same comment here about dropping the code tests.
Done.
From 2a530703c3b7b893c83a13cc8e9e211bddb428b0 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Fri, 5 Feb 2021 12:52:12 +
Subject: [PATCH] [aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

In the case where 8 out of every 16 elements are widened using a
widening pattern and the next 8 are skipped the patterns are not
recognized. This is because they are normally used in a pair, such  as
VEC_WIDEN_MINUS_HI/LO, to achieve a v16qi->v16hi conversion for example.
This patch adds support for V8QI->V8HI patterns.

gcc/ChangeLog:

PR tree-optimisation/98772
* optabs-tree.c (supportable_half_widening_operation): New function
to check for supportable V8QI->V8HI widening patterns.
* optabs-tree.h (supportable_half_widening_operation): New function.
* tree-vect-stmts.c (vect_create_vectorized_promotion_stmts): New
function to create promotion stmts for V8QI->V8HI widening patterns.
(vectorizable_conversion): Add case for V8QI->V8HI.

gcc/testsuite/ChangeLog:

PR tree-optimisation/98772
* gcc.target/aarch64/pr98772.c: New test.
---
 gcc/optabs-tree.c  |  75 ++
 gcc/optabs-tree.h  |   3 +
 gcc/testsuite/gcc.target/aarch64/pr98772.c | 155 +
 gcc/tree-vect-stmts.c  |  93 +++--
 4 files changed, 318 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr98772.c

diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index c94073e3ed9..6cd12a807f9 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -277,6 +277,81 @@ optab_for_tree_code (enum tree_code code, const_tree type,
 }
 }
 
+/* Function supportable_half_widening_operation
+
+   Check whether an operation represented by code is a 'half' widening operation
+   using only half of the full vector width for the inputs. This allows the
+   output to be stored in a single vector, as opposed to using the full width
+   for i

Re: [aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

2021-02-11 Thread Joel Hutton via Gcc-patches
Hi Richard,

I've revised the patch, sorry about sloppy formatting in the previous one.

Full bootstrap/regression tests are still running, but the changes are pretty 
trivial.

Ok for trunk assuming tests finish clean?

>Joel Hutton  writes:
>> @@ -277,6 +277,81 @@ optab_for_tree_code (enum tree_code code, const_tree 
>> type,
>>  }
>>  }
>>
>> +/* Function supportable_half_widening_operation
>> +
>
>I realise existing (related) functions do have a “Function foo” line,
>but it's not generally the GCC style, so I think it would be better
>to leave it out.
Done.

>> +   Check whether an operation represented by code is a 'half' widening 
>> operation
>
>Nit: CODE should be in caps since it's a parameter name.
Done.

>> +   using only half of the full vector width for the inputs. This allows the
>
>I think this could be misread as saying that we only look at half the
>input vector.  Maybe it would be clearer to say something like “in which
>the input vectors contain half as many bits as the output vectors”
>instead of “using…”.
Done.

>> +   output to be stored in a single vector, as opposed to using the full 
>> width
>> +   for input, where half the elements must be processed at a time into
>> +   respective full-width vector outputs i.e. a 'hi'/'lo' pair using codes 
>> such
>> +   as VEC_WIDEN_MINUS_HI/LO. This is handled by widening the inputs and 
>> using
>> +   non-widening instructions. RTL fusing converts these to the widening 
>> hardware
>> +   instructions if supported.
>
>And here the contrast is with cases in which the input vectors have
>the same number of bits as the output vectors, meaning that we need
>a lo/hi pair to generate twice the amount of output data.
Done.

> Might be clearer to make “This is handled…” a new paragraph.
Done.

>> +
>> +   Supported widening operations:
>> +WIDEN_MINUS_EXPR
>> +WIDEN_PLUS_EXPR
>> +WIDEN_MULT_EXPR
>> +WIDEN_LSHIFT_EXPR
>> +
>> +   Output:
>> +   - CODE1 - The non-widened code, which will be used after the inputs are
>> + converted to the wide type.
>> +   */
>
>Formatting nit, */ should be on the previous line, i.e. “.” + two spaces
>+ “*/”.  (Sorry for all the formatting comments.)
Done (sorry for the formatting errors)

>> +bool
>> +supportable_half_widening_operation (enum tree_code code,
>> +tree vectype_out, tree vectype_in,
>> +enum tree_code *code1)
>> +{
>> +  machine_mode m1,m2;
>> +  bool truncp;
>> +  enum tree_code dummy_code;
>> +  optab op;
>> +
>> +  gcc_assert (VECTOR_TYPE_P (vectype_out) && VECTOR_TYPE_P (vectype_in));
>> +
>> +  m1 = TYPE_MODE (vectype_out);
>> +  m2 = TYPE_MODE (vectype_in);
>> +
>> +  if (!VECTOR_MODE_P (m1) || !VECTOR_MODE_P (m2))
>> +return false;
>> +
>> +  if(!known_eq (TYPE_VECTOR_SUBPARTS (vectype_in),
>> +   TYPE_VECTOR_SUBPARTS (vectype_out)))
>> +  return false;
>
>Formatting nit: should be a space between “if” and “(”.  IMO
>maybe_ne is more readable than !known_eq.  The return statement
>should only be indented by 4 columns.
Done.

>> +  if(!(code == WIDEN_MINUS_EXPR
>> +   || code == WIDEN_PLUS_EXPR
>> +   || code == WIDEN_MULT_EXPR
>> +   || code == WIDEN_LSHIFT_EXPR))
>> +return false;
>> +
>> +  switch (code)
>> +  {
>> +case WIDEN_LSHIFT_EXPR:
>> +  *code1 = LSHIFT_EXPR;
>> +  break;
>> +case WIDEN_MINUS_EXPR:
>> +  *code1 = MINUS_EXPR;
>> +  break;
>> +case WIDEN_PLUS_EXPR:
>> +  *code1 = PLUS_EXPR;
>> +  break;
>> +case WIDEN_MULT_EXPR:
>> +  *code1 = MULT_EXPR;
>> +  break;
>> +default:
>> +  gcc_unreachable ();
>
>I think it would be better to return false in the default case and remove
>the “if” above, so that we only have one copy of the list of codes.
Done.

>> +  }
>
>Formatting nit: the { and } lines should be indented two spaces more.
>(The contents of the switch are OK as-is.)
Done.

>> +
>> +  if (!supportable_convert_operation(NOP_EXPR, vectype_out, vectype_in,
>> +  &dummy_code))
>
>Nit: should be a space between the function name and ”(”.
Done.

>> +return false;
>> +
>> +  op = optab_for_tree_code (*code1, vectype_out, optab_vector);
>> +  return (optab_handler (op, TYPE_MODE (vectype_out)) != CODE_FOR_nothing);
>> +}
>> +
>> 

[ping][vect-patterns][RFC] Refactor widening patterns to allow internal_fn's

2021-08-17 Thread Joel Hutton via Gcc-patches
Ping. Is there still interest in refactoring vect-patterns to internal_fn's? 

> -Original Message-
> From: Joel Hutton
> Sent: 07 June 2021 14:30
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Biener ; Richard Sandiford
> 
> Subject: [vect-patterns][RFC] Refactor widening patterns to allow
> internal_fn's
> 
> Hi all,
> 
> This refactor allows widening patterns (such as widen_plus/widen_minus) to
> be represented as either internal_fns or tree_codes. The widening patterns
> were originally added as tree codes with the expectation that they would be
> refactored later.
> 
> [vect-patterns] Refactor as internal_fn's
> 
> Refactor vect-patterns to allow patterns to be internal_fns starting with
> widening_plus/minus patterns.
> 
> 
> gcc/ChangeLog:
> 
> * gimple-match.h (class code_helper): Move code_helper class to more
> visible header.
> * internal-fn.h (internal_fn_name): Add internal_fn range check.
> * optabs-tree.h (supportable_convert_operation): Change function
> prototypes to use code_helper.
> * tree-vect-patterns.c (vect_recog_widen_op_pattern): Refactor to use
> code_helper.
> * tree-vect-stmts.c (vect_gen_widened_results_half): Refactor to use
> code_helper, build internal_fns.
> (vect_create_vectorized_promotion_stmts): Refactor to use
> code_helper.
> (vectorizable_conversion): Refactor to use code_helper.
> (supportable_widening_operation): Refactor to use code_helper.
> (supportable_narrowing_operation): Refactor to use code_helper.
> * tree-vectorizer.h (supportable_widening_operation): Refactor to use
> code_helper.
> (supportable_narrowing_operation): Refactor to use code_helper.
> * tree.h (class code_helper): Refactor to use code_helper.


[SLP][VECT] Add check to fix 96837

2020-09-29 Thread Joel Hutton via Gcc-patches
 Hi All,

The following patch adds a simple check to prevent slp stmts from vector 
constructors being rearranged. vect_attempt_slp_rearrange_stmts tries to 
rearrange to avoid a load permutation.

This fixes PR target/96837 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96827
gcc/ChangeLog:

2020-09-29  Joel Hutton  

PR target/96837
* tree-vect-slp.c (vect_analyze_slp): Do not call 
vect_attempt_slp_rearrange_stmts for vector constructors.

gcc/testsuite/ChangeLog:

2020-09-29  Joel Hutton  

PR target/96837
* gcc.dg/vect/bb-slp-49.c: New test.From 2c738e2c0eddbc4fcdbf8ff2443bb809b36c7e28 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Tue, 29 Sep 2020 15:46:44 +0100
Subject: [PATCH] [SLP][VECT] Add check to fix 96827

Do not call vect_attempt_slp_rearrange_stmts if an slp instance is an
SLP_INSTANCE_ROOT_STMT, i.e. if the tree is built from a constructor
rather than a grouped store. This function is intended to rearrange
stmts in a reduction chain so they do not require load permutation.
Rearranging causes the resulting constructor to be in the wrong order.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-49.c | 28 +++
 gcc/tree-vect-slp.c   |  3 ++-
 2 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-49.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-49.c b/gcc/testsuite/gcc.dg/vect/bb-slp-49.c
new file mode 100644
index ..e7101fcff4627bb545549bdfefd33c2ed58aee7b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-49.c
@@ -0,0 +1,28 @@
+/* This checks that vectorized constructors have the correct ordering. */
+/* { dg-require-effective-target vect_int } */
+
+typedef int V __attribute__((__vector_size__(16)));
+
+__attribute__((__noipa__)) void
+foo (unsigned int x, V *y)
+{
+  unsigned int a[4] = { x + 0, x + 2, x + 4, x + 6 };
+  for (unsigned int i = 0; i < 3; ++i)
+if (a[i] == 1234)
+  a[i]--;
+  *y = (V) { a[3], a[2], a[1], a[0] };
+}
+
+int
+main ()
+{
+  V b;
+  foo (0, &b);
+  if (b[0] != 6 || b[1] != 4 || b[2] != 2 || b[3] != 0)
+__builtin_abort ();
+  return 0;
+}
+
+/* See that we vectorize an SLP instance.  */
+/* { dg-final { scan-tree-dump "Analyzing vectorizable constructor" "slp1" } } */
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "slp1" } } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index c44fd396bf0b69a4153e46026c545bebb3797551..7ba24e241deb76c0fd884ccfff04675d1b050ef7 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2495,7 +2495,8 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
   /* Reduction (there are no data-refs in the root).
 	 In reduction chain the order of the loads is not important.  */
   if (!STMT_VINFO_DATA_REF (stmt_info)
-	  && !REDUC_GROUP_FIRST_ELEMENT (stmt_info))
+	  && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)
+	  && !SLP_INSTANCE_ROOT_STMT (instance))
 	vect_attempt_slp_rearrange_stmts (instance);
 }
 
-- 
2.17.1



[vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-04-13 Thread Joel Hutton via Gcc-patches
Hi all,

These patches refactor the widening patterns in vect-patterns to use 
internal_fn instead of tree_codes.

Sorry about the delay, some changes to master made it a bit messier.

Bootstrapped and regression tested on aarch64.

Joel

> > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index
> > 854cbcff390..4a8ea67e62f 100644
> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -1245,7 +1245,7 @@ vect_recog_sad_pattern (vec_info *vinfo,  static
> > gimple *  vect_recog_widen_op_pattern (vec_info *vinfo,
> >  stmt_vec_info last_stmt_info, tree *type_out,
> > -tree_code orig_code, tree_code wide_code,
> > +tree_code orig_code, code_helper
> wide_code_or_ifn,
> 
> I think it'd be better to keep the original “wide_code” name and try to
> remove as many places as possible in which switching based on tree_code
> or internal_fn is necessary.  The recent gimple-match.h patches should
> help with that, but more routines might be needed.

Done.

> > @@ -1309,8 +1310,16 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
> >2, oprnd, half_type, unprom, vectype);
> >
> >tree var = vect_recog_temp_ssa_var (itype, NULL);
> > -  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
> > - oprnd[0], oprnd[1]);
> > +  gimple *pattern_stmt;
> > +  if (wide_code_or_ifn.is_tree_code ())
> > +pattern_stmt = gimple_build_assign (var, wide_code_or_ifn,
> > +   oprnd[0], oprnd[1]);
> > +  else
> > +{
> > +  internal_fn fn = as_internal_fn ((combined_fn) wide_code_or_ifn);
> > +  pattern_stmt = gimple_build_call_internal (fn, 2, oprnd[0], 
> > oprnd[1]);
> > +  gimple_call_set_lhs (pattern_stmt, var);
> > +}
> 
> For example, I think we should hide this inside a new:
> 
>   gimple_build (var, wide_code, oprnd[0], oprnd[1]);
> 
> that works directly on code_helper, similarly to the new code_helper
> gimple_build interfaces.

Done.

> > @@ -4513,14 +4513,16 @@ vect_gen_widened_results_half (vec_info
> *vinfo, enum tree_code code,
> >tree new_temp;
> >
> >/* Generate half of the widened result:  */
> > -  gcc_assert (op_type == TREE_CODE_LENGTH (code));
> >if (op_type != binary_op)
> >  vec_oprnd1 = NULL;
> > -  new_stmt = gimple_build_assign (vec_dest, code, vec_oprnd0,
> vec_oprnd1);
> > +  if (ch.is_tree_code ())
> > +new_stmt = gimple_build_assign (vec_dest, ch, vec_oprnd0,
> vec_oprnd1);
> > +  else
> > +new_stmt = gimple_build_call_internal (as_internal_fn ((combined_fn)
> ch),
> > +  2, vec_oprnd0, vec_oprnd1);
> 
> Similarly here.  I guess the combined_fn/internal_fn path will also need
> to cope with null trailing operands, for consistency with the tree_code one.
> 

Done.

> > @@ -4744,31 +4747,28 @@ vectorizable_conversion (vec_info *vinfo,
> >&& ! vec_stmt)
> >  return false;
> >
> > -  gassign *stmt = dyn_cast  (stmt_info->stmt);
> > -  if (!stmt)
> > +  gimple* stmt = stmt_info->stmt;
> > +  if (!(is_gimple_assign (stmt) || is_gimple_call (stmt)))
> >  return false;
> >
> > -  if (TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
> > -return false;
> > +  if (is_gimple_assign (stmt))
> > +  {
> > +code_or_ifn = gimple_assign_rhs_code (stmt);
> > +  }
> > +  else
> > +code_or_ifn = gimple_call_combined_fn (stmt);
> 
> It might be possible to use gimple_extract_op here (only recently added).
> This would also provide the number of operands directly, instead of
> needing “op_type”.  It would also provide an array of operands.
> 

Done.

> > -  code = gimple_assign_rhs_code (stmt);
> > -  if (!CONVERT_EXPR_CODE_P (code)
> > -  && code != FIX_TRUNC_EXPR
> > -  && code != FLOAT_EXPR
> > -  && code != WIDEN_PLUS_EXPR
> > -  && code != WIDEN_MINUS_EXPR
> > -  && code != WIDEN_MULT_EXPR
> > -  && code != WIDEN_LSHIFT_EXPR)
> 
> Is it safe to drop this check independently of parts 2 and 3?  (Genuine
> question, haven't checked in detail.)

It requires the parts 2 and 3. I've moved that change into this first patch.

> > @@ -4784,7 +4784,8 @@ vectorizable_conversion (vec_info *vinfo,
> >  }
> >
> >rhs_type = TREE_TYPE (op0);
> > -  if ((code != FIX_TRUNC_EXPR && code != FLOAT_EXPR)
> > +  if ((code_or_ifn.is_tree_code () && code_or_ifn != FIX_TRUNC_EXPR
> > +   && code_or_ifn != FLOAT_EXPR)
> 
> I don't think we want the is_tree_code condition here.  The existing
> != should work.
> 

Done.

> > @@ -11856,13 +11888,13 @@ supportable_widening_operation (vec_info
> *vinfo,
> >if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
> >  std::swap (c1, c2);
> >
> > -  if (code == FIX_TRUNC_EXPR)
> > +  if (code_or_ifn == FIX_TRUNC_EXPR)
> >  {
> >/* The signedness is determined from output operand.  */
> >optab1 = optab_for_tree_code (c1, vec

[ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2021-11-25 Thread Joel Hutton via Gcc-patches
Just a quick ping to check this hasn't been forgotten.

> -Original Message-
> From: Joel Hutton
> Sent: 12 November 2021 11:42
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> 
> Subject: RE: [vect-patterns] Refactor widen_plus/widen_minus as
> internal_fns
> 
> > please use #define INCLUDE_MAP before the system.h include instead.
> > Is it really necessary to build a new std::map for each optab lookup?!
> > That looks quite ugly and inefficient.  We'd usually - if necessary at
> > all - build a auto_vec > and .sort () and .bsearch () 
> > it.
> Ok, I'll rework this part. In the meantime, to address your other comment.
> 
> > I'm not sure I understand DEF_INTERNAL_OPTAB_MULTI_FN, neither this
> > cover letter nor the patch ChangeLog explains anything.
> 
> I'll attempt to clarify, if this makes things clearer I can include this in 
> the
> commit message of the respun patch:
> 
> DEF_INTERNAL_OPTAB_MULTI_FN is like DEF_INTERNAL_OPTAB_FN except it
> provides convenience wrappers for defining conversions that require a hi/lo
> split, like widening and narrowing operations.  Each definition for 
> will require an optab named  and two other optabs that you specify
> for signed and unsigned. The hi/lo pair is necessary because the widening
> operations take n narrow elements as inputs and return n/2 wide elements
> as outputs. The 'lo' operation operates on the first n/2 elements of input.
> The 'hi' operation operates on the second n/2 elements of input. Defining an
> internal_fn along with hi/lo variations allows a single internal function to 
> be
> returned from a vect_recog function that will later be expanded to hi/lo.
> 
> DEF_INTERNAL_OPTAB_MULTI_FN is used in internal-fn.def to register a
> widening internal_fn. It is defined differently in different places and 
> internal-
> fn.def is sourced from those places so the parameters given can be reused.
>   internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, later
> defined to generate the  'expand_' functions for the hi/lo versions of the fn.
>   internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the original
> and hi/lo variants of the internal_fn
> 
>  For example:
>  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI,
> IFN_VEC_WIDEN_PLUS_LO
> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_addl_hi_
> -> (u/s)addl2
>IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_addl_lo_
> -> (u/s)addl
> 
> This gives the same functionality as the previous
> WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into
> VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
> 
> Let me know if I'm not expressing this clearly.
> 
> Thanks,
> Joel


pr103523: Check for PLUS/MINUS support

2021-12-10 Thread Joel Hutton via Gcc-patches
Hi all,

This is to address pr103523.

bootstrapped and regression tested on aarch64.

Check for PLUS_EXPR/MINUS_EXPR support in vectorizable_induction.
PR103523 is an ICE on valid code:

void d(float *a, float b, int c) {
    float e;
    for (; c; c--, e += b)
      a[c] = e;
}

This is due to not checking for PLUS_EXPR support, which is missing in
VNx2sf mode. This causes an ICE at expand time. This patch adds a check
for support in vectorizable_induction.

gcc/ChangeLog:
        
PR tree-optimization/PR103523
* tree-vect-loop.c (vectorizable_induction): Check for
    PLUS_EXPR/MINUS_EXPR support.
diff --git a/gcc/testsuite/gcc.target/aarch64/pr103523.c b/gcc/testsuite/gcc.target/aarch64/pr103523.c
new file mode 100644
index ..736e8936c5f6768bdf098ddc37b2c21ab74ee0df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr103523.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+sve -mtune=neoverse-v1 -Ofast" } */
+
+void d(float *a, float b, int c) {
+float e;
+for (; c; c--, e += b)
+  a[c] = e;
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 7f544ba1fd5198dd32cda05e62382ab2e1e9bb50..f700d5e7ac2c05402407a46113320f79359906fa 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -8065,6 +8065,15 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   return false;
 }
 
+  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
+  gcc_assert (step_expr != NULL_TREE);
+  tree step_vectype = get_same_sized_vectype (TREE_TYPE (step_expr), vectype);
+
+  /* Check for backend support of PLUS/MINUS_EXPR. */
+  if (!directly_supported_p (PLUS_EXPR, step_vectype)
+  || !directly_supported_p (MINUS_EXPR, step_vectype))
+return false;
+
   if (!vec_stmt) /* transformation not required.  */
 {
   unsigned inside_cost = 0, prologue_cost = 0;
@@ -8124,10 +8133,6 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location, "transform induction phi.\n");
 
-  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
-  gcc_assert (step_expr != NULL_TREE);
-  tree step_vectype = get_same_sized_vectype (TREE_TYPE (step_expr), vectype);
-
   pe = loop_preheader_edge (iv_loop);
   /* Find the first insertion point in the BB.  */
   basic_block bb = gimple_bb (phi);


Re: pr103523: Check for PLUS/MINUS support

2021-12-10 Thread Joel Hutton via Gcc-patches
ok for backport to 11?

From: Richard Sandiford 
Sent: 10 December 2021 10:22
To: Joel Hutton 
Cc: GCC Patches ; Richard Biener 
Subject: Re: pr103523: Check for PLUS/MINUS support

Joel Hutton  writes:
> Hi all,
>
> This is to address pr103523.
>
> bootstrapped and regression tested on aarch64.
>
> Check for PLUS_EXPR/MINUS_EXPR support in vectorizable_induction.
> PR103523 is an ICE on valid code:
>
> void d(float *a, float b, int c) {
> float e;
> for (; c; c--, e += b)
>   a[c] = e;
> }
>
> This is due to not checking for PLUS_EXPR support, which is missing in
> VNx2sf mode. This causes an ICE at expand time. This patch adds a check
> for support in vectorizable_induction.
>
> gcc/ChangeLog:
>
> PR tree-optimization/PR103523

The bugzilla hook expects: PR tree-optimization/103523

> * tree-vect-loop.c (vectorizable_induction): Check for
> PLUS_EXPR/MINUS_EXPR support.

OK, thanks.

Richard


[ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-05-25 Thread Joel Hutton via Gcc-patches
Ping!

Just checking there is still interest in this. I'm assuming you've been busy 
with release.

Joel

> -Original Message-
> From: Joel Hutton
> Sent: 13 April 2022 16:53
> To: Richard Sandiford 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: [vect-patterns] Refactor widen_plus/widen_minus as internal_fns
> 
> Hi all,
> 
> These patches refactor the widening patterns in vect-patterns to use
> internal_fn instead of tree_codes.
> 
> Sorry about the delay, some changes to master made it a bit messier.
> 
> Bootstrapped and regression tested on aarch64.
> 
> Joel
> 
> > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > index 854cbcff390..4a8ea67e62f 100644
> > > --- a/gcc/tree-vect-patterns.c
> > > +++ b/gcc/tree-vect-patterns.c
> > > @@ -1245,7 +1245,7 @@ vect_recog_sad_pattern (vec_info *vinfo,
> > > static gimple *  vect_recog_widen_op_pattern (vec_info *vinfo,
> > >stmt_vec_info last_stmt_info, tree *type_out,
> > > -  tree_code orig_code, tree_code wide_code,
> > > +  tree_code orig_code, code_helper
> > wide_code_or_ifn,
> >
> > I think it'd be better to keep the original “wide_code” name and try
> > to remove as many places as possible in which switching based on
> > tree_code or internal_fn is necessary.  The recent gimple-match.h
> > patches should help with that, but more routines might be needed.
> 
> Done.
> 
> > > @@ -1309,8 +1310,16 @@ vect_recog_widen_op_pattern (vec_info
> *vinfo,
> > >  2, oprnd, half_type, unprom, vectype);
> > >
> > >tree var = vect_recog_temp_ssa_var (itype, NULL);
> > > -  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
> > > -   oprnd[0], oprnd[1]);
> > > +  gimple *pattern_stmt;
> > > +  if (wide_code_or_ifn.is_tree_code ())
> > > +pattern_stmt = gimple_build_assign (var, wide_code_or_ifn,
> > > + oprnd[0], oprnd[1]);
> > > +  else
> > > +{
> > > +  internal_fn fn = as_internal_fn ((combined_fn) wide_code_or_ifn);
> > > +  pattern_stmt = gimple_build_call_internal (fn, 2, oprnd[0], 
> > > oprnd[1]);
> > > +  gimple_call_set_lhs (pattern_stmt, var);
> > > +}
> >
> > For example, I think we should hide this inside a new:
> >
> >   gimple_build (var, wide_code, oprnd[0], oprnd[1]);
> >
> > that works directly on code_helper, similarly to the new code_helper
> > gimple_build interfaces.
> 
> Done.
> 
> > > @@ -4513,14 +4513,16 @@ vect_gen_widened_results_half (vec_info
> > *vinfo, enum tree_code code,
> > >tree new_temp;
> > >
> > >/* Generate half of the widened result:  */
> > > -  gcc_assert (op_type == TREE_CODE_LENGTH (code));
> > >if (op_type != binary_op)
> > >  vec_oprnd1 = NULL;
> > > -  new_stmt = gimple_build_assign (vec_dest, code, vec_oprnd0,
> > vec_oprnd1);
> > > +  if (ch.is_tree_code ())
> > > +new_stmt = gimple_build_assign (vec_dest, ch, vec_oprnd0,
> > vec_oprnd1);
> > > +  else
> > > +new_stmt = gimple_build_call_internal (as_internal_fn
> > > + ((combined_fn)
> > ch),
> > > +2, vec_oprnd0, vec_oprnd1);
> >
> > Similarly here.  I guess the combined_fn/internal_fn path will also
> > need to cope with null trailing operands, for consistency with the tree_code
> one.
> >
> 
> Done.
> 
> > > @@ -4744,31 +4747,28 @@ vectorizable_conversion (vec_info *vinfo,
> > >&& ! vec_stmt)
> > >  return false;
> > >
> > > -  gassign *stmt = dyn_cast  (stmt_info->stmt);
> > > -  if (!stmt)
> > > +  gimple* stmt = stmt_info->stmt;
> > > +  if (!(is_gimple_assign (stmt) || is_gimple_call (stmt)))
> > >  return false;
> > >
> > > -  if (TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
> > > -return false;
> > > +  if (is_gimple_assign (stmt))
> > > +  {
> > > +code_or_ifn = gimple_assign_rhs_code (stmt);  }  else
> > > +code_or_ifn = gimple_call_combined_fn (stmt);
> >
> > It might be possible to use gimple_extract_op here (only recently added).
> > This would also provide the number of operands directly, instead of
> > needing “op_type”.  It would also provide an array of operands.
>

[2/3][vect] Add widening add, subtract vect patterns

2020-11-12 Thread Joel Hutton via Gcc-patches
Hi all,

This patch adds widening add and widening subtract patterns to 
tree-vect-patterns.

All 3 patches together bootstrapped and regression tested on aarch64.

gcc/ChangeLog:

2020-11-12  Joel Hutton  

        * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
        * optabs-tree.c (optab_for_tree_code): optabs for widening 
adds,subtracts
        * optabs.def (OPTAB_D): define vectorized widen add, subtracts
        * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, 
subtracts
        * tree-inline.c (estimate_operator_cost): Add case for widening adds, 
subtracts
        * tree-vect-generic.c (expand_vector_operations_1): Add case for 
widening adds, subtracts
        * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern
        (vect_recog_widen_sub_pattern): New recog pattern
        (vect_recog_average_pattern): Update widened add code
        (vect_recog_average_pattern): Update widened add code
        * tree-vect-stmts.c (vectorizable_conversion): Add case for widened 
add, subtract
        (supportable_widening_operation): Add case for widened add, subtract
        * tree.def (WIDEN_ADD_EXPR): New tree code
        (WIDEN_SUB_EXPR): New tree code
        (VEC_WIDEN_ADD_HI_EXPR): New tree code
        (VEC_WIDEN_ADD_LO_EXPR): New tree code
        (VEC_WIDEN_SUB_HI_EXPR): New tree code
        (VEC_WIDEN_SUB_LO_EXPR): New tree code

gcc/testsuite/ChangeLog:

2020-11-12  Joel Hutton  

        * gcc.target/aarch64/vect-widen-add.c: New test.
        * gcc.target/aarch64/vect-widen-sub.c: New test.


Ok for trunk?
From e0c10ca554729b9e6d58dbd3f18ba72b2c3ee8bc Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Mon, 9 Nov 2020 15:44:18 +
Subject: [PATCH 2/3] [vect] Add widening add, subtract patterns

Add widening add, subtract patterns to tree-vect-patterns.
Add aarch64 tests for patterns.

fix sad
---
 gcc/expr.c|  6 ++
 gcc/optabs-tree.c | 17 
 gcc/optabs.def|  8 ++
 .../gcc.target/aarch64/vect-widen-add.c   | 90 +++
 .../gcc.target/aarch64/vect-widen-sub.c   | 90 +++
 gcc/tree-cfg.c|  8 ++
 gcc/tree-inline.c |  6 ++
 gcc/tree-vect-generic.c   |  4 +
 gcc/tree-vect-patterns.c  | 32 +--
 gcc/tree-vect-stmts.c | 15 +++-
 gcc/tree.def  |  6 ++
 11 files changed, 276 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c

diff --git a/gcc/expr.c b/gcc/expr.c
index ae16f07775870792729e3805436d7f2debafb6ca..ffc8aed5296174066849d9e0d73b1c352c20fd9e 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9034,6 +9034,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	  target, unsignedp);
   return target;
 
+case WIDEN_ADD_EXPR:
+case WIDEN_SUB_EXPR:
 case WIDEN_MULT_EXPR:
   /* If first operand is constant, swap them.
 	 Thus the following special case checks need only
@@ -9754,6 +9756,10 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return temp;
   }
 
+case VEC_WIDEN_ADD_HI_EXPR:
+case VEC_WIDEN_ADD_LO_EXPR:
+case VEC_WIDEN_SUB_HI_EXPR:
+case VEC_WIDEN_SUB_LO_EXPR:
 case VEC_WIDEN_MULT_HI_EXPR:
 case VEC_WIDEN_MULT_LO_EXPR:
 case VEC_WIDEN_MULT_EVEN_EXPR:
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 4dfda756932de1693667c39c6fabed043b20b63b..009dccfa3bd298bca7b3b45401a4cc2acc90ff21 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -170,6 +170,23 @@ optab_for_tree_code (enum tree_code code, const_tree type,
   return (TYPE_UNSIGNED (type)
 	  ? vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab);
 
+case VEC_WIDEN_ADD_LO_EXPR:
+  return (TYPE_UNSIGNED (type)
+	  ? vec_widen_uaddl_lo_optab  : vec_widen_saddl_lo_optab);
+
+case VEC_WIDEN_ADD_HI_EXPR:
+  return (TYPE_UNSIGNED (type)
+	  ? vec_widen_uaddl_hi_optab  : vec_widen_saddl_hi_optab);
+
+case VEC_WIDEN_SUB_LO_EXPR:
+  return (TYPE_UNSIGNED (type)
+	  ? vec_widen_usubl_lo_optab  : vec_widen_ssubl_lo_optab);
+
+case VEC_WIDEN_SUB_HI_EXPR:
+  return (TYPE_UNSIGNED (type)
+	  ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
+
+
 case VEC_UNPACK_HI_EXPR:
   return (TYPE_UNSIGNED (type)
 	  ? vec_unpacku_hi_optab : vec_unpacks_hi_optab);
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa14537d259bf90277751aac00d452a0d3f..a97cdb360781ca9c743e2991422c600626c75aa5 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -383,6 +383,14 @@ OPTAB_D (vec_widen_smult_even_optab, "vec_widen_smult_even_$a")
 OPTAB_D (vec_widen_smult_hi_optab, "vec_widen_smult_hi_$a")
 OPTAB_D (vec_

[1/3][aarch64] Add aarch64 support for vec_widen_add, vec_widen_sub patterns

2020-11-12 Thread Joel Hutton via Gcc-patches
Hi all,

This patch adds backend patterns for vec_widen_add, vec_widen_sub on aarch64.

All 3 patches together bootstrapped and regression tested on aarch64.

Ok for stage 1?

gcc/ChangeLog:

2020-11-12  Joel Hutton  

        * config/aarch64/aarch64-simd.md: New patterns 
vec_widen_saddl_lo/hi_
From 3e47bc562b83417a048e780bcde52fb2c9617df3 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Mon, 9 Nov 2020 15:35:57 +
Subject: [PATCH 1/3] [aarch64] Add vec_widen patterns to aarch64

Add widening add and subtract pattrerns to the aarch64
backend.
---
 gcc/config/aarch64/aarch64-simd.md | 94 ++
 1 file changed, 94 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 2cf6fe9154a2ee1b21ad9e8e2a6109805022be7f..b4f56a2295926f027bd53e7456eec729af0cd6df 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3382,6 +3382,100 @@
   [(set_attr "type" "neon__long")]
 )
 
+(define_expand "vec_widen_saddl_lo_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_saddl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_ssubl_lo_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_ssubl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+(define_expand "vec_widen_saddl_hi_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_saddl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_ssubl_hi_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_ssubl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+(define_expand "vec_widen_uaddl_lo_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_uaddl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_usubl_lo_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_usubl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_uaddl_hi_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_uaddl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_usubl_hi_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_usubl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
 
 (define_expand "aarch64_saddl2"
   [(match_operand: 0 "register_operand")
-- 
2.17.1



[3/3][aarch64] Add support for vec_widen_shift pattern

2020-11-12 Thread Joel Hutton via Gcc-patches
Hi all,

This patch adds support in the aarch64 backend for the vec_widen_shift 
vect-pattern and makes a minor mid-end fix to support it.

All 3 patches together bootstrapped and regression tested on aarch64.

Ok for stage 1?

gcc/ChangeLog:

2020-11-12  Joel Hutton  

        * config/aarch64/aarch64-simd.md: vec_widen_lshift_hi/lo patterns
        * tree-vect-stmts.c 
        (vectorizable_conversion): Fix for widen_lshift case

gcc/testsuite/ChangeLog:

2020-11-12  Joel Hutton  

        * gcc.target/aarch64/vect-widen-lshift.c: New test.
From 97af35b2d2a505dcefd8474cbd4bc3441b83ab02 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Thu, 12 Nov 2020 11:48:25 +
Subject: [PATCH 3/3] [AArch64][vect] vec_widen_lshift pattern

Add aarch64 vec_widen_lshift_lo/hi patterns and fix bug it triggers in
mid-end.
---
 gcc/config/aarch64/aarch64-simd.md| 66 +++
 .../gcc.target/aarch64/vect-widen-lshift.c| 60 +
 gcc/tree-vect-stmts.c |  9 ++-
 3 files changed, 133 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b4f56a2295926f027bd53e7456eec729af0cd6df..2bb39c530a1a861cb9bd3df0c2943f62bd6153d7 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4711,8 +4711,74 @@
   [(set_attr "type" "neon_sat_shift_reg")]
 )
 
+(define_expand "vec_widen_shiftl_lo_"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(match_operand:VQW 1 "register_operand" "w")
+			 (match_operand:SI 2
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+emit_insn (gen_aarch64_shll_internal (operands[0], operands[1],
+		 p, operands[2]));
+DONE;
+  }
+)
+
+(define_expand "vec_widen_shiftl_hi_"
+   [(set (match_operand: 0 "register_operand")
+	(unspec: [(match_operand:VQW 1 "register_operand" "w")
+			 (match_operand:SI 2
+			   "immediate_operand" "i")]
+			  VSHLL))]
+   "TARGET_SIMD"
+   {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+emit_insn (gen_aarch64_shll2_internal (operands[0], operands[1],
+		  p, operands[2]));
+DONE;
+   }
+)
+
 ;; vshll_n
 
+(define_insn "aarch64_shll_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(vec_select:
+			(match_operand:VQW 1 "register_operand" "w")
+			(match_operand:VQW 2 "vect_par_cnst_lo_half" ""))
+			 (match_operand:SI 3
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll\\t%0., %1., %3";
+else
+  return "shll\\t%0., %1., %3";
+  }
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_shll2_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(vec_select:
+			(match_operand:VQW 1 "register_operand" "w")
+			(match_operand:VQW 2 "vect_par_cnst_hi_half" ""))
+			 (match_operand:SI 3
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll2\\t%0., %1., %3";
+else
+  return "shll2\\t%0., %1., %3";
+  }
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
 (define_insn "aarch64_shll_n"
   [(set (match_operand: 0 "register_operand" "=w")
 	(unspec: [(match_operand:VD_BHSI 1 "register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c
new file mode 100644
index ..23ed93d1dcbc3ca559efa6708b4ed5855fb6a050
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c
@@ -0,0 +1,60 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include 
+#include 
+
+#define ARR_SIZE 1024
+
+/* Should produce an shll,shll2 pair*/
+void sshll_opt (int32_t *foo, int16_t *a, int16_t *b)
+{
+for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+{
+foo[i]   = a[i]   << 16;
+foo[i+1] = a[i+1] << 16;
+foo[i+2] = a[i+2] << 16;
+foo[i+3] = a[i+3] << 16;
+}
+}
+
+__attribute__((optimize (0)))
+void sshll_nonopt (int32_t *foo, int16_t *a, int16_t *b)
+{
+for( int i = 0; i < ARR_SIZE - 3;i=

Re: [1/3][aarch64] Add aarch64 support for vec_widen_add, vec_widen_sub patterns

2020-11-13 Thread Joel Hutton via Gcc-patches
Tests are still running, but I believe I've addressed the comment.

> There are ways in which we could reduce the amount of cut-&-paste here,
> but I guess everything is a trade-off between clarity and compactness.
> One extreme is to write them all out explicitly, another extreme would
> be to have one define_expand and various iterators and attributes.
>
> I think the vec_widen_mult_*_ patterns strike a good balance:
> the use ANY_EXTEND to hide the sign difference while still having
> separate hi and lo patterns:

Done

gcc/ChangeLog:

2020-11-13  Joel Hutton  

* config/aarch64/aarch64-simd.md: New patterns
  vec_widen_saddl_lo/hi_.
From c52fd11f5d471200c1292fad3bc04056e7721f06 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Mon, 9 Nov 2020 15:35:57 +
Subject: [PATCH 1/3] [aarch64] Add vec_widen patterns to aarch64

Add widening add and subtract patterns to the aarch64
backend. These allow taking vectors of N elements of size S
and performing and add/subtract on the high or low half
widening the resulting elements and storing N/2 elements of size 2*S.
These correspond to the addl,addl2,subl,subl2 instructions.
---
 gcc/config/aarch64/aarch64-simd.md | 47 ++
 1 file changed, 47 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 2cf6fe9154a..30299610635 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3382,6 +3382,53 @@
   [(set_attr "type" "neon__long")]
 )
 
+(define_expand "vec_widen_addl_lo_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_addl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_addl_hi_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_addl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_subl_lo_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_subl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_subl_hi_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_subl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
 
 (define_expand "aarch64_saddl2"
   [(match_operand: 0 "register_operand")
-- 
2.17.1



Re: [2/3][vect] Add widening add, subtract vect patterns

2020-11-13 Thread Joel Hutton via Gcc-patches
Tests are still running, but I believe I've addressed all the comments.

> Like Richard said, the new patterns need to be documented in md.texi
> and the new tree codes need to be documented in generic.texi.

Done.

> While we're using tree codes, I think we need to make the naming
> consistent with other tree codes: WIDEN_PLUS_EXPR instead of
> WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR.
> Same idea for the VEC_* codes.

Fixed.

> > gcc/ChangeLog:
> >
> > 2020-11-12  Joel Hutton  
> >
> > * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
> 
> Not that I personally care about this stuff (would love to see changelogs
> go away :-)) but some nits:
> 
> Each description is supposed to start with a capital letter and end with
> a full stop (even if it's not a complete sentence).  Same for the rest

Fixed.

> > * optabs-tree.c (optab_for_tree_code): optabs for widening 
> > adds,subtracts
> 
> The line limit for changelogs is 80 characters.  The entry should say
> what changed, so “Handle …” or “Add case for …” or something.

Fixed.

> > * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog 
> > ptatern
> 
> typo: pattern

Fixed.

> > Add widening add, subtract patterns to tree-vect-patterns.
> > Add aarch64 tests for patterns.
> >
> > fix sad
> 
> Would be good to expand on this for the final commit message.

'fix sad' was accidentally included when I squashed two commits. I've made all 
the commit messages more descriptive.

> > +
> > +case VEC_WIDEN_SUB_HI_EXPR:
> > +  return (TYPE_UNSIGNED (type)
> > +   ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
> > +
> > +
> 
> Nits: excess blank line at the end and excess space before the “:”s.

Fixed.

> > +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> > +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> > +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
> >  OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
> >  OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
> >  OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
> 
> Looks like the current code groups signed stuff together and
> unsigned stuff together, so would be good to follow that.

Fixed.

> Same comments as the previous patch about having a "+nosve" pragma
> and about the scan-assembler-times lines.  Same for the sub test.

Fixed.

> I am missing documentation in md.texi for the new patterns.  In
> particular I wonder why you need singed and unsigned variants
> for the add/subtract patterns.

Fixed. Signed and unsigned variants because they correspond to signed and
unsigned instructions, (uaddl/uaddl2, saddl/saddl2).

> The new functions should have comments before them.  Can probably
> just use the vect_recog_widen_mult_pattern comment as a template.

Fixed.

> > +case VEC_WIDEN_SUB_HI_EXPR:
> > +case VEC_WIDEN_SUB_LO_EXPR:
> > +case VEC_WIDEN_ADD_HI_EXPR:
> > +case VEC_WIDEN_ADD_LO_EXPR:
> > +  return false;
> > +
>
> I think these should get the same validity checking as
> VEC_WIDEN_MULT_HI_EXPR etc.

Fixed.

> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo,
> >   of the above pattern.  */
> >
> >tree plus_oprnd0, plus_oprnd1;
> > -  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > -&plus_oprnd0, &plus_oprnd1))
> > +  if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > +&plus_oprnd0, &plus_oprnd1)
> > + || vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
> > +&plus_oprnd0, &plus_oprnd1)))
> >  return NULL;
> >
> > tree sum_type = gimple_expr_type (last_stmt);
>
> I think we should make:
>
>   /* Any non-truncating sequence of conversions is OK here, since
>  with a successful match, the result of the ABS(U) is known to fit
>  within the nonnegative range of the result type.  (It cannot be the
>  negative of the minimum signed value due to the range of the widening
>  MINUS_EXPR.)  */
>   vect_unpromoted_value unprom_abs;
>   plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0,
>   &unprom_abs);
>
> specific to 

Re: [3/3][aarch64] Add support for vec_widen_shift pattern

2020-11-13 Thread Joel Hutton via Gcc-patches
Tests are still running, but I believe I've addressed all the comments.

> > +#include 
> > +
> 
> SVE targets will need a:
> 
> #pragma GCC target "+nosve"
> 
> here, since we'll generate different code for SVE.

Fixed.

> > +/* { dg-final { scan-assembler-times "shll\t" 1} } */
> > +/* { dg-final { scan-assembler-times "shll2\t" 1} } */
> 
> Very minor nit, sorry, but I think:
> 
> /* { dg-final { scan-assembler-times {\tshll\t} 1 } } */
> 
> would be better.  Using "…\t" works, but IIRC it shows up as a tab
> character in the testsuite result summary too.

Fixed. Minor nits welcome. :)


> OK for the aarch64 bits with the testsuite changes above.
ok?

gcc/ChangeLog:

2020-11-13  Joel Hutton  

* config/aarch64/aarch64-simd.md: Add vec_widen_lshift_hi/lo
  patterns.
* tree-vect-stmts.c
(vectorizable_conversion): Fix for widen_lshift case.

gcc/testsuite/ChangeLog:

2020-11-13  Joel Hutton  

* gcc.target/aarch64/vect-widen-lshift.c: New test.
From e8d3ed6fa739850eb649b97c250f1f2c650c34c1 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Thu, 12 Nov 2020 11:48:25 +
Subject: [PATCH 3/3] [AArch64][vect] vec_widen_lshift pattern

Add aarch64 vec_widen_lshift_lo/hi patterns and fix bug it triggers in
mid-end. This pattern takes one vector with N elements of size S, shifts
each element left by the element width and stores the results as N
elements of size 2*s (in 2 result vectors). The aarch64 backend
implements this with the shll,shll2 instruction pair.
---
 gcc/config/aarch64/aarch64-simd.md| 66 +++
 .../gcc.target/aarch64/vect-widen-lshift.c| 62 +
 gcc/tree-vect-stmts.c |  5 +-
 3 files changed, 131 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 30299610635..4ba799a27c9 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4664,8 +4664,74 @@
   [(set_attr "type" "neon_sat_shift_reg")]
 )
 
+(define_expand "vec_widen_shiftl_lo_"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(match_operand:VQW 1 "register_operand" "w")
+			 (match_operand:SI 2
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+emit_insn (gen_aarch64_shll_internal (operands[0], operands[1],
+		 p, operands[2]));
+DONE;
+  }
+)
+
+(define_expand "vec_widen_shiftl_hi_"
+   [(set (match_operand: 0 "register_operand")
+	(unspec: [(match_operand:VQW 1 "register_operand" "w")
+			 (match_operand:SI 2
+			   "immediate_operand" "i")]
+			  VSHLL))]
+   "TARGET_SIMD"
+   {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+emit_insn (gen_aarch64_shll2_internal (operands[0], operands[1],
+		  p, operands[2]));
+DONE;
+   }
+)
+
 ;; vshll_n
 
+(define_insn "aarch64_shll_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(vec_select:
+			(match_operand:VQW 1 "register_operand" "w")
+			(match_operand:VQW 2 "vect_par_cnst_lo_half" ""))
+			 (match_operand:SI 3
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll\\t%0., %1., %3";
+else
+  return "shll\\t%0., %1., %3";
+  }
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_shll2_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(vec_select:
+			(match_operand:VQW 1 "register_operand" "w")
+			(match_operand:VQW 2 "vect_par_cnst_hi_half" ""))
+			 (match_operand:SI 3
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll2\\t%0., %1., %3";
+else
+  return "shll2\\t%0., %1., %3";
+  }
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
 (define_insn "aarch64_shll_n"
   [(set (match_operand: 0 "register_operand" "=w")
 	(unspec: [(match_operand:VD_BHSI 1 "register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c
new file mode 100644
index 000..

[vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2021-11-11 Thread Joel Hutton via Gcc-patches
Hi all,

This refactor allows widening vect patterns (such as widen_plus/widen_minus) to 
be represented as
either internal_fns or tree_codes and replaces the current 
widen_plus/widen_minus with internal_fn versions. This refactor is split into 3 
patches.

Boostrapped and regression tested on aarch64.

Ok for stage 3?




0001-vect-patterns-Refactor-to-allow-internal_fn-s.patch
Description: 0001-vect-patterns-Refactor-to-allow-internal_fn-s.patch


0002-vect-patterns-Refactor-widen_plus-as-internal_fn.patch
Description: 0002-vect-patterns-Refactor-widen_plus-as-internal_fn.patch


0003-Remove-widen_plus-minus_expr-tree-codes.patch
Description: 0003-Remove-widen_plus-minus_expr-tree-codes.patch


RE: [vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2021-11-12 Thread Joel Hutton via Gcc-patches
> please use #define INCLUDE_MAP before the system.h include instead.
> Is it really necessary to build a new std::map for each optab lookup?!
> That looks quite ugly and inefficient.  We'd usually - if necessary at all - 
> build
> a auto_vec > and .sort () and .bsearch () it.
Ok, I'll rework this part. In the meantime, to address your other comment.

> I'm not sure I understand DEF_INTERNAL_OPTAB_MULTI_FN, neither this
> cover letter nor the patch ChangeLog explains anything.

I'll attempt to clarify, if this makes things clearer I can include this in the 
commit message of the respun patch:

DEF_INTERNAL_OPTAB_MULTI_FN is like DEF_INTERNAL_OPTAB_FN except it provides 
convenience wrappers for defining conversions that require a hi/lo split, like 
widening and narrowing operations.  Each definition
for  will require an optab named  and two other optabs that you 
specify for signed and unsigned. The hi/lo pair is necessary because the 
widening operations take n narrow elements as inputs and return n/2 wide 
elements as outputs. The 'lo' operation operates on the first n/2 elements of 
input. The 'hi' operation operates on the second n/2 elements of input. 
Defining an internal_fn along with hi/lo variations allows a single internal 
function to be returned from a vect_recog function that will later be expanded 
to hi/lo.

DEF_INTERNAL_OPTAB_MULTI_FN is used in internal-fn.def to register a widening 
internal_fn. It is defined differently in different places and internal-fn.def 
is sourced from those places so the parameters given can be reused.
  internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, later 
defined to generate the  'expand_' functions for the hi/lo versions of the fn.
  internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the original and 
hi/lo variants of the internal_fn

 For example:
 IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_addl_hi_ -> 
(u/s)addl2
   IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_addl_lo_ 
-> (u/s)addl

This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree 
codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.

Let me know if I'm not expressing this clearly.

Thanks,
Joel


RE: [vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2021-11-16 Thread Joel Hutton via Gcc-patches
Updated patch 2 with explanation included in commit message and changes 
requested.

Bootstrapped and regression tested on aarch64
> -Original Message-
> From: Joel Hutton
> Sent: 12 November 2021 11:42
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> 
> Subject: RE: [vect-patterns] Refactor widen_plus/widen_minus as
> internal_fns
> 
> > please use #define INCLUDE_MAP before the system.h include instead.
Done.

> > Is it really necessary to build a new std::map for each optab lookup?!
> > That looks quite ugly and inefficient.  We'd usually - if necessary at
> > all - build a auto_vec > and .sort () and .bsearch () 
> > it.
> Ok, I'll rework this part. In the meantime, to address your other comment.
Done.

> > I'm not sure I understand DEF_INTERNAL_OPTAB_MULTI_FN, neither this
> > cover letter nor the patch ChangeLog explains anything.
> 
> I'll attempt to clarify, if this makes things clearer I can include this in 
> the
> commit message of the respun patch:
> 
> DEF_INTERNAL_OPTAB_MULTI_FN is like DEF_INTERNAL_OPTAB_FN except it
> provides convenience wrappers for defining conversions that require a hi/lo
> split, like widening and narrowing operations.  Each definition for 
> will require an optab named  and two other optabs that you specify
> for signed and unsigned. The hi/lo pair is necessary because the widening
> operations take n narrow elements as inputs and return n/2 wide elements
> as outputs. The 'lo' operation operates on the first n/2 elements of input.
> The 'hi' operation operates on the second n/2 elements of input. Defining an
> internal_fn along with hi/lo variations allows a single internal function to 
> be
> returned from a vect_recog function that will later be expanded to hi/lo.
> 
> DEF_INTERNAL_OPTAB_MULTI_FN is used in internal-fn.def to register a
> widening internal_fn. It is defined differently in different places and 
> internal-
> fn.def is sourced from those places so the parameters given can be reused.
>   internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, later
> defined to generate the  'expand_' functions for the hi/lo versions of the fn.
>   internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the original
> and hi/lo variants of the internal_fn
> 
>  For example:
>  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI,
> IFN_VEC_WIDEN_PLUS_LO
> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_addl_hi_
> -> (u/s)addl2
>IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_addl_lo_
> -> (u/s)addl
> 
> This gives the same functionality as the previous
> WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into
> VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
> 
> Let me know if I'm not expressing this clearly.
> 
> Thanks,
> Joel


0001-vect-patterns-Refactor-to-allow-internal_fn-s.patch
Description: 0001-vect-patterns-Refactor-to-allow-internal_fn-s.patch


0002-vect-patterns-Refactor-widen_plus-as-internal_fn.patch
Description: 0002-vect-patterns-Refactor-widen_plus-as-internal_fn.patch


0003-Remove-widen_plus-minus_expr-tree-codes.patch
Description: 0003-Remove-widen_plus-minus_expr-tree-codes.patch


[vect-patterns][RFC] Refactor widening patterns to allow internal_fn's

2021-06-07 Thread Joel Hutton via Gcc-patches
Hi all,

This refactor allows widening patterns (such as widen_plus/widen_minus) to be 
represented as
either internal_fns or tree_codes. The widening patterns were originally added 
as tree codes with the expectation that they would be refactored later.

[vect-patterns] Refactor as internal_fn's

Refactor vect-patterns to allow patterns to be internal_fns starting
with widening_plus/minus patterns.


gcc/ChangeLog:

* gimple-match.h (class code_helper): Move code_helper class to more 
visible header.
* internal-fn.h (internal_fn_name): Add internal_fn range check.
* optabs-tree.h (supportable_convert_operation): Change function 
prototypes to use code_helper.
* tree-vect-patterns.c (vect_recog_widen_op_pattern): Refactor to use 
code_helper.
* tree-vect-stmts.c (vect_gen_widened_results_half): Refactor to use 
code_helper, build internal_fns.
(vect_create_vectorized_promotion_stmts): Refactor to use code_helper.
(vectorizable_conversion): Refactor to use code_helper.
(supportable_widening_operation): Refactor to use code_helper.
(supportable_narrowing_operation): Refactor to use code_helper.
* tree-vectorizer.h (supportable_widening_operation): Refactor to use 
code_helper.
(supportable_narrowing_operation): Refactor to use code_helper.
* tree.h (class code_helper): Refactor to use code_helper.


rb14487.patch
Description: rb14487.patch


[vect] Support min/max + index pattern

2021-05-05 Thread Joel Hutton via Gcc-patches
Hi all,

looking for some feedback on this, one thing I would like to draw attention to 
is the fact that this pattern requires 2 separate dependent reductions in the 
epilogue.
The accumulator vector for the maximum/minimum elements can be reduced to a 
scalar result trivially with a min/max, but getting the index from accumulator 
vector of indices is more complex and requires using the position of the 
maximum/minimum scalar result value within the accumulator vector to create a 
mask.

The given solution works but it's slightly messy. 
vect_create_epilogue_for_reduction creates the epilogue for one vectorized 
scalar stmt at a time. This modification makes one
invocation create the epilogue for both related stmts and marks the other as 
'done'. Alternate suggestions are welcome.

Joel

[vect] Support min/max + index pattern

Add the capability to vect-loop to support the following pattern.

for (int i = 0; i < n; i ++)
{
if (data[i] < best)
{
best = data[i];
best_i = i;
}
}

gcc/ChangeLog:

* tree-vect-loop.c (vect_reassociating_reduction_simple_p): New 

 
function.   

 
(vect_recog_minmax_index_pattern): New function.

 
(vect_is_simple_reduction): Add multi_use_reduction case.   

 
(vect_create_epilog_for_reduction): Add minmax+index epilogue handling.


minmax.patch
Description: minmax.patch


[AArch64] Remove backend support for widen-sub

2021-01-21 Thread Joel Hutton via Gcc-patches
Hi all, 

This patch removes support for the widening subtract operation in the aarch64 
backend as it is causing a performance regression.

In the following example:

#include 
extern void wdiff( int16_t d[16], uint8_t *restrict pix1, uint8_t *restrict 
pix2)
{
   for( int y = 0; y < 4; y++ )
  {    
    for( int x = 0; x < 4; x++ )
      d[x + y*4] = pix1[x] - pix2[x];
    pix1 += 16;  
    pix2 += 16;
 }

The widening minus pattern is recognized and substituted, but cannot be used 
due to the input vector type chosen in slp vectorization. This results in an 
attempt to do an 8 byte->8 short widening subtract operation, which is not 
supported. 

The issue is documented in PR 98772.


[AArch64] Remove backend support for widen-sub

This patch removes support for the widening subtract operation in the aarch64 
backend as it is causing a performance regression.

gcc/ChangeLog:

        * config/aarch64/aarch64-simd.md    
        (vec_widen_subl_lo_): Removed.
        (vec_widen_subl_hi_): Removed.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/vect-widen-sub.c: Removed.diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 41071b668fd0982f55f9e48510403b9f50fe0f60..c685c512e06917f9cf6bdcffcc41dd091dabfb4e 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3478,30 +3478,6 @@
   DONE;
 })
 
-(define_expand "vec_widen_subl_lo_"
-  [(match_operand: 0 "register_operand")
-   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
-   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
-  "TARGET_SIMD"
-{
-  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
-  emit_insn (gen_aarch64_subl_lo_internal (operands[0], operands[1],
-		 operands[2], p));
-  DONE;
-})
-
-(define_expand "vec_widen_subl_hi_"
-  [(match_operand: 0 "register_operand")
-   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
-   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
-  "TARGET_SIMD"
-{
-  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
-  emit_insn (gen_aarch64_subl_hi_internal (operands[0], operands[1],
-		 operands[2], p));
-  DONE;
-})
-
 (define_expand "aarch64_saddl2"
   [(match_operand: 0 "register_operand")
(match_operand:VQW 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
deleted file mode 100644
index a2bed63affbd091977df95a126da1f5b8c1d41d2..
--- a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
+++ /dev/null
@@ -1,92 +0,0 @@
-/* { dg-do run } */
-/* { dg-options "-O3 -save-temps" } */
-#include 
-#include 
-
-#pragma GCC target "+nosve"
-
-#define ARR_SIZE 1024
-
-/* Should produce an usubl */
-void usub_opt (uint32_t *foo, uint16_t *a, uint16_t *b)
-{
-for( int i = 0; i < ARR_SIZE - 3;i=i+4)
-{
-foo[i]   = a[i]   - b[i];
-foo[i+1] = a[i+1] - b[i+1];
-foo[i+2] = a[i+2] - b[i+2];
-foo[i+3] = a[i+3] - b[i+3];
-}
-}
-
-__attribute__((optimize (0)))
-void usub_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b)
-{
-for( int i = 0; i < ARR_SIZE - 3;i=i+4)
-{
-foo[i]   = a[i]   - b[i];
-foo[i+1] = a[i+1] - b[i+1];
-foo[i+2] = a[i+2] - b[i+2];
-foo[i+3] = a[i+3] - b[i+3];
-}
-}
-
-/* Should produce an ssubl */
-void ssub_opt (int32_t *foo, int16_t *a, int16_t *b)
-{
-for( int i = 0; i < ARR_SIZE - 3;i=i+4)
-{
-foo[i]   = a[i]   - b[i];
-foo[i+1] = a[i+1] - b[i+1];
-foo[i+2] = a[i+2] - b[i+2];
-foo[i+3] = a[i+3] - b[i+3];
-}
-}
-
-__attribute__((optimize (0)))
-void ssub_nonopt (int32_t *foo, int16_t *a, int16_t *b)
-{
-for( int i = 0; i < ARR_SIZE - 3;i=i+4)
-{
-foo[i]   = a[i]   - b[i];
-foo[i+1] = a[i+1] - b[i+1];
-foo[i+2] = a[i+2] - b[i+2];
-foo[i+3] = a[i+3] - b[i+3];
-}
-}
-
-
-void __attribute__((optimize (0)))
-init(uint16_t *a, uint16_t *b)
-{
-for( int i = 0; i < ARR_SIZE;i++)
-{
-  a[i] = i;
-  b[i] = 2*i;
-}
-}
-
-int __attribute__((optimize (0)))
-main()
-{
-uint32_t foo_arr[ARR_SIZE];
-uint32_t bar_arr[ARR_SIZE];
-uint16_t a[ARR_SIZE];
-uint16_t b[ARR_SIZE];
-
-init(a, b);
-usub_opt(foo_arr, a, b);
-usub_nonopt(bar_arr, a, b);
-if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
-  return 1;
-ssub_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b);
-ssub_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b);
-if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
-  return 1;
-return 0;
-}
-
-/* { dg-final { scan-assembler-times {\tusubl\t} 1} } */
-/* { dg-final { scan-assembler-times {\tusubl2\t} 1} } */
-/* { dg-final { scan-assembler-times {\tssubl\t} 1} } */
-/* { dg-final { scan-assembler-times {\tssubl2\t} 1} } */


[1/2][TREE] Add WIDEN_PLUS, WIDEN_MINUS pretty print

2020-12-10 Thread Joel Hutton via Gcc-patches
Hi all,

This adds missing pretty print for WIDEN_PLUS/MINUS and 
VEC_WIDEN_PLUS/MINUS_HI/LO

Bootstrapped and regression tested all together on aarch64.

Ok for trunk?

Add WIDEN_PLUS, WIDEN_MINUS pretty print

Add 'w+'/'w-' as WIDEN_PLUS/WIDEN_MINUS respectively.
Add 'VEC_WIDEN_PLUS/MINUS_HI/LO<...>' for   VEC_WIDEN_PLUS/MINUS_HI/LO

gcc/ChangeLog:

 * tree-pretty-print.c (dump_generic_node): Add case for
 VEC_WIDEN_(PLUS/MINUS)_(HI/LO)_EXPR and WIDEN_(PLUS/MINUS)_EXPR.
From 6ec022ddc86e97fd7779bdf075619d2e273c77d0 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Tue, 8 Dec 2020 19:33:36 +
Subject: [PATCH 1/2] Add WIDEN_PLUS, WIDEN_MINUS pretty print

Add 'w+'/'w-' as WIDEN_PLUS/WIDEN_MINUS respectively.
Add VEC_WIDEN_PLUS/MINUS_HI/LO<...> for VEC_WIDEN_PLUS/MINUS_HI/LO

gcc/ChangeLog:

* tree-pretty-print.c (dump_generic_node): Add case for
VEC_WIDEN_(PLUS/MINUS)_(HI/LO)_EXPR and WIDEN_(PLUS/MINUS)_EXPR.
(op_symbol_code): Add case for WIDEN_PLUS/MINUS.
---
 gcc/tree-pretty-print.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 5a93c4d8b9e..ae4b898782a 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -2649,6 +2649,8 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
   break;
 
   /* Binary arithmetic and logic expressions.  */
+case WIDEN_PLUS_EXPR:
+case WIDEN_MINUS_EXPR:
 case WIDEN_SUM_EXPR:
 case WIDEN_MULT_EXPR:
 case MULT_EXPR:
@@ -3580,6 +3582,10 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
 case VEC_SERIES_EXPR:
 case VEC_WIDEN_MULT_HI_EXPR:
 case VEC_WIDEN_MULT_LO_EXPR:
+case VEC_WIDEN_PLUS_HI_EXPR:
+case VEC_WIDEN_PLUS_LO_EXPR:
+case VEC_WIDEN_MINUS_HI_EXPR:
+case VEC_WIDEN_MINUS_LO_EXPR:
 case VEC_WIDEN_MULT_EVEN_EXPR:
 case VEC_WIDEN_MULT_ODD_EXPR:
 case VEC_WIDEN_LSHIFT_HI_EXPR:
@@ -4097,6 +4103,12 @@ op_symbol_code (enum tree_code code)
 case WIDEN_LSHIFT_EXPR:
   return "w<<";
 
+case WIDEN_PLUS_EXPR:
+  return "w+";
+
+case WIDEN_MINUS_EXPR:
+  return "w-";
+
 case POINTER_PLUS_EXPR:
   return "+";
 
-- 
2.17.1



[2/2][VECT] pr97929 fix

2020-12-10 Thread Joel Hutton via Gcc-patches
Hi all,

This patch addresses PR97929 by adding a missing case for WIDEN_PLUS/MINUS in 
vect_get_smallest_scalar_type. It also introduces a test to check for 
regression. 

One thing to note is that I saw a failure on c-c++-common/builtins.c which 
disappeared when I ran the test again. I assume this is due to the test being 
unreliable.

Bootstrapped and regression tested all together.

Ok for trunk?

[VECT] pr97929 fix

This addresses pr97929. The case for WIDEN_PLUS and WIDEN_MINUS were
missing in vect_get_smallest_scalar_type.

gcc/ChangeLog:

PR tree-optimization/97929
* tree-vect-data-refs.c (vect_get_smallest_scalar_type): Add
WIDEN_PLUS/WIDEN_MINUS case.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr97929.c: New test.From 1d7855aab9ea099f244e50baede24c50897f6eb5 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Wed, 9 Dec 2020 16:51:17 +
Subject: [PATCH 2/2] [VECT] pr97929 fix

This addresses pr97929. The case for WIDEN_PLUS and WIDEN_MINUS were
missing in vect_get_smallest_scalar_type.

gcc/ChangeLog:

PR tree-optimization/97929
* tree-vect-data-refs.c (vect_get_smallest_scalar_type): Add
WIDEN_PLUS/WIDEN_MINUS case.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr97929.c: New test.
---
 gcc/testsuite/gcc.dg/vect/pr97929.c | 10 ++
 gcc/tree-vect-data-refs.c   |  2 ++
 2 files changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr97929.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr97929.c b/gcc/testsuite/gcc.dg/vect/pr97929.c
new file mode 100644
index 000..a027b317151
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr97929.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+#include 
+#define ARR_SIZE 1024
+extern void foo (int32_t *bar, int16_t a)
+{
+for( int i = 0; i < ARR_SIZE;i++)
+{
+bar[i]  = a + 1;
+}
+}
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 18e36c89d14..137017091e8 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -136,6 +136,8 @@ vect_get_smallest_scalar_type (stmt_vec_info stmt_info,
 	  || gimple_assign_rhs_code (assign) == WIDEN_SUM_EXPR
 	  || gimple_assign_rhs_code (assign) == WIDEN_MULT_EXPR
 	  || gimple_assign_rhs_code (assign) == WIDEN_LSHIFT_EXPR
+	  || gimple_assign_rhs_code (assign) == WIDEN_PLUS_EXPR
+	  || gimple_assign_rhs_code (assign) == WIDEN_MINUS_EXPR
 	  || gimple_assign_rhs_code (assign) == FLOAT_EXPR))
 {
   tree rhs_type = TREE_TYPE (gimple_assign_rhs1 (assign));
-- 
2.17.1



RE: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-05-31 Thread Joel Hutton via Gcc-patches
> Can you post an updated patch (after the .cc renaming, and code_helper
> now already moved to tree.h).
> 
> Thanks,
> Richard.

Patches attached. They already incorporated the .cc rename, now rebased to be 
after the change to tree.h

Joel


0001-Refactor-to-allow-internal_fn-s.patch
Description: 0001-Refactor-to-allow-internal_fn-s.patch


0002-Refactor-widen_plus-as-internal_fn.patch
Description: 0002-Refactor-widen_plus-as-internal_fn.patch


0003-Remove-widen_plus-minus_expr-tree-codes.patch
Description: 0003-Remove-widen_plus-minus_expr-tree-codes.patch


RE: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-06-06 Thread Joel Hutton via Gcc-patches
> > Patches attached. They already incorporated the .cc rename, now
> > rebased to be after the change to tree.h
> 
> @@ -1412,8 +1412,7 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
>2, oprnd, half_type, unprom, vectype);
> 
>tree var = vect_recog_temp_ssa_var (itype, NULL);
> -  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
> - oprnd[0], oprnd[1]);
> +  gimple *pattern_stmt = gimple_build (var, wide_code, oprnd[0],
> oprnd[1]);
> 
> 
> you should be able to do without the new gimple_build overload
> by using
> 
>gimple_seq stmts = NULL;
>gimple_build (&stmts, wide_code, itype, oprnd[0], oprnd[1]);
>gimple *pattern_stmt = gimple_seq_last_stmt (stmts);
> 
> because 'gimple_build' is an existing API.

Done.

The gimple_build overload was at the request of Richard Sandiford, I assume 
removing it is ok with you Richard S?
>From Richard Sandiford:
> For example, I think we should hide this inside a new:
> 
>   gimple_build (var, wide_code, oprnd[0], oprnd[1]);
> 
> that works directly on code_helper, similarly to the new code_helper 
> gimple_build interfaces.




> 
> 
> -  if (TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
> +  if (gimple_get_lhs (stmt) == NULL_TREE ||
> +  TREE_CODE(gimple_get_lhs (stmt)) != SSA_NAME)
>  return false;
> 
> || go to the next line, space after TREE_CODE
> 

Done.

> +  bool widen_arith = false;
> +  gimple_match_op res_op;
> +  if (!gimple_extract_op (stmt, &res_op))
> +return false;
> +  code = res_op.code;
> +  op_type = res_op.num_ops;
> +
> +  if (is_gimple_assign (stmt))
> +  {
> +  widen_arith = (code == WIDEN_PLUS_EXPR
> +|| code == WIDEN_MINUS_EXPR
> +|| code == WIDEN_MULT_EXPR
> +|| code == WIDEN_LSHIFT_EXPR);
> + }
> +  else
> +  widen_arith = gimple_call_flags (stmt) & ECF_WIDEN;
> 
> there seem to be formatting issues.  Also shouldn't you check
> if (res_op.code.is_tree_code ()) instead if is_gimple_assign?
> I also don't like the ECF_WIDEN "trick", just do as with the
> tree codes and explicitely enumerate widening ifns here.
> 

Done. I've set widen_arith to False for the first patch as the second patch 
introduces the widening ifns.

> gimple_extract_op is a bit heavy-weight as well, so maybe
> instead simply do
> 
>   if (is_gimple_assign (stmt))
> {
>   code = gimple_assign_rhs_code (stmt);
> ...
> }
>   else if (gimple_call_internal_p (stmt))
> {
>   code = gimple_call_internal_fn (stmt);
> ...
> }
>   else
> return false;

The patch was originally written as above, it was changed to use 
gimple_extract_op at the request of Richard Sandiford. I prefer 
gimple_extract_op as it's more compact, but I don't have strong feelings. If 
the Richards can agree on either version I'm happy.


>From Richard Sandiford:
> > +  if (is_gimple_assign (stmt))
> > +  {
> > +code_or_ifn = gimple_assign_rhs_code (stmt);  }  else
> > +code_or_ifn = gimple_call_combined_fn (stmt);
> 
> It might be possible to use gimple_extract_op here (only recently added).
> This would also provide the number of operands directly, instead of 
> needing "op_type".  It would also provide an array of operands.


> 
> +  code_helper c1=MAX_TREE_CODES, c2=MAX_TREE_CODES;
> 
> spaces before/after '='
> 

Done.

> @@ -12061,13 +12105,16 @@ supportable_widening_operation (vec_info
> *vinfo,
>if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
>  std::swap (c1, c2);
> 
> +
>if (code == FIX_TRUNC_EXPR)
>  {
> 
> unnecessary whitespace change.
> 
Fixed.

> diff --git a/gcc/tree.h b/gcc/tree.h
> index
> f84958933d51144bb6ce7cc41eca5f7f06814550..e51e34c051d9b91d1c02a4b2
> fefdb2b15606a36f
> 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -92,6 +92,10 @@ public:
>bool is_fn_code () const { return rep < 0; }
>bool is_internal_fn () const;
>bool is_builtin_fn () const;
> +  enum tree_code as_tree_code () const { return is_tree_code () ?
> +(tree_code)* this : MAX_TREE_CODES; }
> +  combined_fn as_fn_code () const { return is_fn_code () ? (combined_fn)
> *this
> +: CFN_LAST;}
> 
> hmm, the other as_* functions we have are not member functions.
> Also this semantically differs from the tree_code () conversion
> operator (that one was supposed to be "cheap").  The existing
> as_internal_fn for example is documented as being valid only if
> the code is actually an internal fn.  I see you are introducing
> the new function as convenience to get a "safe" not-a-X value,
> so maybe they should be called safe_as_tree_code () instead?
> 
SGTM. Done

> 
>int get_rep () const { return rep; }
>bool operator== (const code_helper &other) { return rep == other.rep; }
>bool operator!= (const code_helper &other) { return rep != other.rep; }
> @@ -6657,6 +6661,54 @@ extern unsigned fndecl_dealloc_argno (tree);
> if nonnull, set the second argument to the r

RE: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-06-07 Thread Joel Hutton via Gcc-patches
Thanks Richard,

> I thought the potential problem with the above is that gimple_build is a
> folding interface, so in principle it's allowed to return an existing SSA_NAME
> set by an existing statement (or even a constant).
> I think in this context we do need to force a new statement to be created.

Before I make any changes, I'd like to check we're all on the same page.

richi, are you ok with the gimple_build function, perhaps with a different name 
if you are concerned with overloading? we could use gimple_ch_build or 
gimple_code_helper_build?

Similarly are you ok with the use of gimple_extract_op? I would lean towards 
using it as it is cleaner, but I don't have strong feelings.

Joel

> -Original Message-
> From: Richard Sandiford 
> Sent: 07 June 2022 09:18
> To: Joel Hutton 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: Re: [ping][vect-patterns] Refactor widen_plus/widen_minus as
> internal_fns
> 
> Joel Hutton  writes:
> >> > Patches attached. They already incorporated the .cc rename, now
> >> > rebased to be after the change to tree.h
> >>
> >> @@ -1412,8 +1412,7 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
> >>2, oprnd, half_type, unprom, vectype);
> >>
> >>tree var = vect_recog_temp_ssa_var (itype, NULL);
> >> -  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
> >> - oprnd[0], oprnd[1]);
> >> +  gimple *pattern_stmt = gimple_build (var, wide_code, oprnd[0],
> >> oprnd[1]);
> >>
> >>
> >> you should be able to do without the new gimple_build overload by
> >> using
> >>
> >>gimple_seq stmts = NULL;
> >>gimple_build (&stmts, wide_code, itype, oprnd[0], oprnd[1]);
> >>gimple *pattern_stmt = gimple_seq_last_stmt (stmts);
> >>
> >> because 'gimple_build' is an existing API.
> >
> > Done.
> >
> > The gimple_build overload was at the request of Richard Sandiford, I
> assume removing it is ok with you Richard S?
> > From Richard Sandiford:
> >> For example, I think we should hide this inside a new:
> >>
> >>   gimple_build (var, wide_code, oprnd[0], oprnd[1]);
> >>
> >> that works directly on code_helper, similarly to the new code_helper
> >> gimple_build interfaces.
> 
> I thought the potential problem with the above is that gimple_build is a
> folding interface, so in principle it's allowed to return an existing SSA_NAME
> set by an existing statement (or even a constant).
> I think in this context we do need to force a new statement to be created.
> 
> Of course, the hope is that there wouldn't still be such folding opportunities
> at this stage, but I don't think it's guaranteed (especially with options
> fuzzing).
> 
> Sind I was mentioned :-) ...
> 
> Could you run the patch through contrib/check_GNU_style.py?
> There seem to be a few long lines.
> 
> > +  if (res_op.code.is_tree_code ())
> 
> Do you need this is_tree_code ()?  These comparisons…
> 
> > +  {
> > +  widen_arith = (code == WIDEN_PLUS_EXPR
> > +|| code == WIDEN_MINUS_EXPR
> > +|| code == WIDEN_MULT_EXPR
> > +|| code == WIDEN_LSHIFT_EXPR);
> 
> …ought to be safe unconditionally.
> 
> > + }
> > +  else
> > +  widen_arith = false;
> > +
> > +  if (!widen_arith
> > +  && !CONVERT_EXPR_CODE_P (code)
> > +  && code != FIX_TRUNC_EXPR
> > +  && code != FLOAT_EXPR)
> > +return false;
> >
> >/* Check types of lhs and rhs.  */
> > -  scalar_dest = gimple_assign_lhs (stmt);
> > +  scalar_dest = gimple_get_lhs (stmt);
> >lhs_type = TREE_TYPE (scalar_dest);
> >vectype_out = STMT_VINFO_VECTYPE (stmt_info);
> >
> > @@ -4938,10 +4951,14 @@ vectorizable_conversion (vec_info *vinfo,
> >
> >if (op_type == binary_op)
> >  {
> > -  gcc_assert (code == WIDEN_MULT_EXPR || code ==
> WIDEN_LSHIFT_EXPR
> > - || code == WIDEN_PLUS_EXPR || code ==
> WIDEN_MINUS_EXPR);
> > +  gcc_assert (code == WIDEN_MULT_EXPR
> > + || code == WIDEN_LSHIFT_EXPR
> > + || code == WIDEN_PLUS_EXPR
> > + || code == WIDEN_MINUS_EXPR);
> >
> > -  op1 = gimple_assign_rhs2 (stmt);
> > +
> > +  op1 = is_gimple_assign (stmt) ? gimple_assign_rhs2 (stmt) :
> > +gimple_call_arg (stmt, 0);
> > 

[ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-06-09 Thread Joel Hutton via Gcc-patches
> Before I make any changes, I'd like to check we're all on the same page.
> 
> richi, are you ok with the gimple_build function, perhaps with a different
> name if you are concerned with overloading? we could use gimple_ch_build
> or gimple_code_helper_build?
> 
> Similarly are you ok with the use of gimple_extract_op? I would lean towards
> using it as it is cleaner, but I don't have strong feelings.
> 
> Joel

Ping. Just looking for some confirmation before I rework this patch. It would 
be good to get some agreement on this as Tamar is blocked on this patch.

Joel



> -Original Message-
> From: Joel Hutton
> Sent: 07 June 2022 10:02
> To: Richard Sandiford 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: RE: [ping][vect-patterns] Refactor widen_plus/widen_minus as
> internal_fns
> 
> Thanks Richard,
> 
> > I thought the potential problem with the above is that gimple_build is
> > a folding interface, so in principle it's allowed to return an
> > existing SSA_NAME set by an existing statement (or even a constant).
> > I think in this context we do need to force a new statement to be created.
> 
> Before I make any changes, I'd like to check we're all on the same page.
> 
> richi, are you ok with the gimple_build function, perhaps with a different
> name if you are concerned with overloading? we could use gimple_ch_build
> or gimple_code_helper_build?
> 
> Similarly are you ok with the use of gimple_extract_op? I would lean towards
> using it as it is cleaner, but I don't have strong feelings.
> 
> Joel
> 
> > -Original Message-
> > From: Richard Sandiford 
> > Sent: 07 June 2022 09:18
> > To: Joel Hutton 
> > Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> > Subject: Re: [ping][vect-patterns] Refactor widen_plus/widen_minus as
> > internal_fns
> >
> > Joel Hutton  writes:
> > >> > Patches attached. They already incorporated the .cc rename, now
> > >> > rebased to be after the change to tree.h
> > >>
> > >> @@ -1412,8 +1412,7 @@ vect_recog_widen_op_pattern (vec_info
> *vinfo,
> > >>2, oprnd, half_type, unprom, vectype);
> > >>
> > >>tree var = vect_recog_temp_ssa_var (itype, NULL);
> > >> -  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
> > >> - oprnd[0], oprnd[1]);
> > >> +  gimple *pattern_stmt = gimple_build (var, wide_code, oprnd[0],
> > >> oprnd[1]);
> > >>
> > >>
> > >> you should be able to do without the new gimple_build overload by
> > >> using
> > >>
> > >>gimple_seq stmts = NULL;
> > >>gimple_build (&stmts, wide_code, itype, oprnd[0], oprnd[1]);
> > >>gimple *pattern_stmt = gimple_seq_last_stmt (stmts);
> > >>
> > >> because 'gimple_build' is an existing API.
> > >
> > > Done.
> > >
> > > The gimple_build overload was at the request of Richard Sandiford, I
> > assume removing it is ok with you Richard S?
> > > From Richard Sandiford:
> > >> For example, I think we should hide this inside a new:
> > >>
> > >>   gimple_build (var, wide_code, oprnd[0], oprnd[1]);
> > >>
> > >> that works directly on code_helper, similarly to the new
> > >> code_helper gimple_build interfaces.
> >
> > I thought the potential problem with the above is that gimple_build is
> > a folding interface, so in principle it's allowed to return an
> > existing SSA_NAME set by an existing statement (or even a constant).
> > I think in this context we do need to force a new statement to be created.
> >
> > Of course, the hope is that there wouldn't still be such folding
> > opportunities at this stage, but I don't think it's guaranteed
> > (especially with options fuzzing).
> >
> > Sind I was mentioned :-) ...
> >
> > Could you run the patch through contrib/check_GNU_style.py?
> > There seem to be a few long lines.
> >
> > > +  if (res_op.code.is_tree_code ())
> >
> > Do you need this is_tree_code ()?  These comparisons…
> >
> > > +  {
> > > +  widen_arith = (code == WIDEN_PLUS_EXPR
> > > +  || code == WIDEN_MINUS_EXPR
> > > +  || code == WIDEN_MULT_EXPR
> > > +  || code == WIDEN_LSHIFT_EXPR);
> >
> > …ought to be safe unconditionally.
> >
> > > 

[Vect] Fix mask check on Scatter loads/stores

2021-03-10 Thread Joel Hutton via Gcc-patches
Hi all,

This patch fixes PR99102. For masked gather loads/scatter stores the
'loop_masks' variable was checked to be non-null, but the 'final_mask' was the
actual mask used.

bootstrapped and regression tested on aarch64. Regression tested on aarch64_sve
under qemu.

 [Vect] Fix mask check on Scatter loads/stores

Previously, IFN_MASK_SCATTER_STORE was used if 'loop_masks' was
non-null, but the mask used is 'final_mask'. This caused a bug where
a 'MASK_STORE' was vectorized into a 'SCATTER_STORE' instead of a
'MASK_SCATTER_STORE'. This fixes PR target/99102.

gcc/ChangeLog:

PR target/99102
* tree-vect-stmts.c (vectorizable_store): Fix scatter store mask
check condition.
(vectorizable_load): Fix gather load mask check condition.

gcc/testsuite/ChangeLog:

PR target/99102
* gcc.target/aarch64/sve/pr99102.c: New test.


diff
Description: diff


RE: [Vect] Fix mask check on Scatter loads/stores

2021-03-10 Thread Joel Hutton via Gcc-patches
>> gcc/testsuite/ChangeLog:
>>
>>   PR target/99102
>>   * gcc.target/aarch64/sve/pr99102.c: New test.
>
>(The filename is out of date, but the git hook would catch that.)
Fixed and committed.

>
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c 
>> b/gcc/testsuite/gcc.dg/vect/pr99102.c
>> new file mode 100644
>> index 
>> ..9d321b97a68c05ad08646e8e2d69
>> 79c2030c65e6
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/pr99102.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-options "-O2 -ftree-vectorize" } */
>> +/* { dg-additional-options "-msve-vector-bits=256" { target 
>> +aarch64_sve } } */
>
>I should have realised this earlier, sorry, but I think we want this to be
>conditional on aarch64_sve256_hw instead.  Same for the scan-tree-dump below.
>
>When running the testsuite on SVE hardware, the above would force a VL of 256
>even if the actual VL is different, so the test would fail at runtime for
>128-bit SVE, 512-bit SVE, etc.
>
>Thanks,
>Richard.
>

Patch attached to add this, Ok?

[testsuite] Fix target selector for pr99102.c

The target selector should explicitly choose 256 bit hardware as
explicit 256 bit compiler options are used to trigger the bug.

gcc/testsuite/ChangeLog:

>---* gcc.dg/vect/pr99102.c: Fix target selector.


0002-testsuite-Fix-target-selector-for-pr99102.c.patch
Description: 0002-testsuite-Fix-target-selector-for-pr99102.c.patch


Re: GCC 11 backport does not build (no "directly_supported_p") - was: Re: pr103523: Check for PLUS/MINUS support

2021-12-13 Thread Joel Hutton via Gcc-patches
My mistake, reworked patch. Tests are still running.

From: Richard Biener 
Sent: 13 December 2021 14:47
To: gcc-patches@gcc.gnu.org ; Tobias Burnus 
; Joel Hutton ; Richard Sandiford 

Cc: GCC Patches ; Richard Biener 
Subject: Re: GCC 11 backport does not build (no "directly_supported_p") - was: 
Re: pr103523: Check for PLUS/MINUS support

On December 13, 2021 3:27:50 PM GMT+01:00, Tobias Burnus 
 wrote:
>Hi Joel,
>
>your patch fails here with:
>
>../../repos/gcc-11-commit/gcc/tree-vect-loop.c:8000:8: error:
>‘directly_supported_p’ was not declared in this scope
>  8000 |   if (!directly_supported_p (PLUS_EXPR, step_vectype)
>   |^~~~
>
>And "git grep" shows that this is only present in:
>
>gcc/tree-vect-loop.c:  if (!directly_supported_p (PLUS_EXPR, step_vectype)
>gcc/tree-vect-loop.c:  || !directly_supported_p (MINUS_EXPR,
>step_vectype))
>
>That's different on mainline, which offers that function.

Just as a reminder, backports need regular bootstrap and regtest validation on 
the respective branches.

Richard.

>Tobias
>
>On 10.12.21 14:24, Joel Hutton via Gcc-patches wrote:
>> ok for backport to 11?
>> ________
>> From: Richard Sandiford 
>> Sent: 10 December 2021 10:22
>> To: Joel Hutton 
>> Cc: GCC Patches ; Richard Biener 
>> Subject: Re: pr103523: Check for PLUS/MINUS support
>>
>> Joel Hutton  writes:
>>> Hi all,
>>>
>>> This is to address pr103523.
>>>
>>> bootstrapped and regression tested on aarch64.
>>>
>>> Check for PLUS_EXPR/MINUS_EXPR support in vectorizable_induction.
>>> PR103523 is an ICE on valid code:
>>>
>>> void d(float *a, float b, int c) {
>>>  float e;
>>>  for (; c; c--, e += b)
>>>a[c] = e;
>>> }
>>>
>>> This is due to not checking for PLUS_EXPR support, which is missing in
>>> VNx2sf mode. This causes an ICE at expand time. This patch adds a check
>>> for support in vectorizable_induction.
>>>
>>> gcc/ChangeLog:
>>>
>>>  PR tree-optimization/PR103523
>> The bugzilla hook expects: PR tree-optimization/103523
>>
>>>  * tree-vect-loop.c (vectorizable_induction): Check for
>>>  PLUS_EXPR/MINUS_EXPR support.
>> OK, thanks.
>>
>> Richard
>-
>Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
>München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
>Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
>München, HRB 106955

From d6ac8751dfdf3520f90b2ad8d09ebc452cc42831 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Mon, 13 Dec 2021 14:49:55 +
Subject: [PATCH] vect-loop: fix build

Previous commit broke build as it relied on directly_supported_p which
is not in 11. This reworks to avoid using directly_supported_p.

gcc/ChangeLog:

	* tree-vect-loop.c (vectorizable_induction): Rework to avoid
directly_supported_p
---
 gcc/tree-vect-loop.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 54320681119d880ec2cbe641fe16ee73e552f57d..8f464e61780dbb2df5836022d77307c5c90a4b6f 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -7997,8 +7997,14 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   tree step_vectype = get_same_sized_vectype (TREE_TYPE (step_expr), vectype);
 
   /* Check for backend support of PLUS/MINUS_EXPR. */
-  if (!directly_supported_p (PLUS_EXPR, step_vectype)
-  || !directly_supported_p (MINUS_EXPR, step_vectype))
+  direct_optab ot_plus = optab_for_tree_code (tree_code (PLUS_EXPR),
+		 step_vectype, optab_default);
+  direct_optab ot_minus = optab_for_tree_code (tree_code (MINUS_EXPR),
+		 step_vectype, optab_default);
+  if (ot_plus == unknown_optab
+  || ot_minus == unknown_optab
+  || optab_handler (ot_minus, TYPE_MODE (step_vectype)) == CODE_FOR_nothing
+  || optab_handler (ot_plus, TYPE_MODE (step_vectype)) == CODE_FOR_nothing);
 return false;
 
   if (!vec_stmt) /* transformation not required.  */
-- 
2.17.1



RE: GCC 11 backport does not build (no "directly_supported_p") - was: Re: pr103523: Check for PLUS/MINUS support

2021-12-14 Thread Joel Hutton via Gcc-patches
Bootstrapped and regression tested on releases/gcc-11 on aarch64.

Ok for 11?

Previous commit broke build as it relied on directly_supported_p which
is not in 11. This reworks to avoid using directly_supported_p.

gcc/ChangeLog:

  PR bootstrap/103688
  * tree-vect-loop.c (vectorizable_induction): Rework to avoid
directly_supported_p

From: Joel Hutton 
Sent: 13 December 2021 15:02
To: Richard Biener ; gcc-patches@gcc.gnu.org; 
Tobias Burnus ; Richard Sandiford 

Cc: Richard Biener 
Subject: Re: GCC 11 backport does not build (no "directly_supported_p") - was: 
Re: pr103523: Check for PLUS/MINUS support

My mistake, reworked patch. Tests are still running.

From: Richard Biener 
mailto:richard.guent...@gmail.com>>
Sent: 13 December 2021 14:47
To: gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org> 
mailto:gcc-patches@gcc.gnu.org>>; Tobias Burnus 
mailto:tob...@codesourcery.com>>; Joel Hutton 
mailto:joel.hut...@arm.com>>; Richard Sandiford 
mailto:richard.sandif...@arm.com>>
Cc: GCC Patches mailto:gcc-patches@gcc.gnu.org>>; 
Richard Biener mailto:rguent...@suse.de>>
Subject: Re: GCC 11 backport does not build (no "directly_supported_p") - was: 
Re: pr103523: Check for PLUS/MINUS support

On December 13, 2021 3:27:50 PM GMT+01:00, Tobias Burnus 
mailto:tob...@codesourcery.com>> wrote:
>Hi Joel,
>
>your patch fails here with:
>
>../../repos/gcc-11-commit/gcc/tree-vect-loop.c:8000:8: error:
>'directly_supported_p' was not declared in this scope
>  8000 |   if (!directly_supported_p (PLUS_EXPR, step_vectype)
>   |^~~~
>
>And "git grep" shows that this is only present in:
>
>gcc/tree-vect-loop.c:  if (!directly_supported_p (PLUS_EXPR, step_vectype)
>gcc/tree-vect-loop.c:  || !directly_supported_p (MINUS_EXPR,
>step_vectype))
>
>That's different on mainline, which offers that function.

Just as a reminder, backports need regular bootstrap and regtest validation on 
the respective branches.

Richard.

>Tobias
>
>On 10.12.21 14:24, Joel Hutton via Gcc-patches wrote:
>> ok for backport to 11?
>> 
>> From: Richard Sandiford 
>> mailto:richard.sandif...@arm.com>>
>> Sent: 10 December 2021 10:22
>> To: Joel Hutton mailto:joel.hut...@arm.com>>
>> Cc: GCC Patches mailto:gcc-patches@gcc.gnu.org>>; 
>> Richard Biener mailto:rguent...@suse.de>>
>> Subject: Re: pr103523: Check for PLUS/MINUS support
>>
>> Joel Hutton mailto:joel.hut...@arm.com>> writes:
>>> Hi all,
>>>
>>> This is to address pr103523.
>>>
>>> bootstrapped and regression tested on aarch64.
>>>
>>> Check for PLUS_EXPR/MINUS_EXPR support in vectorizable_induction.
>>> PR103523 is an ICE on valid code:
>>>
>>> void d(float *a, float b, int c) {
>>>  float e;
>>>  for (; c; c--, e += b)
>>>a[c] = e;
>>> }
>>>
>>> This is due to not checking for PLUS_EXPR support, which is missing in
>>> VNx2sf mode. This causes an ICE at expand time. This patch adds a check
>>> for support in vectorizable_induction.
>>>
>>> gcc/ChangeLog:
>>>
>>>  PR tree-optimization/PR103523
>> The bugzilla hook expects: PR tree-optimization/103523
>>
>>>  * tree-vect-loop.c (vectorizable_induction): Check for
>>>  PLUS_EXPR/MINUS_EXPR support.
>> OK, thanks.
>>
>> Richard
>-
>Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
>München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
>Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
>München, HRB 106955


0001-vect-loop-fix-build.patch
Description: 0001-vect-loop-fix-build.patch


RE: GCC 11 backport does not build (no "directly_supported_p") - was: Re: pr103523: Check for PLUS/MINUS support

2021-12-14 Thread Joel Hutton via Gcc-patches
> +  if (ot_plus == unknown_optab
> +  || ot_minus == unknown_optab
> +  || optab_handler (ot_minus, TYPE_MODE (step_vectype)) ==
> CODE_FOR_nothing
> +  || optab_handler (ot_plus, TYPE_MODE (step_vectype)) ==
> + CODE_FOR_nothing)
>  return false;
> 
> Won't optab_handler just return CODE_FOR_nothing for unknown_optab?

I was taking the check used in directly_supported_p

return (optab != unknown_optab$
&& optab_handler (optab, TYPE_MODE (type)) != CODE_FOR_nothing);$

> Anyway, I think best would be to write it as:
>   if (!target_supports_op_p (step_vectype, PLUS_EXPR, optab_default)
>   || !target_supports_op_p (step_vectype, MINUS_EXPR, optab_default))
> return false;
Looks good to me.

Patch attached.

Tests running on gcc-11 on aarch64. 

Ok for 11 once tests come back?


0001-vect-loop-fix-build.patch
Description: 0001-vect-loop-fix-build.patch