date:20211111

Re: [PATCH] testsuite/102690 - XFAIL g++.dg/warn/Warray-bounds-16.C

2021-11-11 Thread Richard Biener via Gcc-patches

On Wed, 10 Nov 2021, Martin Sebor wrote:

> On 11/10/21 3:09 AM, Richard Biener via Gcc-patches wrote:
> > This XFAILs the bogus diagnostic test and rectifies the expectation
> > on the optimization.
> > 
> > Tested on x86_64-unknown-linux-gnu, pushed.
> > 
> > 2021-11-10  Richard Biener  
> > 
> >  PR testsuite/102690
> >  * g++.dg/warn/Warray-bounds-16.C: XFAIL diagnostic part
> >  and optimization.
> > ---
> >   gcc/testsuite/g++.dg/warn/Warray-bounds-16.C | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-16.C
> > b/gcc/testsuite/g++.dg/warn/Warray-bounds-16.C
> > index 17b4d0d194e..89cbadb91c7 100644
> > --- a/gcc/testsuite/g++.dg/warn/Warray-bounds-16.C
> > +++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-16.C
> > @@ -19,11 +19,11 @@ struct S
> >   p = (int*) new unsigned char [sizeof (int) * m];
> >   
> >   for (int i = 0; i < m; i++)
> > -  new (p + i) int ();
> > +  new (p + i) int (); /* { dg-bogus "bounds" "pr102690" { xfail *-*-* }
> > } */
> > }
> >   };
> >   
> >   S a (0);
> >   
> > -/* Verify the loop has been eliminated.
> > -   { dg-final { scan-tree-dump-not "goto" "optimized" } } */
> > +/* The loop cannot be eliminated since the global 'new' can change 'm'.  */
> 
> I don't understand this comment.  Can you please explain how
> the global operator new (i.e., the one outside the loop below)
> can change the member of the class whose ctor calls the new?
> 
> The member, or more precisely the enclosing object, doesn't
> yet exist at the time the global new is called because its
> ctor hasn't finished, so nothing outside the ctor can access
> it.  A pointer to the S under construction can be used (and
> could be accessed by a replacement new) but it cannot be
> dereferenced to access its members because the object it
> points to doesn't exist until after the ctor completes.

Yes, that's the C++ legalise - which is why I XFAILed that
part of the test rather than just removed it.  The middle-end
sees the object *this as existing and being global, thus
accessible and mutable by '::new' which when replaced by
the user could access and alter *this.  Like maybe for

S s;

void *operator new(..) { s.m = 0; }

main()
{
  new (&s) (1);
}

that may be invalid C++ but this detail of C++ is not
reflected in the GIMPLE IL.  Before the change that regressed
this if S::S() would call a global function foo() instead
of new to do the allocation the behavior would be as after
the change.  Isn't the call to new or foo part of the
construction and as such obviously allowed to access
and alter the in-construction object?

> I copy the test below:
> 
> inline void* operator new (__SIZE_TYPE__, void * v)
> {
>   return v;
> }
> 
> struct S
> {
>   int* p;
>   int m;
> 
>   S (int i)
>   {
> m = i;
> p = (int*) new unsigned char [sizeof (int) * m];
> 
> for (int i = 0; i < m; i++)
>   new (p + i) int (); /* { dg-bogus "bounds" "pr102690" { xfail *-*-* } }
> */
>   }
> };
> 
> S a (0);
> 
> Thanks
> Martin
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

[PATCH] dwarf2out, v2: Fix up field_byte_offset [PR101378]

2021-11-11 Thread Jakub Jelinek via Gcc-patches

Hi!

Bootstrapped/regtested now successfully on x86_64-linux and i686-linux,
verified the
struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s;
struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t;

int
main ()
{
  s.c = 0x55;
  s.d = 0x;
  t.c = 0x55;
  t.d = 0x;
  s.e++;
}
testcase is compiled the same way as before again, ok for trunk?

> 2021-11-10  Jakub Jelinek  
> 
>   PR debug/101378
>   * dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS
>   handling only for DECL_BIT_FIELD_TYPE decls.
> 
>   * g++.dg/debug/dwarf2/pr101378.C: New test.
> 
> --- gcc/dwarf2out.c.jj2021-11-05 10:19:46.339457342 +0100
> +++ gcc/dwarf2out.c   2021-11-09 15:01:51.425437717 +0100
> @@ -19646,6 +19646,7 @@ field_byte_offset (const_tree decl, stru
>   properly dynamic byte offsets only when PCC bitfield type doesn't
>   matter.  */
>if (PCC_BITFIELD_TYPE_MATTERS
> +  && DECL_BIT_FIELD_TYPE (decl)
>&& TREE_CODE (DECL_FIELD_OFFSET (decl)) == INTEGER_CST)
>  {
>offset_int object_offset_in_bits;
> --- gcc/testsuite/g++.dg/debug/dwarf2/pr101378.C.jj   2021-11-09 
> 15:17:39.504975396 +0100
> +++ gcc/testsuite/g++.dg/debug/dwarf2/pr101378.C  2021-11-09 
> 15:17:28.067137556 +0100
> @@ -0,0 +1,13 @@
> +// PR debug/101378
> +// { dg-do compile { target c++11 } }
> +// { dg-options "-gdwarf-5 -dA" }
> +// { dg-final { scan-assembler-times "0\[^0-9x\\r\\n\]* 
> DW_AT_data_member_location" 1 } }
> +// { dg-final { scan-assembler-times "1\[^0-9x\\r\\n\]* 
> DW_AT_data_member_location" 1 } }
> +// { dg-final { scan-assembler-times "2\[^0-9x\\r\\n\]* 
> DW_AT_data_member_location" 1 } }
> +// { dg-final { scan-assembler-not "-1\[^0-9x\\r\\n\]* 
> DW_AT_data_member_location" } }
> +
> +struct E {};
> +struct S
> +{
> +  [[no_unique_address]] E e, f, g;
> +} s;

Jakub

[PATCH] Remove find_pdom and find_dom

2021-11-11 Thread Richard Biener via Gcc-patches

This removes now useless wrappers around get_immediate_dominator.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-11-11  Richard Biener  

* cfganal.c (find_pdom): Remove.
(control_dependences::find_control_dependence): Remove
special-casing of entry block, call get_immediate_dominator
directly.
* gimple-predicate-analysis.cc (find_pdom): Remove.
(find_dom): Likewise.
(find_control_equiv_block): Call get_immediate_dominator
directly.
(compute_control_dep_chain): Likewise.
(predicate::init_from_phi_def): Likewise.
---
 gcc/cfganal.c| 28 -
 gcc/gimple-predicate-analysis.cc | 36 +++-
 2 files changed, 7 insertions(+), 57 deletions(-)

diff --git a/gcc/cfganal.c b/gcc/cfganal.c
index 11ab23623ae..0cba612738d 100644
--- a/gcc/cfganal.c
+++ b/gcc/cfganal.c
@@ -372,25 +372,6 @@ control_dependences::clear_control_dependence_bitmap 
(basic_block bb)
   bitmap_clear (&control_dependence_map[bb->index]);
 }
 
-/* Find the immediate postdominator PDOM of the specified basic block BLOCK.
-   This function is necessary because some blocks have negative numbers.  */
-
-static inline basic_block
-find_pdom (basic_block block)
-{
-  gcc_assert (block != ENTRY_BLOCK_PTR_FOR_FN (cfun));
-
-  if (block == EXIT_BLOCK_PTR_FOR_FN (cfun))
-return EXIT_BLOCK_PTR_FOR_FN (cfun);
-  else
-{
-  basic_block bb = get_immediate_dominator (CDI_POST_DOMINATORS, block);
-  if (! bb)
-   return EXIT_BLOCK_PTR_FOR_FN (cfun);
-  return bb;
-}
-}
-
 /* Determine all blocks' control dependences on the given edge with edge_list
EL index EDGE_INDEX, ala Morgan, Section 3.6.  */
 
@@ -402,15 +383,14 @@ control_dependences::find_control_dependence (int 
edge_index)
 
   gcc_assert (get_edge_src (edge_index) != EXIT_BLOCK_PTR_FOR_FN (cfun));
 
-  if (get_edge_src (edge_index) == ENTRY_BLOCK_PTR_FOR_FN (cfun))
-ending_block = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
-  else
-ending_block = find_pdom (get_edge_src (edge_index));
+  ending_block = get_immediate_dominator (CDI_POST_DOMINATORS,
+ get_edge_src (edge_index));
 
   for (current_block = get_edge_dest (edge_index);
current_block != ending_block
&& current_block != EXIT_BLOCK_PTR_FOR_FN (cfun);
-   current_block = find_pdom (current_block))
+   current_block = get_immediate_dominator (CDI_POST_DOMINATORS,
+   current_block))
 set_control_dependence_map_bit (current_block, edge_index);
 }
 
diff --git a/gcc/gimple-predicate-analysis.cc b/gcc/gimple-predicate-analysis.cc
index f0c84446194..454113d532e 100644
--- a/gcc/gimple-predicate-analysis.cc
+++ b/gcc/gimple-predicate-analysis.cc
@@ -45,36 +45,6 @@
 
 #define DEBUG_PREDICATE_ANALYZER 1
 
-/* Find the immediate postdominator of the specified basic block BB.  */
-
-static inline basic_block
-find_pdom (basic_block bb)
-{
-  basic_block exit_bb = EXIT_BLOCK_PTR_FOR_FN (cfun);
-  if (bb == exit_bb)
-return exit_bb;
-
-  if (basic_block pdom = get_immediate_dominator (CDI_POST_DOMINATORS, bb))
-return pdom;
-
-  return exit_bb;
-}
-
-/* Find the immediate dominator of the specified basic block BB.  */
-
-static inline basic_block
-find_dom (basic_block bb)
-{
-  basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
-  if (bb == entry_bb)
-return entry_bb;
-
-  if (basic_block dom = get_immediate_dominator (CDI_DOMINATORS, bb))
-return dom;
-
-  return entry_bb;
-}
-
 /* Return true if BB1 is postdominating BB2 and BB1 is not a loop exit
bb.  The loop exit bb check is simple and does not cover all cases.  */
 
@@ -96,7 +66,7 @@ is_non_loop_exit_postdominating (basic_block bb1, basic_block 
bb2)
 static inline basic_block
 find_control_equiv_block (basic_block bb)
 {
-  basic_block pdom = find_pdom (bb);
+  basic_block pdom = get_immediate_dominator (CDI_POST_DOMINATORS, bb);
 
   /* Skip the postdominating bb that is also a loop exit.  */
   if (!is_non_loop_exit_postdominating (pdom, bb))
@@ -1167,7 +1137,7 @@ compute_control_dep_chain (basic_block dom_bb, 
const_basic_block dep_bb,
  break;
}
 
- cd_bb = find_pdom (cd_bb);
+ cd_bb = get_immediate_dominator (CDI_POST_DOMINATORS, cd_bb);
  post_dom_check++;
  if (cd_bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
  || post_dom_check > MAX_POSTDOM_CHECK)
@@ -1788,7 +1758,7 @@ predicate::init_from_phi_def (gphi *phi)
 
   basic_block phi_bb = gimple_bb (phi);
   /* Find the closest dominating bb to be the control dependence root.  */
-  basic_block cd_root = find_dom (phi_bb);
+  basic_block cd_root = get_immediate_dominator (CDI_DOMINATORS, phi_bb);
   if (!cd_root)
 return false;
 
-- 
2.31.1

Re: [PATCH] vect: Remove vec_outside/inside_cost fields

2021-11-11 Thread Martin Liška


On 11/10/21 18:18, Richard Sandiford wrote:

Martin Liška  writes:

On 11/8/21 11:43, Richard Sandiford via Gcc-patches wrote:

|Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?|


I think the patch causes the following on x86_64-linux-gnu:
FAIL: gfortran.dg/inline_matmul_17.f90   -O   scan-tree-dump-times optimized 
"matmul_r4" 2


I get that failure even with d70ef65692f (from before the patches
I committed today).


Sorry, you are right, it's one revision before:
d70ef65692fced7ab72e0aceeff7407e5a34d96d

Honza, can you please take a look?

Cheers,
Martin



Thanks,
Richard

[PATCH v3] c-family: Add __builtin_assoc_barrier

2021-11-11 Thread Matthias Kretz

On Wednesday, 8 September 2021 15:49:27 CET Matthias Kretz wrote:
> On Wednesday, 8 September 2021 15:44:28 CEST Jason Merrill wrote:
> > On 9/8/21 5:37 AM, Matthias Kretz wrote:
> > > On Tuesday, 7 September 2021 19:36:22 CEST Jason Merrill wrote:
> > >>> case PAREN_EXPR:
> > >>> -  RETURN (finish_parenthesized_expr (RECUR (TREE_OPERAND (t,
> > >>> 0;
> > >>> +  if (REF_PARENTHESIZED_P (t))
> > >>> +   RETURN (finish_parenthesized_expr (RECUR (TREE_OPERAND (t,
> > >>> 0;
> > >>> +  else
> > >>> +   RETURN (RECUR (TREE_OPERAND (t, 0)));
> > >> 
> > >> I think you need to build a new PAREN_EXPR in the assoc barrier case as
> > >> well, for it to have any effect in templates.
> > > 
> > > My intent was to ignore __builtin_assoc_barrier in templates / constexpr
> > > evaluation since it's not affected by -fassociative-math anyway. Or do
> > > you
> > > mean something else?
> > 
> > I agree about constexpr, but why wouldn't template instantiations be
> > affected by -fassociative-math like any other function?
> 
> Oh, that seems like a major misunderstanding on my part. I assumed
> tsubst_copy_and_build would evaluate the expressions in template arguments
> 🤦. I'll expand the test and will fix.

Sorry for the long delay. New patch is attached. OK for trunk?


New builtin to enable explicit use of PAREN_EXPR in C & C++ code.

Signed-off-by: Matthias Kretz 

gcc/testsuite/ChangeLog:

* c-c++-common/builtin-assoc-barrier-1.c: New test.

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_constant_expression): Handle PAREN_EXPR
via cxx_eval_constant_expression.
* cp-objcp-common.c (names_builtin_p): Handle
RID_BUILTIN_ASSOC_BARRIER.
* cp-tree.h: Adjust TREE_LANG_FLAG documentation to include
PAREN_EXPR in REF_PARENTHESIZED_P.
(REF_PARENTHESIZED_P): Add PAREN_EXPR.
* parser.c (cp_parser_postfix_expression): Handle
RID_BUILTIN_ASSOC_BARRIER.
* pt.c (tsubst_copy_and_build): If the PAREN_EXPR is not a
parenthesized initializer, build a new PAREN_EXPR.
* semantics.c (force_paren_expr): Simplify conditionals. Set
REF_PARENTHESIZED_P on PAREN_EXPR.
(maybe_undo_parenthesized_ref): Test PAREN_EXPR for
REF_PARENTHESIZED_P.

gcc/c-family/ChangeLog:

* c-common.c (c_common_reswords): Add __builtin_assoc_barrier.
* c-common.h (enum rid): Add RID_BUILTIN_ASSOC_BARRIER.

gcc/c/ChangeLog:

* c-decl.c (names_builtin_p): Handle RID_BUILTIN_ASSOC_BARRIER.
* c-parser.c (c_parser_postfix_expression): Likewise.

gcc/ChangeLog:

* doc/extend.texi: Document __builtin_assoc_barrier.
---
 gcc/c-family/c-common.c   |  1 +
 gcc/c-family/c-common.h   |  2 +-
 gcc/c/c-decl.c|  1 +
 gcc/c/c-parser.c  | 20 ++
 gcc/cp/constexpr.c|  8 +++
 gcc/cp/cp-objcp-common.c  |  1 +
 gcc/cp/cp-tree.h  | 12 ++--
 gcc/cp/parser.c   | 14 
 gcc/cp/pt.c   | 10 ++-
 gcc/cp/semantics.c| 23 ++
 gcc/doc/extend.texi   | 18 +
 .../c-c++-common/builtin-assoc-barrier-1.c| 71 +++
 12 files changed, 158 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/builtin-assoc-barrier-1.c


-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 stdₓ::simd
──
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 436df45df68..dd2a3d5da9e 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -384,6 +384,7 @@ const struct c_common_resword c_common_reswords[] =
   { "__builtin_convertvector", RID_BUILTIN_CONVERTVECTOR, 0 },
   { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 },
   { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY },
+  { "__builtin_assoc_barrier", RID_BUILTIN_ASSOC_BARRIER, 0 },
   { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 },
   { "__builtin_shufflevector", RID_BUILTIN_SHUFFLEVECTOR, 0 },
   { "__builtin_tgmath", RID_BUILTIN_TGMATH, D_CONLY },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index d5dad99ff97..c089fda12e4 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -108,7 +108,7 @@ enum rid
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,  RID_CHOOSE_EXPR,
   RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX,	 RID_BUILTIN_SHUFFLE,
   RID_BUILTIN_SHUFFLEVECTOR,   RID_BUILTIN_CONVERTVECTOR,   RID_BUILTIN_TGMATH,
-  RID_BUILTIN_HAS_ATTRIBUTE,
+  RID_BUILTIN_HAS_ATTRIBUTE,   RID_BUILTIN_ASSOC_BARRIER,
   R

Re: [PATCH] dwarf2out, v2: Fix up field_byte_offset [PR101378]

2021-11-11 Thread Richard Biener via Gcc-patches

On Thu, 11 Nov 2021, Jakub Jelinek wrote:

> Hi!
> 
> Bootstrapped/regtested now successfully on x86_64-linux and i686-linux,
> verified the
> struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s;
> struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t;
> 
> int
> main ()
> {
>   s.c = 0x55;
>   s.d = 0x;
>   t.c = 0x55;
>   t.d = 0x;
>   s.e++;
> }
> testcase is compiled the same way as before again, ok for trunk?

OK, also for affected branches.

Thanks,
Richard.

> > 2021-11-10  Jakub Jelinek  
> > 
> > PR debug/101378
> > * dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS
> > handling only for DECL_BIT_FIELD_TYPE decls.
> > 
> > * g++.dg/debug/dwarf2/pr101378.C: New test.
> > 
> > --- gcc/dwarf2out.c.jj  2021-11-05 10:19:46.339457342 +0100
> > +++ gcc/dwarf2out.c 2021-11-09 15:01:51.425437717 +0100
> > @@ -19646,6 +19646,7 @@ field_byte_offset (const_tree decl, stru
> >   properly dynamic byte offsets only when PCC bitfield type doesn't
> >   matter.  */
> >if (PCC_BITFIELD_TYPE_MATTERS
> > +  && DECL_BIT_FIELD_TYPE (decl)
> >&& TREE_CODE (DECL_FIELD_OFFSET (decl)) == INTEGER_CST)
> >  {
> >offset_int object_offset_in_bits;
> > --- gcc/testsuite/g++.dg/debug/dwarf2/pr101378.C.jj 2021-11-09 
> > 15:17:39.504975396 +0100
> > +++ gcc/testsuite/g++.dg/debug/dwarf2/pr101378.C2021-11-09 
> > 15:17:28.067137556 +0100
> > @@ -0,0 +1,13 @@
> > +// PR debug/101378
> > +// { dg-do compile { target c++11 } }
> > +// { dg-options "-gdwarf-5 -dA" }
> > +// { dg-final { scan-assembler-times "0\[^0-9x\\r\\n\]* 
> > DW_AT_data_member_location" 1 } }
> > +// { dg-final { scan-assembler-times "1\[^0-9x\\r\\n\]* 
> > DW_AT_data_member_location" 1 } }
> > +// { dg-final { scan-assembler-times "2\[^0-9x\\r\\n\]* 
> > DW_AT_data_member_location" 1 } }
> > +// { dg-final { scan-assembler-not "-1\[^0-9x\\r\\n\]* 
> > DW_AT_data_member_location" } }
> > +
> > +struct E {};
> > +struct S
> > +{
> > +  [[no_unique_address]] E e, f, g;
> > +} s;
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

[PATCH] Adjust CPP_FOR_BUILD

2021-11-11 Thread Pekka Seppänen


Hi.

CPP/CPPFLAGS were changed by commit 
84401ce5fb4ecab55decb472b168100e7593e01f.  That commit uses CPP as a 
default for CPP_FOR_BUILD.  Unless CPP is defined, GNU make defaults CPP 
as `$(CC) -E'.  Given the context, this is now incorrect, since 
CC_FOR_BUILD should be used.


Fixes PR103011.

-- Pekka


gcc/Changelog:

  * configure: Regenerate.
  * configure.ac: For CPP_FOR_BUILD use $(CC_FOR_BUILD) -E instead of 
$(CPP).


---
 configure| 2 +-
 configure.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 58979d6e3b1..a5eca91fb2a 100755
--- a/configure
+++ b/configure
@@ -4092,7 +4092,7 @@ if test "${build}" != "${host}" ; then
   AR_FOR_BUILD=${AR_FOR_BUILD-ar}
   AS_FOR_BUILD=${AS_FOR_BUILD-as}
   CC_FOR_BUILD=${CC_FOR_BUILD-gcc}
-  CPP_FOR_BUILD="${CPP_FOR_BUILD-\$(CPP)}"
+  CPP_FOR_BUILD="${CPP_FOR_BUILD-\$(CC_FOR_BUILD) -E}"
   CXX_FOR_BUILD=${CXX_FOR_BUILD-g++}
   DSYMUTIL_FOR_BUILD=${DSYMUTIL_FOR_BUILD-dsymutil}
   GFORTRAN_FOR_BUILD=${GFORTRAN_FOR_BUILD-gfortran}
diff --git a/configure.ac b/configure.ac
index 550e6993b59..b8055dad573 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1334,7 +1334,7 @@ if test "${build}" != "${host}" ; then
   AR_FOR_BUILD=${AR_FOR_BUILD-ar}
   AS_FOR_BUILD=${AS_FOR_BUILD-as}
   CC_FOR_BUILD=${CC_FOR_BUILD-gcc}
-  CPP_FOR_BUILD="${CPP_FOR_BUILD-\$(CPP)}"
+  CPP_FOR_BUILD="${CPP_FOR_BUILD-\$(CC_FOR_BUILD) -E}"
   CXX_FOR_BUILD=${CXX_FOR_BUILD-g++}
   DSYMUTIL_FOR_BUILD=${DSYMUTIL_FOR_BUILD-dsymutil}
   GFORTRAN_FOR_BUILD=${GFORTRAN_FOR_BUILD-gfortran}

[committed] openmp: Add support for 2 argument num_teams clause

2021-11-11 Thread Jakub Jelinek via Gcc-patches

Hi!

In OpenMP 5.1, num_teams clause can accept either one expression as before,
but it in that case changed meaning, rather than create <= expression
teams it is now create == expression teams.  Or it accepts two expressions
separated by :, with the meaning that the first is low bound and second upper
bound on how many teams should be created.  The other ways to set number of
teams are upper bounds with lower bound of 1.

The following patch does parsing of this for C/C++.  For host teams, we
actually don't need to do anything further right now, we always create
(pretend to create) exactly the requested number of teams, so we can just
evaluate and throw away the lower bound for now.
For teams nested in target, we don't guarantee that though and further
work will be needed.
In particular, omplower now turns the teams part of:
struct S { S (); S (const S &); ~S (); int s; };
void bar (S &, S &);
int baz ();
_Pragma ("omp declare target to (baz)");

void
foo (void)
{
  S a, b;
  #pragma omp target private (a) map (b)
  {
#pragma omp teams firstprivate (b) num_teams (baz ())
{
  bar (a, b);
}
  }
}
into:
  retval.0 = baz ();
  retval.1 = retval.0;
  {
unsigned int retval.3;
struct S * D.2549;
struct S b;

retval.3 = (unsigned int) retval.1;
D.2549 = .omp_data_i->b;
S::S (&b, D.2549);
#pragma omp teams num_teams(retval.1) firstprivate(b) shared(a)
__builtin_GOMP_teams (retval.3, 0);
{
  bar (&a, &b);
}
S::~S (&b);
#pragma omp return(nowait)
  }
IMHO we want a new API, say GOMP_teams3 which will take 3 arguments
instead of 2 (the lower and upper bounds from num_teams and thread_limit)
and will return a bool whether it should do the teams body or not.
And, we should add right before outermost {} above
while (__builtin_GOMP_teams3 ((unsigned) retval.1, (unsigned) retval.1, 0))
and remove the __builtin_GOMP_teams call.  The current function performs
exit equivalent (at least on NVPTX) which seems bad because that means
the destructors of e.g. private variables on target aren't invoked, and
at the current placement neither destructors of the already constructed
privatized variables in teams.
I'll do this next on the compiler side, but I'm afraid I'll need help
with the nvptx and amdgcn implementations.  E.g. for nvptx, we won't be
able to use %ctaid.x .  I think ideal would be to use a .shared
integer variable for the omp_get_team_num value, but I don't have any
experience with that, are .shared variables zero initialized by default,
or do they have random value at start?  PTX docs say they aren't initializable.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-11-11  Jakub Jelinek  

gcc/
* tree.h (OMP_CLAUSE_NUM_TEAMS_EXPR): Rename to ...
(OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR): ... this.
(OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR): Define.
* tree.c (omp_clause_num_ops): Increase num ops for
OMP_CLAUSE_NUM_TEAMS to 2.
* tree-pretty-print.c (dump_omp_clause): Print optional lower bound
for OMP_CLAUSE_NUM_TEAMS.
* gimplify.c (gimplify_scan_omp_clauses): Gimplify
OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR if non-NULL.
(optimize_target_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead
of OMP_CLAUSE_NUM_TEAMS_EXPR.  Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
* omp-low.c (lower_omp_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR
instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
* omp-expand.c (expand_teams_call, get_target_arguments): Likewise.
gcc/c/
* c-parser.c (c_parser_omp_clause_num_teams): Parse optional
lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
OMP_CLAUSE_NUM_TEAMS_EXPR.
(c_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
gcc/cp/
* parser.c (cp_parser_omp_clause_num_teams): Parse optional
lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
OMP_CLAUSE_NUM_TEAMS_EXPR.
(cp_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
* semantics.c (finish_omp_clauses): Handle
OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR of OMP_CLAUSE_NUM_TEAMS clause.
* pt.c (tsubst_omp_clauses): Likewise.
(tsubst_expr): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Use
OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
gcc/testsuite/
* c-c++-common/gomp/clauses-1.c (bar): Supply lower-bound expression
to half of the num_teams clauses.
* c-c++-common/gomp/num-teams-1.c: New test.
* c-c++-common/gomp/num-teams-2.c: New test.
* g++.dg/gomp/

Re: [aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr

2021-11-11 Thread Prathamesh Kulkarni via Gcc-patches

On Tue, 9 Nov 2021 at 20:27, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Thu, 4 Nov 2021 at 14:19, Richard Sandiford
> >  wrote:
> >>
> >> Prathamesh Kulkarni  writes:
> >> > On Wed, 20 Oct 2021 at 15:05, Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> Prathamesh Kulkarni  writes:
> >> >> > On Tue, 19 Oct 2021 at 19:58, Richard Sandiford
> >> >> >  wrote:
> >> >> >>
> >> >> >> Prathamesh Kulkarni  writes:
> >> >> >> > Hi,
> >> >> >> > The attached patch emits a more verbose diagnostic for target 
> >> >> >> > attribute that
> >> >> >> > is an architecture extension needing a leading '+'.
> >> >> >> >
> >> >> >> > For the following test,
> >> >> >> > void calculate(void) __attribute__ ((__target__ ("sve")));
> >> >> >> >
> >> >> >> > With patch, the compiler now emits:
> >> >> >> > 102376.c:1:1: error: arch extension ‘sve’ should be prepended with 
> >> >> >> > ‘+’
> >> >> >> > 1 | void calculate(void) __attribute__ ((__target__ ("sve")));
> >> >> >> >   | ^~~~
> >> >> >> >
> >> >> >> > instead of:
> >> >> >> > 102376.c:1:1: error: pragma or attribute ‘target("sve")’ is not 
> >> >> >> > valid
> >> >> >> > 1 | void calculate(void) __attribute__ ((__target__ ("sve")));
> >> >> >> >   | ^~~~
> >> >> >>
> >> >> >> Nice :-)
> >> >> >>
> >> >> >> > (This isn't specific to sve though).
> >> >> >> > OK to commit after bootstrap+test ?
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Prathamesh
> >> >> >> >
> >> >> >> > diff --git a/gcc/config/aarch64/aarch64.c 
> >> >> >> > b/gcc/config/aarch64/aarch64.c
> >> >> >> > index a9a1800af53..975f7faf968 100644
> >> >> >> > --- a/gcc/config/aarch64/aarch64.c
> >> >> >> > +++ b/gcc/config/aarch64/aarch64.c
> >> >> >> > @@ -17821,7 +17821,16 @@ aarch64_process_target_attr (tree args)
> >> >> >> >num_attrs++;
> >> >> >> >if (!aarch64_process_one_target_attr (token))
> >> >> >> >   {
> >> >> >> > -   error ("pragma or attribute % is not 
> >> >> >> > valid", token);
> >> >> >> > +   /* Check if token is possibly an arch extension without
> >> >> >> > +  leading '+'.  */
> >> >> >> > +   char *str = (char *) xmalloc (strlen (token) + 2);
> >> >> >> > +   str[0] = '+';
> >> >> >> > +   strcpy(str + 1, token);
> >> >> >>
> >> >> >> I think std::string would be better here, e.g.:
> >> >> >>
> >> >> >>   auto with_plus = std::string ("+") + token;
> >> >> >>
> >> >> >> > +   if (aarch64_handle_attr_isa_flags (str))
> >> >> >> > + error("arch extension %<%s%> should be prepended with 
> >> >> >> > %<+%>", token);
> >> >> >>
> >> >> >> Nit: should be a space before the “(”.
> >> >> >>
> >> >> >> In principle, a fixit hint would have been nice here, but I don't 
> >> >> >> think
> >> >> >> we have enough information to provide one.  (Just saying for the 
> >> >> >> record.)
> >> >> > Thanks for the suggestions.
> >> >> > Does the attached patch look OK ?
> >> >>
> >> >> Looks good apart from a couple of formatting nits.
> >> >> >
> >> >> > Thanks,
> >> >> > Prathamesh
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Richard
> >> >> >>
> >> >> >> > +   else
> >> >> >> > + error ("pragma or attribute % is not 
> >> >> >> > valid", token);
> >> >> >> > +   free (str);
> >> >> >> > return false;
> >> >> >> >   }
> >> >> >> >
> >> >> >
> >> >> > [aarch64] PR102376 - Emit better diagnostics for arch extension in 
> >> >> > target attribute.
> >> >> >
> >> >> > gcc/ChangeLog:
> >> >> >   PR target/102376
> >> >> >   * config/aarch64/aarch64.c (aarch64_handle_attr_isa_flags): 
> >> >> > Change str's
> >> >> >   type to const char *.
> >> >> >   (aarch64_process_target_attr): Check if token is possibly an 
> >> >> > arch extension
> >> >> >   without leading '+' and emit diagnostic accordingly.
> >> >> >
> >> >> > gcc/testsuite/ChangeLog:
> >> >> >   PR target/102376
> >> >> >   * gcc.target/aarch64/pr102376.c: New test.
> >> >> > diff --git a/gcc/config/aarch64/aarch64.c 
> >> >> > b/gcc/config/aarch64/aarch64.c
> >> >> > index a9a1800af53..b72079bc466 100644
> >> >> > --- a/gcc/config/aarch64/aarch64.c
> >> >> > +++ b/gcc/config/aarch64/aarch64.c
> >> >> > @@ -17548,7 +17548,7 @@ aarch64_handle_attr_tune (const char *str)
> >> >> > modified.  */
> >> >> >
> >> >> >  static bool
> >> >> > -aarch64_handle_attr_isa_flags (char *str)
> >> >> > +aarch64_handle_attr_isa_flags (const char *str)
> >> >> >  {
> >> >> >enum aarch64_parse_opt_result parse_res;
> >> >> >uint64_t isa_flags = aarch64_isa_flags;
> >> >> > @@ -17821,7 +17821,13 @@ aarch64_process_target_attr (tree args)
> >> >> >num_attrs++;
> >> >> >if (!aarch64_process_one_target_attr (token))
> >> >> >   {
> >> >> > -   error ("pragma or attribute % is not valid", 
> >> >> > token);
> >> >> > +   /* Check if token is possibly an arch extension without
> >> >> > +  leading '+'.  */
> >> >> > +   auto with_plus = std::string("+") + token;
> >>

[PATCH] middle-end/103181 - fix operation_could_trap_p for vector division

2021-11-11 Thread Richard Biener via Gcc-patches

For integer vector division we only checked for all zero vector
constants rather than checking whether any element in the constant
vector is zero.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-11-11  Richard Biener  

PR middle-end/103181
* tree-eh.c (operation_could_trap_helper_p): Properly
check vector constants for a zero element for integer
division.  Separate floating point and integer division code.

* gcc.dg/torture/pr103181.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr103181.c | 24 +++
 gcc/tree-eh.c   | 26 -
 2 files changed, 45 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr103181.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr103181.c 
b/gcc/testsuite/gcc.dg/torture/pr103181.c
new file mode 100644
index 000..6bc705ab52e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr103181.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+
+typedef unsigned char __attribute__((__vector_size__ (2))) U;
+typedef unsigned short S;
+typedef unsigned int __attribute__((__vector_size__ (64))) V;
+
+V v;
+U a, b, c;
+
+U
+foo (S s)
+{
+  v += __builtin_bswap16 (s) || (S) (a / ((U){3, 0}));
+  return b + c;
+}
+
+int
+main (void)
+{
+  U x = foo (4);
+  if (x[0] || x[1])
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-eh.c b/gcc/tree-eh.c
index 3a09de95025..3eff07fc8fe 100644
--- a/gcc/tree-eh.c
+++ b/gcc/tree-eh.c
@@ -2454,15 +2454,31 @@ operation_could_trap_helper_p (enum tree_code op,
 case FLOOR_MOD_EXPR:
 case ROUND_MOD_EXPR:
 case TRUNC_MOD_EXPR:
-case RDIV_EXPR:
-  if (honor_snans)
-   return true;
-  if (fp_operation)
-   return flag_trapping_math;
   if (!TREE_CONSTANT (divisor) || integer_zerop (divisor))
 return true;
+  if (TREE_CODE (divisor) == VECTOR_CST)
+   {
+ /* Inspired by initializer_each_zero_or_onep.  */
+ unsigned HOST_WIDE_INT nelts = vector_cst_encoded_nelts (divisor);
+ if (VECTOR_CST_STEPPED_P (divisor)
+ && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (divisor))
+   .is_constant (&nelts))
+   return true;
+ for (unsigned int i = 0; i < nelts; ++i)
+   {
+ tree elt = vector_cst_elt (divisor, i);
+ if (integer_zerop (elt))
+   return true;
+   }
+   }
   return false;
 
+case RDIV_EXPR:
+  if (honor_snans)
+   return true;
+  gcc_assert (fp_operation);
+  return flag_trapping_math;
+
 case LT_EXPR:
 case LE_EXPR:
 case GT_EXPR:
-- 
2.31.1

Re: [PATCH] vect: Remove vec_outside/inside_cost fields

2021-11-11 Thread Jan Hubicka via Gcc-patches

> > > 
> > > I think the patch causes the following on x86_64-linux-gnu:
> > > FAIL: gfortran.dg/inline_matmul_17.f90   -O   scan-tree-dump-times 
> > > optimized "matmul_r4" 2
> > 
> > I get that failure even with d70ef65692f (from before the patches
> > I committed today).
> 
> Sorry, you are right, it's one revision before:
> d70ef65692fced7ab72e0aceeff7407e5a34d96d
> 
> Honza, can you please take a look?
The test looks for matmul_r4 calls which we now optimize out in fre1.
This is because alias info is better now

  afunc (&__var_5_mma); 
  _188 = __var_5_mma.dim[0].ubound; 
  _189 = __var_5_mma.dim[0].lbound; 
  _190 = _188 - _189;   
  _191 = _190 + 1;  
  _192 = MAX_EXPR <_191, 0>;
  _193 = (real(kind=4)) _192;   
  _194 = __var_5_mma.dim[1].ubound; 
  _195 = __var_5_mma.dim[1].lbound; 
  _196 = _194 - _195;   
  _197 = _196 + 1;  
  _198 = MAX_EXPR <_197, 0>;
  _199 = (real(kind=4)) _198;   
  _200 = _193 * _199;   
  _201 = _200 * 3.0e+0; 
  if (_201 <= 1.0e+9)   
goto ; [INV] 
  else  
goto ; [INV] 
   : 
  c = {};   


  afunc (&__var_5_mma); 
  c = {};   

Now afunc writes to __var_5_mma only indirectly so I think it is correct that
we optimize the conditional out.

Easy fix would be to add -fno-ipa-modref, but perhaps someone with
better understanding of Fortran would help me to improve the testcase so
the calls to matmul_r4 remains reachable?

Honza

[PATCH] aarch64: Use type-qualified builtins for unsigned MLA/MLS intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

This patch declares type-qualified builtins and uses them for MLA/MLS
Neon intrinsics that operate on unsigned types. This eliminates lots of
casts in arm_neon.h.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-08  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtin generators for unsigned MLA/MLS intrinsics.
* config/aarch64/arm_neon.h (vmla_n_u16): Use type-qualified
builtin.
(vmla_n_u32): Likewise.
(vmla_u8): Likewise.
(vmla_u16): Likewise.
(vmla_u32): Likewise.
(vmlaq_n_u16): Likewise.
(vmlaq_n_u32): Likewise.
(vmlaq_u8): Likewise.
(vmlaq_u16): Likewise.
(vmlaq_u32): Likewise.
(vmls_n_u16): Likewise.
(vmls_n_u32): Likewise.
(vmls_u8): Likewise.
(vmls_u16): Likewise.
(vmls_u32): Likewise.
(vmlsq_n_u16): Likewise.
(vmlsq_n_u32): Likewise.
(vmlsq_u8): Likewise.
(vmlsq_u16): Likewise.
(vmlsq_u32): Likewise.


rb15027.patch
Description: rb15027.patch

Re: [PATCH] vect: Remove vec_outside/inside_cost fields

2021-11-11 Thread Richard Biener via Gcc-patches

On Thu, Nov 11, 2021 at 10:45 AM Jan Hubicka via Gcc-patches
 wrote:
>
> > > >
> > > > I think the patch causes the following on x86_64-linux-gnu:
> > > > FAIL: gfortran.dg/inline_matmul_17.f90   -O   scan-tree-dump-times 
> > > > optimized "matmul_r4" 2
> > >
> > > I get that failure even with d70ef65692f (from before the patches
> > > I committed today).
> >
> > Sorry, you are right, it's one revision before:
> > d70ef65692fced7ab72e0aceeff7407e5a34d96d
> >
> > Honza, can you please take a look?
> The test looks for matmul_r4 calls which we now optimize out in fre1.
> This is because alias info is better now
>
>   afunc (&__var_5_mma);
>   _188 = __var_5_mma.dim[0].ubound;
>   _189 = __var_5_mma.dim[0].lbound;
>   _190 = _188 - _189;
>   _191 = _190 + 1;
>   _192 = MAX_EXPR <_191, 0>;
>   _193 = (real(kind=4)) _192;
>   _194 = __var_5_mma.dim[1].ubound;
>   _195 = __var_5_mma.dim[1].lbound;
>   _196 = _194 - _195;
>   _197 = _196 + 1;
>   _198 = MAX_EXPR <_197, 0>;
>   _199 = (real(kind=4)) _198;
>   _200 = _193 * _199;
>   _201 = _200 * 3.0e+0;
>   if (_201 <= 1.0e+9)
> goto ; [INV]
>   else
> goto ; [INV]
>:
>   c = {};
>
>
>   afunc (&__var_5_mma);
>   c = {};
>
> Now afunc writes to __var_5_mma only indirectly so I think it is correct that
> we optimize the conditional out.
>
> Easy fix would be to add -fno-ipa-modref, but perhaps someone with
> better understanding of Fortran would help me to improve the testcase so
> the calls to matmul_r4 remains reachable?

I think the two matmul_r4 cases were missed optimizations before so just
changing the expected number of calls to zero is the correct fix here.  Indeed
we can now statically determine the matrices are not large and so only
keep the inline copy.

Richard.

>
> Honza

[PATCH] aarch64: Use type-qualified builtins for PMUL[L] Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

This patch declares poly type-qualified builtins and uses them for
PMUL[L] Neon intrinsics. This removes the need for casts in arm_neon.h.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-08  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Use poly type
qualifier in builtin generator macros.
* config/aarch64/arm_neon.h (vmul_p8): Use type-qualified
builtin and remove casts.
(vmulq_p8): Likewise.
(vmull_high_p8): Likewise.
(vmull_p8): Likewise.


rb15030.patch
Description: rb15030.patch

[PATCH] aarch64: Use type-qualified builtins for XTN[2] Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

This patch declares unsigned type-qualified builtins and uses them for
XTN[2] Neon intrinsics. This removes the need for casts in arm_neon.h.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-08  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
type-qualified builtins for XTN[2].
* config/aarch64/arm_neon.h (vmovn_high_u16): Use type-
qualified builtin and remove casts.
(vmovn_high_u32): Likewise.
(vmovn_high_u64): Likewise.
(vmovn_u16): Likewise.
(vmovn_u32): Likewise.
(vmovn_u64): Likewise.


rb15031.patch
Description: rb15031.patch

Re: [PATCH] aarch64: Use type-qualified builtins for unsigned MLA/MLS intrinsics

2021-11-11 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> This patch declares type-qualified builtins and uses them for MLA/MLS
> Neon intrinsics that operate on unsigned types. This eliminates lots of
> casts in arm_neon.h.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-11-08  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def: Declare type-
> qualified builtin generators for unsigned MLA/MLS intrinsics.
> * config/aarch64/arm_neon.h (vmla_n_u16): Use type-qualified
> builtin.
> (vmla_n_u32): Likewise.
> (vmla_u8): Likewise.
> (vmla_u16): Likewise.
> (vmla_u32): Likewise.
> (vmlaq_n_u16): Likewise.
> (vmlaq_n_u32): Likewise.
> (vmlaq_u8): Likewise.
> (vmlaq_u16): Likewise.
> (vmlaq_u32): Likewise.
> (vmls_n_u16): Likewise.
> (vmls_n_u32): Likewise.
> (vmls_u8): Likewise.
> (vmls_u16): Likewise.
> (vmls_u32): Likewise.
> (vmlsq_n_u16): Likewise.
> (vmlsq_n_u32): Likewise.
> (vmlsq_u8): Likewise.
> (vmlsq_u16): Likewise.
> (vmlsq_u32): Likewise.

OK, thanks.

Richard

>
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> 4a7e2cf4125fe674dbb31c8f068b3b9970e9ea80..cdc44f0a22fd29715472e5b2dfe6a19ad0c729dd
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -238,13 +238,17 @@
>  
>/* Implemented by aarch64_mla.  */
>BUILTIN_VDQ_BHSI (TERNOP, mla, 0, NONE)
> +  BUILTIN_VDQ_BHSI (TERNOPU, mla, 0, NONE)
>/* Implemented by aarch64_mla_n.  */
>BUILTIN_VDQHS (TERNOP, mla_n, 0, NONE)
> +  BUILTIN_VDQHS (TERNOPU, mla_n, 0, NONE)
>  
>/* Implemented by aarch64_mls.  */
>BUILTIN_VDQ_BHSI (TERNOP, mls, 0, NONE)
> +  BUILTIN_VDQ_BHSI (TERNOPU, mls, 0, NONE)
>/* Implemented by aarch64_mls_n.  */
>BUILTIN_VDQHS (TERNOP, mls_n, 0, NONE)
> +  BUILTIN_VDQHS (TERNOPU, mls_n, 0, NONE)
>  
>/* Implemented by aarch64_shrn".  */
>BUILTIN_VQN (SHIFTIMM, shrn, 0, NONE)
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> 398a2e3a021fc488519acf6b54ff114805340e8a..de29b3b7da9a2ab16f6c5bdc832907df5deb7d61
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -6608,18 +6608,14 @@ __extension__ extern __inline uint16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmla_n_u16 (uint16x4_t __a, uint16x4_t __b, uint16_t __c)
>  {
> -  return (uint16x4_t) __builtin_aarch64_mla_nv4hi ((int16x4_t) __a,
> -   (int16x4_t) __b,
> -   (int16_t) __c);
> +  return __builtin_aarch64_mla_nv4hi_ (__a, __b, __c);
>  }
>  
>  __extension__ extern __inline uint32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmla_n_u32 (uint32x2_t __a, uint32x2_t __b, uint32_t __c)
>  {
> -  return (uint32x2_t) __builtin_aarch64_mla_nv2si ((int32x2_t) __a,
> -   (int32x2_t) __b,
> -   (int32_t) __c);
> +  return __builtin_aarch64_mla_nv2si_ (__a, __b, __c);
>  }
>  
>  __extension__ extern __inline int8x8_t
> @@ -6647,27 +6643,21 @@ __extension__ extern __inline uint8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmla_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c)
>  {
> -  return (uint8x8_t) __builtin_aarch64_mlav8qi ((int8x8_t) __a,
> -(int8x8_t) __b,
> -(int8x8_t) __c);
> +  return __builtin_aarch64_mlav8qi_ (__a, __b,  __c);
>  }
>  
>  __extension__ extern __inline uint16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmla_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c)
>  {
> -  return (uint16x4_t) __builtin_aarch64_mlav4hi ((int16x4_t) __a,
> - (int16x4_t) __b,
> - (int16x4_t) __c);
> +  return __builtin_aarch64_mlav4hi_ (__a, __b, __c);
>  }
>  
>  __extension__ extern __inline uint32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmla_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c)
>  {
> -  return (uint32x2_t) __builtin_aarch64_mlav2si ((int32x2_t) __a,
> - (int32x2_t) __b,
> - (int32x2_t) __c);
> +  return __builtin_aarch64_mlav2si_ (__a, __b, __c);
>  }
>  
>  __extension__ extern __inline int32x4_t
> @@ -6955,18 +6945,14 @@ __extension__ extern __inline uint16x8_t
>  __attribute__

Re: [PATCH] aarch64: Use type-qualified builtins for PMUL[L] Neon intrinsics

2021-11-11 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> This patch declares poly type-qualified builtins and uses them for
> PMUL[L] Neon intrinsics. This removes the need for casts in arm_neon.h.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-11-08  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def: Use poly type
> qualifier in builtin generator macros.
> * config/aarch64/arm_neon.h (vmul_p8): Use type-qualified
> builtin and remove casts.
> (vmulq_p8): Likewise.
> (vmull_high_p8): Likewise.
> (vmull_p8): Likewise.

OK, thanks.

Richard

>
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> cdc44f0a22fd29715472e5b2dfe6a19ad0c729dd..35e065fe938e6a6d488dc1b0f084f6ddf2d3618f
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -45,9 +45,9 @@
>  
>BUILTIN_VDC (COMBINE, combine, 0, AUTO_FP)
>VAR1 (COMBINEP, combine, 0, NONE, di)
> -  BUILTIN_VB (BINOP, pmul, 0, NONE)
> -  VAR1 (BINOP, pmull, 0, NONE, v8qi)
> -  VAR1 (BINOP, pmull_hi, 0, NONE, v16qi)
> +  BUILTIN_VB (BINOPP, pmul, 0, NONE)
> +  VAR1 (BINOPP, pmull, 0, NONE, v8qi)
> +  VAR1 (BINOPP, pmull_hi, 0, NONE, v16qi)
>BUILTIN_VHSDF_HSDF (BINOP, fmulx, 0, FP)
>BUILTIN_VHSDF_DF (UNOP, sqrt, 2, FP)
>BUILTIN_VDQ_I (BINOP, addp, 0, NONE)
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> de29b3b7da9a2ab16f6c5bdc832907df5deb7d61..b4a8ec3e328b138c0f368f60bf2534fb10126bd5
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -1007,8 +1007,7 @@ __extension__ extern __inline poly8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmul_p8 (poly8x8_t __a, poly8x8_t __b)
>  {
> -  return (poly8x8_t) __builtin_aarch64_pmulv8qi ((int8x8_t) __a,
> -  (int8x8_t) __b);
> +  return __builtin_aarch64_pmulv8qi_ppp (__a, __b);
>  }
>  
>  __extension__ extern __inline int8x16_t
> @@ -1071,8 +1070,7 @@ __extension__ extern __inline poly8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmulq_p8 (poly8x16_t __a, poly8x16_t __b)
>  {
> -  return (poly8x16_t) __builtin_aarch64_pmulv16qi ((int8x16_t) __a,
> -(int8x16_t) __b);
> +  return __builtin_aarch64_pmulv16qi_ppp (__a, __b);
>  }
>  
>  __extension__ extern __inline int8x8_t
> @@ -7716,8 +7714,7 @@ __extension__ extern __inline poly16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmull_high_p8 (poly8x16_t __a, poly8x16_t __b)
>  {
> -  return (poly16x8_t) __builtin_aarch64_pmull_hiv16qi ((int8x16_t) __a,
> -(int8x16_t) __b);
> +  return __builtin_aarch64_pmull_hiv16qi_ppp (__a, __b);
>  }
>  
>  __extension__ extern __inline int16x8_t
> @@ -7850,8 +7847,7 @@ __extension__ extern __inline poly16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmull_p8 (poly8x8_t __a, poly8x8_t __b)
>  {
> -  return (poly16x8_t) __builtin_aarch64_pmullv8qi ((int8x8_t) __a,
> -(int8x8_t) __b);
> +  return __builtin_aarch64_pmullv8qi_ppp (__a, __b);
>  }
>  
>  __extension__ extern __inline int16x8_t

[PATCH] aarch64: Use type-qualified builtins for [R]SHRN[2] Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

Thus patch declares unsigned type-qualified builtins and uses them for
[R]SHRN[2] Neon intrinsics. This removes the need for casts in
arm_neon.h.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-08  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for [R]SHRN[2].
* config/aarch64/arm_neon.h (vshrn_n_u16): Use type-qualified
builtin and remove casts.
(vshrn_n_u32): Likewise.
(vshrn_n_u64): Likewise.
(vrshrn_high_n_u16): Likewise.
(vrshrn_high_n_u32): Likewise.
(vrshrn_high_n_u64): Likewise.
(vrshrn_n_u16): Likewise.
(vrshrn_n_u32): Likewise.
(vrshrn_n_u64): Likewise.
(vshrn_high_n_u16): Likewise.
(vshrn_high_n_u32): Likewise.
(vshrn_high_n_u64): Likewise.


rb15032.patch
Description: rb15032.patch

Re: [PATCH] aarch64: Use type-qualified builtins for XTN[2] Neon intrinsics

2021-11-11 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> This patch declares unsigned type-qualified builtins and uses them for
> XTN[2] Neon intrinsics. This removes the need for casts in arm_neon.h.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-11-08  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def: Declare unsigned
> type-qualified builtins for XTN[2].
> * config/aarch64/arm_neon.h (vmovn_high_u16): Use type-
> qualified builtin and remove casts.
> (vmovn_high_u32): Likewise.
> (vmovn_high_u64): Likewise.
> (vmovn_u16): Likewise.
> (vmovn_u32): Likewise.
> (vmovn_u64): Likewise.


OK, thanks.

Richard

>
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> 35e065fe938e6a6d488dc1b0f084f6ddf2d3618f..5e6df6abe3f5b42710a266d0b2a7a1e4597975a6
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -235,6 +235,7 @@
>  
>/* Implemented by aarch64_xtn.  */
>BUILTIN_VQN (UNOP, xtn, 0, NONE)
> +  BUILTIN_VQN (UNOPU, xtn, 0, NONE)
>  
>/* Implemented by aarch64_mla.  */
>BUILTIN_VDQ_BHSI (TERNOP, mla, 0, NONE)
> @@ -489,7 +490,8 @@
>BUILTIN_VSDQ_I (USHIFTIMM, uqshl_n, 0, NONE)
>  
>/* Implemented by aarch64_xtn2.  */
> -  BUILTIN_VQN (UNOP, xtn2, 0, NONE)
> +  BUILTIN_VQN (BINOP, xtn2, 0, NONE)
> +  BUILTIN_VQN (BINOPU, xtn2, 0, NONE)
>  
>/* Implemented by vec_unpack_hi_.  */
>BUILTIN_VQW (UNOP, vec_unpacks_hi_, 10, NONE)
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> b4a8ec3e328b138c0f368f60bf2534fb10126bd5..51cedab19d8d1c261fbcf9a6d3202c2e1b513183
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -7522,24 +7522,21 @@ __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmovn_high_u16 (uint8x8_t __a, uint16x8_t __b)
>  {
> -  return (uint8x16_t)
> -__builtin_aarch64_xtn2v8hi ((int8x8_t) __a, (int16x8_t) __b);
> +  return __builtin_aarch64_xtn2v8hi_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmovn_high_u32 (uint16x4_t __a, uint32x4_t __b)
>  {
> -  return (uint16x8_t)
> -__builtin_aarch64_xtn2v4si ((int16x4_t) __a, (int32x4_t) __b);
> +  return __builtin_aarch64_xtn2v4si_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmovn_high_u64 (uint32x2_t __a, uint64x2_t __b)
>  {
> -  return (uint32x4_t)
> -__builtin_aarch64_xtn2v2di ((int32x2_t) __a, (int64x2_t) __b);
> +  return __builtin_aarch64_xtn2v2di_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline int8x8_t
> @@ -7567,21 +7564,21 @@ __extension__ extern __inline uint8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmovn_u16 (uint16x8_t __a)
>  {
> -  return (uint8x8_t)__builtin_aarch64_xtnv8hi ((int16x8_t) __a);
> +  return __builtin_aarch64_xtnv8hi_uu (__a);
>  }
>  
>  __extension__ extern __inline uint16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmovn_u32 (uint32x4_t __a)
>  {
> -  return (uint16x4_t) __builtin_aarch64_xtnv4si ((int32x4_t )__a);
> +  return __builtin_aarch64_xtnv4si_uu (__a);
>  }
>  
>  __extension__ extern __inline uint32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmovn_u64 (uint64x2_t __a)
>  {
> -  return (uint32x2_t) __builtin_aarch64_xtnv2di ((int64x2_t) __a);
> +  return __builtin_aarch64_xtnv2di_uu (__a);
>  }
>  
>  __extension__ extern __inline int8x8_t

[PATCH] aarch64: Use type-qualified builtins for UADD[LW][2] Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

This patch declares unsigned type-qualified builtins and uses them to
implement widening-add Neon intrinsics. This removes the need for
many casts in arm_neon.h.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-09  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for uadd[lw][2] builtins.
* config/aarch64/arm_neon.h (vaddl_s8): Remove unnecessary
cast.
(vaddl_s16): Likewise.
(vaddl_s32): Likewise.
(vaddl_u8): Use type-qualified builtin and remove casts.
(vaddl_u16): Likewise.
(vaddl_u32): Likewise.
(vaddl_high_s8): Remove unnecessary cast.
(vaddl_high_s16): Likewise.
(vaddl_high_s32): Likewise.
(vaddl_high_u8): Use type-qualified builtin and remove casts.
(vaddl_high_u16): Likewise.
(vaddl_high_u32): Likewise.
(vaddw_s8): Remove unnecessary cast.
(vaddw_s16): Likewise.
(vaddw_s32): Likewise.
(vaddw_u8): Use type-qualified builtin and remove casts.
(vaddw_u16): Likewise.
(vaddw_u32): Likewise.
(vaddw_high_s8): Remove unnecessary cast.
(vaddw_high_s16): Likewise.
(vaddw_high_s32): Likewise.
(vaddw_high_u8): Use type-qualified builtin and remove casts.
(vaddw_high_u16): Likewise.
(vaddw_high_u32): Likewise.


rb15033.patch
Description: rb15033.patch

[PATCH] aarch64: Use type-qualified builtins for USUB[LW][2] Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

This patch declares unsigned type-qualified builtins and uses them to
implement widening-subtract Neon intrinsics. This removes the need
for many casts in arm_neon.h.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-09  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for usub[lw][2] builtins.
* config/aarch64/arm_neon.h (vsubl_s8): Remove unnecessary
cast.
(vsubl_s16): Likewise.
(vsubl_s32): Likewise.
(vsubl_u8): Use type-qualified builtin and remove casts.
(vsubl_u16): Likewise.
(vsubl_u32): Likewise.
(vsubl_high_s8): Remove unnecessary cast.
(vsubl_high_s16): Likewise.
(vsubl_high_s32): Likewise.
(vsubl_high_u8): Use type-qualified builtin and remove casts.
(vsubl_high_u16): Likewise.
(vsubl_high_u32): Likewise.
(vsubw_s8): Remove unnecessary casts.
(vsubw_s16): Likewise.
(vsubw_s32): Likewise.
(vsubw_u8): Use type-qualified builtin and remove casts.
(vsubw_u16): Likewise.
(vsubw_u32): Likewise.
(vsubw_high_s8): Remove unnecessary cast.
(vsubw_high_s16): Likewise.
(vsubw_high_s32): Likewise.
(vsubw_high_u8): Use type-qualified builtin and remove casts.
(vsubw_high_u16): Likewise.
(vsubw_high_u32): Likewise.


rb15034.patch
Description: rb15034.patch

Re: [PATCH] aarch64: Use type-qualified builtins for [R]SHRN[2] Neon intrinsics

2021-11-11 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> Thus patch declares unsigned type-qualified builtins and uses them for
> [R]SHRN[2] Neon intrinsics. This removes the need for casts in
> arm_neon.h.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-11-08  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def: Declare type-
> qualified builtins for [R]SHRN[2].
> * config/aarch64/arm_neon.h (vshrn_n_u16): Use type-qualified
> builtin and remove casts.
> (vshrn_n_u32): Likewise.
> (vshrn_n_u64): Likewise.
> (vrshrn_high_n_u16): Likewise.
> (vrshrn_high_n_u32): Likewise.
> (vrshrn_high_n_u64): Likewise.
> (vrshrn_n_u16): Likewise.
> (vrshrn_n_u32): Likewise.
> (vrshrn_n_u64): Likewise.
> (vshrn_high_n_u16): Likewise.
> (vshrn_high_n_u32): Likewise.
> (vshrn_high_n_u64): Likewise.

OK, thanks.

Richard

>
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> 5e6df6abe3f5b42710a266d0b2a7a1e4597975a6..46ec2f9bfc509e5e460334d4c5324ddf18703639
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -253,15 +253,19 @@
>  
>/* Implemented by aarch64_shrn".  */
>BUILTIN_VQN (SHIFTIMM, shrn, 0, NONE)
> +  BUILTIN_VQN (USHIFTIMM, shrn, 0, NONE)
>  
>/* Implemented by aarch64_shrn2.  */
> -  BUILTIN_VQN (SHIFTACC, shrn2, 0, NONE)
> +  BUILTIN_VQN (SHIFT2IMM, shrn2, 0, NONE)
> +  BUILTIN_VQN (USHIFT2IMM, shrn2, 0, NONE)
>  
>/* Implemented by aarch64_rshrn".  */
>BUILTIN_VQN (SHIFTIMM, rshrn, 0, NONE)
> +  BUILTIN_VQN (USHIFTIMM, rshrn, 0, NONE)
>  
>/* Implemented by aarch64_rshrn2.  */
> -  BUILTIN_VQN (SHIFTACC, rshrn2, 0, NONE)
> +  BUILTIN_VQN (SHIFT2IMM, rshrn2, 0, NONE)
> +  BUILTIN_VQN (USHIFT2IMM, rshrn2, 0, NONE)
>  
>/* Implemented by aarch64_mlsl.  */
>BUILTIN_VD_BHSI (TERNOP, smlsl, 0, NONE)
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> 51cedab19d8d1c261fbcf9a6d3202c2e1b513183..37f02e2a24fbc85f23ea73e2fd0e06deac7db87e
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -7606,21 +7606,21 @@ __extension__ extern __inline uint8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vshrn_n_u16 (uint16x8_t __a, const int __b)
>  {
> -  return (uint8x8_t)__builtin_aarch64_shrnv8hi ((int16x8_t)__a, __b);
> +  return __builtin_aarch64_shrnv8hi_uus (__a, __b);
>  }
>  
>  __extension__ extern __inline uint16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vshrn_n_u32 (uint32x4_t __a, const int __b)
>  {
> -  return (uint16x4_t)__builtin_aarch64_shrnv4si ((int32x4_t)__a, __b);
> +  return __builtin_aarch64_shrnv4si_uus (__a, __b);
>  }
>  
>  __extension__ extern __inline uint32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vshrn_n_u64 (uint64x2_t __a, const int __b)
>  {
> -  return (uint32x2_t)__builtin_aarch64_shrnv2di ((int64x2_t)__a, __b);
> +  return __builtin_aarch64_shrnv2di_uus (__a, __b);
>  }
>  
>  __extension__ extern __inline int32x4_t
> @@ -8387,24 +8387,21 @@ __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vrshrn_high_n_u16 (uint8x8_t __a, uint16x8_t __b, const int __c)
>  {
> -  return (uint8x16_t) __builtin_aarch64_rshrn2v8hi ((int8x8_t) __a,
> - (int16x8_t) __b, __c);
> +  return __builtin_aarch64_rshrn2v8hi_uuus (__a, __b, __c);
>  }
>  
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vrshrn_high_n_u32 (uint16x4_t __a, uint32x4_t __b, const int __c)
>  {
> -  return (uint16x8_t) __builtin_aarch64_rshrn2v4si ((int16x4_t) __a,
> - (int32x4_t) __b, __c);
> +  return __builtin_aarch64_rshrn2v4si_uuus (__a, __b, __c);
>  }
>  
>  __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vrshrn_high_n_u64 (uint32x2_t __a, uint64x2_t __b, const int __c)
>  {
> -  return (uint32x4_t) __builtin_aarch64_rshrn2v2di ((int32x2_t)__a,
> - (int64x2_t)__b, __c);
> +  return __builtin_aarch64_rshrn2v2di_uuus (__a, __b, __c);
>  }
>  
>  __extension__ extern __inline int8x8_t
> @@ -8432,21 +8429,21 @@ __extension__ extern __inline uint8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vrshrn_n_u16 (uint16x8_t __a, const int __b)
>  {
> -  return (uint8x8_t) __builtin_aarch64_rshrnv8hi ((int16x8_t) __a, __b);
> +  return __builtin_aarch64_rshrnv8hi_uus (__a, __b);
>  }
>  
>  __extension__ extern __inline uint16x

[PATCH] aarch64: Use type-qualified builtins for U[R]HADD Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

This patch declares unsigned type-qualified builtins and uses them to
implement (rounding) halving-add Neon intrinsics. This removes the
need for many casts in arm_neon.h.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-09  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for u[r]hadd builtins.
* config/aarch64/arm_neon.h (vhadd_s8): Remove unnecessary
cast.
(vhadd_s16): Likewise.
(vhadd_s32): Likewise.
(vhadd_u8): Use type-qualified builtin and remove casts.
(vhadd_u16): Likewise.
(vhadd_u32): Likewise.
(vhaddq_s8): Remove unnecessary cast.
(vhaddq_s16): Likewise.
(vhaddq_s32): Likewise.
(vhaddq_u8): Use type-qualified builtin and remove casts.
(vhaddq_u16): Likewise.
(vhaddq_u32): Likewise.
(vrhadd_s8): Remove unnecessary cast.
(vrhadd_s16): Likewise.
(vrhadd_s32): Likewise.
(vrhadd_u8): Use type-qualified builtin and remove casts.
(vrhadd_u16): Likewise.
(vrhadd_u32): Likewise.
(vrhaddq_s8): Remove unnecessary cast.
(vrhaddq_s16): Likewise.
(vrhaddq_s32): Likewise.
(vrhaddq_u8): Use type-wualified builtin and remove casts.
(vrhaddq_u16): Likewise.
(vrhaddq_u32): Likewise.


rb15035.patch
Description: rb15035.patch

[PATCH] aarch64: Use type-qualified builtins for UHSUB Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

This patch declares unsigned type-qualified builtins and uses them to
implement halving-subtract Neon intrinsics. This removes the need for
many casts in arm_neon.h.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-09  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for uhsub builtins.
* config/aarch64/arm_neon.h (vhsub_s8): Remove unnecessary
cast.
(vhsub_s16): Likewise.
(vhsub_s32): Likewise.
(vhsub_u8): Use type-qualified builtin and remove casts.
(vhsub_u16): Likewise.
(vhsub_u32): Likewise.
(vhsubq_s8): Remove unnecessary cast.
(vhsubq_s16): Likewise.
(vhsubq_s32): Likewise.
(vhsubq_u8): Use type-qualified builtin and remove casts.
(vhsubq_u16): Likewise.
(vhsubq_u32): Likewise.


rb15036.patch
Description: rb15036.patch

[PATCH] aarch64: Use type-qualified builtins for [R]ADDHN[2] Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

This patch declares unsigned type-qualified builtins and uses them to
implement (rounding) halving-narrowing-add Neon intrinsics. This
removes the need for many casts in arm_neon.h.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-09  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
builtins for [r]addhn[2].
* config/aarch64/arm_neon.h (vaddhn_s16): Remove unnecessary
cast.
(vaddhn_s32): Likewise.
(vaddhn_s64): Likewise.
(vaddhn_u16): Use type-qualified builtin and remove casts.
(vaddhn_u32): Likewise.
(vaddhn_u64): Likewise.
(vraddhn_s16): Remove unnecessary cast.
(vraddhn_s32): Likewise.
(vraddhn_s64): Likewise.
(vraddhn_u16): Use type-qualified builtin and remove casts.
(vraddhn_u32): Likewise.
(vraddhn_u64): Likewise.
(vaddhn_high_s16): Remove unnecessary cast.
(vaddhn_high_s32): Likewise.
(vaddhn_high_s64): Likewise.
(vaddhn_high_u16): Use type-qualified builtin and remove
casts.
(vaddhn_high_u32): Likewise.
(vaddhn_high_u64): Likewise.
(vraddhn_high_s16): Remove unnecessary cast.
(vraddhn_high_s32): Likewise.
(vraddhn_high_s64): Likewise.
(vraddhn_high_u16): Use type-qualified builtin and remove
casts.
(vraddhn_high_u32): Likewise.
(vraddhn_high_u64): Likewise.


rb15037.patch
Description: rb15037.patch

Re: [PATCH] aarch64: Use type-qualified builtins for USUB[LW][2] Neon intrinsics

2021-11-11 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> This patch declares unsigned type-qualified builtins and uses them to
> implement widening-subtract Neon intrinsics. This removes the need
> for many casts in arm_neon.h.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-11-09  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
> qualifiers in generator macros for usub[lw][2] builtins.
> * config/aarch64/arm_neon.h (vsubl_s8): Remove unnecessary
> cast.
> (vsubl_s16): Likewise.
> (vsubl_s32): Likewise.
> (vsubl_u8): Use type-qualified builtin and remove casts.
> (vsubl_u16): Likewise.
> (vsubl_u32): Likewise.
> (vsubl_high_s8): Remove unnecessary cast.
> (vsubl_high_s16): Likewise.
> (vsubl_high_s32): Likewise.
> (vsubl_high_u8): Use type-qualified builtin and remove casts.
> (vsubl_high_u16): Likewise.
> (vsubl_high_u32): Likewise.
> (vsubw_s8): Remove unnecessary casts.
> (vsubw_s16): Likewise.
> (vsubw_s32): Likewise.
> (vsubw_u8): Use type-qualified builtin and remove casts.
> (vsubw_u16): Likewise.
> (vsubw_u32): Likewise.
> (vsubw_high_s8): Remove unnecessary cast.
> (vsubw_high_s16): Likewise.
> (vsubw_high_s32): Likewise.
> (vsubw_high_u8): Use type-qualified builtin and remove casts.
> (vsubw_high_u16): Likewise.
> (vsubw_high_u32): Likewise.

OK, thanks.

Richard

>
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> ccd194978f948201698aec16d74baa82c187cad4..be06a80cea379b8b78c798dbec47fb95eec68db1
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -160,21 +160,21 @@
>BUILTIN_VQW (BINOP, saddl2, 0, NONE)
>BUILTIN_VQW (BINOPU, uaddl2, 0, NONE)
>BUILTIN_VQW (BINOP, ssubl2, 0, NONE)
> -  BUILTIN_VQW (BINOP, usubl2, 0, NONE)
> +  BUILTIN_VQW (BINOPU, usubl2, 0, NONE)
>BUILTIN_VQW (BINOP, saddw2, 0, NONE)
>BUILTIN_VQW (BINOPU, uaddw2, 0, NONE)
>BUILTIN_VQW (BINOP, ssubw2, 0, NONE)
> -  BUILTIN_VQW (BINOP, usubw2, 0, NONE)
> +  BUILTIN_VQW (BINOPU, usubw2, 0, NONE)
>/* Implemented by aarch64_l.  */
>BUILTIN_VD_BHSI (BINOP, saddl, 0, NONE)
>BUILTIN_VD_BHSI (BINOPU, uaddl, 0, NONE)
>BUILTIN_VD_BHSI (BINOP, ssubl, 0, NONE)
> -  BUILTIN_VD_BHSI (BINOP, usubl, 0, NONE)
> +  BUILTIN_VD_BHSI (BINOPU, usubl, 0, NONE)
>/* Implemented by aarch64_w.  */
>BUILTIN_VD_BHSI (BINOP, saddw, 0, NONE)
>BUILTIN_VD_BHSI (BINOPU, uaddw, 0, NONE)
>BUILTIN_VD_BHSI (BINOP, ssubw, 0, NONE)
> -  BUILTIN_VD_BHSI (BINOP, usubw, 0, NONE)
> +  BUILTIN_VD_BHSI (BINOPU, usubw, 0, NONE)
>/* Implemented by aarch64_h.  */
>BUILTIN_VDQ_BHSI (BINOP, shadd, 0, NONE)
>BUILTIN_VDQ_BHSI (BINOP, shsub, 0, NONE)
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> a3d742f25a896f8e736a5fb01535d372cd4b20db..58b3dddb2c4ebf856de0e9cf0399e42d322beff9
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -1765,180 +1765,168 @@ __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vsubl_s8 (int8x8_t __a, int8x8_t __b)
>  {
> -  return (int16x8_t) __builtin_aarch64_ssublv8qi (__a, __b);
> +  return __builtin_aarch64_ssublv8qi (__a, __b);
>  }
>  
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vsubl_s16 (int16x4_t __a, int16x4_t __b)
>  {
> -  return (int32x4_t) __builtin_aarch64_ssublv4hi (__a, __b);
> +  return __builtin_aarch64_ssublv4hi (__a, __b);
>  }
>  
>  __extension__ extern __inline int64x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vsubl_s32 (int32x2_t __a, int32x2_t __b)
>  {
> -  return (int64x2_t) __builtin_aarch64_ssublv2si (__a, __b);
> +  return __builtin_aarch64_ssublv2si (__a, __b);
>  }
>  
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vsubl_u8 (uint8x8_t __a, uint8x8_t __b)
>  {
> -  return (uint16x8_t) __builtin_aarch64_usublv8qi ((int8x8_t) __a,
> -(int8x8_t) __b);
> +  return __builtin_aarch64_usublv8qi_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vsubl_u16 (uint16x4_t __a, uint16x4_t __b)
>  {
> -  return (uint32x4_t) __builtin_aarch64_usublv4hi ((int16x4_t) __a,
> -(int16x4_t) __b);
> +  return __builtin_aarch64_usublv4hi_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint64x2_t
>  __attrib

Re: [PATCH] aarch64: Use type-qualified builtins for U[R]HADD Neon intrinsics

2021-11-11 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> This patch declares unsigned type-qualified builtins and uses them to
> implement (rounding) halving-add Neon intrinsics. This removes the
> need for many casts in arm_neon.h.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-11-09  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
> qualifiers in generator macros for u[r]hadd builtins.
> * config/aarch64/arm_neon.h (vhadd_s8): Remove unnecessary
> cast.
> (vhadd_s16): Likewise.
> (vhadd_s32): Likewise.
> (vhadd_u8): Use type-qualified builtin and remove casts.
> (vhadd_u16): Likewise.
> (vhadd_u32): Likewise.
> (vhaddq_s8): Remove unnecessary cast.
> (vhaddq_s16): Likewise.
> (vhaddq_s32): Likewise.
> (vhaddq_u8): Use type-qualified builtin and remove casts.
> (vhaddq_u16): Likewise.
> (vhaddq_u32): Likewise.
> (vrhadd_s8): Remove unnecessary cast.
> (vrhadd_s16): Likewise.
> (vrhadd_s32): Likewise.
> (vrhadd_u8): Use type-qualified builtin and remove casts.
> (vrhadd_u16): Likewise.
> (vrhadd_u32): Likewise.
> (vrhaddq_s8): Remove unnecessary cast.
> (vrhaddq_s16): Likewise.
> (vrhaddq_s32): Likewise.
> (vrhaddq_u8): Use type-wualified builtin and remove casts.
> (vrhaddq_u16): Likewise.
> (vrhaddq_u32): Likewise.

OK, thanks.

Richard

>
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> be06a80cea379b8b78c798dbec47fb95eec68db1..8f9a8d1707dfdf6111d740da53275e79500e8cde
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -178,10 +178,10 @@
>/* Implemented by aarch64_h.  */
>BUILTIN_VDQ_BHSI (BINOP, shadd, 0, NONE)
>BUILTIN_VDQ_BHSI (BINOP, shsub, 0, NONE)
> -  BUILTIN_VDQ_BHSI (BINOP, uhadd, 0, NONE)
> +  BUILTIN_VDQ_BHSI (BINOPU, uhadd, 0, NONE)
>BUILTIN_VDQ_BHSI (BINOP, uhsub, 0, NONE)
>BUILTIN_VDQ_BHSI (BINOP, srhadd, 0, NONE)
> -  BUILTIN_VDQ_BHSI (BINOP, urhadd, 0, NONE)
> +  BUILTIN_VDQ_BHSI (BINOPU, urhadd, 0, NONE)
>  
>/* Implemented by aarch64_addlp.  */
>BUILTIN_VDQV_L (UNOP, saddlp, 0, NONE)
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> 58b3dddb2c4ebf856de0e9cf0399e42d322beff9..73eea7c261f49155d616a2ddf1d96d4be9bca53f
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -545,180 +545,168 @@ __extension__ extern __inline int8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhadd_s8 (int8x8_t __a, int8x8_t __b)
>  {
> -  return (int8x8_t) __builtin_aarch64_shaddv8qi (__a, __b);
> +  return __builtin_aarch64_shaddv8qi (__a, __b);
>  }
>  
>  __extension__ extern __inline int16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhadd_s16 (int16x4_t __a, int16x4_t __b)
>  {
> -  return (int16x4_t) __builtin_aarch64_shaddv4hi (__a, __b);
> +  return __builtin_aarch64_shaddv4hi (__a, __b);
>  }
>  
>  __extension__ extern __inline int32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhadd_s32 (int32x2_t __a, int32x2_t __b)
>  {
> -  return (int32x2_t) __builtin_aarch64_shaddv2si (__a, __b);
> +  return __builtin_aarch64_shaddv2si (__a, __b);
>  }
>  
>  __extension__ extern __inline uint8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhadd_u8 (uint8x8_t __a, uint8x8_t __b)
>  {
> -  return (uint8x8_t) __builtin_aarch64_uhaddv8qi ((int8x8_t) __a,
> -   (int8x8_t) __b);
> +  return __builtin_aarch64_uhaddv8qi_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhadd_u16 (uint16x4_t __a, uint16x4_t __b)
>  {
> -  return (uint16x4_t) __builtin_aarch64_uhaddv4hi ((int16x4_t) __a,
> -(int16x4_t) __b);
> +  return __builtin_aarch64_uhaddv4hi_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhadd_u32 (uint32x2_t __a, uint32x2_t __b)
>  {
> -  return (uint32x2_t) __builtin_aarch64_uhaddv2si ((int32x2_t) __a,
> -(int32x2_t) __b);
> +  return __builtin_aarch64_uhaddv2si_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhaddq_s8 (int8x16_t __a, int8x16_t __b)
>  {
> -  return (int8x16_t) __builtin_aarch64_shaddv16qi (__a, __b);
> +  return __builtin_aarch64_shaddv16qi (__a, __b);
>  }
>  
>  __extension__ ex

[PATCH] aarch64: Use type-qualified builtins for [R]SUBHN[2] Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

This patch declares unsigned type-qualified builtins and uses them to
implement (rounding) halving-narrowing-subtract Neon intrinsics. This
removes the need for many casts in arm_neon.h.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-09  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
builtins for [r]subhn[2].
* config/aarch64/arm_neon.h (vsubhn_s16): Remove unnecessary
cast.
(vsubhn_s32): Likewise.
(vsubhn_s64): Likewise.
(vsubhn_u16): Use type-qualified builtin and remove casts.
(vsubhn_u32): Likewise.
(vsubhn_u64): Likewise.
(vrsubhn_s16): Remove unnecessary cast.
(vrsubhn_s32): Likewise.
(vrsubhn_s64): Likewise.
(vrsubhn_u16): Use type-qualified builtin and remove casts.
(vrsubhn_u32): Likewise.
(vrsubhn_u64): Likewise.
(vrsubhn_high_s16): Remove unnecessary cast.
(vrsubhn_high_s32): Likewise.
(vrsubhn_high_s64): Likewise.
(vrsubhn_high_u16): Use type-qualified builtin and remove
casts.
(vrsubhn_high_u32): Likewise.
(vrsubhn_high_u64): Likewise.
(vsubhn_high_s16): Remove unnecessary cast.
(vsubhn_high_s32): Likewise.
(vsubhn_high_s64): Likewise.
(vsubhn_high_u16): Use type-qualified builtin and remove
casts.
(vsubhn_high_u32): Likewise.
(vsubhn_high_u64): Likewise.


rb15038.patch
Description: rb15038.patch

Re: [PATCH] aarch64: Use type-qualified builtins for UHSUB Neon intrinsics

2021-11-11 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> This patch declares unsigned type-qualified builtins and uses them to
> implement halving-subtract Neon intrinsics. This removes the need for
> many casts in arm_neon.h.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-11-09  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
> qualifiers in generator macros for uhsub builtins.
> * config/aarch64/arm_neon.h (vhsub_s8): Remove unnecessary
> cast.
> (vhsub_s16): Likewise.
> (vhsub_s32): Likewise.
> (vhsub_u8): Use type-qualified builtin and remove casts.
> (vhsub_u16): Likewise.
> (vhsub_u32): Likewise.
> (vhsubq_s8): Remove unnecessary cast.
> (vhsubq_s16): Likewise.
> (vhsubq_s32): Likewise.
> (vhsubq_u8): Use type-qualified builtin and remove casts.
> (vhsubq_u16): Likewise.
> (vhsubq_u32): Likewise.

OK, thanks.

Richard

> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> 8f9a8d1707dfdf6111d740da53275e79500e8cde..af04b732227439dcaaa2f3751097050d988eb729
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -179,7 +179,7 @@
>BUILTIN_VDQ_BHSI (BINOP, shadd, 0, NONE)
>BUILTIN_VDQ_BHSI (BINOP, shsub, 0, NONE)
>BUILTIN_VDQ_BHSI (BINOPU, uhadd, 0, NONE)
> -  BUILTIN_VDQ_BHSI (BINOP, uhsub, 0, NONE)
> +  BUILTIN_VDQ_BHSI (BINOPU, uhsub, 0, NONE)
>BUILTIN_VDQ_BHSI (BINOP, srhadd, 0, NONE)
>BUILTIN_VDQ_BHSI (BINOPU, urhadd, 0, NONE)
>  
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> 73eea7c261f49155d616a2ddf1d96d4be9bca53f..b2781f680d142b848f622d2f4965b42985885502
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -1956,90 +1956,84 @@ __extension__ extern __inline int8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhsub_s8 (int8x8_t __a, int8x8_t __b)
>  {
> -  return (int8x8_t)__builtin_aarch64_shsubv8qi (__a, __b);
> +  return __builtin_aarch64_shsubv8qi (__a, __b);
>  }
>  
>  __extension__ extern __inline int16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhsub_s16 (int16x4_t __a, int16x4_t __b)
>  {
> -  return (int16x4_t) __builtin_aarch64_shsubv4hi (__a, __b);
> +  return __builtin_aarch64_shsubv4hi (__a, __b);
>  }
>  
>  __extension__ extern __inline int32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhsub_s32 (int32x2_t __a, int32x2_t __b)
>  {
> -  return (int32x2_t) __builtin_aarch64_shsubv2si (__a, __b);
> +  return __builtin_aarch64_shsubv2si (__a, __b);
>  }
>  
>  __extension__ extern __inline uint8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhsub_u8 (uint8x8_t __a, uint8x8_t __b)
>  {
> -  return (uint8x8_t) __builtin_aarch64_uhsubv8qi ((int8x8_t) __a,
> -   (int8x8_t) __b);
> +  return __builtin_aarch64_uhsubv8qi_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhsub_u16 (uint16x4_t __a, uint16x4_t __b)
>  {
> -  return (uint16x4_t) __builtin_aarch64_uhsubv4hi ((int16x4_t) __a,
> -(int16x4_t) __b);
> +  return __builtin_aarch64_uhsubv4hi_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhsub_u32 (uint32x2_t __a, uint32x2_t __b)
>  {
> -  return (uint32x2_t) __builtin_aarch64_uhsubv2si ((int32x2_t) __a,
> -(int32x2_t) __b);
> +  return __builtin_aarch64_uhsubv2si_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhsubq_s8 (int8x16_t __a, int8x16_t __b)
>  {
> -  return (int8x16_t) __builtin_aarch64_shsubv16qi (__a, __b);
> +  return __builtin_aarch64_shsubv16qi (__a, __b);
>  }
>  
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhsubq_s16 (int16x8_t __a, int16x8_t __b)
>  {
> -  return (int16x8_t) __builtin_aarch64_shsubv8hi (__a, __b);
> +  return __builtin_aarch64_shsubv8hi (__a, __b);
>  }
>  
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhsubq_s32 (int32x4_t __a, int32x4_t __b)
>  {
> -  return (int32x4_t) __builtin_aarch64_shsubv4si (__a, __b);
> +  return __builtin_aarch64_shsubv4si (__a, __b);
>  }
>  
>  __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vhsubq_u8 (uint8x

[PATCH] aarch64: Use type-qualified builtins for ADDP Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

This patch declares unsigned type-qualified builtins and uses them to
implement the pairwise addition Neon intrinsics. This removes the need
for many casts in arm_neon.h.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-09  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def:
* config/aarch64/arm_neon.h (vpaddq_u8): Use type-qualified
builtin and remove casts.
(vpaddq_u16): Likewise.
(vpaddq_u32): Likewise.
(vpaddq_u64): Likewise.
(vpadd_u8): Likewise.
(vpadd_u16): Likewise.
(vpadd_u32): Likewise.
(vpaddd_u64): Likewise.


rb15039.patch
Description: rb15039.patch

Re: [PATCH] aarch64: Use type-qualified builtins for [R]ADDHN[2] Neon intrinsics

2021-11-11 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> This patch declares unsigned type-qualified builtins and uses them to
> implement (rounding) halving-narrowing-add Neon intrinsics. This
> removes the need for many casts in arm_neon.h.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-11-09  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def: Declare unsigned
> builtins for [r]addhn[2].
> * config/aarch64/arm_neon.h (vaddhn_s16): Remove unnecessary
> cast.
> (vaddhn_s32): Likewise.
> (vaddhn_s64): Likewise.
> (vaddhn_u16): Use type-qualified builtin and remove casts.
> (vaddhn_u32): Likewise.
> (vaddhn_u64): Likewise.
> (vraddhn_s16): Remove unnecessary cast.
> (vraddhn_s32): Likewise.
> (vraddhn_s64): Likewise.
> (vraddhn_u16): Use type-qualified builtin and remove casts.
> (vraddhn_u32): Likewise.
> (vraddhn_u64): Likewise.
> (vaddhn_high_s16): Remove unnecessary cast.
> (vaddhn_high_s32): Likewise.
> (vaddhn_high_s64): Likewise.
> (vaddhn_high_u16): Use type-qualified builtin and remove
> casts.
> (vaddhn_high_u32): Likewise.
> (vaddhn_high_u64): Likewise.
> (vraddhn_high_s16): Remove unnecessary cast.
> (vraddhn_high_s32): Likewise.
> (vraddhn_high_s64): Likewise.
> (vraddhn_high_u16): Use type-qualified builtin and remove
> casts.
> (vraddhn_high_u32): Likewise.
> (vraddhn_high_u64): Likewise.

OK, thanks.

Richard

> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> af04b732227439dcaaa2f3751097050d988eb729..6372da80be33c40cb27e5811bfb4f4f672f28a35
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -220,13 +220,17 @@
>  
>/* Implemented by aarch64_hn.  */
>BUILTIN_VQN (BINOP, addhn, 0, NONE)
> +  BUILTIN_VQN (BINOPU, addhn, 0, NONE)
>BUILTIN_VQN (BINOP, subhn, 0, NONE)
>BUILTIN_VQN (BINOP, raddhn, 0, NONE)
> +  BUILTIN_VQN (BINOPU, raddhn, 0, NONE)
>BUILTIN_VQN (BINOP, rsubhn, 0, NONE)
>/* Implemented by aarch64_hn2.  */
>BUILTIN_VQN (TERNOP, addhn2, 0, NONE)
> +  BUILTIN_VQN (TERNOPU, addhn2, 0, NONE)
>BUILTIN_VQN (TERNOP, subhn2, 0, NONE)
>BUILTIN_VQN (TERNOP, raddhn2, 0, NONE)
> +  BUILTIN_VQN (TERNOPU, raddhn2, 0, NONE)
>BUILTIN_VQN (TERNOP, rsubhn2, 0, NONE)
>  
>/* Implemented by aarch64_xtl.  */
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> b2781f680d142b848f622d2f4965b42985885502..cb481542ba0d6ffb7cc8ffe7c1a098930fc5e746
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -713,186 +713,168 @@ __extension__ extern __inline int8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vaddhn_s16 (int16x8_t __a, int16x8_t __b)
>  {
> -  return (int8x8_t) __builtin_aarch64_addhnv8hi (__a, __b);
> +  return __builtin_aarch64_addhnv8hi (__a, __b);
>  }
>  
>  __extension__ extern __inline int16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vaddhn_s32 (int32x4_t __a, int32x4_t __b)
>  {
> -  return (int16x4_t) __builtin_aarch64_addhnv4si (__a, __b);
> +  return __builtin_aarch64_addhnv4si (__a, __b);
>  }
>  
>  __extension__ extern __inline int32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vaddhn_s64 (int64x2_t __a, int64x2_t __b)
>  {
> -  return (int32x2_t) __builtin_aarch64_addhnv2di (__a, __b);
> +  return __builtin_aarch64_addhnv2di (__a, __b);
>  }
>  
>  __extension__ extern __inline uint8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vaddhn_u16 (uint16x8_t __a, uint16x8_t __b)
>  {
> -  return (uint8x8_t) __builtin_aarch64_addhnv8hi ((int16x8_t) __a,
> -   (int16x8_t) __b);
> +  return __builtin_aarch64_addhnv8hi_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vaddhn_u32 (uint32x4_t __a, uint32x4_t __b)
>  {
> -  return (uint16x4_t) __builtin_aarch64_addhnv4si ((int32x4_t) __a,
> -(int32x4_t) __b);
> +  return __builtin_aarch64_addhnv4si_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vaddhn_u64 (uint64x2_t __a, uint64x2_t __b)
>  {
> -  return (uint32x2_t) __builtin_aarch64_addhnv2di ((int64x2_t) __a,
> -(int64x2_t) __b);
> +  return __builtin_aarch64_addhnv2di_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline int8x8_t
>  __attribute__ ((__always_inli

Re: [PATCH 1/5] Add IFN_COND_FMIN/FMAX functions

2021-11-11 Thread Richard Biener via Gcc-patches

On Wed, Nov 10, 2021 at 1:44 PM Richard Sandiford via Gcc-patches
 wrote:
>
> This patch adds conditional forms of FMAX and FMIN, following
> the pattern for existing conditional binary functions.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
> * doc/md.texi (cond_fmin@var{mode}, cond_fmax@var{mode}): Document.
> * optabs.def (cond_fmin_optab, cond_fmax_optab): New optabs.
> * internal-fn.def (COND_FMIN, COND_FMAX): New functions.
> * internal-fn.c (first_commutative_argument): Handle them.
> (FOR_EACH_COND_FN_PAIR): Likewise.
> * match.pd (UNCOND_BINARY, COND_BINARY): Likewise.
> * config/aarch64/aarch64-sve.md (cond_): New
> pattern.
>
> gcc/testsuite/
> * gcc.target/aarch64/sve/cond_fmaxnm_5.c: New test.
> * gcc.target/aarch64/sve/cond_fmaxnm_5_run.c: Likewise.
> * gcc.target/aarch64/sve/cond_fmaxnm_6.c: Likewise.
> * gcc.target/aarch64/sve/cond_fmaxnm_6_run.c: Likewise.
> * gcc.target/aarch64/sve/cond_fmaxnm_7.c: Likewise.
> * gcc.target/aarch64/sve/cond_fmaxnm_7_run.c: Likewise.
> * gcc.target/aarch64/sve/cond_fmaxnm_8.c: Likewise.
> * gcc.target/aarch64/sve/cond_fmaxnm_8_run.c: Likewise.
> * gcc.target/aarch64/sve/cond_fminnm_5.c: Likewise.
> * gcc.target/aarch64/sve/cond_fminnm_5_run.c: Likewise.
> * gcc.target/aarch64/sve/cond_fminnm_6.c: Likewise.
> * gcc.target/aarch64/sve/cond_fminnm_6_run.c: Likewise.
> * gcc.target/aarch64/sve/cond_fminnm_7.c: Likewise.
> * gcc.target/aarch64/sve/cond_fminnm_7_run.c: Likewise.
> * gcc.target/aarch64/sve/cond_fminnm_8.c: Likewise.
> * gcc.target/aarch64/sve/cond_fminnm_8_run.c: Likewise.
> ---
>  gcc/config/aarch64/aarch64-sve.md | 19 +++-
>  gcc/doc/md.texi   |  4 +++
>  gcc/internal-fn.c |  4 +++
>  gcc/internal-fn.def   |  2 ++
>  gcc/match.pd  |  2 ++
>  gcc/optabs.def|  2 ++
>  .../gcc.target/aarch64/sve/cond_fmaxnm_5.c| 28 ++
>  .../aarch64/sve/cond_fmaxnm_5_run.c   |  4 +++
>  .../gcc.target/aarch64/sve/cond_fmaxnm_6.c| 22 ++
>  .../aarch64/sve/cond_fmaxnm_6_run.c   |  4 +++
>  .../gcc.target/aarch64/sve/cond_fmaxnm_7.c| 27 +
>  .../aarch64/sve/cond_fmaxnm_7_run.c   |  4 +++
>  .../gcc.target/aarch64/sve/cond_fmaxnm_8.c| 26 +
>  .../aarch64/sve/cond_fmaxnm_8_run.c   |  4 +++
>  .../gcc.target/aarch64/sve/cond_fminnm_5.c| 29 +++
>  .../aarch64/sve/cond_fminnm_5_run.c   |  4 +++
>  .../gcc.target/aarch64/sve/cond_fminnm_6.c| 23 +++
>  .../aarch64/sve/cond_fminnm_6_run.c   |  4 +++
>  .../gcc.target/aarch64/sve/cond_fminnm_7.c| 28 ++
>  .../aarch64/sve/cond_fminnm_7_run.c   |  4 +++
>  .../gcc.target/aarch64/sve/cond_fminnm_8.c| 27 +
>  .../aarch64/sve/cond_fminnm_8_run.c   |  4 +++
>  22 files changed, 274 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_5.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_5_run.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_6.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_6_run.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_7.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_7_run.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_8.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_8_run.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_5.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_5_run.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_6.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_6_run.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_7.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_7_run.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_8.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_8_run.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 5de479e141a..0f5bf5ea8cb 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -6287,7 +6287,7 @@ (define_expand "xorsign3"
>  ;; -
>
>  ;; Unpredicated fmax/fmin (the libm functions).  The optabs for the
> -;; smin/smax rtx codes are handled in the generic section above.
> +;; sma

[PATCH] aarch64: Use type-qualified builtins for ADDV Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

This patch declares unsigned type-qualified builtins and uses them to
implement the vector reduction Neon intrinsics. This removes the need
for many casts in arm_neon.h.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-09  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
builtins for vector reduction.
* config/aarch64/arm_neon.h (vaddv_u8): Use type-qualified
builtin and remove casts.
(vaddv_u16): Likewise.
(vaddv_u32): Likewise.
(vaddvq_u8): Likewise.
(vaddvq_u16): Likewise.
(vaddvq_u32): Likewise.
(vaddvq_u64): Likewise.


rb15057.patch
Description: rb15057.patch

Re: [PATCH] aarch64: Use type-qualified builtins for [R]SUBHN[2] Neon intrinsics

2021-11-11 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> This patch declares unsigned type-qualified builtins and uses them to
> implement (rounding) halving-narrowing-subtract Neon intrinsics. This
> removes the need for many casts in arm_neon.h.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-11-09  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def: Declare unsigned
> builtins for [r]subhn[2].
> * config/aarch64/arm_neon.h (vsubhn_s16): Remove unnecessary
> cast.
> (vsubhn_s32): Likewise.
> (vsubhn_s64): Likewise.
> (vsubhn_u16): Use type-qualified builtin and remove casts.
> (vsubhn_u32): Likewise.
> (vsubhn_u64): Likewise.
> (vrsubhn_s16): Remove unnecessary cast.
> (vrsubhn_s32): Likewise.
> (vrsubhn_s64): Likewise.
> (vrsubhn_u16): Use type-qualified builtin and remove casts.
> (vrsubhn_u32): Likewise.
> (vrsubhn_u64): Likewise.
> (vrsubhn_high_s16): Remove unnecessary cast.
> (vrsubhn_high_s32): Likewise.
> (vrsubhn_high_s64): Likewise.
> (vrsubhn_high_u16): Use type-qualified builtin and remove
> casts.
> (vrsubhn_high_u32): Likewise.
> (vrsubhn_high_u64): Likewise.
> (vsubhn_high_s16): Remove unnecessary cast.
> (vsubhn_high_s32): Likewise.
> (vsubhn_high_s64): Likewise.
> (vsubhn_high_u16): Use type-qualified builtin and remove
> casts.
> (vsubhn_high_u32): Likewise.
> (vsubhn_high_u64): Likewise.

OK, thanks.

Richard

> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> 6372da80be33c40cb27e5811bfb4f4f672f28a35..035bddcb660e34146b709fdae244571cdeb06272
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -222,16 +222,20 @@
>BUILTIN_VQN (BINOP, addhn, 0, NONE)
>BUILTIN_VQN (BINOPU, addhn, 0, NONE)
>BUILTIN_VQN (BINOP, subhn, 0, NONE)
> +  BUILTIN_VQN (BINOPU, subhn, 0, NONE)
>BUILTIN_VQN (BINOP, raddhn, 0, NONE)
>BUILTIN_VQN (BINOPU, raddhn, 0, NONE)
>BUILTIN_VQN (BINOP, rsubhn, 0, NONE)
> +  BUILTIN_VQN (BINOPU, rsubhn, 0, NONE)
>/* Implemented by aarch64_hn2.  */
>BUILTIN_VQN (TERNOP, addhn2, 0, NONE)
>BUILTIN_VQN (TERNOPU, addhn2, 0, NONE)
>BUILTIN_VQN (TERNOP, subhn2, 0, NONE)
> +  BUILTIN_VQN (TERNOPU, subhn2, 0, NONE)
>BUILTIN_VQN (TERNOP, raddhn2, 0, NONE)
>BUILTIN_VQN (TERNOPU, raddhn2, 0, NONE)
>BUILTIN_VQN (TERNOP, rsubhn2, 0, NONE)
> +  BUILTIN_VQN (TERNOPU, rsubhn2, 0, NONE)
>  
>/* Implemented by aarch64_xtl.  */
>BUILTIN_VQN (UNOP, sxtl, 0, NONE)
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> cb481542ba0d6ffb7cc8ffe7c1a098930fc5e746..ac871d4e503c634b453cd1f1d3e61182ce4a5a88
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -2022,186 +2022,168 @@ __extension__ extern __inline int8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vsubhn_s16 (int16x8_t __a, int16x8_t __b)
>  {
> -  return (int8x8_t) __builtin_aarch64_subhnv8hi (__a, __b);
> +  return __builtin_aarch64_subhnv8hi (__a, __b);
>  }
>  
>  __extension__ extern __inline int16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vsubhn_s32 (int32x4_t __a, int32x4_t __b)
>  {
> -  return (int16x4_t) __builtin_aarch64_subhnv4si (__a, __b);
> +  return __builtin_aarch64_subhnv4si (__a, __b);
>  }
>  
>  __extension__ extern __inline int32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vsubhn_s64 (int64x2_t __a, int64x2_t __b)
>  {
> -  return (int32x2_t) __builtin_aarch64_subhnv2di (__a, __b);
> +  return __builtin_aarch64_subhnv2di (__a, __b);
>  }
>  
>  __extension__ extern __inline uint8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vsubhn_u16 (uint16x8_t __a, uint16x8_t __b)
>  {
> -  return (uint8x8_t) __builtin_aarch64_subhnv8hi ((int16x8_t) __a,
> -   (int16x8_t) __b);
> +  return __builtin_aarch64_subhnv8hi_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vsubhn_u32 (uint32x4_t __a, uint32x4_t __b)
>  {
> -  return (uint16x4_t) __builtin_aarch64_subhnv4si ((int32x4_t) __a,
> -(int32x4_t) __b);
> +  return __builtin_aarch64_subhnv4si_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vsubhn_u64 (uint64x2_t __a, uint64x2_t __b)
>  {
> -  return (uint32x2_t) __builtin_aarch64_subhnv2di ((int64x2_t) __a,
> -

Re: [PATCH] Allow loop header copying when first iteration condition is known.

2021-11-11 Thread Aldy Hernandez via Gcc-patches

On Thu, Nov 11, 2021 at 8:30 AM Richard Biener
 wrote:
>
> On Wed, Nov 10, 2021 at 9:42 PM Jeff Law  wrote:
> >
> >
> >
> > On 11/10/2021 11:20 AM, Aldy Hernandez via Gcc-patches wrote:
> > > As discussed in the PR, the loop header copying pass avoids doing so
> > > when optimizing for size.  However, sometimes we can determine the
> > > loop entry conditional statically for the first iteration of the loop.
> > >
> > > This patch uses the path solver to determine the outgoing edge
> > > out of preheader->header->xx.  If so, it allows header copying.  Doing
> > > this in the loop optimizer saves us from doing gymnastics in the
> > > threader which doesn't have the context to determine if a loop
> > > transformation is profitable.
> > >
> > > I am only returning true in entry_loop_condition_is_static for
> > > a true conditional.  Technically a false conditional is also
> > > provably static, but allowing any boolean value causes a regression
> > > in gfortran.dg/vector_subscript_1.f90.
> > >
> > > I would have preferred not passing around the query object, but the
> > > layout of pass_ch and should_duplicate_loop_header_p make it a bit
> > > awkward to get it right without an outright refactor to the
> > > pass.
> > >
> > > Tested on x86-64 Linux.
> > >
> > > OK?
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR tree-optimization/102906
> > >   * tree-ssa-loop-ch.c (entry_loop_condition_is_static): New.
> > >   (should_duplicate_loop_header_p): Call 
> > > entry_loop_condition_is_static.
> > >   (class ch_base): Add m_ranger and m_query.
> > >   (ch_base::copy_headers): Pass m_query to
> > >   entry_loop_condition_is_static.
> > >   (pass_ch::execute): Allocate and deallocate m_ranger and
> > >   m_query.
> > >   (pass_ch_vect::execute): Same.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.dg/tree-ssa/pr102906.c: New test.
> > OK.  It also makes a nice little example of how to use a Ranger within
> > an existing pass.
>
> Note if you just test for the condition to be true it will only catch 50%
> of the desired cases since we have no idea whether the 'true' edge
> is the edge existing the loop or the edge remaining in the loop.
> For loop header copying we like to resolve statically to the edge
> remaining in the loop, so you want

Ahh, I figured there was some block shuffling needed.

I was cautious not to touch much because of the
gfortran.dg/vector_subscript_1.f90 regression, but now I see that the
test fails for all optimization levels except -Os.  With this fix we
properly fail for all levels.  I assume this is expected ;-).

>
> extract_true_false_edges_from_block (gimple_bb (last), &true_e, &false_e);
>
> /* If neither edge is the exit edge this is not a case we'd like to
>special-case.  */
> if (!loop_exit_edge_p (l, true_e) && !loop_exit_edge_p (l, false_e))
>  return false;
>
> tree desired_static_value;
> if (loop_exit_edge_p (l, true_e))
>  desired_static_value = boolean_false_node;
> else
>   desired_static_value = boolean_true_node;
>
> and test for desired_static_value.

Thanks for the code!

OK pending tests?
From 9609cff278d3ddea9f74b805b395d5c0293a126c Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Thu, 11 Nov 2021 11:27:07 +0100
Subject: [PATCH] Resolve entry loop condition for the edge remaining in the
 loop.

There is a known failure for gfortran.dg/vector_subscript_1.f90.  It
was previously failing for all optimization levels except -Os.
Getting the loop header copying right, now makes it fail for all
levels :-).

Co-authored-by: Richard Biener 

gcc/ChangeLog:

	* tree-ssa-loop-ch.c (entry_loop_condition_is_static): Resolve
	statically to the edge remaining in the loop.
---
 gcc/tree-ssa-loop-ch.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index c7d86d751d4..af3401f112c 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -57,10 +57,24 @@ entry_loop_condition_is_static (class loop *l, path_range_query *query)
   || !irange::supports_type_p (TREE_TYPE (gimple_cond_lhs (last
 return false;
 
+  edge true_e, false_e;
+  extract_true_false_edges_from_block (e->dest, &true_e, &false_e);
+
+  /* If neither edge is the exit edge, this is not a case we'd like to
+ special-case.  */
+  if (!loop_exit_edge_p (l, true_e) && !loop_exit_edge_p (l, false_e))
+return false;
+
+  tree desired_static_value;
+  if (loop_exit_edge_p (l, true_e))
+desired_static_value = boolean_false_node;
+  else
+desired_static_value = boolean_true_node;
+
   int_range<2> r;
   query->compute_ranges (e);
   query->range_of_stmt (r, last);
-  return r == int_range<2> (boolean_true_node, boolean_true_node);
+  return r == int_range<2> (desired_static_value, desired_static_value);
 }
 
 /* Check whether we should duplicate HEADER of LOOP.  At most *LIMIT
-- 
2.31.1

[PATCH] aarch64: Use type-qualified builtins for LD1/ST1 Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

This patch declares unsigned and polynomial type-qualified builtins and
uses them to implement the LD1/ST1 Neon intrinsics. This removes the
need for many casts in arm_neon.h.

The new type-qualified builtins are also lowered to gimple - as the
unqualified builtins are already.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-10  Jonathan Wright  

* config/aarch64/aarch64-builtins.c (TYPES_LOAD1_U): Define.
(TYPES_LOAD1_P): Define.
(TYPES_STORE1_U): Define.
(TYPES_STORE1P): Rename to...
(TYPES_STORE1_P): This.
(get_mem_type_for_load_store): Add unsigned and poly types.
(aarch64_general_gimple_fold_builtin): Add unsigned and poly
type-qualified builtin declarations.
* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for LD1/ST1.
* config/aarch64/arm_neon.h (vld1_p8): Use type-qualified
builtin and remove cast.
(vld1_p16): Likewise.
(vld1_u8): Likewise.
(vld1_u16): Likewise.
(vld1_u32): Likewise.
(vld1q_p8): Likewise.
(vld1q_p16): Likewise.
(vld1q_p64): Likewise.
(vld1q_u8): Likewise.
(vld1q_u16): Likewise.
(vld1q_u32): Likewise.
(vld1q_u64): Likewise.
(vst1_p8): Likewise.
(vst1_p16): Likewise.
(vst1_u8): Likewise.
(vst1_u16): Likewise.
(vst1_u32): Likewise.
(vst1q_p8): Likewise.
(vst1q_p16): Likewise.
(vst1q_p64): Likewise.
(vst1q_u8): Likewise.
(vst1q_u16): Likewise.
(vst1q_u32): Likewise.
(vst1q_u64): Likewise.
* config/aarch64/iterators.md (VALLP_NO_DI): New iterator.


rb15058.patch
Description: rb15058.patch

[PATCH] aarch64: Use type-qualified builtins for vcombine_* Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

This patch declares unsigned and polynomial type-qualified builtins for
vcombine_* Neon intrinsics. Using these builtins removes the need for
many casts in arm_neon.h.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-10  Jonathan Wright  

* config/aarch64/aarch64-builtins.c (TYPES_COMBINE): Delete.
(TYPES_COMBINEP): Delete.
* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for vcombine_* intrinsics.
* config/aarch64/arm_neon.h (vcombine_s8): Remove unnecessary
cast.
(vcombine_s16): Likewise.
(vcombine_s32): Likewise.
(vcombine_f32): Likewise.
(vcombine_u8): Use type-qualified builtin and remove casts.
(vcombine_u16): Likewise.
(vcombine_u32): Likewise.
(vcombine_u64): Likewise.
(vcombine_p8): Likewise.
(vcombine_p16): Likewise.
(vcombine_p64): Likewise.
(vcombine_bf16): Remove unnecessary cast.
* config/aarch64/iterators.md (VDC_I): New mode iterator.
(VDC_P): New mode iterator.


rb15059.patch
Description: rb15059.patch

Re: [PATCH] aarch64: Use type-qualified builtins for ADDP Neon intrinsics

2021-11-11 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> This patch declares unsigned type-qualified builtins and uses them to
> implement the pairwise addition Neon intrinsics. This removes the need
> for many casts in arm_neon.h.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-11-09  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def:
> * config/aarch64/arm_neon.h (vpaddq_u8): Use type-qualified
> builtin and remove casts.
> (vpaddq_u16): Likewise.
> (vpaddq_u32): Likewise.
> (vpaddq_u64): Likewise.
> (vpadd_u8): Likewise.
> (vpadd_u16): Likewise.
> (vpadd_u32): Likewise.
> (vpaddd_u64): Likewise.

OK, thanks.  Was initially caught out by vpaddd_u64 not oreviously
having a return cast, but of course that's because it's scalar,
and so an implicit cast was allowed.  So the patch is still
avoiding two casts there.

Richard

>
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> 035bddcb660e34146b709fdae244571cdeb06272..7d6de6728cf7c63872e09850a394101f7abf21d4
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -51,7 +51,9 @@
>BUILTIN_VHSDF_HSDF (BINOP, fmulx, 0, FP)
>BUILTIN_VHSDF_DF (UNOP, sqrt, 2, FP)
>BUILTIN_VDQ_I (BINOP, addp, 0, NONE)
> +  BUILTIN_VDQ_I (BINOPU, addp, 0, NONE)
>VAR1 (UNOP, addp, 0, NONE, di)
> +  VAR1 (UNOPU, addp, 0, NONE, di)
>BUILTIN_VDQ_BHSI (UNOP, clrsb, 2, NONE)
>BUILTIN_VDQ_BHSI (UNOP, clz, 2, NONE)
>BUILTIN_VS (UNOP, ctz, 2, NONE)
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> ac871d4e503c634b453cd1f1d3e61182ce4a5a88..ab46897d784b81bec9654d87557640ca4c1e5681
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -8011,32 +8011,28 @@ __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vpaddq_u8 (uint8x16_t __a, uint8x16_t __b)
>  {
> -  return (uint8x16_t) __builtin_aarch64_addpv16qi ((int8x16_t) __a,
> -(int8x16_t) __b);
> +  return __builtin_aarch64_addpv16qi_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vpaddq_u16 (uint16x8_t __a, uint16x8_t __b)
>  {
> -  return (uint16x8_t) __builtin_aarch64_addpv8hi ((int16x8_t) __a,
> -   (int16x8_t) __b);
> +  return __builtin_aarch64_addpv8hi_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vpaddq_u32 (uint32x4_t __a, uint32x4_t __b)
>  {
> -  return (uint32x4_t) __builtin_aarch64_addpv4si ((int32x4_t) __a,
> -   (int32x4_t) __b);
> +  return __builtin_aarch64_addpv4si_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint64x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vpaddq_u64 (uint64x2_t __a, uint64x2_t __b)
>  {
> -  return (uint64x2_t) __builtin_aarch64_addpv2di ((int64x2_t) __a,
> -   (int64x2_t) __b);
> +  return __builtin_aarch64_addpv2di_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline int16x4_t
> @@ -20293,24 +20289,21 @@ __extension__ extern __inline uint8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vpadd_u8 (uint8x8_t __a, uint8x8_t __b)
>  {
> -  return (uint8x8_t) __builtin_aarch64_addpv8qi ((int8x8_t) __a,
> -  (int8x8_t) __b);
> +  return __builtin_aarch64_addpv8qi_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vpadd_u16 (uint16x4_t __a, uint16x4_t __b)
>  {
> -  return (uint16x4_t) __builtin_aarch64_addpv4hi ((int16x4_t) __a,
> -   (int16x4_t) __b);
> +  return __builtin_aarch64_addpv4hi_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline uint32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vpadd_u32 (uint32x2_t __a, uint32x2_t __b)
>  {
> -  return (uint32x2_t) __builtin_aarch64_addpv2si ((int32x2_t) __a,
> -   (int32x2_t) __b);
> +  return __builtin_aarch64_addpv2si_uuu (__a, __b);
>  }
>  
>  __extension__ extern __inline float32_t
> @@ -20338,7 +20331,7 @@ __extension__ extern __inline uint64_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vpaddd_u64 (uint64x2_t __a)
>  {
> -  return __builtin_aarch64_addpdi ((int64x2_t) __a);
> +  return __builtin_aarch64_addpdi_uu (__a);
>  }
>  
>  /* vqabs */

[PATCH] aarch64: Use type-qualified builtins for vget_low/high intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches

Hi,

This patch declares unsigned and polynomial type-qualified builtins for
vget_low_*/vget_high_* Neon intrinsics. Using these builtins removes
the need for many casts in arm_neon.h.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-11-10  Jonathan Wright  

* config/aarch64/aarch64-builtins.c (TYPES_UNOPP): Define.
* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for vget_low/high.
* config/aarch64/arm_neon.h (vget_low_p8): Use type-qualified
builtin and remove casts.
(vget_low_p16): Likewise.
(vget_low_p64): Likewise.
(vget_low_u8): Likewise.
(vget_low_u16): Likewise.
(vget_low_u32): Likewise.
(vget_low_u64): Likewise.
(vget_high_p8): Likewise.
(vget_high_p16): Likewise.
(vget_high_p64): Likewise.
(vget_high_u8): Likewise.
(vget_high_u16): Likewise.
(vget_high_u32): Likewise.
(vget_high_u64): Likewise.
* config/aarch64/iterators.md (VQ_P): New mode iterator.


rb15060.patch
Description: rb15060.patch

Re: [PATCH] aarch64: Use type-qualified builtins for ADDV Neon intrinsics

2021-11-11 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> This patch declares unsigned type-qualified builtins and uses them to
> implement the vector reduction Neon intrinsics. This removes the need
> for many casts in arm_neon.h.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-11-09  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def: Declare unsigned
> builtins for vector reduction.
> * config/aarch64/arm_neon.h (vaddv_u8): Use type-qualified
> builtin and remove casts.
> (vaddv_u16): Likewise.
> (vaddv_u32): Likewise.
> (vaddvq_u8): Likewise.
> (vaddvq_u16): Likewise.
> (vaddvq_u32): Likewise.
> (vaddvq_u64): Likewise.

OK, thanks.

Richard

>
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> 7d6de6728cf7c63872e09850a394101f7abf21d4..35a099e1fb8dd1acb9e35583d1267df257d961b0
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -513,6 +513,7 @@
>  
>/* Implemented by aarch64_reduc_plus_.  */
>BUILTIN_VALL (UNOP, reduc_plus_scal_, 10, NONE)
> +  BUILTIN_VDQ_I (UNOPU, reduc_plus_scal_, 10, NONE)
>  
>/* Implemented by reduc__scal_ (producing scalar).  */
>BUILTIN_VDQIF_F16 (UNOP, reduc_smax_scal_, 10, NONE)
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> ab46897d784b81bec9654d87557640ca4c1e5681..3c03432b5b6c6cd0f349671366615925d38121e5
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -9695,21 +9695,21 @@ __extension__ extern __inline uint8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vaddv_u8 (uint8x8_t __a)
>  {
> -  return (uint8_t) __builtin_aarch64_reduc_plus_scal_v8qi ((int8x8_t) __a);
> +  return __builtin_aarch64_reduc_plus_scal_v8qi_uu (__a);
>  }
>  
>  __extension__ extern __inline uint16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vaddv_u16 (uint16x4_t __a)
>  {
> -  return (uint16_t) __builtin_aarch64_reduc_plus_scal_v4hi ((int16x4_t) __a);
> +  return __builtin_aarch64_reduc_plus_scal_v4hi_uu (__a);
>  }
>  
>  __extension__ extern __inline uint32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vaddv_u32 (uint32x2_t __a)
>  {
> -  return (int32_t) __builtin_aarch64_reduc_plus_scal_v2si ((int32x2_t) __a);
> +  return __builtin_aarch64_reduc_plus_scal_v2si_uu (__a);
>  }
>  
>  __extension__ extern __inline int8_t
> @@ -9744,28 +9744,28 @@ __extension__ extern __inline uint8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vaddvq_u8 (uint8x16_t __a)
>  {
> -  return (uint8_t) __builtin_aarch64_reduc_plus_scal_v16qi ((int8x16_t) __a);
> +  return __builtin_aarch64_reduc_plus_scal_v16qi_uu (__a);
>  }
>  
>  __extension__ extern __inline uint16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vaddvq_u16 (uint16x8_t __a)
>  {
> -  return (uint16_t) __builtin_aarch64_reduc_plus_scal_v8hi ((int16x8_t) __a);
> +  return __builtin_aarch64_reduc_plus_scal_v8hi_uu (__a);
>  }
>  
>  __extension__ extern __inline uint32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vaddvq_u32 (uint32x4_t __a)
>  {
> -  return (uint32_t) __builtin_aarch64_reduc_plus_scal_v4si ((int32x4_t) __a);
> +  return __builtin_aarch64_reduc_plus_scal_v4si_uu (__a);
>  }
>  
>  __extension__ extern __inline uint64_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vaddvq_u64 (uint64x2_t __a)
>  {
> -  return (uint64_t) __builtin_aarch64_reduc_plus_scal_v2di ((int64x2_t) __a);
> +  return __builtin_aarch64_reduc_plus_scal_v2di_uu (__a);
>  }
>  
>  __extension__ extern __inline float32_t

Re: [PATCH] aarch64: Use type-qualified builtins for LD1/ST1 Neon intrinsics

2021-11-11 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> This patch declares unsigned and polynomial type-qualified builtins and
> uses them to implement the LD1/ST1 Neon intrinsics. This removes the
> need for many casts in arm_neon.h.
>
> The new type-qualified builtins are also lowered to gimple - as the
> unqualified builtins are already.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-11-10  Jonathan Wright  
>
> * config/aarch64/aarch64-builtins.c (TYPES_LOAD1_U): Define.
> (TYPES_LOAD1_P): Define.
> (TYPES_STORE1_U): Define.
> (TYPES_STORE1P): Rename to...
> (TYPES_STORE1_P): This.
> (get_mem_type_for_load_store): Add unsigned and poly types.
> (aarch64_general_gimple_fold_builtin): Add unsigned and poly
> type-qualified builtin declarations.
> * config/aarch64/aarch64-simd-builtins.def: Declare type-
> qualified builtins for LD1/ST1.
> * config/aarch64/arm_neon.h (vld1_p8): Use type-qualified
> builtin and remove cast.
> (vld1_p16): Likewise.
> (vld1_u8): Likewise.
> (vld1_u16): Likewise.
> (vld1_u32): Likewise.
> (vld1q_p8): Likewise.
> (vld1q_p16): Likewise.
> (vld1q_p64): Likewise.
> (vld1q_u8): Likewise.
> (vld1q_u16): Likewise.
> (vld1q_u32): Likewise.
> (vld1q_u64): Likewise.
> (vst1_p8): Likewise.
> (vst1_p16): Likewise.
> (vst1_u8): Likewise.
> (vst1_u16): Likewise.
> (vst1_u32): Likewise.
> (vst1q_p8): Likewise.
> (vst1q_p16): Likewise.
> (vst1q_p64): Likewise.
> (vst1q_u8): Likewise.
> (vst1q_u16): Likewise.
> (vst1q_u32): Likewise.
> (vst1q_u64): Likewise.
> * config/aarch64/iterators.md (VALLP_NO_DI): New iterator.
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index 
> 5053bf0f8fd6638bf84a6df06c0987a0216b69e7..f286401ff3ab01dd860ae22858ca07e364247414
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -372,10 +372,12 @@ aarch64_types_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  static enum aarch64_type_qualifiers
>  aarch64_types_load1_u_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_unsigned, qualifier_const_pointer_map_mode };
> +#define TYPES_LOAD1_U (aarch64_types_load1_u_qualifiers)
>  #define TYPES_LOADSTRUCT_U (aarch64_types_load1_u_qualifiers)
>  static enum aarch64_type_qualifiers
>  aarch64_types_load1_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_poly, qualifier_const_pointer_map_mode };
> +#define TYPES_LOAD1_P (aarch64_types_load1_p_qualifiers)
>  #define TYPES_LOADSTRUCT_P (aarch64_types_load1_p_qualifiers)
>  
>  static enum aarch64_type_qualifiers
> @@ -423,11 +425,12 @@ aarch64_types_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  static enum aarch64_type_qualifiers
>  aarch64_types_store1_u_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_void, qualifier_pointer_map_mode, qualifier_unsigned };
> +#define TYPES_STORE1_U (aarch64_types_store1_u_qualifiers)
>  #define TYPES_STORESTRUCT_U (aarch64_types_store1_u_qualifiers)
>  static enum aarch64_type_qualifiers
>  aarch64_types_store1_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_void, qualifier_pointer_map_mode, qualifier_poly };
> -#define TYPES_STORE1P (aarch64_types_store1_p_qualifiers)
> +#define TYPES_STORE1_P (aarch64_types_store1_p_qualifiers)
>  #define TYPES_STORESTRUCT_P (aarch64_types_store1_p_qualifiers)
>  
>  static enum aarch64_type_qualifiers
> @@ -2590,47 +2593,83 @@ get_mem_type_for_load_store (unsigned int fcode)
>  {
>switch (fcode)
>{
> -VAR1 (LOAD1, ld1 , 0, LOAD, v8qi)
> -VAR1 (STORE1, st1 , 0, STORE, v8qi)
> +VAR1 (LOAD1, ld1, 0, LOAD, v8qi)
> +VAR1 (STORE1, st1, 0, STORE, v8qi)
>return Int8x8_t;
> -VAR1 (LOAD1, ld1 , 0, LOAD, v16qi)
> -VAR1 (STORE1, st1 , 0, STORE, v16qi)
> +VAR1 (LOAD1, ld1, 0, LOAD, v16qi)
> +VAR1 (STORE1, st1, 0, STORE, v16qi)
>return Int8x16_t;
> -VAR1 (LOAD1, ld1 , 0, LOAD, v4hi)
> -VAR1 (STORE1, st1 , 0, STORE, v4hi)
> +VAR1 (LOAD1, ld1, 0, LOAD, v4hi)
> +VAR1 (STORE1, st1, 0, STORE, v4hi)
>return Int16x4_t;
> -VAR1 (LOAD1, ld1 , 0, LOAD, v8hi)
> -VAR1 (STORE1, st1 , 0, STORE, v8hi)
> +VAR1 (LOAD1, ld1, 0, LOAD, v8hi)
> +VAR1 (STORE1, st1, 0, STORE, v8hi)
>return Int16x8_t;
> -VAR1 (LOAD1, ld1 , 0, LOAD, v2si)
> -VAR1 (STORE1, st1 , 0, STORE, v2si)
> +VAR1 (LOAD1, ld1, 0, LOAD, v2si)
> +VAR1 (STORE1, st1, 0, STORE, v2si)
>return Int32x2_t;
> -VAR1 (LOAD1, ld1 , 0, LOAD, v4si)
> -VAR1 (STORE1, st1 , 0, STORE, v4si)
> +VAR1 (LOAD1, ld1, 0, LOAD, v4si)
> +VAR1 (STORE1, st1, 0, STORE, v4si)
>return Int32x4_t;
> -VAR1 (LOAD1, ld

Re: [PATCH 2/5] gimple-match: Add a gimple_extract_op function

2021-11-11 Thread Richard Biener via Gcc-patches

On Wed, Nov 10, 2021 at 1:46 PM Richard Sandiford via Gcc-patches
 wrote:
>
> code_helper and gimple_match_op seem like generally useful ways
> of summing up a gimple_assign or gimple_call (or gimple_cond).
> This patch adds a gimple_extract_op function that can be used
> for that.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
> * gimple-match.h (gimple_extract_op): Declare.
> * gimple-match.c (gimple_extract): New function, extracted from...
> (gimple_simplify): ...here.
> (gimple_extract_op): New function.
> ---
>  gcc/gimple-match-head.c | 261 +++-
>  gcc/gimple-match.h  |   1 +
>  2 files changed, 149 insertions(+), 113 deletions(-)
>
> diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
> index 9d88b2f8551..4c6e0883ba4 100644
> --- a/gcc/gimple-match-head.c
> +++ b/gcc/gimple-match-head.c
> @@ -890,12 +890,29 @@ try_conditional_simplification (internal_fn ifn, 
> gimple_match_op *res_op,
>return true;
>  }
>
> -/* The main STMT based simplification entry.  It is used by the fold_stmt
> -   and the fold_stmt_to_constant APIs.  */
> +/* Common subroutine of gimple_extract_op and gimple_simplify.  Try to
> +   describe STMT in RES_OP.  Return:
>
> -bool
> -gimple_simplify (gimple *stmt, gimple_match_op *res_op, gimple_seq *seq,
> -tree (*valueize)(tree), tree (*top_valueize)(tree))
> +   - -1 if extraction failed
> +   - otherwise, 0 if no simplification should take place
> +   - otherwise, the number of operands for a GIMPLE_ASSIGN or GIMPLE_COND
> +   - otherwise, -2 for a GIMPLE_CALL
> +
> +   Before recording an operand, call:
> +
> +   - VALUEIZE_CONDITION for a COND_EXPR condition
> +   - VALUEIZE_NAME if the rhs of a GIMPLE_ASSIGN is an SSA_NAME

I think at least VALUEIZE_NAME is unnecessary, see below

> +   - VALUEIZE_OP for every other top-level operand
> +
> +   Each routine takes a tree argument and returns a tree.  */
> +
> +template +typename ValueizeName>
> +inline int
> +gimple_extract (gimple *stmt, gimple_match_op *res_op,
> +   ValueizeOp valueize_op,
> +   ValueizeCondition valueize_condition,
> +   ValueizeName valueize_name)
>  {
>switch (gimple_code (stmt))
>  {
> @@ -911,100 +928,53 @@ gimple_simplify (gimple *stmt, gimple_match_op 
> *res_op, gimple_seq *seq,
> || code == VIEW_CONVERT_EXPR)
>   {
> tree op0 = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
> -   bool valueized = false;
> -   op0 = do_valueize (op0, top_valueize, valueized);
> -   res_op->set_op (code, type, op0);
> -   return (gimple_resimplify1 (seq, res_op, valueize)
> -   || valueized);
> +   res_op->set_op (code, type, valueize_op (op0));
> +   return 1;
>   }
> else if (code == BIT_FIELD_REF)
>   {
> tree rhs1 = gimple_assign_rhs1 (stmt);
> -   tree op0 = TREE_OPERAND (rhs1, 0);
> -   bool valueized = false;
> -   op0 = do_valueize (op0, top_valueize, valueized);
> +   tree op0 = valueize_op (TREE_OPERAND (rhs1, 0));
> res_op->set_op (code, type, op0,
> TREE_OPERAND (rhs1, 1),
> TREE_OPERAND (rhs1, 2),
> REF_REVERSE_STORAGE_ORDER (rhs1));
> -   if (res_op->reverse)
> - return valueized;
> -   return (gimple_resimplify3 (seq, res_op, valueize)
> -   || valueized);
> +   return res_op->reverse ? 0 : 3;
>   }
> -   else if (code == SSA_NAME
> -&& top_valueize)
> +   else if (code == SSA_NAME)
>   {
> tree op0 = gimple_assign_rhs1 (stmt);
> -   tree valueized = top_valueize (op0);
> +   tree valueized = valueize_name (op0);
> if (!valueized || op0 == valueized)
> - return false;
> + return -1;
> res_op->set_op (TREE_CODE (op0), type, valueized);
> -   return true;
> +   return 0;

So the old code in an obfuscated way just knowed nothing simplifies
on the plain not valueized name but returned true when valueization
changed the stmt.  So I'd expect

 tree valueized = valueize_op (op0);
 res_op->set_op (TREE_CODE (op0), type, valueized);
 return 0;

here and the gimple_simplify caller returning 'valueized'.  I think
that the old code treated a NULL top_valueize () as "fail" is just
premature optimization without any effect.

>   }
> break;
>   case GIMPLE_UNARY_RHS:
> {
>   tree rhs1 = gimple_assign_rhs1 (stmt

Re: [PATCH 3/5] gimple-match: Make code_helper conversions explicit

2021-11-11 Thread Richard Biener via Gcc-patches

On Wed, Nov 10, 2021 at 1:47 PM Richard Sandiford via Gcc-patches
 wrote:
>
> code_helper provides conversions to tree_code and combined_fn.
> Now that the codebase is C++11, we can mark these conversions as
> explicit.  This avoids accidentally using code_helpers with
> functions that take tree_codes, which would previously entail
> a hidden unchecked conversion.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
> * gimple-match.h (code_helper): Provide == and != overloads.
> (code_helper::operator tree_code): Make explicit.
> (code_helper::operator combined_fn): Likewise.
> * gimple-match-head.c (convert_conditional_op): Use explicit
> conversions where necessary.
> (gimple_resimplify1, gimple_resimplify2, gimple_resimplify3): 
> Likewise.
> (maybe_push_res_to_seq, gimple_simplify): Likewise.
> * gimple-fold.c (replace_stmt_with_simplification): Likewise.
> ---
>  gcc/gimple-fold.c   | 18 ---
>  gcc/gimple-match-head.c | 51 ++---
>  gcc/gimple-match.h  |  9 ++--
>  3 files changed, 45 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index 6e25a7c05db..9daf2cc590c 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -5828,18 +5828,19 @@ replace_stmt_with_simplification 
> (gimple_stmt_iterator *gsi,
>if (gcond *cond_stmt = dyn_cast  (stmt))
>  {
>gcc_assert (res_op->code.is_tree_code ());
> -  if (TREE_CODE_CLASS ((enum tree_code) res_op->code) == tcc_comparison
> +  auto code = tree_code (res_op->code);
> +  if (TREE_CODE_CLASS (code) == tcc_comparison
>   /* GIMPLE_CONDs condition may not throw.  */
>   && (!flag_exceptions
>   || !cfun->can_throw_non_call_exceptions
> - || !operation_could_trap_p (res_op->code,
> + || !operation_could_trap_p (code,
>   FLOAT_TYPE_P (TREE_TYPE (ops[0])),
>   false, NULL_TREE)))
> -   gimple_cond_set_condition (cond_stmt, res_op->code, ops[0], ops[1]);
> -  else if (res_op->code == SSA_NAME)
> +   gimple_cond_set_condition (cond_stmt, code, ops[0], ops[1]);
> +  else if (code == SSA_NAME)
> gimple_cond_set_condition (cond_stmt, NE_EXPR, ops[0],
>build_zero_cst (TREE_TYPE (ops[0])));
> -  else if (res_op->code == INTEGER_CST)
> +  else if (code == INTEGER_CST)
> {
>   if (integer_zerop (ops[0]))
> gimple_cond_make_false (cond_stmt);
> @@ -5870,11 +5871,12 @@ replace_stmt_with_simplification 
> (gimple_stmt_iterator *gsi,
>else if (is_gimple_assign (stmt)
>&& res_op->code.is_tree_code ())
>  {
> +  auto code = tree_code (res_op->code);
>if (!inplace
> - || gimple_num_ops (stmt) > get_gimple_rhs_num_ops (res_op->code))
> + || gimple_num_ops (stmt) > get_gimple_rhs_num_ops (code))
> {
>   maybe_build_generic_op (res_op);
> - gimple_assign_set_rhs_with_ops (gsi, res_op->code,
> + gimple_assign_set_rhs_with_ops (gsi, code,
>   res_op->op_or_null (0),
>   res_op->op_or_null (1),
>   res_op->op_or_null (2));
> @@ -5891,7 +5893,7 @@ replace_stmt_with_simplification (gimple_stmt_iterator 
> *gsi,
> }
>  }
>else if (res_op->code.is_fn_code ()
> -  && gimple_call_combined_fn (stmt) == res_op->code)
> +  && gimple_call_combined_fn (stmt) == combined_fn (res_op->code))
>  {
>gcc_assert (num_ops == gimple_call_num_args (stmt));
>for (unsigned int i = 0; i < num_ops; ++i)
> diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
> index 4c6e0883ba4..d4d7d767075 100644
> --- a/gcc/gimple-match-head.c
> +++ b/gcc/gimple-match-head.c
> @@ -96,7 +96,7 @@ convert_conditional_op (gimple_match_op *orig_op,
>  ifn = get_conditional_internal_fn ((tree_code) orig_op->code);
>else
>  {
> -  combined_fn cfn = orig_op->code;
> +  auto cfn = combined_fn (orig_op->code);
>if (!internal_fn_p (cfn))
> return false;
>ifn = get_conditional_internal_fn (as_internal_fn (cfn));
> @@ -206,10 +206,10 @@ gimple_resimplify1 (gimple_seq *seq, gimple_match_op 
> *res_op,
>tree tem = NULL_TREE;
>if (res_op->code.is_tree_code ())
> {
> - tree_code code = res_op->code;
> + auto code = tree_code (res_op->code);
>   if (IS_EXPR_CODE_CLASS (TREE_CODE_CLASS (code))
>   && TREE_CODE_LENGTH (code) == 1)
> -   tem = const_unop (res_op->code, res_op->type, res_op->ops[0]);
> +   tem = const_unop (code, res_op->type, res_op->ops[0]);
> }
>else
>

Re: [PATCH] aarch64: Use type-qualified builtins for vcombine_* Neon intrinsics

2021-11-11 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> This patch declares unsigned and polynomial type-qualified builtins for
> vcombine_* Neon intrinsics. Using these builtins removes the need for
> many casts in arm_neon.h.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-11-10  Jonathan Wright  
>
> * config/aarch64/aarch64-builtins.c (TYPES_COMBINE): Delete.
> (TYPES_COMBINEP): Delete.
> * config/aarch64/aarch64-simd-builtins.def: Declare type-
> qualified builtins for vcombine_* intrinsics.
> * config/aarch64/arm_neon.h (vcombine_s8): Remove unnecessary
> cast.
> (vcombine_s16): Likewise.
> (vcombine_s32): Likewise.
> (vcombine_f32): Likewise.
> (vcombine_u8): Use type-qualified builtin and remove casts.
> (vcombine_u16): Likewise.
> (vcombine_u32): Likewise.
> (vcombine_u64): Likewise.
> (vcombine_p8): Likewise.
> (vcombine_p16): Likewise.
> (vcombine_p64): Likewise.
> (vcombine_bf16): Remove unnecessary cast.
> * config/aarch64/iterators.md (VDC_I): New mode iterator.
> (VDC_P): New mode iterator.
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index 
> f286401ff3ab01dd860ae22858ca07e364247414..7abf8747b69591815068709af42598c47d73269e
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -353,17 +353,6 @@ 
> aarch64_types_unsigned_shiftacc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>qualifier_immediate };
>  #define TYPES_USHIFTACC (aarch64_types_unsigned_shiftacc_qualifiers)
>  
> -
> -static enum aarch64_type_qualifiers
> -aarch64_types_combine_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> -  = { qualifier_none, qualifier_none, qualifier_none };
> -#define TYPES_COMBINE (aarch64_types_combine_qualifiers)
> -
> -static enum aarch64_type_qualifiers
> -aarch64_types_combine_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> -  = { qualifier_poly, qualifier_poly, qualifier_poly };
> -#define TYPES_COMBINEP (aarch64_types_combine_p_qualifiers)
> -
>  static enum aarch64_type_qualifiers
>  aarch64_types_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_none, qualifier_const_pointer_map_mode };
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> 404696a71e0c1fc37cdf53fc42439a28bc9a745a..ab5f3a098f2047d0f1ba933f4418609678102c3d
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -43,8 +43,9 @@
> help describe the attributes (for example, pure) for the intrinsic
> function.  */
>  
> -  BUILTIN_VDC (COMBINE, combine, 0, AUTO_FP)
> -  VAR1 (COMBINEP, combine, 0, NONE, di)
> +  BUILTIN_VDC (BINOP, combine, 0, AUTO_FP)
> +  BUILTIN_VDC_I (BINOPU, combine, 0, NONE)
> +  BUILTIN_VDC_P (BINOPP, combine, 0, NONE)
>BUILTIN_VB (BINOPP, pmul, 0, NONE)
>VAR1 (BINOPP, pmull, 0, NONE, v8qi)
>VAR1 (BINOPP, pmull_hi, 0, NONE, v16qi)
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> 7abd1821840f84a79c37c40a33214294b06edbc6..c374e90f31546886a519ba270113ccedd4ca7abf
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -5975,21 +5975,21 @@ __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vcombine_s8 (int8x8_t __a, int8x8_t __b)
>  {
> -  return (int8x16_t) __builtin_aarch64_combinev8qi (__a, __b);
> +  return __builtin_aarch64_combinev8qi (__a, __b);
>  }
>  
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vcombine_s16 (int16x4_t __a, int16x4_t __b)
>  {
> -  return (int16x8_t) __builtin_aarch64_combinev4hi (__a, __b);
> +  return __builtin_aarch64_combinev4hi (__a, __b);
>  }
>  
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vcombine_s32 (int32x2_t __a, int32x2_t __b)
>  {
> -  return (int32x4_t) __builtin_aarch64_combinev2si (__a, __b);
> +  return __builtin_aarch64_combinev2si (__a, __b);
>  }
>  
>  __extension__ extern __inline int64x2_t
> @@ -6010,38 +6010,35 @@ __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vcombine_f32 (float32x2_t __a, float32x2_t __b)
>  {
> -  return (float32x4_t) __builtin_aarch64_combinev2sf (__a, __b);
> +  return __builtin_aarch64_combinev2sf (__a, __b);
>  }
>  
>  __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vcombine_u8 (uint8x8_t __a, uint8x8_t __b)
>  {
> -  return (uint8x16_t) __builtin_aarch64_combinev8qi ((int8x8_t) __a,
> -  (int8x8_t) __b);
> +

Re: [PATCH 2/4] Mark IFN_COMPLEX_MUL as commutative

2021-11-11 Thread Richard Biener via Gcc-patches

On Wed, Nov 10, 2021 at 1:51 PM Richard Sandiford via Gcc-patches
 wrote:
>
> Mark IFN_COMPLEX_MUL as commutative.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK

> Richard
>
>
> gcc/
> * internal-fn.c (commutative_binary_fn_p): Handle IFN_COMPLEX_MUL.
>
> gcc/testsuite/
> * gcc.target/aarch64/sve/complex_mul_1.c: New test.
> ---
>  gcc/internal-fn.c|  1 +
>  .../gcc.target/aarch64/sve/complex_mul_1.c   | 16 
>  2 files changed, 17 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/complex_mul_1.c
>
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index 7b13db6dfe3..ff7d43f1801 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -3829,6 +3829,7 @@ commutative_binary_fn_p (internal_fn fn)
>  case IFN_MULHRS:
>  case IFN_FMIN:
>  case IFN_FMAX:
> +case IFN_COMPLEX_MUL:
>return true;
>
>  default:
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/complex_mul_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/complex_mul_1.c
> new file mode 100644
> index 000..d197e7d0d8e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/complex_mul_1.c
> @@ -0,0 +1,16 @@
> +/* { dg-options "-O2 -fgimple -fdump-tree-optimized" } */
> +
> +void __GIMPLE
> +foo (__SVFloat64_t x, __SVFloat64_t y, __SVFloat64_t *res1,
> + __SVFloat64_t *res2)
> +{
> +  __SVFloat64_t a1;
> +  __SVFloat64_t a2;
> +
> +  a1 = .COMPLEX_MUL (x, y);
> +  a2 = .COMPLEX_MUL (y, x);
> +  __MEM<__SVFloat64_t> (res1) = a1;
> +  __MEM<__SVFloat64_t> (res2) = a2;
> +}
> +
> +/* { dg-final { scan-tree-dump-times {\.COMPLEX_MUL} 1 "optimized" } } */
> --
> 2.25.1
>

Re: [PATCH] aarch64: Use type-qualified builtins for vget_low/high intrinsics

2021-11-11 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> This patch declares unsigned and polynomial type-qualified builtins for
> vget_low_*/vget_high_* Neon intrinsics. Using these builtins removes
> the need for many casts in arm_neon.h.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-11-10  Jonathan Wright  
>
> * config/aarch64/aarch64-builtins.c (TYPES_UNOPP): Define.
> * config/aarch64/aarch64-simd-builtins.def: Declare type-
> qualified builtins for vget_low/high.
> * config/aarch64/arm_neon.h (vget_low_p8): Use type-qualified
> builtin and remove casts.
> (vget_low_p16): Likewise.
> (vget_low_p64): Likewise.
> (vget_low_u8): Likewise.
> (vget_low_u16): Likewise.
> (vget_low_u32): Likewise.
> (vget_low_u64): Likewise.
> (vget_high_p8): Likewise.
> (vget_high_p16): Likewise.
> (vget_high_p64): Likewise.
> (vget_high_u8): Likewise.
> (vget_high_u16): Likewise.
> (vget_high_u32): Likewise.
> (vget_high_u64): Likewise.
> * config/aarch64/iterators.md (VQ_P): New mode iterator.
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index 
> 7abf8747b69591815068709af42598c47d73269e..3edc2f55e571c1a34a24add842c47b130d900cf6
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -204,6 +204,10 @@ aarch64_types_unopu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_unsigned, qualifier_unsigned };
>  #define TYPES_UNOPU (aarch64_types_unopu_qualifiers)
>  static enum aarch64_type_qualifiers
> +aarch64_types_unopp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_poly, qualifier_poly };
> +#define TYPES_UNOPP (aarch64_types_unopp_qualifiers)
> +static enum aarch64_type_qualifiers
>  aarch64_types_unopus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_unsigned, qualifier_none };
>  #define TYPES_UNOPUS (aarch64_types_unopus_qualifiers)
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> ab5f3a098f2047d0f1ba933f4418609678102c3d..08d6bbe635424217687a429709c696c3282feea0
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -62,8 +62,12 @@
>  
>/* Implemented by aarch64_get_low.  */
>BUILTIN_VQMOV (UNOP, get_low, 0, AUTO_FP)
> +  BUILTIN_VQ_I (UNOPU, get_low, 0, NONE)
> +  BUILTIN_VQ_P (UNOPP, get_low, 0, NONE)
>/* Implemented by aarch64_get_high.  */
>BUILTIN_VQMOV (UNOP, get_high, 0, AUTO_FP)
> +  BUILTIN_VQ_I (UNOPU, get_high, 0, NONE)
> +  BUILTIN_VQ_P (UNOPP, get_high, 0, NONE)
>  
>/* Implemented by aarch64_qshl.  */
>BUILTIN_VSDQ_I (BINOP, sqshl, 0, NONE)
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> c374e90f31546886a519ba270113ccedd4ca7abf..6137d53297863aaad0cad31c7eb6eef24bc4316a
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -5799,21 +5799,21 @@ __extension__ extern __inline poly8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vget_low_p8 (poly8x16_t __a)
>  {
> -  return (poly8x8_t) __builtin_aarch64_get_lowv16qi ((int8x16_t) __a);
> +  return __builtin_aarch64_get_lowv16qi_pp (__a);
>  }
>  
>  __extension__ extern __inline poly16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vget_low_p16 (poly16x8_t __a)
>  {
> -  return (poly16x4_t) __builtin_aarch64_get_lowv8hi ((int16x8_t) __a);
> +  return __builtin_aarch64_get_lowv8hi_pp (__a);
>  }
>  
>  __extension__ extern __inline poly64x1_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vget_low_p64 (poly64x2_t __a)
>  {
> -  return (poly64x1_t) __builtin_aarch64_get_lowv2di ((int64x2_t) __a);
> +  return (poly64x1_t) __builtin_aarch64_get_lowv2di_pp (__a);

I think we could define the intrinsics such that the return cast
isn't needed either.  poly64x1_t has the same mode (DI) as the
scalar type, so it should “just” be a case of using qualifiers
to pick the x1 vector type instead of the scalar type.

Thanks,
Richard

>  }
>  
>  __extension__ extern __inline int8x8_t
> @@ -5848,28 +5848,28 @@ __extension__ extern __inline uint8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vget_low_u8 (uint8x16_t __a)
>  {
> -  return (uint8x8_t) __builtin_aarch64_get_lowv16qi ((int8x16_t) __a);
> +  return __builtin_aarch64_get_lowv16qi_uu (__a);
>  }
>  
>  __extension__ extern __inline uint16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vget_low_u16 (uint16x8_t __a)
>  {
> -  return (uint16x4_t) __builtin_aarch64_get_lowv8hi ((int16x8_t) __a);
> +  return __builtin_aarch64_get_lowv8hi_uu (__a);
>  }
>  
>  __extension__ extern __inlin

Re: [PATCH 1/4] Canonicalize argument order for commutative functions

2021-11-11 Thread Richard Biener via Gcc-patches

On Wed, Nov 10, 2021 at 1:50 PM Richard Sandiford via Gcc-patches
 wrote:
>
> This patch uses information about internal functions to canonicalize
> the argument order of calls.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.  Note the gimple_resimplifyN functions also canonicalize operand
order, currently for is_tree_code only:

  /* Canonicalize operand order.  */
  bool canonicalized = false;
  if (res_op->code.is_tree_code ()
  && (TREE_CODE_CLASS ((enum tree_code) res_op->code) == tcc_comparison
  || commutative_tree_code (res_op->code))
  && tree_swap_operands_p (res_op->ops[0], res_op->ops[1]))
{
  std::swap (res_op->ops[0], res_op->ops[1]);
  if (TREE_CODE_CLASS ((enum tree_code) res_op->code) == tcc_comparison)
res_op->code = swap_tree_comparison (res_op->code);
  canonicalized = true;
}

that's maybe not the best place.  The function assumes the operands
are already valueized,
so it maybe should be valueization that does the canonicalization -
but I think doing it
elsewhere made operand order unreliable (we do end up with
non-canonical order in
the IL sometimes).

So maybe you should amend the code in resimplifyN as well.

Richard.


> Richard
>
>
> gcc/
> * gimple-fold.c: Include internal-fn.h.
> (fold_stmt_1): If a function maps to an internal one, use
> first_commutative_argument to canonicalize the order of
> commutative arguments.
>
> gcc/testsuite/
> * gcc.dg/fmax-fmin-1.c: New test.
> ---
>  gcc/gimple-fold.c  | 25 ++---
>  gcc/testsuite/gcc.dg/fmax-fmin-1.c | 18 ++
>  2 files changed, 40 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/fmax-fmin-1.c
>
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index a937f130815..6a7d4507c89 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -69,6 +69,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "varasm.h"
>  #include "memmodel.h"
>  #include "optabs.h"
> +#include "internal-fn.h"
>
>  enum strlen_range_kind {
>/* Compute the exact constant string length.  */
> @@ -6140,18 +6141,36 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, bool inplace, 
> tree (*valueize) (tree))
>break;
>  case GIMPLE_CALL:
>{
> -   for (i = 0; i < gimple_call_num_args (stmt); ++i)
> +   gcall *call = as_a (stmt);
> +   for (i = 0; i < gimple_call_num_args (call); ++i)
>   {
> -   tree *arg = gimple_call_arg_ptr (stmt, i);
> +   tree *arg = gimple_call_arg_ptr (call, i);
> if (REFERENCE_CLASS_P (*arg)
> && maybe_canonicalize_mem_ref_addr (arg))
>   changed = true;
>   }
> -   tree *lhs = gimple_call_lhs_ptr (stmt);
> +   tree *lhs = gimple_call_lhs_ptr (call);
> if (*lhs
> && REFERENCE_CLASS_P (*lhs)
> && maybe_canonicalize_mem_ref_addr (lhs))
>   changed = true;
> +   if (*lhs)
> + {
> +   combined_fn cfn = gimple_call_combined_fn (call);
> +   internal_fn ifn = associated_internal_fn (cfn, TREE_TYPE (*lhs));
> +   int opno = first_commutative_argument (ifn);
> +   if (opno >= 0)
> + {
> +   tree arg1 = gimple_call_arg (call, opno);
> +   tree arg2 = gimple_call_arg (call, opno + 1);
> +   if (tree_swap_operands_p (arg1, arg2))
> + {
> +   gimple_call_set_arg (call, opno, arg2);
> +   gimple_call_set_arg (call, opno + 1, arg1);
> +   changed = true;
> + }
> + }
> + }
> break;
>}
>  case GIMPLE_ASM:
> diff --git a/gcc/testsuite/gcc.dg/fmax-fmin-1.c 
> b/gcc/testsuite/gcc.dg/fmax-fmin-1.c
> new file mode 100644
> index 000..e7e0518d8bb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/fmax-fmin-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +
> +void
> +f1 (double *res, double x, double y)
> +{
> +  res[0] = __builtin_fmax (x, y);
> +  res[1] = __builtin_fmax (y, x);
> +}
> +
> +void
> +f2 (double *res, double x, double y)
> +{
> +  res[0] = __builtin_fmin (x, y);
> +  res[1] = __builtin_fmin (y, x);
> +}
> +
> +/* { dg-final { scan-tree-dump-times {__builtin_fmax} 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times {__builtin_fmin} 1 "optimized" } } */
> --
> 2.25.1
>

Re: [PATCH] Allow loop header copying when first iteration condition is known.

2021-11-11 Thread Richard Biener via Gcc-patches

On Thu, Nov 11, 2021 at 11:33 AM Aldy Hernandez  wrote:
>
> On Thu, Nov 11, 2021 at 8:30 AM Richard Biener
>  wrote:
> >
> > On Wed, Nov 10, 2021 at 9:42 PM Jeff Law  wrote:
> > >
> > >
> > >
> > > On 11/10/2021 11:20 AM, Aldy Hernandez via Gcc-patches wrote:
> > > > As discussed in the PR, the loop header copying pass avoids doing so
> > > > when optimizing for size.  However, sometimes we can determine the
> > > > loop entry conditional statically for the first iteration of the loop.
> > > >
> > > > This patch uses the path solver to determine the outgoing edge
> > > > out of preheader->header->xx.  If so, it allows header copying.  Doing
> > > > this in the loop optimizer saves us from doing gymnastics in the
> > > > threader which doesn't have the context to determine if a loop
> > > > transformation is profitable.
> > > >
> > > > I am only returning true in entry_loop_condition_is_static for
> > > > a true conditional.  Technically a false conditional is also
> > > > provably static, but allowing any boolean value causes a regression
> > > > in gfortran.dg/vector_subscript_1.f90.
> > > >
> > > > I would have preferred not passing around the query object, but the
> > > > layout of pass_ch and should_duplicate_loop_header_p make it a bit
> > > > awkward to get it right without an outright refactor to the
> > > > pass.
> > > >
> > > > Tested on x86-64 Linux.
> > > >
> > > > OK?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >   PR tree-optimization/102906
> > > >   * tree-ssa-loop-ch.c (entry_loop_condition_is_static): New.
> > > >   (should_duplicate_loop_header_p): Call 
> > > > entry_loop_condition_is_static.
> > > >   (class ch_base): Add m_ranger and m_query.
> > > >   (ch_base::copy_headers): Pass m_query to
> > > >   entry_loop_condition_is_static.
> > > >   (pass_ch::execute): Allocate and deallocate m_ranger and
> > > >   m_query.
> > > >   (pass_ch_vect::execute): Same.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >   * gcc.dg/tree-ssa/pr102906.c: New test.
> > > OK.  It also makes a nice little example of how to use a Ranger within
> > > an existing pass.
> >
> > Note if you just test for the condition to be true it will only catch 50%
> > of the desired cases since we have no idea whether the 'true' edge
> > is the edge existing the loop or the edge remaining in the loop.
> > For loop header copying we like to resolve statically to the edge
> > remaining in the loop, so you want
>
> Ahh, I figured there was some block shuffling needed.
>
> I was cautious not to touch much because of the
> gfortran.dg/vector_subscript_1.f90 regression, but now I see that the
> test fails for all optimization levels except -Os.  With this fix we
> properly fail for all levels.  I assume this is expected ;-).
>
> >
> > extract_true_false_edges_from_block (gimple_bb (last), &true_e, &false_e);
> >
> > /* If neither edge is the exit edge this is not a case we'd like to
> >special-case.  */
> > if (!loop_exit_edge_p (l, true_e) && !loop_exit_edge_p (l, false_e))
> >  return false;
> >
> > tree desired_static_value;
> > if (loop_exit_edge_p (l, true_e))
> >  desired_static_value = boolean_false_node;
> > else
> >   desired_static_value = boolean_true_node;
> >
> > and test for desired_static_value.
>
> Thanks for the code!
>
> OK pending tests?

OK, thanks!
Richard.

Re: [PATCH] rs6000/doc: Rename future cpu with power10

2021-11-11 Thread Kewen.Lin via Gcc-patches

on 2021/11/10 下午6:03, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Nov 10, 2021 at 05:39:27PM +0800, Kewen.Lin wrote:
>> @@ -27779,10 +27779,10 @@ Enable/disable the @var{__float128} keyword for 
>> IEEE 128-bit floating point
>>  and use either software emulation for IEEE 128-bit floating point or
>>  hardware instructions.
>>
>> -The VSX instruction set (@option{-mvsx}, @option{-mcpu=power7},
>> -@option{-mcpu=power8}), or @option{-mcpu=power9} must be enabled to
>> -use the IEEE 128-bit floating point support.  The IEEE 128-bit
>> -floating point support only works on PowerPC Linux systems.
>> +The VSX instruction set (@option{-mvsx}, @option{-mcpu=power7} (or later
>> +@var{cpu_type})) must be enabled to use the IEEE 128-bit floating point
>> +support.  The IEEE 128-bit floating point support only works on PowerPC
>> +Linux systems.
> 
> I'd just say -mvsx.  This is default on for -mcpu=power7 and later, and
> cannot be enabled elsewhere, but that is beside the point.
> 
> If you say more than the essentials here it becomes harder to read
> (simply because there is more to read then), harder to find what you
> are looking for, and harder to keep it updated if things change (like
> what this patch is for :-) )
> 
> The part about "works only on Linux" isn't quite true.  "Is only
> supported on Linux" is a bit better.
> 
>>  Generate (do not generate) addressing modes using prefixed load and
>> -store instructions when the option @option{-mcpu=future} is used.
>> +store instructions.  The @option{-mprefixed} option requires that
>> +the option @option{-mcpu=power10} (or later @var{cpu_type}) is enabled.
> 
> Just "or later" please.  The "CPU_TYPE" thing is local to the -mcpu=
> description, let's not refer to it from elsewhere.
> 
>>  @item -mmma
>>  @itemx -mno-mma
>>  @opindex mmma
>>  @opindex mno-mma
>> -Generate (do not generate) the MMA instructions when the option
>> -@option{-mcpu=future} is used.
>> +Generate (do not generate) the MMA instructions.  The @option{-mma}
>> +option requires that the option @option{-mcpu=power10} (or later
>> +@var{cpu_type}) is enabled.
> 
> (once more)
> 
> Okay for trunk with those changes.  Thanks!
> 
> 

Thanks!  All comments are addressed and committed as r12-5143.

BR,
Kewen

> Segher
>

Re: Use modref summary to DSE calls to non-pure functions

2021-11-11 Thread Richard Biener via Gcc-patches

On Wed, Nov 10, 2021 at 1:43 PM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> this patch implements DSE using modref summaries: if function has no side 
> effects
> besides storing to memory pointed to by its argument and if we can prove 
> those stores
> to be dead, we can optimize out. So we handle for example:
>
> volatile int *ptr;
> struct a {
> int a,b,c;
> } a;
> __attribute__((noinline))
> static int init (struct a*a)
> {
> a->a=0;
> a->b=1;
> }
> __attribute__((noinline))
> static int use (struct a*a)
> {
> if (a->c != 3)
> *ptr=5;
> }
>
> void
> main(void)
> {
> struct a a;
> init (&a);
> a.c=3;
> use (&a);
> }
>
> And optimize out call to init (&a).
>
> We work quite hard to inline such constructors and this patch is only
> effective if inlining did not happen (for whatever reason).  Still, we
> optimize about 26 calls building tramp3d and about 70 calls during
> bootstrap (mostly ctors of poly_int). During bootstrap most removal
> happens early and we would inline the ctors unless we decide to optimize
> for size. 1 call per cc1* binary is removed late during LTO build.
>
> This is more frequent in codebases with higher abstraction penalty, with
> -Os or with profile feedback in sections optimized for size. I also hope
> we will be able to CSE such calls and that would make DSE more
> important.
>
> Bootstrapped/regtested x86_64-linux, OK?
>
> gcc/ChangeLog:
>
> * tree-ssa-alias.c (ao_ref_alias_ptr_type): Export.

ao_ref_init_from_ptr_and_range it is

> * tree-ssa-alias.h (ao_ref_init_from_ptr_and_range): Declare.
> * tree-ssa-dse.c (dse_optimize_stmt): Rename to ...
> (dse_optimize_store): ... this;
> (dse_optimize_call): New function.
> (pass_dse::execute): Use dse_optimize_call and update
> call to dse_optimize_store.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/modref-dse-1.c: New test.
> * gcc.dg/tree-ssa/modref-dse-2.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c
> new file mode 100644
> index 000..e78693b349a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1"  } */
> +volatile int *ptr;
> +struct a {
> +   int a,b,c;
> +} a;
> +__attribute__((noinline))
> +static int init (struct a*a)
> +{
> +   a->a=0;
> +   a->b=1;
> +}
> +__attribute__((noinline))
> +static int use (struct a*a)
> +{
> +   if (a->c != 3)
> +   *ptr=5;
> +}
> +
> +void
> +main(void)
> +{
> +   struct a a;
> +   init (&a);
> +   a.c=3;
> +   use (&a);
> +}
> +/* { dg-final { scan-tree-dump "Deleted dead store: init" "dse1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c
> new file mode 100644
> index 000..99c8ceb8127
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse2 -fno-ipa-sra -fno-ipa-cp"  } */
> +volatile int *ptr;
> +struct a {
> +   int a,b,c;
> +} a;
> +__attribute__((noinline))
> +static int init (struct a*a)
> +{
> +   a->a=0;
> +   a->b=1;
> +   a->c=1;
> +}
> +__attribute__((noinline))
> +static int use (struct a*a)
> +{
> +   if (a->c != 3)
> +   *ptr=5;
> +}
> +
> +void
> +main(void)
> +{
> +   struct a a;
> +   init (&a);
> +   a.c=3;
> +   use (&a);
> +}
> +/* Only DSE2 is tracking live bytes needed to figure out that store to c is
> +   also dead above.  */
> +/* { dg-final { scan-tree-dump "Deleted dead store: init" "dse2" } } */
> diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
> index eabf6805f2b..affb5d40d4b 100644
> --- a/gcc/tree-ssa-alias.c
> +++ b/gcc/tree-ssa-alias.c
> @@ -782,7 +782,7 @@ ao_ref_alias_ptr_type (ao_ref *ref)
> The access is assumed to be only to or after of the pointer target 
> adjusted
> by the offset, not before it (even in the case RANGE_KNOWN is false).  */
>
> -static void
> +void
>  ao_ref_init_from_ptr_and_range (ao_ref *ref, tree ptr,
> bool range_known,
> poly_int64 offset,
> diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h
> index 275dea10397..c2e28a74999 100644
> --- a/gcc/tree-ssa-alias.h
> +++ b/gcc/tree-ssa-alias.h
> @@ -111,6 +111,8 @@ ao_ref::max_size_known_p () const
>  /* In tree-ssa-alias.c  */
>  extern void ao_ref_init (ao_ref *, tree);
>  extern void ao_ref_init_from_ptr_and_size (ao_ref *, tree, tree);
> +void ao_ref_init_from_ptr_and_range (ao_ref *, tree, bool,
> +poly_int64, poly_int64, poly_int64);
>  extern tree ao_ref_base (ao_ref *);
>  extern alias_set_type ao_ref_alias_set (ao_ref *);
>  extern alias_set_type ao_

[PATCH 01/15] frv: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/frv/frv.md (*abssi2_internal, *minmax_si_signed,
*minmax_si_unsigned, *minmax_sf, *minmax_df): Fix split condition.
---
 gcc/config/frv/frv.md | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/frv/frv.md b/gcc/config/frv/frv.md
index a2aa1b2d2ac..fea6dedc53d 100644
--- a/gcc/config/frv/frv.md
+++ b/gcc/config/frv/frv.md
@@ -4676,7 +4676,7 @@ (define_insn_and_split "*abssi2_internal"
(clobber (match_operand:CC_CCR 3 "icr_operand" "=v,v"))]
   "TARGET_COND_MOVE"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(match_dup 4)]
   "operands[4] = frv_split_abs (operands);"
   [(set_attr "length" "12,16")
@@ -4717,7 +4717,7 @@ (define_insn_and_split "*minmax_si_signed"
(clobber (match_operand:CC_CCR 5 "icr_operand" "=v,v,v"))]
   "TARGET_COND_MOVE"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(match_dup 6)]
   "operands[6] = frv_split_minmax (operands);"
   [(set_attr "length" "12,12,16")
@@ -4758,7 +4758,7 @@ (define_insn_and_split "*minmax_si_unsigned"
(clobber (match_operand:CC_CCR 5 "icr_operand" "=v,v,v"))]
   "TARGET_COND_MOVE"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(match_dup 6)]
   "operands[6] = frv_split_minmax (operands);"
   [(set_attr "length" "12,12,16")
@@ -4799,7 +4799,7 @@ (define_insn_and_split "*minmax_sf"
(clobber (match_operand:CC_CCR 5 "fcr_operand" "=w,w,w"))]
   "TARGET_COND_MOVE && TARGET_HARD_FLOAT"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(match_dup 6)]
   "operands[6] = frv_split_minmax (operands);"
   [(set_attr "length" "12,12,16")
@@ -4840,7 +4840,7 @@ (define_insn_and_split "*minmax_df"
(clobber (match_operand:CC_CCR 5 "fcr_operand" "=w,w,w"))]
   "TARGET_COND_MOVE && TARGET_HARD_FLOAT && TARGET_DOUBLE"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(match_dup 6)]
   "operands[6] = frv_split_minmax (operands);"
   [(set_attr "length" "12,12,16")
-- 
2.27.0

[PATCH 00/15] Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

Hi,

This trivial patch series is the secondary product from the previous
investigation to see how many define_insn_and_split cases where
split_condition isn't applied on top of condition for define_insn
part and doesn't contain it, when there were some discussions on
whether we should warn for empty split condition or join both
conditions implicitly etc.  (See the threads[1][2]).

For some of investigated define_insn_and_splits, the corresponding
split_condition is suspected not robust, especially the split
condition has only reload_complete.  Lacking of good understanding
on the related port and the context of the code, I could be wrong.
But I think it may be a good idea to raise them and get them either
fixed or clarified.  It would be also good as preparation for the
possible conditions joining in future.  For some ports with the
proposed fixes applied, the split conditions in all
define_insn_and_splits will either have the explicit leading "&&" or
fully contain the condition for define_insn part.  In other words,
the implicit conditions joining would be one nop for this kind of
port, we don't need any other checks/fixes for it.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571647.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572120.html

BR,
Kewen
-

Kewen Lin (15):
  frv: Fix non-robust split condition in define_insn_and_split
  m32c: Fix non-robust split condition in define_insn_and_split
  rx: Fix non-robust split condition in define_insn_and_split
  s390: Fix non-robust split condition in define_insn_and_split
  v850: Fix non-robust split condition in define_insn_and_split
  visium: Fix non-robust split condition in define_insn_and_split
  xtensa: Fix non-robust split condition in define_insn_and_split
  alpha: Fix non-robust split condition in define_insn_and_split
  arm: Fix non-robust split condition in define_insn_and_split
  bfin: Fix non-robust split condition in define_insn_and_split
  csky: Fix non-robust split condition in define_insn_and_split
  i386: Fix non-robust split condition in define_insn_and_split
  ia64: Fix non-robust split condition in define_insn_and_split
  mips: Fix non-robust split condition in define_insn_and_split
  sh: Fix non-robust split condition in define_insn_and_split

 gcc/config/alpha/alpha.md   |  4 +--
 gcc/config/arm/arm.md   |  2 +-
 gcc/config/bfin/bfin.md |  4 +--
 gcc/config/csky/csky.md | 28 ++---
 gcc/config/frv/frv.md   | 10 
 gcc/config/i386/i386.md | 20 +++
 gcc/config/ia64/vect.md |  4 +--
 gcc/config/m32c/cond.md |  4 +--
 gcc/config/mips/mips.md |  4 +--
 gcc/config/rx/rx.md |  2 +-
 gcc/config/s390/s390.md |  2 +-
 gcc/config/s390/vector.md   |  4 +--
 gcc/config/sh/sh.md |  8 +++---
 gcc/config/v850/v850.md |  8 +++---
 gcc/config/visium/visium.md | 50 ++---
 gcc/config/xtensa/xtensa.md |  4 +--
 16 files changed, 79 insertions(+), 79 deletions(-)

-- 
2.27.0

[PATCH 03/15] rx: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix one non-robust split condition, to make
it applied on top of the corresponding condition for define_insn
part, otherwise the splitting could perform unexpectedly.

gcc/ChangeLog:

* config/rx/rx.md (cstoresf4): Fix split condition.
---
 gcc/config/rx/rx.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rx/rx.md b/gcc/config/rx/rx.md
index b76fce97bdc..c5297685a38 100644
--- a/gcc/config/rx/rx.md
+++ b/gcc/config/rx/rx.md
@@ -714,7 +714,7 @@ (define_insn_and_split "cstoresf4"
  (match_operand:SF 3 "rx_source_operand" "rFQ")]))]
   "ALLOW_RX_FPU_INSNS"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   rtx flags, x;
-- 
2.27.0

[PATCH 02/15] m32c: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/m32c/cond.md (stzx_reversed_, movhicc__):
Fix split condition.
---
 gcc/config/m32c/cond.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/m32c/cond.md b/gcc/config/m32c/cond.md
index b80b10320fb..ce6493fc9f6 100644
--- a/gcc/config/m32c/cond.md
+++ b/gcc/config/m32c/cond.md
@@ -106,7 +106,7 @@ (define_insn_and_split "stzx_reversed_"
 (match_operand:QHI 2 "const_int_operand" "")))]
   "(TARGET_A24 || GET_MODE (operands[0]) == QImode) && reload_completed"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0)
(if_then_else:QHI (eq (reg:CC FLG_REGNO) (const_int 0))
  (match_dup 2)
@@ -230,7 +230,7 @@ (define_insn_and_split "movhicc__"
  (match_operand:HI 4 "const_int_operand" "")))]
   "TARGET_A24"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (reg:CC FLG_REGNO)
(compare (match_dup 1)
 (match_dup 2)))
-- 
2.27.0

[PATCH 04/15] s390: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/s390/s390.md (*cstorecc_z13): Fix split condition.
* config/s390/vector.md (fprx2_to_tf, tf_to_fprx2): Likewise.
---
 gcc/config/s390/s390.md   | 2 +-
 gcc/config/s390/vector.md | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 4debdcd1247..1d66c30b9d5 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -6941,7 +6941,7 @@ (define_insn_and_split "*cstorecc_z13"
 (match_operand 3 "const_int_operand"  "")]))]
   "TARGET_Z13"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (const_int 0))
(set (match_dup 0)
(if_then_else:GPR
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 1ed1d0665d4..8aa4e82c28d 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -641,7 +641,7 @@ (define_insn_and_split "fprx2_to_tf"
   "@
vmrhg\t%v0,%1,%N1
#"
-  "!(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))"
+  "&& !(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))"
   [(set (match_dup 2) (match_dup 3))
(set (match_dup 4) (match_dup 5))]
 {
@@ -916,7 +916,7 @@ (define_insn_and_split "tf_to_fprx2"
(subreg:FPRX2 (match_operand:TF 1 "general_operand"   "v,AR") 0))]
   "TARGET_VXE"
   "#"
-  "!(MEM_P (operands[1]) && MEM_VOLATILE_P (operands[1]))"
+  "&& !(MEM_P (operands[1]) && MEM_VOLATILE_P (operands[1]))"
   [(set (match_dup 2) (match_dup 3))
(set (match_dup 4) (match_dup 5))]
 {
-- 
2.27.0

[PATCH 05/15] v850: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/v850/v850.md (cbranchsf4, cbranchdf4, *movsicc_normal,
*movsicc_reversed): Fix split condition.
---
 gcc/config/v850/v850.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/v850/v850.md b/gcc/config/v850/v850.md
index 872f17913de..d4a953c6bdb 100644
--- a/gcc/config/v850/v850.md
+++ b/gcc/config/v850/v850.md
@@ -374,7 +374,7 @@ (define_insn_and_split "cbranchsf4"
  (pc)))]
   "TARGET_USE_FPU"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 4) (match_dup 5))
(set (pc)
 (if_then_else (match_dup 6)
@@ -428,7 +428,7 @@ (define_insn_and_split "cbranchdf4"
  (pc)))]
   "TARGET_USE_FPU"
   "#"
-  "reload_completed"
+  "&& reload_completed"
 ;; How to get the mode here?
   [(set (match_dup 4) (match_dup 5))
(set (pc)
@@ -1210,7 +1210,7 @@ (define_insn_and_split "*movsicc_normal"
 (match_operand:SI 3 "reg_or_0_operand" "rI")))]
   "(TARGET_V850E_UP)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (reg:CC CC_REGNUM)
(compare:CC (match_dup 4) (match_dup 5)))
(set (match_dup 0)
@@ -1229,7 +1229,7 @@ (define_insn_and_split "*movsicc_reversed"
 (match_operand:SI 3 "reg_or_0_operand" "rJ")))]
   "(TARGET_V850E_UP)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (reg:CC CC_REGNUM)
(compare:CC (match_dup 4) (match_dup 5)))
(set (match_dup 0)
-- 
2.27.0

[PATCH 07/15] xtensa: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/xtensa/xtensa.md (movdi_internal, movdf_internal): Fix split
condition.
---
 gcc/config/xtensa/xtensa.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index cdf22f14b94..e0bf720d6e0 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -779,7 +779,7 @@ (define_insn_and_split "movdi_internal"
   "register_operand (operands[0], DImode)
|| register_operand (operands[1], DImode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 1) (match_dup 3))]
 {
@@ -1053,7 +1053,7 @@ (define_insn_and_split "movdf_internal"
   "register_operand (operands[0], DFmode)
|| register_operand (operands[1], DFmode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 1) (match_dup 3))]
 {
-- 
2.27.0

[PATCH 06/15] visium: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/visium/visium.md (*add3_insn, *addsi3_insn, *addi3_insn,
*sub3_insn, *subsi3_insn, *subdi3_insn, *neg2_insn,
*negdi2_insn, *and3_insn, *ior3_insn, *xor3_insn,
*one_cmpl2_insn, *ashl3_insn, *ashr3_insn,
*lshr3_insn, *trunchiqi2_insn, *truncsihi2_insn,
*truncdisi2_insn, *extendqihi2_insn, *extendqisi2_insn,
*extendhisi2_insn, *extendsidi2_insn, *zero_extendqihi2_insn,
*zero_extendqisi2_insn, *zero_extendsidi2_insn): Fix split condition.
---
 gcc/config/visium/visium.md | 50 ++---
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/gcc/config/visium/visium.md b/gcc/config/visium/visium.md
index 83ccf088124..ca2234bf253 100644
--- a/gcc/config/visium/visium.md
+++ b/gcc/config/visium/visium.md
@@ -792,7 +792,7 @@ (define_insn_and_split "*add3_insn"
  (match_operand:QHI 2 "register_operand" "r")))]
   "ok_for_simple_arith_logic_operands (operands, mode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0)
   (plus:QHI (match_dup 1) (match_dup 2)))
  (clobber (reg:CC R_FLAGS))])]
@@ -850,7 +850,7 @@ (define_insn_and_split "*addsi3_insn"
 (match_operand:SI 2 "add_operand"  " L,r,J")))]
   "ok_for_simple_arith_logic_operands (operands, SImode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0)
   (plus:SI (match_dup 1) (match_dup 2)))
  (clobber (reg:CC R_FLAGS))])]
@@ -912,7 +912,7 @@ (define_insn_and_split "*addi3_insn"
 (match_operand:DI 2 "add_operand"  " L,J, r")))]
   "ok_for_simple_arith_logic_operands (operands, DImode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   visium_split_double_add (PLUS, operands[0], operands[1], operands[2]);
@@ -1007,7 +1007,7 @@ (define_insn_and_split "*sub3_insn"
   (match_operand:QHI 2 "register_operand" "r")))]
   "ok_for_simple_arith_logic_operands (operands, mode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0)
   (minus:QHI (match_dup 1) (match_dup 2)))
  (clobber (reg:CC R_FLAGS))])]
@@ -1064,7 +1064,7 @@ (define_insn_and_split "*subsi3_insn"
  (match_operand:SI 2 "add_operand"  " L,r, J")))]
   "ok_for_simple_arith_logic_operands (operands, SImode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0)
   (minus:SI (match_dup 1) (match_dup 2)))
  (clobber (reg:CC R_FLAGS))])]
@@ -1125,7 +1125,7 @@ (define_insn_and_split "*subdi3_insn"
  (match_operand:DI 2 "add_operand"  " L,J, r")))]
   "ok_for_simple_arith_logic_operands (operands, DImode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   visium_split_double_add (MINUS, operands[0], operands[1], operands[2]);
@@ -1209,7 +1209,7 @@ (define_insn_and_split "*neg2_insn"
(neg:I (match_operand:I 1 "register_operand" "r")))]
   "ok_for_simple_arith_logic_operands (operands, mode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0) (neg:I (match_dup 1)))
  (clobber (reg:CC R_FLAGS))])]
   ""
@@ -1253,7 +1253,7 @@ (define_insn_and_split "*negdi2_insn"
(neg:DI (match_operand:DI 1 "register_operand" "r")))]
   "ok_for_simple_arith_logic_operands (operands, DImode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   visium_split_double_add (MINUS, operands[0], const0_rtx, operands[1]);
@@ -1415,7 +1415,7 @@ (define_insn_and_split "*and3_insn"
   (match_operand:I 2 "register_operand" "r")))]
   "ok_for_simple_arith_logic_operands (operands, mode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0)
   (and:I (match_dup 1) (match_dup 2)))
  (clobber (reg:CC R_FLAGS))])]
@@ -1453,7 +1453,7 @@ (define_insn_and_split "*ior3_insn"
   (match_operand:I 2 "register_operand" "r")))]
   "ok_for_simple_arith_logic_operands (operands, mode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0)
   (ior:I (match_dup 1) (match_dup 2)))
  (clobber (reg:CC R_FLAGS))])]
@@ -1491,7 +1491,7 @@ (define_insn_and_split "*xor3_insn"
   (match_operand:I 2 "register_operand" "r")))]
   "ok_for_simple_arith_logic_operands (operands, mode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0)
   (xor:I (match_dup 1) (match_dup 2)))
  (clobber (reg:CC R_FLAGS))])]

[PATCH 08/15] alpha: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/alpha/alpha.md (*movtf_internal, *movti_internal): Fix split
condition.
---
 gcc/config/alpha/alpha.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 98d09d43721..87617afd0c6 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -3830,7 +3830,7 @@ (define_insn_and_split "*movtf_internal"
   "register_operand (operands[0], TFmode)
|| reg_or_0_operand (operands[1], TFmode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 1) (match_dup 3))]
   "alpha_split_tmode_pair (operands, TFmode, true);")
@@ -4091,7 +4091,7 @@ (define_insn_and_split "*movti_internal"
 && ! CONSTANT_P (operands[1]))
|| reg_or_0_operand (operands[1], TImode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 1) (match_dup 3))]
   "alpha_split_tmode_pair (operands, TImode, true);")
-- 
2.27.0

[PATCH 09/15] arm: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix one non-robust split condition, to make
it applied on top of the corresponding condition for define_insn
part, otherwise the splitting could perform unexpectedly.

gcc/ChangeLog:

* config/arm/arm.md (*minmax_arithsi_non_canon): Fix split condition.
---
 gcc/config/arm/arm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 4adc976b8b6..9a27d421484 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4198,7 +4198,7 @@ (define_insn_and_split "*minmax_arithsi_non_canon"
   "TARGET_32BIT && !arm_eliminable_register (operands[1])
&& !(arm_restrict_it && CONST_INT_P (operands[3]))"
   "#"
-  "TARGET_32BIT && !arm_eliminable_register (operands[1]) && reload_completed"
+  "&& reload_completed"
   [(set (reg:CC CC_REGNUM)
 (compare:CC (match_dup 2) (match_dup 3)))
 
-- 
2.27.0

[PATCH 10/15] bfin: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/bfin/bfin.md (movdi_insn, movdf_insn): Fix split condition.
---
 gcc/config/bfin/bfin.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/bfin/bfin.md b/gcc/config/bfin/bfin.md
index fd65f4d9e63..41a50974136 100644
--- a/gcc/config/bfin/bfin.md
+++ b/gcc/config/bfin/bfin.md
@@ -506,7 +506,7 @@ (define_insn_and_split "movdi_insn"
(match_operand:DI 1 "general_operand" "iFx,r,mx"))]
   "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) == REG"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 2) (match_dup 3))
(set (match_dup 4) (match_dup 5))]
 {
@@ -718,7 +718,7 @@ (define_insn_and_split "movdf_insn"
(match_operand:DF 1 "general_operand" "iFx,r,mx"))]
   "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) == REG"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 2) (match_dup 3))
(set (match_dup 4) (match_dup 5))]
 {
-- 
2.27.0

[PATCH 13/15] ia64: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/ia64/vect.md (*vec_extractv2sf_0_le, *vec_extractv2sf_0_be):
Fix split condition.
---
 gcc/config/ia64/vect.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/ia64/vect.md b/gcc/config/ia64/vect.md
index 1a2452289b7..0f3a406d620 100644
--- a/gcc/config/ia64/vect.md
+++ b/gcc/config/ia64/vect.md
@@ -1422,7 +1422,7 @@ (define_insn_and_split "*vec_extractv2sf_0_le"
   UNSPEC_VECT_EXTR))]
   "!TARGET_BIG_ENDIAN"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
 {
   if (REG_P (operands[1]) && FR_REGNO_P (REGNO (operands[1])))
@@ -1440,7 +1440,7 @@ (define_insn_and_split "*vec_extractv2sf_0_be"
   UNSPEC_VECT_EXTR))]
   "TARGET_BIG_ENDIAN"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
 {
   if (MEM_P (operands[1]))
-- 
2.27.0

[PATCH 12/15] i386: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/i386/i386.md (*add3_doubleword, *addv4_doubleword,
*addv4_doubleword_1, *sub3_doubleword,
*subv4_doubleword, *subv4_doubleword_1,
*add3_doubleword_cc_overflow_1, *divmodsi4_const,
*neg2_doubleword, *tls_dynamic_gnu2_combine_64_): Fix split
condition.
---
 gcc/config/i386/i386.md | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6eb9de81921..2bd09e502ae 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -5491,7 +5491,7 @@ (define_insn_and_split "*add3_doubleword"
(clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (PLUS, mode, operands)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
   (compare:CCC
 (plus:DWIH (match_dup 1) (match_dup 2))
@@ -6300,7 +6300,7 @@ (define_insn_and_split "*addv4_doubleword"
(plus: (match_dup 1) (match_dup 2)))]
   "ix86_binary_operator_ok (PLUS, mode, operands)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
   (compare:CCC
 (plus:DWIH (match_dup 1) (match_dup 2))
@@ -6347,7 +6347,7 @@ (define_insn_and_split "*addv4_doubleword_1"
&& CONST_SCALAR_INT_P (operands[2])
&& rtx_equal_p (operands[2], operands[3])"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
   (compare:CCC
 (plus:DWIH (match_dup 1) (match_dup 2))
@@ -6641,7 +6641,7 @@ (define_insn_and_split "*sub3_doubleword"
(clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (MINUS, mode, operands)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
   (compare:CC (match_dup 1) (match_dup 2)))
  (set (match_dup 0)
@@ -6817,7 +6817,7 @@ (define_insn_and_split "*subv4_doubleword"
(minus: (match_dup 1) (match_dup 2)))]
   "ix86_binary_operator_ok (MINUS, mode, operands)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
   (compare:CC (match_dup 1) (match_dup 2)))
  (set (match_dup 0)
@@ -6862,7 +6862,7 @@ (define_insn_and_split "*subv4_doubleword_1"
&& CONST_SCALAR_INT_P (operands[2])
&& rtx_equal_p (operands[2], operands[3])"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
   (compare:CC (match_dup 1) (match_dup 2)))
  (set (match_dup 0)
@@ -7542,7 +7542,7 @@ (define_insn_and_split 
"*add3_doubleword_cc_overflow_1"
(plus: (match_dup 1) (match_dup 2)))]
   "ix86_binary_operator_ok (PLUS, mode, operands)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
   (compare:CCC
 (plus:DWIH (match_dup 1) (match_dup 2))
@@ -9000,7 +9000,7 @@ (define_insn_and_split "*divmodsi4_const"
(clobber (reg:CC FLAGS_REG))]
   "!optimize_function_for_size_p (cfun)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 1) (match_dup 4))
(parallel [(set (match_dup 0)
@@ -10515,7 +10515,7 @@ (define_insn_and_split "*neg2_doubleword"
(clobber (reg:CC FLAGS_REG))]
   "ix86_unary_operator_ok (NEG, mode, operands)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel
 [(set (reg:CCC FLAGS_REG)
  (ne:CCC (match_dup 1) (const_int 0)))
@@ -16898,7 +16898,7 @@ (define_insn_and_split 
"*tls_dynamic_gnu2_combine_64_"
(clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && TARGET_GNU2_TLS"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0) (match_dup 4))]
 {
   operands[4] = can_create_pseudo_p () ? gen_reg_rtx (ptr_mode) : operands[0];
-- 
2.27.0

[PATCH 14/15] mips: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/mips/mips.md (*udivmod4, udivmod4_mips16): Fix
split condition.
---
 gcc/config/mips/mips.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 455b9b802f6..4efb7503df3 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -2961,7 +2961,7 @@ (define_insn_and_split "*udivmod4"
  (match_dup 2)))]
   "ISA_HAS_DIV && !TARGET_MIPS16"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   emit_insn (gen_udivmod4_split (operands[3], operands[1], operands[2]));
@@ -2982,7 +2982,7 @@ (define_insn_and_split "udivmod4_mips16"
(clobber (match_operand:GPR 4 "lo_operand" "=l"))]
   "ISA_HAS_DIV && TARGET_MIPS16"
   "#"
-  "cse_not_expected"
+  "&& cse_not_expected"
   [(const_int 0)]
 {
   emit_insn (gen_udivmod4_split (operands[3], operands[1], operands[2]));
-- 
2.27.0

[PATCH 15/15] sh: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/sh/sh.md (call_pcrel, call_value_pcrel, sibcall_pcrel,
sibcall_value_pcrel): Fix split condition.
---
 gcc/config/sh/sh.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 93ee7c9a7de..1bb325c7044 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -6566,7 +6566,7 @@ (define_insn_and_split "call_pcrel"
(clobber (match_scratch:SI 2 "=&r"))]
   "TARGET_SH2"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   rtx lab = PATTERN (gen_call_site ());
@@ -6678,7 +6678,7 @@ (define_insn_and_split "call_value_pcrel"
(clobber (match_scratch:SI 3 "=&r"))]
   "TARGET_SH2"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   rtx lab = PATTERN (gen_call_site ());
@@ -6877,7 +6877,7 @@ (define_insn_and_split "sibcall_pcrel"
(return)]
   "TARGET_SH2 && !TARGET_FDPIC"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   rtx lab = PATTERN (gen_call_site ());
@@ -7043,7 +7043,7 @@ (define_insn_and_split "sibcall_value_pcrel"
(return)]
   "TARGET_SH2 && !TARGET_FDPIC"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   rtx lab = PATTERN (gen_call_site ());
-- 
2.27.0

[PATCH 11/15] csky: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches

This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/csky/csky.md (*cskyv2_adddi3, *ck801_adddi3, *cskyv2_adddi1_1,
*cskyv2_subdi3, *ck801_subdi3, *cskyv2_subdi1_1, cskyv2_addcc,
cskyv2_addcc_invert, *cskyv2_anddi3, *ck801_anddi3, *cskyv2_iordi3,
*ck801_iordi3, *cskyv2_xordi3, *ck801_xordi3,): Fix split condition.
---
 gcc/config/csky/csky.md | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/config/csky/csky.md b/gcc/config/csky/csky.md
index f91d851cb2c..54143a0efea 100644
--- a/gcc/config/csky/csky.md
+++ b/gcc/config/csky/csky.md
@@ -850,7 +850,7 @@ (define_insn_and_split "*cskyv2_adddi3"
(clobber (reg:CC CSKY_CC_REGNUM))]
   "CSKY_ISA_FEATURE (E2)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -877,7 +877,7 @@ (define_insn_and_split "*ck801_adddi3"
(clobber (reg:CC CSKY_CC_REGNUM))]
   "CSKY_ISA_FEATURE (E1)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -906,7 +906,7 @@ (define_insn_and_split "*cskyv2_adddi1_1"
(clobber (reg:CC CSKY_CC_REGNUM))]
   "CSKY_ISA_FEATURE (E2)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1048,7 +1048,7 @@ (define_insn_and_split "*cskyv2_subdi3"
(clobber (reg:CC CSKY_CC_REGNUM))]
   "CSKY_ISA_FEATURE (E2)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1075,7 +1075,7 @@ (define_insn_and_split "*ck801_subdi3"
(clobber (reg:CC CSKY_CC_REGNUM))]
   "CSKY_ISA_FEATURE (E1)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1104,7 +1104,7 @@ (define_insn_and_split "*cskyv2_subdi1_1"
(clobber (reg:CC CSKY_CC_REGNUM))]
   "CSKY_ISA_FEATURE (E2)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1276,7 +1276,7 @@ (define_insn_and_split "cskyv2_addcc"
dect\t%0, %1, %M2
#
#"
-  "reload_completed && !rtx_equal_p (operands[0], operands[1])"
+  "&& reload_completed && !rtx_equal_p (operands[0], operands[1])"
   [(set (match_dup 0)
(if_then_else:SI (ne (reg:CC CSKY_CC_REGNUM) (const_int 0))
 (plus:SI (match_dup 0) (match_dup 2]
@@ -1302,7 +1302,7 @@ (define_insn_and_split "cskyv2_addcc_invert"
decf\t%0, %1, %M2
#
#"
-  "reload_completed && !rtx_equal_p (operands[0], operands[1])"
+  "&& reload_completed && !rtx_equal_p (operands[0], operands[1])"
   [(set (match_dup 0)
(if_then_else:SI (eq (reg:CC CSKY_CC_REGNUM) (const_int 0))
 (plus:SI (match_dup 0) (match_dup 2]
@@ -1691,7 +1691,7 @@ (define_insn_and_split "*cskyv2_anddi3"
(match_operand:DI 2 "register_operand" "b,r")))]
   "CSKY_ISA_FEATURE (E2)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1716,7 +1716,7 @@ (define_insn_and_split "*ck801_anddi3"
   (match_operand:DI 2 "register_operand" "r")))]
   "CSKY_ISA_FEATURE (E1)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1801,7 +1801,7 @@ (define_insn_and_split "*cskyv2_iordi3"
(match_operand:DI 2 "register_operand" "b,  r")))]
   "CSKY_ISA_FEATURE (E2)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1826,7 +1826,7 @@ (define_insn_and_split "*ck801_iordi3"
(match_operand:DI 2 "register_operand" "r")))]
   "CSKY_ISA_FEATURE (E1)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1911,7 +1911,7 @@ (define_insn_and_split "*cskyv2_xordi3"
(match_operand:DI 2 "register_operand" "b,  r")))]
   "CSKY_ISA_FEATURE (E2)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1936,7 +1936,7 @@ (define_insn_and_split "*ck801_xordi3"
(match_operand:DI 2 "register_operand" "r")))]
   "CSKY_ISA_FEATURE (E1)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
-- 
2.27.0

Re: Use modref summary to DSE calls to non-pure functions

2021-11-11 Thread Jan Hubicka via Gcc-patches

> > +  /* Unlike alias oracle we can not skip subtrees based on TBAA check.
> > + Count the size of the whole tree to verify that we will not need too 
> > many
> > + tests.  */
> > +  FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node)
> > +FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
> > +  FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> > +   if (num_tests++ > max_tests)
> > + return false;
> 
> at least the innermost loop can be done as
> 
>   if (num_tests += ref_node->accesses.length () > max_tests)
> 
> no?

Yep that was stupid, sorry for that ;))
> 
> > +
> > +  /* Walk all memory writes and verify that they are dead.  */
> > +  FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node)
> > +FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
> > +  FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> > +   {
> > + /* ??? if offset is unkonwn it may be negative.  Not sure
> > +how to construct ref here.  */
> 
> I think you can't, you could use -poly_int64_max or so.

I need a ref to give to dse_classify_store. It needs base to track live
bytes etc which is not very useful if I do not know the range.  However
DSE is still useful since I can hit free or end of lifetime of the decl.
I was wondering if I should simply implement a lightweight version of
dse_clasify_store that handles this case?
> 
> > + if (!access_node->parm_offset_known)
> > +   return false;
> 
> But you could do this check in the loop computing num_tests ...
> (we could also cache the count and whether any of the refs have unknown offset
> in the summary?)

Yep, I plan to add cache for bits like this (and the check for accessing
global memory).  Just want to push bit more of the cleanups I have in my
local tree.
> 
> > + tree arg;
> > + if (access_node->parm_index == MODREF_STATIC_CHAIN_PARM)
> > +   arg = gimple_call_chain (stmt);
> > + else
> > +   arg = gimple_call_arg (stmt, access_node->parm_index);
> > +
> > + ao_ref ref;
> > + poly_offset_int off = (poly_offset_int)access_node->offset
> > +   + ((poly_offset_int)access_node->parm_offset
> > +  << LOG2_BITS_PER_UNIT);
> > + poly_int64 off2;
> > + if (!off.to_shwi (&off2))
> > +   return false;
> > + ao_ref_init_from_ptr_and_range
> > +(&ref, arg, true, off2, access_node->size,
> > + access_node->max_size);
> > + ref.ref_alias_set = ref_node->ref;
> > + ref.base_alias_set = base_node->base;
> > +
> > + bool byte_tracking_enabled
> > + = setup_live_bytes_from_ref (&ref, live_bytes);
> > + enum dse_store_status store_status;
> > +
> > + store_status = dse_classify_store (&ref, stmt,
> > +byte_tracking_enabled,
> > +live_bytes, &by_clobber_p);
> > + if (store_status != DSE_STORE_DEAD)
> > +   return false;
> > +   }
> > +  /* Check also value stored by the call.  */
> > +  if (gimple_store_p (stmt))
> > +{
> > +  ao_ref ref;
> > +
> > +  if (!initialize_ao_ref_for_dse (stmt, &ref))
> > +   gcc_unreachable ();
> > +  bool byte_tracking_enabled
> > + = setup_live_bytes_from_ref (&ref, live_bytes);
> > +  enum dse_store_status store_status;
> > +
> > +  store_status = dse_classify_store (&ref, stmt,
> > +byte_tracking_enabled,
> > +live_bytes, &by_clobber_p);
> > +  if (store_status != DSE_STORE_DEAD)
> > +   return false;
> > +}
> > +  delete_dead_or_redundant_assignment (gsi, "dead", need_eh_cleanup);
> > +  return true;
> > +}
> > +
> >  namespace {
> >
> >  const pass_data pass_data_dse =
> > @@ -1235,7 +1363,14 @@ pass_dse::execute (function *fun)
> >   gimple *stmt = gsi_stmt (gsi);
> >
> >   if (gimple_vdef (stmt))
> > -   dse_optimize_stmt (fun, &gsi, live_bytes);
> > +   {
> > + gcall *call = dyn_cast  (stmt);
> > +
> > + if (call && dse_optimize_call (&gsi, live_bytes))
> > +   /* We removed a dead call.  */;
> > + else
> > +   dse_optimize_store (fun, &gsi, live_bytes);
> 
> I think we want to refactor both functions, dse_optimize_stmt has some
> early outs that apply generally, and it handles some builtin calls
> that we don't want to re-handle with dse_optimize_call.
> 
> So I wonder if it is either possible to call the new function from
> inside dse_optimize_stmt instead, after we handled the return
> value of call for example or different refactoring can make the flow
> more obvious.

It was my initial plan. However I was not sure how much I would get from
that.

The function starts with:

  /* Don't return early on *this_2(D)

Re: Use modref summary to DSE calls to non-pure functions

2021-11-11 Thread Richard Biener via Gcc-patches

On Thu, Nov 11, 2021 at 1:07 PM Jan Hubicka  wrote:
>
> > > +  /* Unlike alias oracle we can not skip subtrees based on TBAA check.
> > > + Count the size of the whole tree to verify that we will not need 
> > > too many
> > > + tests.  */
> > > +  FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node)
> > > +FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
> > > +  FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> > > +   if (num_tests++ > max_tests)
> > > + return false;
> >
> > at least the innermost loop can be done as
> >
> >   if (num_tests += ref_node->accesses.length () > max_tests)
> >
> > no?
>
> Yep that was stupid, sorry for that ;))
> >
> > > +
> > > +  /* Walk all memory writes and verify that they are dead.  */
> > > +  FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node)
> > > +FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
> > > +  FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> > > +   {
> > > + /* ??? if offset is unkonwn it may be negative.  Not sure
> > > +how to construct ref here.  */
> >
> > I think you can't, you could use -poly_int64_max or so.
>
> I need a ref to give to dse_classify_store. It needs base to track live
> bytes etc which is not very useful if I do not know the range.  However
> DSE is still useful since I can hit free or end of lifetime of the decl.
> I was wondering if I should simply implement a lightweight version of
> dse_clasify_store that handles this case?

No, I think if it turns out useful then we want a way to have such ref
represented by an ao_ref.  Note that when we come from a
ref tree we know handled-components only will increase offset,
only the base MEM_REF can contain a pointer subtraction (but
the result of that is the base then).

In what cases does parm_offset_known end up false?  Is that
when seeing a POINTER_PLUS_EXPR with unknown offset?
So yes, that's a case we cannot capture right now - the only
thing that remains is a pointer with a known points-to-set - a
similar problem as with the pure call PRE.  You could in theory
allocate a scratch SSA name and attach points-to-info
to it.  And when the call argument is &decl based then you could set
offset to zero.

> >
> > > + if (!access_node->parm_offset_known)
> > > +   return false;
> >
> > But you could do this check in the loop computing num_tests ...
> > (we could also cache the count and whether any of the refs have unknown 
> > offset
> > in the summary?)
>
> Yep, I plan to add cache for bits like this (and the check for accessing
> global memory).  Just want to push bit more of the cleanups I have in my
> local tree.
> >
> > > + tree arg;
> > > + if (access_node->parm_index == MODREF_STATIC_CHAIN_PARM)
> > > +   arg = gimple_call_chain (stmt);
> > > + else
> > > +   arg = gimple_call_arg (stmt, access_node->parm_index);
> > > +
> > > + ao_ref ref;
> > > + poly_offset_int off = (poly_offset_int)access_node->offset
> > > +   + ((poly_offset_int)access_node->parm_offset
> > > +  << LOG2_BITS_PER_UNIT);
> > > + poly_int64 off2;
> > > + if (!off.to_shwi (&off2))
> > > +   return false;
> > > + ao_ref_init_from_ptr_and_range
> > > +(&ref, arg, true, off2, access_node->size,
> > > + access_node->max_size);
> > > + ref.ref_alias_set = ref_node->ref;
> > > + ref.base_alias_set = base_node->base;
> > > +
> > > + bool byte_tracking_enabled
> > > + = setup_live_bytes_from_ref (&ref, live_bytes);
> > > + enum dse_store_status store_status;
> > > +
> > > + store_status = dse_classify_store (&ref, stmt,
> > > +byte_tracking_enabled,
> > > +live_bytes, &by_clobber_p);
> > > + if (store_status != DSE_STORE_DEAD)
> > > +   return false;
> > > +   }
> > > +  /* Check also value stored by the call.  */
> > > +  if (gimple_store_p (stmt))
> > > +{
> > > +  ao_ref ref;
> > > +
> > > +  if (!initialize_ao_ref_for_dse (stmt, &ref))
> > > +   gcc_unreachable ();
> > > +  bool byte_tracking_enabled
> > > + = setup_live_bytes_from_ref (&ref, live_bytes);
> > > +  enum dse_store_status store_status;
> > > +
> > > +  store_status = dse_classify_store (&ref, stmt,
> > > +byte_tracking_enabled,
> > > +live_bytes, &by_clobber_p);
> > > +  if (store_status != DSE_STORE_DEAD)
> > > +   return false;
> > > +}
> > > +  delete_dead_or_redundant_assignment (gsi, "dead", need_eh_cleanup);
> > > +  return true;
> > > +}
> > > +
> > >  namespace {
> > >
> > >  const pass_data pass_data_dse =
> > > @@ -1235,7 +1363,14 @@ pass_dse::execute (function *fun)
> > >   gi

Re: Use modref summary to DSE calls to non-pure functions

2021-11-11 Thread Jan Hubicka via Gcc-patches

Hi,
> 
> No, I think if it turns out useful then we want a way to have such ref
> represented by an ao_ref.  Note that when we come from a
> ref tree we know handled-components only will increase offset,
> only the base MEM_REF can contain a pointer subtraction (but
> the result of that is the base then).

Yep, that is why I introduced the parm_offset at first place - it can be
negative or unknown...
> 
> In what cases does parm_offset_known end up false?  Is that
> when seeing a POINTER_PLUS_EXPR with unknown offset?

Yep, a typical example is a loop with pointer walking an array .

> So yes, that's a case we cannot capture right now - the only
> thing that remains is a pointer with a known points-to-set - a
> similar problem as with the pure call PRE.  You could in theory
> allocate a scratch SSA name and attach points-to-info
> to it.  And when the call argument is &decl based then you could set
> offset to zero.

Hmm, I could try to do this, but possibly incrementally?

Basically I want to have

foo (&decl)
decl = {}

To be matched since even if I do not know the offset I know it is dead
after end of lifetime of the decl.  I am not quite sure PTA will give me
that?
> > It was my initial plan. However I was not sure how much I would get from
> > that.
> >
> > The function starts with:
> >
> >   /* Don't return early on *this_2(D) ={v} {CLOBBER}.  */
> >   if (gimple_has_volatile_ops (stmt)
> >   && (!gimple_clobber_p (stmt)
> >   || TREE_CODE (gimple_assign_lhs (stmt)) != MEM_REF))
> > return;
> >
> >   ao_ref ref;
> >   if (!initialize_ao_ref_for_dse (stmt, &ref))
> > return;
> >
> > The check about clobber does not apply to calls and then it gives up on
> > functions not returning aggregates (that is a common case).
> >
> > For functions returing aggregates it tries to prove that retval is dead
> > and replace it.
> >
> > I guess I can simply call my analysis from the second return above and
> > from the code removing dead LHS call instead of doing it from the main
> > walker and drop the LHS handling?
> 
> Yeah, something like that.
OK, I will prepare updated patch, thanks!

Honza
> 
> Richard.
> 
> > Thank you,
> > Honza
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > +   }
> > > >   else if (def_operand_p
> > > >  def_p = single_ssa_def_operand (stmt, SSA_OP_DEF))
> > > > {

Basic kill analysis for modref

2021-11-11 Thread Jan Hubicka via Gcc-patches

Hi,
This patch enables optimization of stores that are killed by calls.
Modref summary is extended by array containing list of access ranges, relative
to function parameters, that are known to be killed by the function.
This array is collected during local analysis and optimized (so separate
stores are glued together). 

Kill analysis in ipa-modref.c is quite simplistic.  In particular no WPA
propagation is done and also we take very simple approach to prove that
given store is executed each invocation of the function.  I simply
require it to be in the first basic block and before anyting that can
throw externally.  I have more fancy code for that but with this patch I
want to primarily discuss interace to tree-ssa-alias.c. I wonder if thre
are some helpers I can re-use?

>From GCC linktime I get 814 functions with non-empty kill vector.

Modref stats:
  modref kill: 39 kills, 7162 queries
  modref use: 25169 disambiguations, 697722 queries
  modref clobber: 2290122 disambiguations, 22750147 queries
  5240008 tbaa queries (0.230329 per modref query)
  806190 base compares (0.035437 per modref query)

(note that more kills happens at early optimization where we did not
inlined that much yet).

For tramp3d (non-lto -O3 build):

Modref stats:
  modref kill: 45 kills, 630 queries
  modref use: 750 disambiguations, 10061 queries
  modref clobber: 35253 disambiguations, 543262 queries
  85347 tbaa queries (0.157101 per modref query)
  18727 base compares (0.034471 per modref query)

So it is not that high, but it gets better after improving the analysis side
and also with -Os and/or PGO (wehre we offline cdtors) and also wiring in
same_addr_size_stores_p which I want to discuss incrementally.

But at least there are not that many queries to slow down compile times
noticeably :)

Honza

gcc/ChangeLog:

* ipa-modref-tree.h (struct modref_access_node): New member function
* ipa-modref.c (modref_summary::useful_p): Kills are not useful when
we can not analyze loads.
(struct modref_summary_lto): Add kills.
(modref_summary::dump): Dump kills.
(record_access): Take access node as parameter.
(record_access_lto): Likewise.
(add_kill): New function.
(merge_call_side_effects): Merge kills.
(analyze_call): Pass around always_executed.
(struct summary_ptrs): Add always_executed flag.
(analyze_load): Update.
(analyze_store): Handle kills.
(analyze_stmt): Pass around always_executed flag; handle kills from
clobbers.
(analyze_function): Compute always_executed.
(modref_summaries::duplicate): Copy kills.
(update_signature): Release kills.
* ipa-modref.h (struct modref_summary): Add kills.
* tree-ssa-alias.c (dump_alias_stats): Dump kills.
(stmt_kills_ref_p): Handle modref kills.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/modref-dse-2.c: New test.

diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index 17ff6bb582c..6f8caa331a6 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -120,6 +120,8 @@ static struct {
   unsigned HOST_WIDE_INT modref_use_no_alias;
   unsigned HOST_WIDE_INT modref_clobber_may_alias;
   unsigned HOST_WIDE_INT modref_clobber_no_alias;
+  unsigned HOST_WIDE_INT modref_kill_no;
+  unsigned HOST_WIDE_INT modref_kill_yes;
   unsigned HOST_WIDE_INT modref_tests;
   unsigned HOST_WIDE_INT modref_baseptr_tests;
 } alias_stats;
@@ -169,6 +171,12 @@ dump_alias_stats (FILE *s)
   + alias_stats.aliasing_component_refs_p_may_alias);
   dump_alias_stats_in_alias_c (s);
   fprintf (s, "\nModref stats:\n");
+  fprintf (s, "  modref kill: "
+  HOST_WIDE_INT_PRINT_DEC" kills, "
+  HOST_WIDE_INT_PRINT_DEC" queries\n",
+  alias_stats.modref_kill_yes,
+  alias_stats.modref_kill_yes
+  + alias_stats.modref_kill_no);
   fprintf (s, "  modref use: "
   HOST_WIDE_INT_PRINT_DEC" disambiguations, "
   HOST_WIDE_INT_PRINT_DEC" queries\n",
@@ -3373,6 +3381,107 @@ stmt_kills_ref_p (gimple *stmt, ao_ref *ref)
   if (is_gimple_call (stmt))
 {
   tree callee = gimple_call_fndecl (stmt);
+  struct cgraph_node *node;
+  modref_summary *summary;
+
+  /* Try to disambiguate using modref summary.  Modref records a vector
+of stores with known offsets relative to function parameters that must
+happen every execution of function.  Find if we have a matching
+store and verify that function can not use the value.  */
+  if (callee != NULL_TREE
+ && (node = cgraph_node::get (callee)) != NULL
+ && node->binds_to_current_def_p ()
+ && (summary = get_modref_function_summary (node)) != NULL
+ && summary->kills.length ())
+   {
+ tree base = ao_ref_base (ref);
+ for (unsigned int i = 0; i < summary->kills.length (); i++)
+   {
+ modref_access_node &a = summary->kills[i];
+

Re: [PATCH] fixincludes: don't assume getcwd() can handle NULL argument

2021-11-11 Thread Eric Gallager via Gcc-patches

On Tue, Nov 9, 2021 at 8:50 AM Xi Ruoyao via Gcc-patches
 wrote:
>
> POSIX says:
>
> On some implementations, if buf is a null pointer, getcwd() may obtain
> size bytes of memory using malloc(). In this case, the pointer returned
> by getcwd() may be used as the argument in a subsequent call to free().
> Invoking getcwd() with buf as a null pointer is not recommended in
> conforming applications.
>
> This produces an error building GCC with --enable-werror-always:
>
> ../../../fixincludes/fixincl.c: In function ‘process’:
> ../../../fixincludes/fixincl.c:1356:7: error: argument 1 is null but
> the corresponding size argument 2 value is 4096 [-Werror=nonnull]
>
> And, at least we've been leaking memory even if getcwd() supports this
> non-standard extension.
>
> fixincludes/ChangeLog:
>
> * fixincl.c (process): Allocate and deallocate the buffer for
>   getcwd() explicitly.
> ---
>  fixincludes/fixincl.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
> index 6dba2f6e830..b4b1e38ede7 100644
> --- a/fixincludes/fixincl.c
> +++ b/fixincludes/fixincl.c
> @@ -1353,9 +1353,11 @@ process (void)
>if (access (pz_curr_file, R_OK) != 0)
>  {
>int erno = errno;
> +  char *buf = xmalloc (MAXPATHLEN);
>fprintf (stderr, "Cannot access %s from %s\n\terror %d (%s)\n",
> -   pz_curr_file, getcwd ((char *) NULL, MAXPATHLEN),
> +   pz_curr_file, getcwd (buf, MAXPATHLEN),
> erno, xstrerror (erno));
> +  free (buf);
>return;
>  }
>
> --
> 2.33.1

This seems to contradict bug 21823:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21823
It would fix bug 80047, though:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80047

Re: [PATCH] rs6000: Fix a handful of 32-bit built-in function problems in the new support

2021-11-11 Thread Segher Boessenkool

On Wed, Nov 10, 2021 at 03:28:18PM -0600, Bill Schmidt wrote:
> On 11/10/21 2:33 AM, Segher Boessenkool wrote:
> > On Tue, Nov 09, 2021 at 03:46:54PM -0600, Bill Schmidt wrote:
> >>* config/rs6000/rs6000-builtin-new.def (CMPB): Flag as no32bit.
> >>(BPERMD): Flag as 32bit.

So, change this to something like "flag this as needing special handling
on 32 bit" or something?

> >> -  void __builtin_set_texasr (unsigned long long);
> >> +  void __builtin_set_texasr (unsigned long);
> >>  SET_TEXASR nothing {htm,htmspr}
> >>  
> >> -  void __builtin_set_texasru (unsigned long long);
> >> +  void __builtin_set_texasru (unsigned long);
> >>  SET_TEXASRU nothing {htm,htmspr}
> >>  
> >> -  void __builtin_set_tfhar (unsigned long long);
> >> +  void __builtin_set_tfhar (unsigned long);
> >>  SET_TFHAR nothing {htm,htmspr}
> >>  
> >> -  void __builtin_set_tfiar (unsigned long long);
> >> +  void __builtin_set_tfiar (unsigned long);
> >>  SET_TFIAR nothing {htm,htmspr}
> > This does not seem to be what the exiting code does, either?  Try with
> > -m32 -mpowerpc64 (it extends to 64 bit there, so the builtin does not
> > have long int as parameter, it has long long int).
> 
> This uses a tfiar_t, which is a typedef for uintptr_t, so long int is 
> appropriate.
> This is necessary to make the HTM tests pass on 32-bit powerpc64.

void f(long x) { __builtin_set_texasr(x); }

built with -m32 -mpowerpc64 gives (in the expand dump):

void f (long int x)
{
  long long unsigned int _1;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _1 = (long long unsigned int) x_2(D);
  __builtin_set_texasr (_1); [tail call]
  return;
;;succ:   EXIT

}

The builtins have a "long long" argument in the existing code, in this
configuration.  And this is not the same as "long" here.

> >> --- a/gcc/testsuite/gcc.target/powerpc/cmpb-3.c
> >> +++ b/gcc/testsuite/gcc.target/powerpc/cmpb-3.c
> >> @@ -8,7 +8,7 @@ void abort ();
> >>  long long int
> >>  do_compare (long long int a, long long int b)
> >>  {
> >> -  return __builtin_cmpb (a, b);   /* { dg-error "'__builtin_cmpb' is not 
> >> supported in this compiler configuration" } */
> >> +  return __builtin_cmpb (a, b);   /* { dg-error "'__builtin_p6_cmpb' is 
> >> not supported in 32-bit mode" } */
> >>  }
> > The original spelling is the correct one?
> 
> This is something I have on my to-do list for the future, to see whether I
> can improve it.  The overloaded function __builtin_cmpb gets translated to
> the underlying non-overloaded builtin __builtin_p6_cmpb, and that's the only
> name that's still around by the time we get to the error processing.  I want
> to see whether I can add some infrastructure to recover the overloaded
> function name in such cases.  Is it okay to defer this for now?

It is fine to defer it.  It is not fine to change the testcase like
this.  The user did not write __builtin_p6_cmpb (which is not even
documented btw), so the compiler should not talk about that.  It is
fine to leave the test failing for now.


Segher

Re: [committed] openmp: Fix handling of numa_domains(1)

2021-11-11 Thread Thomas Schwinge

Hi!

On 2021-10-18T15:03:08+0200, Jakub Jelinek via Gcc-patches 
 wrote:
> On Fri, Oct 15, 2021 at 12:26:34PM -0700, sunil.k.pandey wrote:
>> 4764049dd620affcd3e2658dc7f03a6616370a29 is the first bad commit
>> commit 4764049dd620affcd3e2658dc7f03a6616370a29
>> Author: Jakub Jelinek 
>> Date:   Fri Oct 15 16:25:25 2021 +0200
>>
>> openmp: Fix up handling of OMP_PLACES=threads(1)
>>
>> caused
>>
>> FAIL: libgomp.c/places-10.c execution test
>
> Reproduced on gcc112 in CompileFarm (my ws isn't NUMA).
> If numa-domains is used with num-places count, sometimes the function
> could create more places than requested and crash.  This depended on the
> content of /sys/devices/system/node/online file, e.g. if the file
> contains
> 0-1,16-17
> and all NUMA nodes contain at least one CPU in the cpuset of the program,
> then numa_domains(2) or numa_domains(4) (or 5+) work fine while
> numa_domains(1) or numa_domains(3) misbehave.  I.e. the function was able
> to stop after reaching limit on the , separators (or trivially at the end),
> but not within in the ranges.
>
> Fixed thusly, tested on powerpc64le-linux, committed to trunk.

There appears to be yet another issue: there still are quite a number of
'FAIL: libgomp.c/places-10.c execution test' reports on
.  Also in my testing testing, on a system
where '/sys/devices/system/node/online' contains '0-1', I get a FAIL:

[...]
OPENMP DISPLAY ENVIRONMENT BEGIN
  _OPENMP = '201511'
  OMP_DYNAMIC = 'FALSE'
  OMP_NESTED = 'FALSE'
  OMP_NUM_THREADS = '8'
  OMP_SCHEDULE = 'DYNAMIC'
  OMP_PROC_BIND = 'TRUE'
  OMP_PLACES = '{0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30},{FAIL: 
libgomp.c/places-10.c execution test


Grüße
 Thomas


> 2021-10-18  Jakub Jelinek  
>
>   * config/linux/affinity.c (gomp_affinity_init_numa_domains): Add
>   && gomp_places_list_len < count after nfirst <= nlast loop condition.
>
> --- libgomp/config/linux/affinity.c.jj2021-10-15 16:28:30.374460522 
> +0200
> +++ libgomp/config/linux/affinity.c   2021-10-18 14:44:51.559667127 +0200
> @@ -401,7 +401,7 @@ gomp_affinity_init_numa_domains (unsigne
>   break;
> q = end;
>   }
> -  for (; nfirst <= nlast; nfirst++)
> +  for (; nfirst <= nlast && gomp_places_list_len < count; nfirst++)
>   {
> sprintf (name + prefix_len, "node%lu/cpulist", nfirst);
> f = fopen (name, "r");
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

RE: [PATCH] aarch64: Use type-qualified builtins for UADD[LW][2] Neon intrinsics

2021-11-11 Thread Kyrylo Tkachov via Gcc-patches

Hi Jonathan,

> -Original Message-
> From: Jonathan Wright 
> Sent: Thursday, November 11, 2021 10:18 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Kyrylo Tkachov
> 
> Subject: [PATCH] aarch64: Use type-qualified builtins for UADD[LW][2] Neon
> intrinsics
> 
> Hi,
> 
> This patch declares unsigned type-qualified builtins and uses them to
> implement widening-add Neon intrinsics. This removes the need for
> many casts in arm_neon.h.
> 
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?
> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-11-09  Jonathan Wright  
> 
>   * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
>   qualifiers in generator macros for uadd[lw][2] builtins.
>   * config/aarch64/arm_neon.h (vaddl_s8): Remove unnecessary
>   cast.
>   (vaddl_s16): Likewise.
>   (vaddl_s32): Likewise.
>   (vaddl_u8): Use type-qualified builtin and remove casts.
>   (vaddl_u16): Likewise.
>   (vaddl_u32): Likewise.
>   (vaddl_high_s8): Remove unnecessary cast.
>   (vaddl_high_s16): Likewise.
>   (vaddl_high_s32): Likewise.
>   (vaddl_high_u8): Use type-qualified builtin and remove casts.
>   (vaddl_high_u16): Likewise.
>   (vaddl_high_u32): Likewise.
>   (vaddw_s8): Remove unnecessary cast.
>   (vaddw_s16): Likewise.
>   (vaddw_s32): Likewise.
>   (vaddw_u8): Use type-qualified builtin and remove casts.
>   (vaddw_u16): Likewise.
>   (vaddw_u32): Likewise.
>   (vaddw_high_s8): Remove unnecessary cast.
>   (vaddw_high_s16): Likewise.
>   (vaddw_high_s32): Likewise.
>   (vaddw_high_u8): Use type-qualified builtin and remove casts.
>   (vaddw_high_u16): Likewise.
>   (vaddw_high_u32): Likewise.

Ok.
Thanks,
Kyrill

[committed] libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() values

2021-11-11 Thread Jakub Jelinek via Gcc-patches

Hi!

When thinking about GOMP_teams3, I've realized that using global variables
for the values returned by omp_get_num_teams()/omp_get_team_num() calls
is incorrect even with our right now dumb way of implementing host teams.
The problems are two, one is if host teams is used from multiple pthread_create
created threads - the spec says that host teams can't be nested inside of
explicit parallel or other teams constructs, but with pthread_create the
standard says obviously nothing about it.  Another more important thing
is host fallback, right now we don't do anything for omp_get_num_teams()
or omp_get_team_num() which was fine before host teams was introduced and
the 5.1 requirement that num_teams clause specifies minimum of teams, but
with the global vars it means inside of target teams num_teams (2) we happily
return omp_get_num_teams() == 4 if the target teams is inside of host teams
with num_teams(4).  With target fallback being invoked from parallel
regions global vars simply can't work right on the host.  Both with nowait
target and with synchronous target too, as while doing host fallback from
one thread a different thread could see wrong values.

So, this patch moves them to struct gomp_thread and propagates those for
parallel to child threads.  For host fallback, the implicit zeroing of
*thr results in us returning omp_get_num_teams () == 1 and
omp_get_team_num () == 0 which is fine for target teams without num_teams
clause, for target teams with num_teams clause something to work on and
for target without teams nested in it I've asked on omp-lang what should
be done.

Regtested on x86_64-linux, committed to trunk.

2021-11-11  Jakub Jelinek  

* libgomp.h (struct gomp_thread): Add num_teams and team_num members.
* team.c (struct gomp_thread_start_data): Likewise.
(gomp_thread_start): Initialize thr->num_teams and thr->team_num.
(gomp_team_start): Initialize start_data->num_teams and
start_data->team_num.  Update nthr->num_teams and nthr->team_num.
* teams.c (gomp_num_teams, gomp_team_num): Remove.
(GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num
instead of gomp_num_teams and gomp_team_num.
(omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams.
(omp_get_team_num): Use thr->team_num instead of gomp_team_num.
* testsuite/libgomp.c/teams-4.c: New test.

--- libgomp/libgomp.h.jj2021-10-20 09:34:47.004331626 +0200
+++ libgomp/libgomp.h   2021-11-11 12:44:47.710092897 +0100
@@ -768,6 +768,14 @@ struct gomp_thread
   /* User pthread thread pool */
   struct gomp_thread_pool *thread_pool;
 
+#ifdef LIBGOMP_USE_PTHREADS
+  /* omp_get_num_teams () - 1.  */
+  unsigned int num_teams;
+
+  /* omp_get_team_num ().  */
+  unsigned int team_num;
+#endif
+
 #if defined(LIBGOMP_USE_PTHREADS) \
 && (!defined(HAVE_TLS) \
|| !defined(__GLIBC__) \
--- libgomp/team.c.jj   2021-09-28 11:34:29.380146749 +0200
+++ libgomp/team.c  2021-11-11 12:55:22.524952564 +0100
@@ -56,6 +56,8 @@ struct gomp_thread_start_data
   struct gomp_task *task;
   struct gomp_thread_pool *thread_pool;
   unsigned int place;
+  unsigned int num_teams;
+  unsigned int team_num;
   bool nested;
   pthread_t handle;
 };
@@ -88,6 +90,8 @@ gomp_thread_start (void *xdata)
   thr->ts = data->ts;
   thr->task = data->task;
   thr->place = data->place;
+  thr->num_teams = data->num_teams;
+  thr->team_num = data->team_num;
 #ifdef GOMP_NEEDS_THREAD_HANDLE
   thr->handle = data->handle;
 #endif
@@ -645,6 +649,8 @@ gomp_team_start (void (*fn) (void *), vo
  nthr->ts.single_count = 0;
 #endif
  nthr->ts.static_trip = 0;
+ nthr->num_teams = thr->num_teams;
+ nthr->team_num = thr->team_num;
  nthr->task = &team->implicit_task[i];
  nthr->place = place;
  gomp_init_task (nthr->task, task, icv);
@@ -833,6 +839,8 @@ gomp_team_start (void (*fn) (void *), vo
   start_data->ts.single_count = 0;
 #endif
   start_data->ts.static_trip = 0;
+  start_data->num_teams = thr->num_teams;
+  start_data->team_num = thr->team_num;
   start_data->task = &team->implicit_task[i];
   gomp_init_task (start_data->task, task, icv);
   team->implicit_task[i].icv.nthreads_var = nthreads_var;
--- libgomp/teams.c.jj  2021-10-11 12:20:21.927063104 +0200
+++ libgomp/teams.c 2021-11-11 12:43:58.769797557 +0100
@@ -28,14 +28,12 @@
 #include "libgomp.h"
 #include 
 
-static unsigned gomp_num_teams = 1, gomp_team_num = 0;
-
 void
 GOMP_teams_reg (void (*fn) (void *), void *data, unsigned int num_teams,
unsigned int thread_limit, unsigned int flags)
 {
+  struct gomp_thread *thr = gomp_thread ();
   (void) flags;
-  (void) num_teams;
   unsigned old_thread_limit_var = 0;
   if (thread_limit == 0)
 thread_limit = gomp_teams_thread_limit_var;
@@ -48,11 +46,11 @@ GOMP_teams_reg (void (*fn) (void *), voi
 }
   if (num_teams == 0)
 num_teams

Re: Use modref summary to DSE calls to non-pure functions

2021-11-11 Thread Richard Biener via Gcc-patches

On Thu, Nov 11, 2021 at 1:42 PM Jan Hubicka  wrote:
>
> Hi,
> >
> > No, I think if it turns out useful then we want a way to have such ref
> > represented by an ao_ref.  Note that when we come from a
> > ref tree we know handled-components only will increase offset,
> > only the base MEM_REF can contain a pointer subtraction (but
> > the result of that is the base then).
>
> Yep, that is why I introduced the parm_offset at first place - it can be
> negative or unknown...
> >
> > In what cases does parm_offset_known end up false?  Is that
> > when seeing a POINTER_PLUS_EXPR with unknown offset?
>
> Yep, a typical example is a loop with pointer walking an array .
>
> > So yes, that's a case we cannot capture right now - the only
> > thing that remains is a pointer with a known points-to-set - a
> > similar problem as with the pure call PRE.  You could in theory
> > allocate a scratch SSA name and attach points-to-info
> > to it.  And when the call argument is &decl based then you could set
> > offset to zero.
>
> Hmm, I could try to do this, but possibly incrementally?

You mean handle a &decl argument specially for unknown param offset?
Yeah, I guess so.

> Basically I want to have
>
> foo (&decl)
> decl = {}
>
> To be matched since even if I do not know the offset I know it is dead
> after end of lifetime of the decl.  I am not quite sure PTA will give me
> that?

for this case PTA should tell you the alias is to 'decl' only but then I'm
not sure if stmt_kills_ref_p is up to the task to determine that 'decl = {}',
from a quick look it doesn't.  So indeed the only interesting case will
be a &decl based parameter which we can special-case.

> > > It was my initial plan. However I was not sure how much I would get from
> > > that.
> > >
> > > The function starts with:
> > >
> > >   /* Don't return early on *this_2(D) ={v} {CLOBBER}.  */
> > >   if (gimple_has_volatile_ops (stmt)
> > >   && (!gimple_clobber_p (stmt)
> > >   || TREE_CODE (gimple_assign_lhs (stmt)) != MEM_REF))
> > > return;
> > >
> > >   ao_ref ref;
> > >   if (!initialize_ao_ref_for_dse (stmt, &ref))
> > > return;
> > >
> > > The check about clobber does not apply to calls and then it gives up on
> > > functions not returning aggregates (that is a common case).
> > >
> > > For functions returing aggregates it tries to prove that retval is dead
> > > and replace it.
> > >
> > > I guess I can simply call my analysis from the second return above and
> > > from the code removing dead LHS call instead of doing it from the main
> > > walker and drop the LHS handling?
> >
> > Yeah, something like that.
> OK, I will prepare updated patch, thanks!
>
> Honza
> >
> > Richard.
> >
> > > Thank you,
> > > Honza
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > > +   }
> > > > >   else if (def_operand_p
> > > > >  def_p = single_ssa_def_operand (stmt, 
> > > > > SSA_OP_DEF))
> > > > > {

Re: [PATCH][committed]middle-end: Fix signbit tests when ran on ISA with support for masks.

2021-11-11 Thread Tamar Christina via Gcc-patches

Ah yes that particular test checks the vector code.

I see that the function wasn't vectorized but that the scalar replacement was 
done.

_15 = _4 > 0;

So the test is checking if (-x >> bitsize-1) gets optimized to -(x > 0)

I see that the replacement was made on the scalar correctly so I will modify

The test to check for either the vector is vect_int or the scalar replacement

if not.

Cheers,

Tamar

From: Sandra Loosemore 
Sent: Wednesday, November 10, 2021 8:03 PM
To: Tamar Christina ; gcc-patches@gcc.gnu.org 

Cc: nd ; rguent...@suse.de 
Subject: Re: [PATCH][committed]middle-end: Fix signbit tests when ran on ISA 
with support for masks.

On 11/10/21 11:53 AM, Tamar Christina wrote:
> FAIL: gcc.dg/signbit-2.c scan-tree-dump-times optimized
> "[file://\\s+]\\s+>\\s+{ 0,
> 0, 0, 0 }" 1
>
> That's the old test which this patch has changed. Does it still fail
> with the new patch?

My test results are indeed from a couple days ago.  But, I looked at
your new modifications to this test, and still don't see anything like
the pattern it's looking for, or understand what output you expect to be
happening here.  Is the whole test specific to vector ISAs, and not just
your recent changes to it?  I've attached the .optimized dump I got on
nios2-elf.

-Sandra

Re: Use modref summary to DSE calls to non-pure functions

2021-11-11 Thread Jan Hubicka via Gcc-patches

> > Hmm, I could try to do this, but possibly incrementally?
> 
> You mean handle a &decl argument specially for unknown param offset?
> Yeah, I guess so.

I think it is also pointer that was allocated and is going to be
freed...
> 
> > Basically I want to have
> >
> > foo (&decl)
> > decl = {}
> >
> > To be matched since even if I do not know the offset I know it is dead
> > after end of lifetime of the decl.  I am not quite sure PTA will give me
> > that?
> 
> for this case PTA should tell you the alias is to 'decl' only but then I'm
> not sure if stmt_kills_ref_p is up to the task to determine that 'decl = {}',
> from a quick look it doesn't.  So indeed the only interesting case will
> be a &decl based parameter which we can special-case.

Yep, i do not think it understands this.  I will look into it - I guess
it is common enough to care about.

Honza

Fix noreturn discovery

2021-11-11 Thread Jan Hubicka via Gcc-patches

Hi,
this patch fixes ipa-pure-const handling of noreturn flags.  It is not
safe to set it for interposable symbols and we should also set it for
aliases (just like we do for other flags).  This patch merely copies other
flag handling and implements it here.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

gcc/ChangeLog:

2021-11-11  Jan Hubicka  

* cgraph.c (set_noreturn_flag_1): New function.
(cgraph_node::set_noreturn_flag): New member function
* cgraph.h (cgraph_node::set_noreturn_flags): Declare.
* ipa-pure-const.c (pass_local_pure_const::execute): Use it.

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index c67d300e7a4..466b66d5ba5 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2614,6 +2614,53 @@ cgraph_node::set_malloc_flag (bool malloc_p)
   return changed;
 }
 
+/* Worker to set noreturng flag.  */
+static void
+set_noreturn_flag_1 (cgraph_node *node, bool noreturn_p, bool *changed)
+{
+  if (noreturn_p && !TREE_THIS_VOLATILE (node->decl))
+{
+  TREE_THIS_VOLATILE (node->decl) = true;
+  *changed = true;
+}
+
+  ipa_ref *ref;
+  FOR_EACH_ALIAS (node, ref)
+{
+  cgraph_node *alias = dyn_cast (ref->referring);
+  if (!noreturn_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+   set_noreturn_flag_1 (alias, noreturn_p, changed);
+}
+
+  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
+if (e->caller->thunk
+   && (!noreturn_p || e->caller->get_availability () > AVAIL_INTERPOSABLE))
+  set_noreturn_flag_1 (e->caller, noreturn_p, changed);
+}
+
+/* Set TREE_THIS_VOLATILE on NODE's decl and on NODE's aliases if any.  */
+
+bool
+cgraph_node::set_noreturn_flag (bool noreturn_p)
+{
+  bool changed = false;
+
+  if (!noreturn_p || get_availability () > AVAIL_INTERPOSABLE)
+set_noreturn_flag_1 (this, noreturn_p, &changed);
+  else
+{
+  ipa_ref *ref;
+
+  FOR_EACH_ALIAS (this, ref)
+   {
+ cgraph_node *alias = dyn_cast (ref->referring);
+ if (!noreturn_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+   set_noreturn_flag_1 (alias, noreturn_p, &changed);
+   }
+}
+  return changed;
+}
+
 /* Worker to set_const_flag.  */
 
 static void
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 0a1f7c8960e..e42e305cdb6 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1167,6 +1167,10 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
  if any.  */
   bool set_malloc_flag (bool malloc_p);
 
+  /* SET TREE_THIS_VOLATILE on cgraph_node's decl and on aliases of the node
+ if any.  */
+  bool set_noreturn_flag (bool noreturn_p);
+
   /* If SET_CONST is true, mark function, aliases and thunks to be ECF_CONST.
 If SET_CONST if false, clear the flag.
 
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index 505ed4f8a3b..84a028bcf8e 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -2132,11 +2132,10 @@ pass_local_pure_const::execute (function *fun)
 current_function_name ());
 
   /* Update declaration and reduce profile to executed once.  */
-  TREE_THIS_VOLATILE (current_function_decl) = 1;
+  if (cgraph_node::get (current_function_decl)->set_noreturn_flag (true))
+   changed = true;
   if (node->frequency > NODE_FREQUENCY_EXECUTED_ONCE)
node->frequency = NODE_FREQUENCY_EXECUTED_ONCE;
-
-  changed = true;
 }
 
   switch (l->pure_const_state)

Fix recursion discovery in ipa-pure-const

2021-11-11 Thread Jan Hubicka via Gcc-patches

Hi,
We make self recursive functions as looping of fear of endless recursion.
This is done correctly for local pure/const and for non-trivial SCCs in
callgraph, but for trivial SCCs we miss the flag.

I think it is bad decision since infinite recursion will run out of stack,
but changing it upsets some testcases and should be done independently.
So this patch is fixing current behaviour to be consistent.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  

* ipa-pure-const.c (propagate_pure_const): Self recursion is
a side effects.

diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index 505ed4f8a3b..64777cd2d91 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -1513,6 +1611,9 @@ propagate_pure_const (void)
  enum pure_const_state_e edge_state = IPA_CONST;
  bool edge_looping = false;
 
+ if (e->recursive_p ())
+   looping = true;
+
  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file, "Call to %s",

Re: [PATCH] libgcc: fix backtrace fallback on PowerPC Big-endian. [PR103004]

2021-11-11 Thread Segher Boessenkool

Hi!

On Wed, Nov 10, 2021 at 06:59:23PM -0300, Raphael Moreira Zinsly wrote:
> At the end of the backtrace stream _Unwind_Find_FDE() may not be able
> to find the frame unwind info and will later call the backtrace fallback
> instead of finishing. This occurs when using an old libc on ppc64 due to
> dl_iterate_phdr() not being able to set the fde in the last trace.
> When this occurs the cfa of the trace will be behind of context's cfa.
> Also, libgo’s probestackmaps() calls the backtrace with a null pointer
> and can get to the backchain fallback with the same problem, in this case
> we are only interested in find a stack map, we don't need nor can do a
> backchain.
> _Unwind_ForcedUnwind_Phase2() can hit the same issue as it uses
> uw_frame_state_for(), so we need to treat _URC_NORMAL_STOP.
> 
> libgcc/ChangeLog:
> 
>  * config/rs6000/linux-unwind.h (ppc_backchain_fallback): turn into
>static to fix -Wmissing-prototypes. Check if it's called with a null
>argument or at the end of the backtrace and return.
>  * unwind.inc (_Unwind_ForcedUnwind_Phase2): treat _URC_NORMAL_STOP.

Formatting is messed up.  Lines start with a capital.  Two spaces after
full stop, while you're at it.

> -void ppc_backchain_fallback (struct _Unwind_Context *context, void *a)
> +static void
> +ppc_backchain_fallback (struct _Unwind_Context *context, void *a)

This was already fixed in 75ef0353a2d3.

>  {
>struct frame_layout *current;
>struct trace_arg *arg = a;
>int count;
>  
> -  /* Get the last address computed and start with the next.  */
> +  /* Get the last address computed.  */
>current = context->cfa;

Empty line after here please.  Most of the time if you have a full-line
comment it means a new paragraph is starting.

> +  /* If the trace CFA is not the context CFA the backtrace is done.  */
> +  if (arg == NULL || arg->cfa != current)
> + return;
> +
> +  /* Start with next address.  */
>current = current->backchain;

Like you did here :-)

Do you have a testcase (that failed without this, but now doesn't)?

Looks okay, but please update and resend.


Segher

[PATCH] tree-optimization/103188 - avoid running ranger on not-up-to-date SSA

2021-11-11 Thread Richard Biener via Gcc-patches

The following splits loop header copying into an analysis phase
that uses ranger and a transform phase that can do without to avoid
running ranger on IL that has SSA form not updated.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-11-11  Richard Biener  

PR tree-optimization/103188
* tree-ssa-loop-ch.c (should_duplicate_loop_header_p):
Remove query parameter, split out check for size
optimization.
(ch_base::m_ranger, cb_base::m_query): Remove.
(ch_base::copy_headers): Split processing loop into
analysis around which we allocate and use ranger and
transform where we do not.
(pass_ch::execute): Do not allocate/free ranger here.
(pass_ch_vect::execute): Likewise.

* gcc.dg/torture/pr103188.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr103188.c | 38 +
 gcc/tree-ssa-loop-ch.c  | 72 ++---
 2 files changed, 78 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr103188.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr103188.c 
b/gcc/testsuite/gcc.dg/torture/pr103188.c
new file mode 100644
index 000..0412f6f9b79
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr103188.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+
+int a, b, c, d = 10, e = 1, f, g, h, i;
+int main()
+{
+  int j = -1;
+k:
+  h = c;
+l:
+  c = ~c;
+  if (e)
+  m:
+a = 0;
+  if (j > 1)
+goto m;
+  if (!e)
+goto l;
+  if (c)
+goto p;
+n:
+  goto m;
+o:
+  if (f) {
+if (g)
+  goto k;
+j = 0;
+  p:
+if (d)
+  goto o;
+goto n;
+  }
+  if (i)
+goto l;
+  for (; a < 1; a++)
+while (a > d)
+  b++;
+  return 0;
+}
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index c7d86d751d4..0cee38159fb 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -69,26 +69,12 @@ entry_loop_condition_is_static (class loop *l, 
path_range_query *query)
 
 static bool
 should_duplicate_loop_header_p (basic_block header, class loop *loop,
-   int *limit, path_range_query *query)
+   int *limit)
 {
   gimple_stmt_iterator bsi;
 
   gcc_assert (!header->aux);
 
-  /* Avoid loop header copying when optimizing for size unless we can
- determine that the loop condition is static in the first
- iteration.  */
-  if (optimize_loop_for_size_p (loop)
-  && !loop->force_vectorize
-  && !entry_loop_condition_is_static (loop, query))
-{
-  if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file,
-"  Not duplicating bb %i: optimizing for size.\n",
-header->index);
-  return false;
-}
-
   gcc_assert (EDGE_COUNT (header->succs) > 0);
   if (single_succ_p (header))
 {
@@ -223,8 +209,6 @@ should_duplicate_loop_header_p (basic_block header, class 
loop *loop,
   return false;
 }
 
-  if (dump_file && (dump_flags & TDF_DETAILS))
-fprintf (dump_file, "Will duplicate bb %i\n", header->index); 
   return true;
 }
 
@@ -289,9 +273,6 @@ class ch_base : public gimple_opt_pass
 
   /* Return true to copy headers of LOOP or false to skip.  */
   virtual bool process_loop_p (class loop *loop) = 0;
-
-  gimple_ranger *m_ranger = NULL;
-  path_range_query *m_query = NULL;
 };
 
 const pass_data pass_data_ch =
@@ -386,8 +367,11 @@ ch_base::copy_headers (function *fun)
   copied_bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun));
   bbs_size = n_basic_blocks_for_fn (fun);
 
+  auto_vec candidates;
   auto_vec > copied;
 
+  gimple_ranger *ranger = new gimple_ranger;
+  path_range_query *query = new path_range_query (*ranger, /*resolve=*/true);
   for (auto loop : loops_list (cfun, 0))
 {
   int initial_limit = param_max_loop_header_insns;
@@ -406,6 +390,37 @@ ch_base::copy_headers (function *fun)
  || !process_loop_p (loop))
continue;
 
+  /* Avoid loop header copying when optimizing for size unless we can
+determine that the loop condition is static in the first
+iteration.  */
+  if (optimize_loop_for_size_p (loop)
+ && !loop->force_vectorize
+ && !entry_loop_condition_is_static (loop, query))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"  Not duplicating bb %i: optimizing for size.\n",
+header->index);
+ continue;
+   }
+
+  if (should_duplicate_loop_header_p (header, loop, &remaining_limit))
+   candidates.safe_push (loop);
+}
+  /* Do not use ranger after we change the IL and not have updated SSA.  */
+  delete query;
+  delete ranger;
+
+  for (auto loop : candidates)
+{
+  int initial_limit = param_max_loop_header_insns;
+  int remaining_limit = initial_limit;
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Copying headers of loop

[PATCH v1 1/8] bswap: synthesize HImode bswap from SImode or DImode

2021-11-11 Thread Philipp Tomsich

The RISC-V Zbb extension adds an XLEN (i.e. SImode for rv32, DImode
for rv64) bswap instruction (rev8).  While, with the current master,
SImode is synthesized correctly from DImode, HImode is not.

This change adds an appropriate expansion for a HImode bswap, if a
wider bswap is available.

Without this change, the following rv64gc_zbb code is generated for
__builtin_bswap16():
slliw   a5,a0,8
zext.h  a0,a0
srliw   a0,a0,8
or  a0,a5,a0
sext.h  a0,a0  // this is a 16bit sign-extension following
   // the byteswap (e.g. on a 'short' function
   // return).

After this change, a bswap (rev8) is used and any extensions are
combined into the shift-right:
rev8a0,a0
sraia0,a0,48   // the sign-extension is combined into the
   // shift; a srli is emitted otherwise...

gcc/ChangeLog:

* optabs.c (expand_unop): support expanding a HImode bswap
  using SImode or DImode, followed by a shift.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-bswap.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/optabs.c   |  6 ++
 gcc/testsuite/gcc.target/riscv/zbb-bswap.c | 22 ++
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-bswap.c

diff --git a/gcc/optabs.c b/gcc/optabs.c
index 019bbb62882..7a3ffbe4525 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -3307,6 +3307,12 @@ expand_unop (machine_mode mode, optab unoptab, rtx op0, 
rtx target,
return temp;
}
 
+ /* If we are missing a HImode BSWAP, but have one for SImode or
+DImode, use a BSWAP followed by a SHIFT.  */
+ temp = widen_bswap (as_a  (mode), op0, target);
+ if (temp)
+   return temp;
+
  last = get_last_insn ();
 
  temp1 = expand_binop (mode, ashl_optab, op0,
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-bswap.c 
b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c
new file mode 100644
index 000..6ee27d9f47a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
+
+unsigned long
+func64 (unsigned long i)
+{
+  return __builtin_bswap64(i);
+}
+
+unsigned int
+func32 (unsigned int i)
+{
+  return __builtin_bswap32(i);
+}
+
+unsigned short
+func16 (unsigned short i)
+{
+  return __builtin_bswap16(i);
+}
+
+/* { dg-final { scan-assembler-times "rev8" 3 } } */
-- 
2.32.0

[PATCH v1 0/8] Improvements to bitmanip-1.0 (Zb[abcs]) support

2021-11-11 Thread Philipp Tomsich



This series provides assorted improvements for the RISC-V Zb[abcs]
support collected over the last year and a half and forward-ported to
the recently merged upstream support for the Zb[abcs] extensions.

Improvements include:
 - synthesis of HImode bswap from SImode/DImode rev8
 - cost-model change to support shift-and-add (sh[123]add) in the
   strength-reduction of multiplication operations
 - support for constant-loading of (1ULL << 31) on RV64 using bseti
 - generating a polarity-reversed mask from a bit-test
 - adds orc.b as UNSPEC
 - improves min/minu/max/maxu patterns to suppress redundant extensions


Philipp Tomsich (8):
  bswap: synthesize HImode bswap from SImode or DImode
  RISC-V: costs: handle BSWAP
  RISC-V: costs: support shift-and-add in strength-reduction
  RISC-V: bitmanip: fix constant-loading for (1ULL << 31) in DImode
  RISC-V: bitmanip: improvements to rotate instructions
  RISC-V: bitmanip: add splitter to use bexti for "(a & (1 << BIT_NO)) ?
0 : -1"
  RISC-V: bitmanip: add orc.b as an unspec
  RISC-V: bitmanip: relax minmax to operate on GPR

 gcc/config/riscv/bitmanip.md | 74 +---
 gcc/config/riscv/riscv.c | 31 
 gcc/config/riscv/riscv.h | 11 ++-
 gcc/config/riscv/riscv.md|  3 +
 gcc/optabs.c |  6 ++
 gcc/testsuite/gcc.target/riscv/zbb-bswap.c   | 22 ++
 gcc/testsuite/gcc.target/riscv/zbb-min-max.c | 20 +-
 gcc/testsuite/gcc.target/riscv/zbs-bexti.c   | 14 
 8 files changed, 162 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-bswap.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bexti.c

-- 
2.32.0

[PATCH v1 2/8] RISC-V: costs: handle BSWAP

2021-11-11 Thread Philipp Tomsich

The BSWAP operation is not handled in rtx_costs. Add it.

gcc/ChangeLog:

* config/riscv/riscv.c (rtx_costs): Add BSWAP.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index c77b0322869..8480cf09294 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -2131,6 +2131,14 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   *total = riscv_extend_cost (XEXP (x, 0), GET_CODE (x) == ZERO_EXTEND);
   return false;
 
+case BSWAP:
+  if (TARGET_ZBB)
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
+  return false;
+
 case FLOAT:
 case UNSIGNED_FLOAT:
 case FIX:
-- 
2.32.0

[PATCH v1 3/8] RISC-V: costs: support shift-and-add in strength-reduction

2021-11-11 Thread Philipp Tomsich

The strength-reduction implementation in expmed.c will assess the
profitability of using shift-and-add using a RTL expression that wraps
a MULT (with a power-of-2) in a PLUS.  Unless the RISC-V rtx_costs
function recognizes this as expressing a sh[123]add instruction, we
will return an inflated cost, thus defeating the optimization.

This change adds the necessary idiom recognition to provide an
accurate cost for this for of expressing sh[123]add.

Instead on expanding to
li  a5,200
mulwa0,a5,a0
with this change, the expression 'a * 200' is sythesized as:
sh2add  a0,a0,a0   // *5 = a + 4 * a
sh2add  a0,a0,a0   // *5 = a + 4 * a
sllia0,a0,3// *8

gcc/ChangeLog:

* config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd,
if expressed as a plus and multiplication with a power-of-2.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 8480cf09294..dff4e370471 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -2020,6 +2020,20 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  *total = COSTS_N_INSNS (1);
  return true;
}
+  /* Before strength-reduction, the shNadd can be expressed as the addition
+of a multiplication with a power-of-two.  If this case is not handled,
+the strength-reduction in expmed.c will calculate an inflated cost. */
+  if (TARGET_ZBA
+ && ((!TARGET_64BIT && (mode == SImode)) ||
+ (TARGET_64BIT && (mode == DImode)))
+ && (GET_CODE (XEXP (x, 0)) == MULT)
+ && REG_P (XEXP (XEXP (x, 0), 0))
+ && CONST_INT_P (XEXP (XEXP (x, 0), 1))
+ && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3))
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
   /* shNadd.uw pattern for zba.
 [(set (match_operand:DI 0 "register_operand" "=r")
   (plus:DI
-- 
2.32.0

[PATCH v1 4/8] RISC-V: bitmanip: fix constant-loading for (1ULL << 31) in DImode

2021-11-11 Thread Philipp Tomsich

The SINGLE_BIT_MASK_OPERAND() is overly restrictive, triggering for
bits above 31 only (to side-step any issues with the negative SImode
value 0x8000).  This moves the special handling of this SImode
value (i.e. the check for -2147483648) to riscv.c and relaxes the
SINGLE_BIT_MASK_OPERAND() test.

This changes the code-generation for loading (1ULL << 31) from:
li  a0,1
sllia0,a0,31
to:
bseti   a0,zero,31

gcc/ChangeLog:

* config/riscv/riscv.c (riscv_build_integer_1): Rewrite value as
-2147483648 for the single-bit case, when operating on 0x8000
in SImode.
* gcc/config/riscv/riscv.h (SINGLE_BIT_MASK_OPERAND): Allow for
any single-bit value, moving the special case for 0x8000 to
riscv_build_integer_1 (in riscv.c).

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv.c |  9 +
 gcc/config/riscv/riscv.h | 11 ---
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index dff4e370471..4c30d4e521d 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -415,6 +415,15 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
   /* Simply BSETI.  */
   codes[0].code = UNKNOWN;
   codes[0].value = value;
+
+  /* RISC-V sign-extends all 32bit values that life in a 32bit
+register.  To avoid paradoxes, we thus need to use the
+sign-extended (negative) representation for the value, if we
+want to build 0x8000 in SImode.  This will then expand
+to an ADDI/LI instruction.  */
+  if (mode == SImode && value == 0x8000)
+   codes[0].value = -2147483648;
+
   return 1;
 }
 
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 64287124735..abb121ddbea 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -526,13 +526,10 @@ enum reg_class
   (((VALUE) | ((1UL<<31) - IMM_REACH)) == ((1UL<<31) - IMM_REACH)  \
|| ((VALUE) | ((1UL<<31) - IMM_REACH)) + IMM_REACH == 0)
 
-/* If this is a single bit mask, then we can load it with bseti.  But this
-   is not useful for any of the low 31 bits because we can use addi or lui
-   to load them.  It is wrong for loading SImode 0x8000 on rv64 because it
-   needs to be sign-extended.  So we restrict this to the upper 32-bits
-   only.  */
-#define SINGLE_BIT_MASK_OPERAND(VALUE) \
-  (pow2p_hwi (VALUE) && (ctz_hwi (VALUE) >= 32))
+/* If this is a single bit mask, then we can load it with bseti.  Special
+   handling of SImode 0x8000 on RV64 is done in riscv_build_integer_1. */
+#define SINGLE_BIT_MASK_OPERAND(VALUE) \
+  (pow2p_hwi (VALUE))
 
 /* Stack layout; function entry, exit and calling.  */
 
-- 
2.32.0

[PATCH v1 5/8] RISC-V: bitmanip: improvements to rotate instructions

2021-11-11 Thread Philipp Tomsich

This change improves rotate instructions (motivated by a review of the
code generated for OpenSSL): rotate-left by a constant are synthesized
using a rotate-right-immediate to avoid putting the shift-amount into
a temporary; to do so, we allow either a register or an immediate for
the expansion of rotl3 and then check if the shift-amount is a
constant.

Without these changes, the function
unsigned int f(unsigned int a)
{
  return (a << 2) | (a >> 30);
}
turns into
li  a5,2
rolwa0,a0,a5
while these changes give us:
roriw   a0,a0,30

gcc/ChangeLog:

* config/riscv/bitmanip.md (rotlsi3, rotldi3, rotlsi3_sext):
Synthesize rotate-left-by-immediate from a rotate-right insn.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/bitmanip.md | 39 ++--
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 59779b48f27..178d1ca0e4b 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -204,25 +204,52 @@ (define_insn "rotrsi3_sext"
 (define_insn "rotlsi3"
   [(set (match_operand:SI 0 "register_operand" "=r")
(rotate:SI (match_operand:SI 1 "register_operand" "r")
-  (match_operand:QI 2 "register_operand" "r")))]
+  (match_operand:QI 2 "arith_operand" "rI")))]
   "TARGET_ZBB"
-  { return TARGET_64BIT ? "rolw\t%0,%1,%2" : "rol\t%0,%1,%2"; }
+  {
+/* If the rotate-amount is constant, let's synthesize using a
+   rotate-right-immediate instead of using a temporary. */
+
+if (CONST_INT_P(operands[2])) {
+  operands[2] = GEN_INT(32 - INTVAL(operands[2]));
+  return TARGET_64BIT ? "roriw\t%0,%1,%2" : "rori\t%0,%1,%2";
+}
+
+return TARGET_64BIT ? "rolw\t%0,%1,%2" : "rol\t%0,%1,%2";
+  }
   [(set_attr "type" "bitmanip")])
 
 (define_insn "rotldi3"
   [(set (match_operand:DI 0 "register_operand" "=r")
(rotate:DI (match_operand:DI 1 "register_operand" "r")
-  (match_operand:QI 2 "register_operand" "r")))]
+  (match_operand:QI 2 "arith_operand" "rI")))]
   "TARGET_64BIT && TARGET_ZBB"
-  "rol\t%0,%1,%2"
+  {
+if (CONST_INT_P(operands[2])) {
+  operands[2] = GEN_INT(64 - INTVAL(operands[2]));
+  return "rori\t%0,%1,%2";
+}
+
+return "rol\t%0,%1,%2";
+  }
   [(set_attr "type" "bitmanip")])
 
+;; Until we have improved REE to understand that sign-extending the result of
+;; an implicitly sign-extending operation is redundant, we need an additional
+;; pattern to gobble up the redundant sign-extension.
 (define_insn "rotlsi3_sext"
   [(set (match_operand:DI 0 "register_operand" "=r")
(sign_extend:DI (rotate:SI (match_operand:SI 1 "register_operand" "r")
-  (match_operand:QI 2 "register_operand" 
"r"]
+  (match_operand:QI 2 "arith_operand" "rI"]
   "TARGET_64BIT && TARGET_ZBB"
-  "rolw\t%0,%1,%2"
+  {
+if (CONST_INT_P(operands[2])) {
+  operands[2] = GEN_INT(32 - INTVAL(operands[2]));
+  return "roriw\t%0,%1,%2";
+}
+
+return "rolw\t%0,%1,%2";
+  }
   [(set_attr "type" "bitmanip")])
 
 (define_insn "bswap2"
-- 
2.32.0

[PATCH v1 6/8] RISC-V: bitmanip: add splitter to use bexti for "(a & (1 << BIT_NO)) ? 0 : -1"

2021-11-11 Thread Philipp Tomsich

Consider creating a polarity-reversed mask from a set-bit (i.e., if
the bit is set, produce all-ones; otherwise: all-zeros).  Using Zbb,
this can be expressed as bexti, followed by an addi of minus-one.  To
enable the combiner to discover this opportunity, we need to split the
canonical expression for "(a & (1 << BIT_NO)) ? 0 : -1" into a form
combinable into bexti.

Consider the function:
long f(long a)
{
  return (a & (1 << BIT_NO)) ? 0 : -1;
}
This produces the following sequence prior to this change:
andia0,a0,16
seqza0,a0
neg a0,a0
ret
Following this change, it results in:
bexti   a0,a0,4
addia0,a0,-1
ret

gcc/ChangeLog:

* config/riscv/bitmanip.md: Add a splitter to generate
  polarity-reversed masks from a set bit using bexti + addi.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bexti.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/bitmanip.md   | 13 +
 gcc/testsuite/gcc.target/riscv/zbs-bexti.c | 14 ++
 2 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bexti.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 178d1ca0e4b..9e10280e306 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -367,3 +367,16 @@ (define_insn "*bexti"
   "TARGET_ZBS"
   "bexti\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
+
+;; We can create a polarity-reversed mask (i.e. bit N -> { set = 0, clear = -1 
})
+;; using a bext(i) followed by an addi instruction.
+;; This splits the canonical representation of "(a & (1 << BIT_NO)) ? 0 : -1".
+(define_split
+  [(set (match_operand:GPR 0 "register_operand")
+   (neg:GPR (eq:GPR (zero_extract:GPR (match_operand:GPR 1 
"register_operand")
+  (const_int 1)
+  (match_operand 2))
+(const_int 0]
+  "TARGET_ZBB"
+  [(set (match_dup 0) (zero_extract:GPR (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (plus:GPR (match_dup 0) (const_int -1)))])
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
new file mode 100644
index 000..d02c3f7a98d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64 -O2" } */
+
+/* bexti */
+#define BIT_NO  27
+
+long
+foo0 (long a)
+{
+  return (a & (1 << BIT_NO)) ? 0 : -1;
+}
+
+/* { dg-final { scan-assembler "bexti" } } */
+/* { dg-final { scan-assembler "addi" } } */
-- 
2.32.0

[PATCH v1 7/8] RISC-V: bitmanip: add orc.b as an unspec

2021-11-11 Thread Philipp Tomsich

As a basis for optimized string functions (e.g., the by-pieces
implementations), we need orc.b available.  This adds orc.b as an
unspec, so we can expand to it.

gcc/ChangeLog:

* config/riscv/bitmanip.md (orcb2): Add orc.b as an unspec.
* config/riscv/riscv.md: Add UNSPEC_ORC_B.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/bitmanip.md | 8 
 gcc/config/riscv/riscv.md| 3 +++
 2 files changed, 11 insertions(+)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 9e10280e306..000deb48b16 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -267,6 +267,14 @@ (define_insn "3"
   "\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; orc.b (or-combine) is added as an unspec for the benefit of the support
+;; for optimized string functions (such as strcmp).
+(define_insn "orcb2"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (unspec:X [(match_operand:X 1 "register_operand")] UNSPEC_ORC_B))]
+  "TARGET_ZBB"
+  "orc.b\t%0,%1")
+
 ;; ZBS extension.
 
 (define_insn "*bset"
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 225e5b259c1..7a2501ec7a9 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -45,6 +45,9 @@ (define_c_enum "unspec" [
 
   ;; Stack tie
   UNSPEC_TIE
+
+  ;; Zbb OR-combine instruction
+  UNSPEC_ORC_B
 ])
 
 (define_c_enum "unspecv" [
-- 
2.32.0

[PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR

2021-11-11 Thread Philipp Tomsich

While min/minu/max/maxu instructions are provided for XLEN only, these
can safely operate on GPRs (i.e. SImode or DImode for RV64): SImode is
always sign-extended, which ensures that the XLEN-wide instructions
can be used for signed and unsigned comparisons on SImode yielding a
correct ordering of value.

This commit
 - relaxes the minmax pattern to express for GPR (instead of X only),
   providing both a si3 and di3 expansion on RV64
 - adds a sign-extending form for thee si3 pattern for RV64 to all REE
   to eliminate redundant extensions
 - adds test-cases for both

gcc/ChangeLog:

* config/riscv/bitmanip.md: Relax minmax to GPR (i.e SImode or
  DImode) on RV64.
* config/riscv/bitmanip.md (si3_sext): Add
  pattern for REE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-min-max.c: Add testcases for SImode
  operands checking that no redundant sign- or zero-extensions
  are emitted.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/bitmanip.md | 14 +++---
 gcc/testsuite/gcc.target/riscv/zbb-min-max.c | 20 +---
 2 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 000deb48b16..2a28f78f5f6 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -260,13 +260,21 @@ (define_insn "bswap2"
   [(set_attr "type" "bitmanip")])
 
 (define_insn "3"
-  [(set (match_operand:X 0 "register_operand" "=r")
-(bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
-  (match_operand:X 2 "register_operand" "r")))]
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+(bitmanip_minmax:GPR (match_operand:GPR 1 "register_operand" "r")
+(match_operand:GPR 2 "register_operand" "r")))]
   "TARGET_ZBB"
   "\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+(define_insn "si3_sext"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(sign_extend:DI (bitmanip_minmax:SI (match_operand:SI 1 
"register_operand" "r")
+(match_operand:SI 2 "register_operand" "r"]
+  "TARGET_64BIT && TARGET_ZBB"
+  "\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
+
 ;; orc.b (or-combine) is added as an unspec for the benefit of the support
 ;; for optimized string functions (such as strcmp).
 (define_insn "orcb2"
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c 
b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
index f44c398ea08..7169e873551 100644
--- a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
+++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
+/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64 -O2" } */
 
 long
 foo1 (long i, long j)
@@ -25,7 +25,21 @@ foo4 (unsigned long i, unsigned long j)
   return i > j ? i : j;
 }
 
+unsigned int
+foo5(unsigned int a, unsigned int b)
+{
+  return a > b ? a : b;
+}
+
+int
+foo6(int a, int b)
+{
+  return a > b ? a : b;
+}
+
 /* { dg-final { scan-assembler-times "min" 3 } } */
-/* { dg-final { scan-assembler-times "max" 3 } } */
+/* { dg-final { scan-assembler-times "max" 4 } } */
 /* { dg-final { scan-assembler-times "minu" 1 } } */
-/* { dg-final { scan-assembler-times "maxu" 1 } } */
+/* { dg-final { scan-assembler-times "maxu" 3 } } */
+/* { dg-final { scan-assembler-not "zext.w" } } */
+/* { dg-final { scan-assembler-not "sext.w" } } */
-- 
2.32.0

Re: [PATCH] libgcc: fix backtrace fallback on PowerPC Big-endian. [PR103004]

2021-11-11 Thread Raphael M Zinsly via Gcc-patches


Hi Segher,

On 11/11/2021 10:43, Segher Boessenkool wrote:

Hi!

On Wed, Nov 10, 2021 at 06:59:23PM -0300, Raphael Moreira Zinsly wrote:

At the end of the backtrace stream _Unwind_Find_FDE() may not be able
to find the frame unwind info and will later call the backtrace fallback
instead of finishing. This occurs when using an old libc on ppc64 due to
dl_iterate_phdr() not being able to set the fde in the last trace.
When this occurs the cfa of the trace will be behind of context's cfa.
Also, libgo’s probestackmaps() calls the backtrace with a null pointer
and can get to the backchain fallback with the same problem, in this case
we are only interested in find a stack map, we don't need nor can do a
backchain.
_Unwind_ForcedUnwind_Phase2() can hit the same issue as it uses
uw_frame_state_for(), so we need to treat _URC_NORMAL_STOP.

libgcc/ChangeLog:

  * config/rs6000/linux-unwind.h (ppc_backchain_fallback): turn into
 static to fix -Wmissing-prototypes. Check if it's called with a null
 argument or at the end of the backtrace and return.
  * unwind.inc (_Unwind_ForcedUnwind_Phase2): treat _URC_NORMAL_STOP.


Formatting is messed up.  Lines start with a capital.  Two spaces after
full stop, while you're at it.



Ok.


-void ppc_backchain_fallback (struct _Unwind_Context *context, void *a)
+static void
+ppc_backchain_fallback (struct _Unwind_Context *context, void *a)


This was already fixed in 75ef0353a2d3.


Ops, missed that.




  {
struct frame_layout *current;
struct trace_arg *arg = a;
int count;
  
-  /* Get the last address computed and start with the next.  */

+  /* Get the last address computed.  */
current = context->cfa;


Empty line after here please.  Most of the time if you have a full-line
comment it means a new paragraph is starting.



Ok.


+  /* If the trace CFA is not the context CFA the backtrace is done.  */
+  if (arg == NULL || arg->cfa != current)
+   return;
+
+  /* Start with next address.  */
current = current->backchain;


Like you did here :-)

Do you have a testcase (that failed without this, but now doesn't)?



I don't have a simple testcase for that, but many of the asan and go 
tests catch that.



Looks okay, but please update and resend.


Segher



Thanks,
--
Raphael Moreira Zinsly

Re: Fix recursion discovery in ipa-pure-const

2021-11-11 Thread Richard Biener via Gcc-patches

On Thu, Nov 11, 2021 at 2:41 PM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> We make self recursive functions as looping of fear of endless recursion.
> This is done correctly for local pure/const and for non-trivial SCCs in
> callgraph, but for trivial SCCs we miss the flag.
>
> I think it is bad decision since infinite recursion will run out of stack,

Note it might not always in case we can eliminate the tail-recursion or avoid
stack use by the recursion by other means.  So I think it is conservatively
correct.

Richard.

> but changing it upsets some testcases and should be done independently.
> So this patch is fixing current behaviour to be consistent.
>
> Bootstrapped/regtested x86_64-linux, comitted.
>
> gcc/ChangeLog:
>
> 2021-11-11  Jan Hubicka  
>
> * ipa-pure-const.c (propagate_pure_const): Self recursion is
> a side effects.
>
> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
> index 505ed4f8a3b..64777cd2d91 100644
> --- a/gcc/ipa-pure-const.c
> +++ b/gcc/ipa-pure-const.c
> @@ -1513,6 +1611,9 @@ propagate_pure_const (void)
>   enum pure_const_state_e edge_state = IPA_CONST;
>   bool edge_looping = false;
>
> + if (e->recursive_p ())
> +   looping = true;
> +
>   if (dump_file && (dump_flags & TDF_DETAILS))
> {
>   fprintf (dump_file, "Call to %s",

[COMMITTED] Move import population from threader to path solver.

2021-11-11 Thread Aldy Hernandez via Gcc-patches

Imports are our nomenclature for external SSA names to a block that
are used to calculate the outgoing edges for said block.  For example,
in the following snippet:

 :
_1 = b_10 == block_11;
_2 = b_10 != -1;
_3 = _1 & _2;
if (_3 != 0)
  goto ; [INV]
else
  goto ; [INV]

...the imports to the block are b_10 and block_11 since they are both
needed to calculate _3.

The path solver takes a bitmap of imports in addition to the path
itself.  This sets up the number of SSA names to be on the lookout
for, while resolving the final conditional.

Calculating these imports was initially done in the threader, since it
was the only user of the path solver.  With new clients, it has become
obvious that populating the imports should be a task for the path
solver, so it can be shared among the clients.

This patch moves the import code to the solver, making both the solver
and the threader simpler in the process.  This is because intent is
clearer and some duplicate code was removed.

This reshuffling had the net effect of giving us a handful of new
threads through my suite of .ii files (125).  This was unexpected, but
welcome nevertheless.  There is no performance difference in callgrind
over the same suite.

Regstrapped on x86-64 Linux.

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::add_copies_to_imports):
Rename to...
(path_range_query::compute_imports): ...this.  Adapt it so it can
be passed the imports bitmap instead of working on m_imports.
(path_range_query::compute_ranges): Call compute_imports in all
cases unless an imports bitmap is passed.
* gimple-range-path.h (path_range_query::compute_imports): New.
(path_range_query::add_copies_to_imports): Remove.
* tree-ssa-threadbackward.c (back_threader::resolve_def): Remove.
(back_threader::find_paths_to_names): Inline resolve_def.
(back_threader::find_paths): Call compute_imports.
(back_threader::resolve_phi): Adjust comment.
---
 gcc/gimple-range-path.cc  | 45 -
 gcc/gimple-range-path.h   |  2 +-
 gcc/tree-ssa-threadbackward.c | 47 ++-
 3 files changed, 30 insertions(+), 64 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 6da01c7067f..4843c133e62 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -439,26 +439,32 @@ path_range_query::add_to_imports (tree name, bitmap 
imports)
   return false;
 }
 
-// Add the copies of any SSA names in IMPORTS to IMPORTS.
+// Compute the imports to the path ending in EXIT.  These are
+// essentially the SSA names used to calculate the final conditional
+// along the path.
 //
-// These are hints for the solver.  Adding more elements (within
-// reason) doesn't slow us down, because we don't solve anything that
-// doesn't appear in the path.  On the other hand, not having enough
-// imports will limit what we can solve.
+// They are hints for the solver.  Adding more elements doesn't slow
+// us down, because we don't solve anything that doesn't appear in the
+// path.  On the other hand, not having enough imports will limit what
+// we can solve.
 
 void
-path_range_query::add_copies_to_imports ()
+path_range_query::compute_imports (bitmap imports, basic_block exit)
 {
-  auto_vec worklist (bitmap_count_bits (m_imports));
+  // Start with the imports from the exit block...
+  bitmap r_imports = m_ranger.gori ().imports (exit);
+  bitmap_copy (imports, r_imports);
+
+  auto_vec worklist (bitmap_count_bits (imports));
   bitmap_iterator bi;
   unsigned i;
-
-  EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
+  EXECUTE_IF_SET_IN_BITMAP (imports, 0, i, bi)
 {
   tree name = ssa_name (i);
   worklist.quick_push (name);
 }
 
+  // ...and add any operands used to define these imports.
   while (!worklist.is_empty ())
 {
   tree name = worklist.pop ();
@@ -466,15 +472,12 @@ path_range_query::add_copies_to_imports ()
 
   if (is_gimple_assign (def_stmt))
{
- // ?? Adding assignment copies doesn't get us much.  At the
- // time of writing, we got 63 more threaded paths across the
- // .ii files from a bootstrap.
- add_to_imports (gimple_assign_rhs1 (def_stmt), m_imports);
+ add_to_imports (gimple_assign_rhs1 (def_stmt), imports);
  tree rhs = gimple_assign_rhs2 (def_stmt);
- if (rhs && add_to_imports (rhs, m_imports))
+ if (rhs && add_to_imports (rhs, imports))
worklist.safe_push (rhs);
  rhs = gimple_assign_rhs3 (def_stmt);
- if (rhs && add_to_imports (rhs, m_imports))
+ if (rhs && add_to_imports (rhs, imports))
worklist.safe_push (rhs);
}
   else if (gphi *phi = dyn_cast  (def_stmt))
@@ -486,7 +489,7 @@ path_range_query::add_copies_to_imports ()
 
  if (TREE_CODE (arg) == SSA_NAME
  && m_pat

[PATCH v2] libgcc: fix backtrace fallback on PowerPC Big-endian. [PR103004]

2021-11-11 Thread Raphael Moreira Zinsly via Gcc-patches

Changes since v1:
- Removed -Wmissing-prototypes fix.
- Fixed formatting of Changelog and patch.

--->8---

At the end of the backtrace stream _Unwind_Find_FDE() may not be able
to find the frame unwind info and will later call the backtrace fallback
instead of finishing. This occurs when using an old libc on ppc64 due to
dl_iterate_phdr() not being able to set the fde in the last trace.
When this occurs the cfa of the trace will be behind of context's cfa.
Also, libgo’s probestackmaps() calls the backtrace with a null pointer
and can get to the backchain fallback with the same problem, in this case
we are only interested in find a stack map, we don't need nor can do a
backchain.
_Unwind_ForcedUnwind_Phase2() can hit the same issue as it uses
uw_frame_state_for(), so we need to treat _URC_NORMAL_STOP.

libgcc/ChangeLog:

 * config/rs6000/linux-unwind.h (ppc_backchain_fallback): Check if it's
 called with a null argument or at the end of the backtrace and return.
 * unwind.inc (_Unwind_ForcedUnwind_Phase2): Treat _URC_NORMAL_STOP.
---
 libgcc/config/rs6000/linux-unwind.h | 8 +++-
 libgcc/unwind.inc   | 5 +++--
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/libgcc/config/rs6000/linux-unwind.h 
b/libgcc/config/rs6000/linux-unwind.h
index 8deccc1d650..ad1ab286a2f 100644
--- a/libgcc/config/rs6000/linux-unwind.h
+++ b/libgcc/config/rs6000/linux-unwind.h
@@ -401,8 +401,14 @@ void ppc_backchain_fallback (struct _Unwind_Context 
*context, void *a)
   struct trace_arg *arg = a;
   int count;
 
-  /* Get the last address computed and start with the next.  */
+  /* Get the last address computed.  */
   current = context->cfa;
+
+  /* If the trace CFA is not the context CFA the backtrace is done.  */
+  if (arg == NULL || arg->cfa != current)
+   return;
+
+  /* Start with next address.  */
   current = current->backchain;
 
   for (count = arg->count; current != NULL; current = current->backchain)
diff --git a/libgcc/unwind.inc b/libgcc/unwind.inc
index 456a5ee682f..dc2f9c13e97 100644
--- a/libgcc/unwind.inc
+++ b/libgcc/unwind.inc
@@ -160,12 +160,13 @@ _Unwind_ForcedUnwind_Phase2 (struct _Unwind_Exception 
*exc,
 
   /* Set up fs to describe the FDE for the caller of cur_context.  */
   code = uw_frame_state_for (context, &fs);
-  if (code != _URC_NO_REASON && code != _URC_END_OF_STACK)
+  if (code != _URC_NO_REASON && code != _URC_END_OF_STACK
+ && code != _URC_NORMAL_STOP)
return _URC_FATAL_PHASE2_ERROR;
 
   /* Unwind successful.  */
   action = _UA_FORCE_UNWIND | _UA_CLEANUP_PHASE;
-  if (code == _URC_END_OF_STACK)
+  if (code == _URC_END_OF_STACK || code == _URC_NORMAL_STOP)
action |= _UA_END_OF_STACK;
   stop_code = (*stop) (1, action, exc->exception_class, exc,
   context, stop_argument);
-- 
2.31.1

Re: Fix recursion discovery in ipa-pure-const

2021-11-11 Thread Jan Hubicka via Gcc-patches

> On Thu, Nov 11, 2021 at 2:41 PM Jan Hubicka via Gcc-patches
>  wrote:
> >
> > Hi,
> > We make self recursive functions as looping of fear of endless recursion.
> > This is done correctly for local pure/const and for non-trivial SCCs in
> > callgraph, but for trivial SCCs we miss the flag.
> >
> > I think it is bad decision since infinite recursion will run out of stack,
> 
> Note it might not always in case we can eliminate the tail-recursion or avoid
> stack use by the recursion by other means.  So I think it is conservatively
> correct.

I don't know.  If function is pure and has infinite recursion in it it
means that it can only run forever without side effects if it gets lucky
and we tail-recurse it.  There are no other means avoid the stack use from
growing.

First i think code relying on tail-recurse optimization to not run out
of stack is not strictly valid in C/C++ other languages we care.
Also in C++ there is the forced progression which makes even the tail
optiimzed code invalid.

I think in high level code such recursive accessors used for no good
reason are not that infrequent.  Also we had this bug in tree probably
forever since LOOPING_PURE_CONST was added and no one complained ;)

Relaxing this rule breaks some testcases, but odd ones - they are
infinitely self-recursive builtin implementations where we then both
prove function as noreturn & later optimize builtin to constant
so the assembly matching does not see expected thing.

Honza

[committed] Testsuite: Various fixes for nios2.

2021-11-11 Thread Sandra Loosemore

I've pushed the attached patch to clean up some test failures I've seen 
on nios2-elf.  This target defaults to -fno-delete-null-pointer-checks 
so any optimization tests that depend on assumptions that valid pointers 
are non-zero have to be marked explicitly.  The others ought to be 
obvious, except perhaps struct-by-value-1.c which was giving a link 
error about overflowing the small data region without -G0.


My last set of test results were pretty messy but I think almost all of 
the problems are not nios2-specific (e.g., PR103166, PR103163).  I think 
it is better to wait until we're into stage 3 and the churn settles down 
some before I make another pass to triage remaining nios2-specific 
problems, but I might as well check in what I have now instead of 
sitting on it.


-Sandra
commit eb43f1a95d1d7a0f88a8107d860e5343507554dd
Author: Sandra Loosemore 
Date:   Thu Nov 11 06:31:02 2021 -0800

Testsuite:  Various fixes for nios2.

2021-11-11  Sandra Loosemore  

	gcc/testsuite/
	* g++.dg/warn/Wmismatched-new-delete-5.C: Add
	-fdelete-null-pointer-checks.
	* gcc.dg/attr-returns-nonnull.c: Likewise.
	* gcc.dg/debug/btf/btf-datasec-1.c: Add -G0 option for nios2.
	* gcc.dg/ifcvt-4.c: Skip on nios2.
	* gcc.dg/struct-by-value-1.c: Add -G0 option for nios2.

diff --git a/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-5.C b/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-5.C
index 92c75df..bac2b68 100644
--- a/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-5.C
+++ b/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-5.C
@@ -1,7 +1,7 @@
 /* PR c++/100876 - -Wmismatched-new-delete should either look through
or ignore placement new
{ dg-do compile }
-   { dg-options "-O2 -Wall" } */
+   { dg-options "-O2 -Wall -fdelete-null-pointer-checks" } */
 
 extern "C" {
   void* malloc (__SIZE_TYPE__);
diff --git a/gcc/testsuite/gcc.dg/attr-returns-nonnull.c b/gcc/testsuite/gcc.dg/attr-returns-nonnull.c
index 22ee30a..e4e20b8 100644
--- a/gcc/testsuite/gcc.dg/attr-returns-nonnull.c
+++ b/gcc/testsuite/gcc.dg/attr-returns-nonnull.c
@@ -1,7 +1,7 @@
 /* Verify that attribute returns_nonnull on global and local function
declarations is merged.
{ dg-do compile }
-   { dg-options "-Wall -fdump-tree-optimized" } */
+   { dg-options "-Wall -fdump-tree-optimized -fdelete-null-pointer-checks" } */
 
 void foo (void);
 
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c
index f809d93..dbb236b 100644
--- a/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c
@@ -12,6 +12,7 @@
 /* { dg-do compile )  */
 /* { dg-options "-O0 -gbtf -dA" } */
 /* { dg-options "-O0 -gbtf -dA -msdata=none" { target { { powerpc*-*-* } && ilp32 } } } */
+/* { dg-options "-O0 -gbtf -dA -G0" { target { nios2-*-* } } } */
 
 /* Check for two DATASEC entries with vlen 3, and one with vlen 1.  */
 /* { dg-final { scan-assembler-times "0xf03\[\t \]+\[^\n\]*btt_info" 2 } } */
diff --git a/gcc/testsuite/gcc.dg/ifcvt-4.c b/gcc/testsuite/gcc.dg/ifcvt-4.c
index e74e449..0525102 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-4.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-4.c
@@ -2,7 +2,7 @@
 /* { dg-additional-options "-misel" { target { powerpc*-*-* } } } */
 /* { dg-additional-options "-march=z196" { target { s390x-*-* } } } */
 /* { dg-additional-options "-mtune-ctrl=^one_if_conv_insn" { target { i?86-*-* x86_64-*-* } } } */
-/* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" { "arm*-*-* avr-*-* hppa*64*-*-* s390-*-* visium-*-*" riscv*-*-* msp430-*-* } }  */
+/* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" { "arm*-*-* avr-*-* hppa*64*-*-* s390-*-* visium-*-*" riscv*-*-* msp430-*-* nios2-*-*} }  */
 /* { dg-skip-if "" { "s390x-*-*" } { "-m31" } }  */
 
 typedef int word __attribute__((mode(word)));
diff --git a/gcc/testsuite/gcc.dg/struct-by-value-1.c b/gcc/testsuite/gcc.dg/struct-by-value-1.c
index addf253..ae7adb5 100644
--- a/gcc/testsuite/gcc.dg/struct-by-value-1.c
+++ b/gcc/testsuite/gcc.dg/struct-by-value-1.c
@@ -1,6 +1,7 @@
 /* Test structure passing by value.  */
 /* { dg-do run } */
 /* { dg-options "-O2" } */
+/* { dg-options "-O2 -G0" { target { nios2-*-* } } } */
 
 #define T(N)	\
 struct S##N { unsigned char i[N]; };		\

Fix some side cases of side effects analysis

2021-11-11 Thread Jan Hubicka via Gcc-patches

Hi,
I wrote script comparing modref pure/const discovery with ipa-pure-const
and found mistakes on both ends.  I fixed ipa-pure-const in previous two
patches.

This plugs the case where modref was too optimistic in handling looping
pure consts which were previously missed due to early exits on ECF_CONST
| ECF_PURE.  Those early exists are bit anoying and I think as a cleanup
I may just drop some of them as premature optimizations coming from time
modref was very simplistic on what it propagates.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  

* ipa-modref.c (modref_summary::useful_p): Check also for side-effects
with looping const/pure.
(modref_summary_lto::useful_p): Likewise.
(merge_call_side_effects): Merge side effects before early exit
for pure/const.
(process_fnspec): Also handle pure functions.
(analyze_call): Do not early exit on looping pure const.
(propagate_unknown_call): Also handle nontrivial SCC as side-effect.
(modref_propagate_in_scc):

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index f8b7b900527..45b391a565e 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -331,11 +331,11 @@ modref_summary::useful_p (int ecf_flags, bool check_flags)
   && remove_useless_eaf_flags (static_chain_flags, ecf_flags, false))
 return true;
   if (ecf_flags & (ECF_CONST | ECF_NOVOPS))
-return false;
+return (!side_effects && (ecf_flags & ECF_LOOPING_CONST_OR_PURE));
   if (loads && !loads->every_base)
 return true;
   if (ecf_flags & ECF_PURE)
-return false;
+return (!side_effects && (ecf_flags & ECF_LOOPING_CONST_OR_PURE));
   return stores && !stores->every_base;
 }
 
@@ -416,11 +416,11 @@ modref_summary_lto::useful_p (int ecf_flags, bool 
check_flags)
   && remove_useless_eaf_flags (static_chain_flags, ecf_flags, false))
 return true;
   if (ecf_flags & (ECF_CONST | ECF_NOVOPS))
-return false;
+return (!side_effects && (ecf_flags & ECF_LOOPING_CONST_OR_PURE));
   if (loads && !loads->every_base)
 return true;
   if (ecf_flags & ECF_PURE)
-return false;
+return (!side_effects && (ecf_flags & ECF_LOOPING_CONST_OR_PURE));
   return stores && !stores->every_base;
 }
 
@@ -925,6 +925,18 @@ merge_call_side_effects (modref_summary *cur_summary,
   auto_vec  parm_map;
   modref_parm_map chain_map;
   bool changed = false;
+  int flags = gimple_call_flags (stmt);
+
+  if (!cur_summary->side_effects && callee_summary->side_effects)
+{
+  if (dump_file)
+   fprintf (dump_file, " - merging side effects.\n");
+  cur_summary->side_effects = true;
+  changed = true;
+}
+
+  if (flags & (ECF_CONST | ECF_NOVOPS))
+return changed;
 
   /* We can not safely optimize based on summary of callee if it does
  not always bind to current def: it is possible that memory load
@@ -988,12 +1000,6 @@ merge_call_side_effects (modref_summary *cur_summary,
  changed = true;
}
 }
-  if (!cur_summary->side_effects
-  && callee_summary->side_effects)
-{
-  cur_summary->side_effects = true;
-  changed = true;
-}
   return changed;
 }
 
@@ -1091,7 +1097,7 @@ process_fnspec (modref_summary *cur_summary,
   attr_fnspec fnspec = gimple_call_fnspec (call);
   int flags = gimple_call_flags (call);
 
-  if (!(flags & (ECF_CONST | ECF_NOVOPS))
+  if (!(flags & (ECF_CONST | ECF_NOVOPS | ECF_PURE))
   || (flags & ECF_LOOPING_CONST_OR_PURE)
   || (cfun->can_throw_non_call_exceptions
  && stmt_could_throw_p (cfun, call)))
@@ -1101,6 +1107,8 @@ process_fnspec (modref_summary *cur_summary,
   if (cur_summary_lto)
cur_summary_lto->side_effects = true;
 }
+  if (flags & (ECF_CONST | ECF_NOVOPS))
+return true;
   if (!fnspec.known_p ())
 {
   if (dump_file && gimple_call_builtin_p (call, BUILT_IN_NORMAL))
@@ -1203,7 +1211,8 @@ analyze_call (modref_summary *cur_summary, 
modref_summary_lto *cur_summary_lto,
   /* Check flags on the function call.  In certain cases, analysis can be
  simplified.  */
   int flags = gimple_call_flags (stmt);
-  if (flags & (ECF_CONST | ECF_NOVOPS))
+  if ((flags & (ECF_CONST | ECF_NOVOPS))
+  && !(flags & ECF_LOOPING_CONST_OR_PURE))
 {
   if (dump_file)
fprintf (dump_file,
@@ -3963,7 +3972,8 @@ static bool
 propagate_unknown_call (cgraph_node *node,
cgraph_edge *e, int ecf_flags,
modref_summary *cur_summary,
-   modref_summary_lto *cur_summary_lto)
+   modref_summary_lto *cur_summary_lto,
+   bool nontrivial_scc)
 {
   bool changed = false;
   class fnspec_summary *fnspec_sum = fnspec_summaries->get (e);
@@ -3973,12 +3983,12 @@ propagate_unknown_call (cgraph_node *node,
   if (e->callee
   && builtin_safe_for_const_function_p (&looping, e->callee->decl))
 {
-  if (cur_summary && !cur_

[Patch] Fortran/openmp: Add support for 2 argument num_teams clause

2021-11-11 Thread Tobias Burnus


Just the Fortran FE work + Fortranized version for the C tests.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran/openmp: Add support for 2 argument num_teams clause

Fortran part to commit r12-5146-g48d7327f2aaf65

gcc/fortran/ChangeLog:

	* gfortran.h (struct gfc_omp_clauses): Rename num_teams to
	num_teams_upper, add num_teams_upper.
	* dump-parse-tree.c (show_omp_clauses): Update to handle
	lower-bound num_teams clause.
	* frontend-passes.c (gfc_code_walker): Likewise
	* openmp.c (gfc_free_omp_clauses, gfc_match_omp_clauses,
	resolve_omp_clauses): Likewise.
	* trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses,
	gfc_trans_omp_target): Likewise.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/teams-1.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/num-teams-1.f90: New test.
	* gfortran.dg/gomp/num-teams-2.f90: New test.

 gcc/fortran/dump-parse-tree.c  |  9 -
 gcc/fortran/frontend-passes.c  |  3 +-
 gcc/fortran/gfortran.h |  3 +-
 gcc/fortran/openmp.c   | 32 +---
 gcc/fortran/trans-openmp.c | 35 -
 gcc/testsuite/gfortran.dg/gomp/num-teams-1.f90 | 53 ++
 gcc/testsuite/gfortran.dg/gomp/num-teams-2.f90 | 37 ++
 libgomp/testsuite/libgomp.fortran/teams-1.f90  | 22 +++
 8 files changed, 175 insertions(+), 19 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 14a307856fc..04660d5074a 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1741,10 +1741,15 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
 	}
   fprintf (dumpfile, " BIND(%s)", type);
 }
-  if (omp_clauses->num_teams)
+  if (omp_clauses->num_teams_upper)
 {
   fputs (" NUM_TEAMS(", dumpfile);
-  show_expr (omp_clauses->num_teams);
+  if (omp_clauses->num_teams_lower)
+	{
+	  show_expr (omp_clauses->num_teams_lower);
+	  fputc (':', dumpfile);
+	}
+  show_expr (omp_clauses->num_teams_upper);
   fputc (')', dumpfile);
 }
   if (omp_clauses->device)
diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c
index 145bff50f3e..f5ba7cecd54 100644
--- a/gcc/fortran/frontend-passes.c
+++ b/gcc/fortran/frontend-passes.c
@@ -5634,7 +5634,8 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t codefn, walk_expr_fn_t exprfn,
 		  WALK_SUBEXPR (co->ext.omp_clauses->chunk_size);
 		  WALK_SUBEXPR (co->ext.omp_clauses->safelen_expr);
 		  WALK_SUBEXPR (co->ext.omp_clauses->simdlen_expr);
-		  WALK_SUBEXPR (co->ext.omp_clauses->num_teams);
+		  WALK_SUBEXPR (co->ext.omp_clauses->num_teams_lower);
+		  WALK_SUBEXPR (co->ext.omp_clauses->num_teams_upper);
 		  WALK_SUBEXPR (co->ext.omp_clauses->device);
 		  WALK_SUBEXPR (co->ext.omp_clauses->thread_limit);
 		  WALK_SUBEXPR (co->ext.omp_clauses->dist_chunk_size);
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 9378b4b8a24..1ad2f0df702 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1502,7 +1502,8 @@ typedef struct gfc_omp_clauses
   struct gfc_expr *chunk_size;
   struct gfc_expr *safelen_expr;
   struct gfc_expr *simdlen_expr;
-  struct gfc_expr *num_teams;
+  struct gfc_expr *num_teams_lower;
+  struct gfc_expr *num_teams_upper;
   struct gfc_expr *device;
   struct gfc_expr *thread_limit;
   struct gfc_expr *grainsize;
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index dcf22ac2c2f..7b2df0d0be3 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -85,7 +85,8 @@ gfc_free_omp_clauses (gfc_omp_clauses *c)
   gfc_free_expr (c->chunk_size);
   gfc_free_expr (c->safelen_expr);
   gfc_free_expr (c->simdlen_expr);
-  gfc_free_expr (c->num_teams);
+  gfc_free_expr (c->num_teams_lower);
+  gfc_free_expr (c->num_teams_upper);
   gfc_free_expr (c->device);
   gfc_free_expr (c->thread_limit);
   gfc_free_expr (c->dist_chunk_size);
@@ -2420,11 +2421,22 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 	  continue;
 	}
 	  if ((mask & OMP_CLAUSE_NUM_TEAMS)
-	  && (m = gfc_match_dupl_check (!c->num_teams, "num_teams", true,
-	&c->num_teams)) != MATCH_NO)
+	  && (m = gfc_match_dupl_check (!c->num_teams_upper, "num_teams",
+	true)) != MATCH_NO)
 	{
 	  if (m == MATCH_ERROR)
 		goto error;
+	  if (gfc_match ("%e ", &c->num_teams_upper) != MATCH_YES)
+		goto error;
+	  if (gfc_peek_ascii_char () == ':')
+		{
+		  c->num_teams_lower = c->num_teams_upper;
+		  c->num_teams_upper = NULL;
+		  if (gfc_match (": %e ", &c->num_teams_upper) != MATCH_YES)
+		goto error;
+		}
+	  if (gfc_match (") ") != MATCH_YES)
+		goto error;
 	  continue;
 	}

[PATCH] tree-optimization/103190 - fix assert in reassoc stmt placement with asm

2021-11-11 Thread Richard Biener via Gcc-patches

This makes sure to only assert we don't run into a asm goto when
inserting a stmt in reassoc, matching the condition in
can_reassociate_p.  We can handle EH edges from an asm just like
EH edges from any other stmt.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-11-11  Richard Biener  

PR tree-optimization/103190
* tree-ssa-reassoc.c (insert_stmt_after): Only assert on asm goto.
---
 gcc/tree-ssa-reassoc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 6a555e7c553..65316223047 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -1515,7 +1515,8 @@ insert_stmt_after (gimple *stmt, gimple *insert_point)
   gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
   return;
 }
-  else if (gimple_code (insert_point) == GIMPLE_ASM)
+  else if (gimple_code (insert_point) == GIMPLE_ASM
+  && gimple_asm_nlabels (as_a  (insert_point)) != 0)
 /* We have no idea where to insert - it depends on where the
uses will be placed.  */
 gcc_unreachable ();
-- 
2.31.1

1 2 >

1 - 100 of 158 matches

Mail list logo