[PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec

2023-11-05 Thread pan2 . li
From: Pan Li 

This patch would like to support the FP below API auto vectorization
with different type size

+-+---+--+
| API | RV64  | RV32 |
+-+---+--+
| irint   | DF => SI  | DF => SI |
| irintf  | - | -|
| lrint   | - | DF => SI |
| lrintf  | SF => DI  | -|
| llrint  | - | -|
| llrintf | SF => DI  | SF => DI |
+-+---+--+

Given below code:
void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf (in[i]);
}

Before this patch:
test_lrintf:
  beq a2,zero,.L8
  sllia5,a2,32
  srlia2,a5,30
  add a4,a1,a2
.L3:
  flw fa5,0(a1)
  addia1,a1,4
  addia0,a0,8
  fcvt.l.s a5,fa5,dyn
  sd  a5,-8(a0)
  bne a1,a4,.L3

After this patch:
test_lrintf:
  beq a2,zero,.L8
  sllia2,a2,32
  srlia2,a2,32
.L3:
  vsetvli a5,a2,e32,mf2,ta,ma
  vle32.v v2,0(a1)
  sllia3,a5,2
  sllia4,a5,3
  vfwcvt.x.f.vv1,v2
  sub a2,a2,a5
  vse64.v v1,0(a0)
  add a1,a1,a3
  add a0,a0,a4
  bne a2,zero,.L3

Unfortunately, the HF mode is not include due to it requires
additional middle-end support from internal-fun.def.

gcc/ChangeLog:

* config/riscv/autovec.md: Remove the size check of lrint.
* config/riscv/riscv-v.cc (emit_vec_narrow_cvt_x_f): New help
emit func impl.
(emit_vec_widden_cvt_x_f): New help emit func impl.
(emit_vec_rounding_to_integer): New func impl to emit the
rounding from FP to integer.
(expand_vec_lrint): Leverage emit_vec_rounding_to_integer.
* config/riscv/vector.md: Take V_VLSF for vfncvt.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-irint-run-0.c:
* gcc.target/riscv/rvv/autovec/unop/math-irint-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-irintf-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llrintf-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llrintf-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-rv32-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-rv32-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrintf-rv64-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrintf-rv64-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-irint-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-llrintf-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lrint-rv32-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lrintf-rv64-0.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   |  6 +-
 gcc/config/riscv/riscv-v.cc   | 46 +-
 gcc/config/riscv/vector.md|  2 +-
 .../riscv/rvv/autovec/unop/math-irint-1.c | 13 +++
 .../riscv/rvv/autovec/unop/math-irint-run-0.c | 92 +--
 .../rvv/autovec/unop/math-irintf-run-0.c  | 63 +
 .../riscv/rvv/autovec/unop/math-llrintf-0.c   | 13 +++
 .../rvv/autovec/unop/math-llrintf-run-0.c | 63 +
 .../rvv/autovec/unop/math-lrint-rv32-0.c  | 13 +++
 .../rvv/autovec/unop/math-lrint-rv32-run-0.c  | 63 +
 .../rvv/autovec/unop/math-lrintf-rv64-0.c | 13 +++
 .../rvv/autovec/unop/math-lrintf-rv64-run-0.c | 63 +
 .../riscv/rvv/autovec/vls/math-irint-1.c  | 30 ++
 .../riscv/rvv/autovec/vls/math-llrintf-0.c| 30 ++
 .../riscv/rvv/autovec/vls/math-lrint-rv32-0.c | 30 ++
 .../rvv/autovec/vls/math-lrintf-rv64-0.c  | 30 ++
 16 files changed, 514 insertions(+), 56 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-irint-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-irintf-run-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrintf-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llrintf-run-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-rv32-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-rv32-run-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrintf-rv64-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrintf-rv64-run-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-irint-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-llrintf-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lrint-rv32-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lrintf-rv64-0.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cc4c9596bbf..f1f0523d1de 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/a

Re: [PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec

2023-11-05 Thread juzhe.zhong
lgtm Replied Message Frompan2...@intel.comDate11/05/2023 17:30 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,pan2...@intel.com,yanzhang.w...@intel.com,kito.ch...@gmail.comSubject[PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec


RE: [PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec

2023-11-05 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: juzhe.zhong 
Sent: Sunday, November 5, 2023 5:40 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; Li, Pan2 ; Wang, Yanzhang 
; kito.ch...@gmail.com
Subject: Re: [PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec

lgtm
 Replied Message 
From
pan2...@intel.com
Date
11/05/2023 17:30
To
gcc-patches@gcc.gnu.org
Cc
juzhe.zh...@rivai.ai,
pan2...@intel.com,
yanzhang.w...@intel.com,
kito.ch...@gmail.com
Subject
[PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec



[PATCH v4 1/2] c++: Initial support for P0847R7 (Deducing this) [PR102609]

2023-11-05 Thread waffl3x
Bootstrapped and tested on x86_64-linux with no regressions.

I originally threw this e-mail together last night, but threw in the
towel when I thought I saw tests failing and went to sleep. I did a
proper bootstrap and comparison and whatnot and found that there were
thankfully no regressions.

Anyhow, the first patch feels ready for trunk, the second needs at
least one review, I'll write more on that in the second e-mail though.
I put quite a lot into the commit message, in hindsight I think I may
have gone overboard, but that isn't something I'm going to rewrite at
the moment. I really want to get these patches up for review so they
can be finalized.

I'm also including my usual musings on things that came up as I was
polishing off the patches. I reckon some of them aren't all that
important right now but I would rather toss them in here than forget
about them.

I'm starting to think that we should have a general macro that
indicates whether an implicit object argument should be passed in the
call. It might be more clear than what is currently present. I've also
noticed that there's a fair amount of places where instead of using
DECL_NONSTATIC_MEMBER_FUNCTION_P the code checks if tree_code of the
type is a METHOD_TYPE, which is exactly what the aforementioned macro
does.

In build_min_non_dep_op_overload I reversed the branches of a condition
because it made more sense with METHOD_TYPE first so it doesn't have to
take xobj member functions into account on both branches. I am slightly
concerned that flipping the branch around might have consequences,
hence why I am mentioning it. Realistically I think it's probably fine
though.

I have a test prepared for diagnosing virtual specifiers on xobj member
functions, but it's got some issues so I won't be including it with the
following diagnostic patch. Diagnostics for virtual specifiers are
still implemented, it's just the test that is having trouble. I mostly
had a hard time working out edge cases, and the standard doesn't
actually properly specify what the criteria for overriding a function
is so I've been stumped on what behavior I want it to have. So for the
time being, it only diagnoses uses of virtual on xobj member functions,
while errors for final and override are handled by code that is already
present. This can result in multiple errors, but again, I don't know
how I want to handle it yet, especially since the standard doesn't
specify this stuff very well.

BTW let me know if there's anything you would prefer to be done
differently in the changelog, I am still having trouble writing them
and I'm usually uncertain if I'm writing them properly.

Alex
From e730dcba51503446cc362909fcab19361970b448 Mon Sep 17 00:00:00 2001
From: waffl3x 
Date: Sat, 4 Nov 2023 05:35:10 -0600
Subject: [PATCH 1/2] c++: Initial support for C++23 P0847R7 (Deducing this)
 [PR102609]

This patch implements initial support for P0847R7 without diagnostics.  My goal
was to minimize changes to the existing code.  To achieve this I chose to treat
xobj member functions as static member functions, while opting into member
function handling when necessary.  This seemed to be the better choice since
most of the time they are more like static member functions.

This is achieved by inhibiting conversion of the declaration's type from
FUNCTION_TYPE to METHOD_TYPE.  Most if not everything seems to differentiate
between member functions and static member functions by inspecting the
FUNCTION_DECL's type, so forcing this is sufficient.  An xobj member function
is any member function that is declared with an xobj parameter as it's first
parameter.  This information is passed through the declarator's parameter list,
stored in the purpose of the parameter's tree_list node.  Normally this is used
to store default arguments, but as default arguments are not allowed for xobj
parameters it is fine for us to hijack it.  By utilizing this we can pass this
information from cp_parser_parameter_declaration over to grokdeclarator without
adding anything new to the tree structs.

We still need to differentiate this new function variety from static member
functions and regular functions, and since this information needs to stick to
the declaration we should select a more concrete place to store it.  Unlike the
previous hack for parameters, we instead add a flag to lang_decl_fn,  the only
modification this patch makes to any tree data-structures.  We could probably
try to stick the information in the decl's parameters somewhere, but I think a
proper flag is justified.  The new flag can be set and cleared through
DECL_FUNCTION_XOBJ_FLAG, it is invalid to use this with anything other than
FUNCTION_DECL nodes.  For inspecting the value of this flag
DECL_XOBJ_MEMBER_FUNC_P should be used, this macro is safe to use with any node
type and will harmlessly evaluate to false for invalid node types.

It needs to be noted that we can not add checking for xobj member functions to
DECL_NONSTATIC_MEMBER_FUNCTION_P as it is

[PATCH v4 2/2] c++: Diagnostics for P0847R7 (Deducing this) [PR102609]

2023-11-05 Thread waffl3x
Bootstrapped and tested on x86_64-linux with no regressions.

Finally, the fabled diagnostics patch. I would like to note really
quickly that there was never a v2 and v3 of this patch, only the first
of these 2 had those versions. Originally I had planned to revise this
patch alongside the first but it just didn't happen. Anyhow, I decided
to match the version of this second patch to the current first patch to
avoid any confusion. 

With that out of the way, I feel mostly okay about the code in this
patch, but I have a feeling it will need a revision, especially with
the large amounts of comments I left in. At the very least I expect to
need to pull those out before the patch can be accepted.

I had wanted to write about some of my frustrations with trying to
write a test for virtual specifiers and errors/warnings for
shadowing/overloading virtual functions, but I am a bit too tired at
the moment and I don't want to delay getting this up for another night.
In short, the standard does not properly specify the criteria for
overriding functions, which leaves a lot of ambiguity in how exactly we
should be handling these cases. The standard also really poorly
specifies things related to the implicit object parameter and implicit
object argument which also causes some trouble. Anyhow, for the time
being I am not including my test for diagnostics related to a virtual
specifier on xobj member functions. I can't get it to a point I am
happy with it and I think there will need to be some discussion on how
exactly we want to handle that.

I was fairly lazy with the changelog and commit message in this patch
as I expect to need to do another round on this patch before it can be
accepted. One specific question I have is whether I should be listing
out all the diagnostics that were added to a function. For the cases
where there were only one diagnostic added I stated it, but for
grokdeclarator which has the majority of the diagnostics I did not. I
welcome input here, really I request it, because the changelogs are
still fairly difficult for me to write. Hell, the commit messages are
hard to write, I feel I went overboard on the first patch but I guess
it's a fairly large patch so maybe it's alright? Again, I am looking
for feedback here if anyone is willing to provide it.

I've written more than I want here, so I'll wrap this e-mail up and go
to bed. I am very happy to be getting close to a final product here.
Hopefully if all goes well I'll be able to fit in the final missing
features before feature lock hits.

AlexFrom c8e8155a635fab7f326d0ad32326da352d7c323e Mon Sep 17 00:00:00 2001
From: waffl3x 
Date: Sun, 5 Nov 2023 05:17:18 -0700
Subject: [PATCH 2/2] c++: Diagnostics for C++23 P0847R7 (Deducing this)
 [PR102609]

This patch adds diagnostics for various ill-formed code related to xobj member
functions. Some of the code in here leaves something to be desired, but the
majority of cases should be handled. I opted to add a new TFF flag despite only
using it in a single place, other solutions seemed non ideal and there are
plenty of bits left. Some of the diagnostics are more scattered around than I
would like, perhaps this could be refactored in the future, especially those in
grokfndecl.

	PR c++/102609

gcc/cp/ChangeLog:

	PR c++/102609
	Diagnostics for C++23 P0847R7 - Deducing this.
	* cp-tree.h (TFF_XOBJ_FUNC): Define.
	* decl.cc (grokfndecl): Diagnose cvref-qualifiers on an xobj member
	function.
	(grokdeclarator): Diagnostics
	* error.cc (dump_function_decl): For xobj member function add
	TFF_XOBJ_FUNC bit to dump_parameters flags argument.
	(dump_parameters): When printing xobj member function's params add
	"this" to the first param.
	(function_category): Say so when in an xobj member function.
	* parser.cc (cp_parser_decl_specifier_seq): Diagnose incorrectly
	positioned "this" specifier.
	(cp_parser_parameter_declaration): Diagnose default argument on
	xobj params.
	* semantics.cc (finish_this_expr): Diagnose uses of "this" in body
	of xobj member function.

gcc/testsuite/ChangeLog:

	PR c++/102609
	Diagnostics for C++23 P0847R7 - Deducing this.
	* g++.dg/cpp23/explicit-obj-cxx-dialect-A.C: New test.
	* g++.dg/cpp23/explicit-obj-cxx-dialect-B.C: New test.
	* g++.dg/cpp23/explicit-obj-cxx-dialect-C.C: New test.
	* g++.dg/cpp23/explicit-obj-cxx-dialect-D.C: New test.
	* g++.dg/cpp23/explicit-obj-cxx-dialect-E.C: New test.
	* g++.dg/cpp23/explicit-obj-diagnostics1.C: New test.
	* g++.dg/cpp23/explicit-obj-diagnostics2.C: New test.
	* g++.dg/cpp23/explicit-obj-diagnostics3.C: New test.
	* g++.dg/cpp23/explicit-obj-diagnostics4.C: New test.
	* g++.dg/cpp23/explicit-obj-diagnostics5.C: New test.
	* g++.dg/cpp23/explicit-obj-diagnostics6.C: New test.
	* g++.dg/cpp23/explicit-obj-diagnostics7.C: New test.

Signed-off-by: waffl3x 
---
 gcc/cp/cp-tree.h  |   5 +-
 gcc/cp/decl.cc| 133 ++---
 gcc/cp/error.cc   |   8 +-
 gcc

[PATCH] Remove unnecessary "& 1" in year_month_day_last::day()

2023-11-05 Thread Cassio Neri
When year_month_day_last::day() was implemented, Dr. Matthias Kretz realised
that the operation "& 1" wasn't necessary but we did not patch it at that
time. This patch removes the unnecessary operation.

libstdc++-v3/ChangeLog:

* include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono
b/libstdc++-v3/include/std/chrono
index 10e868e5a03..c979a5d05dd 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1800,8 +1800,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
  const auto __m = static_cast(month());

- // Excluding February, the last day of month __m is either 30 or 31 or,
- // in another words, it is 30 + b = 30 | b, where b is in {0, 1}.
+ // Assume 1 <= __m <= 12, otherwise month().ok() == false and the result
+ // of day() is unspecified. Excluding February, the last day of month __m
+ // m is either 30 or 31 or, in another words, it is 30 | b, where b is in
+ // {0, 1}.

  // If __m in {1, 3, 4, 5, 6, 7}, then b is 1 if, and only if __m is odd.
  // Hence, b = __m & 1 = (__m ^ 0) & 1.
@@ -1812,10 +1814,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  // Therefore, b = (__m ^ c) & 1, where c = 0, if __m < 8, or c = 1 if
  // __m >= 8, that is, c = __m >> 3.

+ // Since 30 = (0)_2 and __m <= 31 = (1)_2, we have:
+ // 30 | ((__m ^ c) & 1) == 30 | (__m ^ c), that is, the "& 1" is
+ // unnecessary.
+
  // The above mathematically justifies this implementation whose
  // performance does not depend on look-up tables being on the L1 cache.
- return chrono::day{__m != 2 ? ((__m ^ (__m >> 3)) & 1) | 30
-: _M_y.is_leap() ? 29 : 28};
+ return chrono::day{__m != 2 ? (__m ^ (__m >> 3)) | 30
+ : _M_y.is_leap() ? 29 : 28};
   }

   constexpr


Re: [PATCH] Remove unnecessary "& 1" in year_month_day_last::day()

2023-11-05 Thread Marc Glisse

On Sun, 5 Nov 2023, Cassio Neri wrote:


When year_month_day_last::day() was implemented, Dr. Matthias Kretz realised
that the operation "& 1" wasn't necessary but we did not patch it at that
time. This patch removes the unnecessary operation.


Is there an entry in gcc's bugzilla about having the optimizer handle this 
kind of optimization?


unsigned f(unsigned x){
  if(x>=32)__builtin_unreachable();
  return 30|(x&1); // --> 30|x
}

(that optimization would come in addition to your patch, doing the 
optimization by hand is still a good idea)


It looks like the criterion would be a|(b&c) when the possible 1 bits of b 
are included in the certainly 1 bits of a|c.


--
Marc Glisse


Re: [PATCH] Remove unnecessary "& 1" in year_month_day_last::day()

2023-11-05 Thread Cassio Neri
I could not find any entry in gcc's bugzilla for that. Perhaps my search
wasn't good enough.


On Sun, 5 Nov 2023 at 15:58, Marc Glisse  wrote:

> On Sun, 5 Nov 2023, Cassio Neri wrote:
>
> > When year_month_day_last::day() was implemented, Dr. Matthias Kretz
> realised
> > that the operation "& 1" wasn't necessary but we did not patch it at that
> > time. This patch removes the unnecessary operation.
>
> Is there an entry in gcc's bugzilla about having the optimizer handle this
> kind of optimization?
>
> unsigned f(unsigned x){
>if(x>=32)__builtin_unreachable();
>return 30|(x&1); // --> 30|x
> }
>
> (that optimization would come in addition to your patch, doing the
> optimization by hand is still a good idea)
>
> It looks like the criterion would be a|(b&c) when the possible 1 bits of b
> are included in the certainly 1 bits of a|c.
>
> --
> Marc Glisse
>


[committed] openmp: Adjust handling of __has_attribute (omp::directive)/sequence and add omp::decl

2023-11-05 Thread Jakub Jelinek
Hi!

I forgot to tweak c_common_has_attribute for the C++ omp::decl addition and now
also for the C omp::{directive,sequence,decl} addition.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2023-11-05  Jakub Jelinek  

* c-lex.cc (c_common_has_attribute): Return 1 for omp::directive
and omp::sequence with -fopenmp or -fopenmp-simd also for C, not
just for C++.  Return 1 for omp::decl with -fopenmp or -fopenmp-simd
for both C and C++.

* c-c++-common/gomp/attrs-1.c: Adjust for omp::directive and
omp::sequence being supported also in C and add tests for omp::decl.
* c-c++-common/gomp/attrs-2.c: Likewise.
* c-c++-common/gomp/attrs-3.c: Add tests for omp::decl.

--- gcc/c-family/c-lex.cc.jj2023-10-08 16:37:31.301279702 +0200
+++ gcc/c-family/c-lex.cc   2023-11-04 09:19:58.739016364 +0100
@@ -367,15 +367,13 @@ c_common_has_attribute (cpp_reader *pfil
= get_identifier ((const char *)
  cpp_token_as_text (pfile, nxt_token));
  attr_id = canonicalize_attr_name (attr_id);
- if (c_dialect_cxx ())
-   {
- /* OpenMP attributes need special handling.  */
- if ((flag_openmp || flag_openmp_simd)
- && is_attribute_p ("omp", attr_ns)
- && (is_attribute_p ("directive", attr_id)
- || is_attribute_p ("sequence", attr_id)))
-   result = 1;
-   }
+ /* OpenMP attributes need special handling.  */
+ if ((flag_openmp || flag_openmp_simd)
+ && is_attribute_p ("omp", attr_ns)
+ && (is_attribute_p ("directive", attr_id)
+ || is_attribute_p ("sequence", attr_id)
+ || is_attribute_p ("decl", attr_id)))
+   result = 1;
  if (result)
attr_name = NULL_TREE;
  else
--- gcc/testsuite/c-c++-common/gomp/attrs-1.c.jj2021-07-23 
09:50:02.429080908 +0200
+++ gcc/testsuite/c-c++-common/gomp/attrs-1.c   2023-11-04 09:39:37.770503402 
+0100
@@ -1,144 +1,96 @@
 /* { dg-do compile } */
 /* { dg-options "-fopenmp" } */
 
-#if __has_attribute(omp::directive)
-#ifndef __cplusplus
-#error omp::directive supported in C
-#endif
-#else
-#ifdef __cplusplus
-#error omp::directive not supported in C++
-#endif
+#if !__has_attribute(omp::directive)
+#error omp::directive not supported in C/C++
 #endif
 
-#if __has_attribute(omp::sequence)
-#ifndef __cplusplus
-#error omp::sequence supported in C
-#endif
-#else
-#ifdef __cplusplus
-#error omp::sequence not supported in C++
+#if !__has_attribute(omp::sequence)
+#error omp::sequence not supported in C/C++
 #endif
+
+#if !__has_attribute(omp::decl)
+#error omp::decl not supported in C/C++
 #endif
 
 #if __has_attribute(omp::unknown)
 #error omp::unknown supported
 #endif
 
-#if __has_cpp_attribute(omp::directive)
-#ifndef __cplusplus
-#error omp::directive supported in C
-#endif
-#else
-#ifdef __cplusplus
-#error omp::directive not supported in C++
-#endif
+#if !__has_cpp_attribute(omp::directive)
+#error omp::directive not supported in C/C++
 #endif
 
-#if __has_cpp_attribute(omp::sequence)
-#ifndef __cplusplus
-#error omp::sequence supported in C
-#endif
-#else
-#ifdef __cplusplus
-#error omp::sequence not supported in C++
+#if !__has_cpp_attribute(omp::sequence)
+#error omp::sequence not supported in C/C++
 #endif
+
+#if !__has_cpp_attribute(omp::decl)
+#error omp::sequence not supported in C/C++
 #endif
 
 #if __has_cpp_attribute(omp::unknown)
 #error omp::unknown supported
 #endif
 
-#if __has_attribute(__omp__::__directive__)
-#ifndef __cplusplus
-#error __omp__::__directive__ supported in C
-#endif
-#else
-#ifdef __cplusplus
-#error __omp__::__directive__ not supported in C++
-#endif
+#if !__has_attribute(__omp__::__directive__)
+#error __omp__::__directive__ not supported in C/C++
 #endif
 
-#if __has_attribute(__omp__::__sequence__)
-#ifndef __cplusplus
-#error __omp__::__sequence__ supported in C
-#endif
-#else
-#ifdef __cplusplus
-#error __omp__::__sequence__ not supported in C++
+#if !__has_attribute(__omp__::__sequence__)
+#error __omp__::__sequence__ not supported in C/C++
 #endif
+
+#if !__has_attribute(__omp__::__decl__)
+#error __omp__::__decl__ not supported in C/C++
 #endif
 
 #if __has_attribute(__omp__::__unknown__)
 #error __omp__::__unknown__ supported
 #endif
 
-#if __has_cpp_attribute(__omp__::__directive__)
-#ifndef __cplusplus
-#error __omp__::__directive__ supported in C
-#endif
-#else
-#ifdef __cplusplus
-#error __omp__::__directive__ not supported in C++
-#endif
+#if !__has_cpp_attribute(__omp__::__directive__)
+#error __omp__::__directive__ not supported in C/C++
 #endif
 
-#if __has_cpp_attribute(__omp__::__sequence__)
-#ifndef __cplusplus
-#error __omp__::__sequence__ supported in C
-#endif
-#else
-#ifdef __cplusplus
-#error __o

[committed] openmp: Mention C attribute syntax in documentation

2023-11-05 Thread Jakub Jelinek
Hi!

This patch mentions the C attribute syntax support in the libgomp documentation.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2023-11-05  Jakub Jelinek  

* libgomp.texi (Enabling OpenMP): Adjust wording for attribute syntax
supported also in C.

--- libgomp/libgomp.texi.jj 2023-10-16 14:24:46.408203789 +0200
+++ libgomp/libgomp.texi2023-11-04 09:46:47.200437631 +0100
@@ -138,7 +138,7 @@ changed to GNU Offloading and Multi Proc
 
 To activate the OpenMP extensions for C/C++ and Fortran, the compile-time
 flag @option{-fopenmp} must be specified.  For C and C++, this enables
-the handling of the OpenMP directives using @code{#pragma omp} and, for C++, 
the
+the handling of the OpenMP directives using @code{#pragma omp} and the
 @code{[[omp::directive(...)]]}, @code{[[omp::sequence(...)]]} and
 @code{[[omp::decl(...)]]} attributes.  For Fortran, it enables for
 free source form the @code{!$omp} sentinel for directives and the

Jakub



[PATCH] c++: Fix error recovery ICE [PR112365]

2023-11-05 Thread Jakub Jelinek
Hi!

check_field_decls for DECL_C_BIT_FIELD FIELD_DECLs with error_mark_node
TREE_TYPE continues early and doesn't call check_bitfield_decl which would
either set DECL_BIT_FIELD, or clear DECL_C_BIT_FIELD.  So, the following
testcase ICEs after emitting tons of errors, because
SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD asserts DECL_BIT_FIELD.

The patch skips that for FIELD_DECLs with error_mark_node, another
option would be to check DECL_BIT_FIELD in addition to DECL_C_BIT_FIELD.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-11-05  Jakub Jelinek  

PR c++/112365
* class.cc (layout_class_type): Don't
SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD on FIELD_DECLs with
error_mark_node type.

* g++.dg/cpp0x/pr112365.C: New test.

--- gcc/cp/class.cc.jj  2023-11-04 09:02:35.380001476 +0100
+++ gcc/cp/class.cc 2023-11-04 10:03:34.974075429 +0100
@@ -6962,7 +6962,8 @@ layout_class_type (tree t, tree *virtual
 check_bitfield_decl eventually sets DECL_SIZE (field)
 to that width.  */
  && (DECL_SIZE (field) == NULL_TREE
- || integer_zerop (DECL_SIZE (field
+ || integer_zerop (DECL_SIZE (field)))
+ && TREE_TYPE (field) != error_mark_node)
SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD (field, 1);
   check_non_pod_aggregate (field);
 }
--- gcc/testsuite/g++.dg/cpp0x/pr112365.C.jj2023-11-04 10:05:58.285013791 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/pr112365.C   2023-11-04 10:05:14.879638217 
+0100
@@ -0,0 +1,8 @@
+// PR c++/112365
+// { dg-do compile { target c++11 } }
+// { dg-excess-errors "" }
+
+template  struct A;
+template  A  foo (T;
+template  struct A { constexpr A : T {} }
+struct { bar ( { foo (this)

Jakub



[PATCH] Simplify year::is_leap().

2023-11-05 Thread Cassio Neri
The current implementation returns
(_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
where __is_multiple_of_100 is calculated using an obfuscated algorithm which
saves one ror instruction when compared to _M_y % 100 == 0 [1].

In leap years calculation, it's mathematically correct to replace the
divisibility check by 100 with the one by 25. It turns out that
_M_y % 25 == 0 also saves the ror instruction [2]. Therefore, the
obfuscation is not required.

[1] https://godbolt.org/z/5PaEv6a6b
[2] https://godbolt.org/z/55G8rn77e

libstdc++-v3/ChangeLog:

* include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono
b/libstdc++-v3/include/std/chrono
index 10e868e5a03..a34b3977d59 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -835,29 +835,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr bool
   is_leap() const noexcept
   {
- // Testing divisibility by 100 first gives better performance, that is,
- // return (_M_y % 100 != 0 || _M_y % 400 == 0) && _M_y % 4 == 0;
-
- // It gets even faster if _M_y is in [-536870800, 536870999]
- // (which is the case here) and _M_y % 100 is replaced by
- // __is_multiple_of_100 below.
+ // Testing divisibility by 100 first gives better performance [1], i.e.,
+ // return y % 100 == 0 ? y % 400 == 0 : y % 16 == 0;
+ // Furthermore, if y % 100 == 0, then y % 400 == 0 is equivalent to
+ // y % 16 == 0, so we can simplify it to
+ // return y % 100 == 0 ? y % 16 == 0 : y % 4 == 0. // #1
+ // Similarly, we can replace 100 with 25 (which is good since y % 25 == 0
+ // requires one fewer instruction than y % 100 == 0 [2]):
+ // return y % 25 == 0 ? y % 16 == 0 : y % 4 == 0. // #2
+ // Indeed, first assume y % 4 != 0. Then y % 16 != 0 and hence, y % 4 == 0
+ // and y % 16 == 0 are both false. Therefore, #2 returns false as it
+ // should (regardless of y % 25.) Now assume y % 4 == 0. In this case,
+ // y % 25 == 0 if, and only if, y % 100 == 0, that is, #1 and #2 are
+ // equivalent. Finally, #2 is equivalent to
+ // return (y & (y % 25 == 0 ? 15 : 3)) == 0.

  // References:
  // [1] https://github.com/cassioneri/calendar
- // [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16
-
- // Furthermore, if y%100 == 0, then y%400==0 is equivalent to y%16==0,
- // so we can simplify it to (!mult_100 && y % 4 == 0) || y % 16 == 0,
- // which is equivalent to (y & (mult_100 ? 15 : 3)) == 0.
- // See https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
-
- constexpr uint32_t __multiplier   = 42949673;
- constexpr uint32_t __bound= 42949669;
- constexpr uint32_t __max_dividend = 1073741799;
- constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
- const bool __is_multiple_of_100
-  = __multiplier * (_M_y + __offset) < __bound;
- return (_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
+ // [2] https://godbolt.org/z/55G8rn77e
+ // [3] https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
+
+ return (_M_y & (_M_y % 25 == 0 ? 15 : 3)) == 0;
   }

   explicit constexpr


[pushed] read-rtl: Fix infinite loop while parsing [...]

2023-11-05 Thread Richard Sandiford
read_rtx_operand would spin endlessly for:

   (unspec [(...))] UNSPEC_FOO)

because read_nested_rtx does nothing if the next character is not '('.

Pushed after testing on aarch64-linux-gnu & x86_&4-linux-gnu.

Richard


gcc/
* read-rtl.cc (read_rtx_operand): Avoid spinning endlessly for
invalid [...] operands.
---
 gcc/read-rtl.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/read-rtl.cc b/gcc/read-rtl.cc
index 292f8b72d43..f3b5613dfdb 100644
--- a/gcc/read-rtl.cc
+++ b/gcc/read-rtl.cc
@@ -1896,8 +1896,10 @@ rtx_reader::read_rtx_operand (rtx return_rtx, int idx)
repeat_count--;
value = saved_rtx;
  }
-   else
+   else if (c == '(')
  value = read_nested_rtx ();
+   else
+ fatal_with_file_and_line ("unexpected character in vector");
 
for (; repeat_count > 0; repeat_count--)
  {
-- 
2.25.1



[pushed] mode-switching: Remove unused bbnum field

2023-11-05 Thread Richard Sandiford
seginfo had an unused bbnum field, presumably dating from before
BB information was attached directly to insns.

Pushed as obvious after testing on aarch64-linux-gnu &
x86_64-linux-gnu.

Richard


gcc/
* mode-switching.cc: Remove unused forward references.
(seginfo): Remove bbnum.
(new_seginfo): Remove associated argument.
(optimize_mode_switching): Update calls accordingly.
---
 gcc/mode-switching.cc | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index f483c831c35..c3e4d24de9b 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -65,13 +65,11 @@ along with GCC; see the file COPYING3.  If not see
MODE is the mode this insn must be executed in.
INSN_PTR is the insn to be executed (may be the note that marks the
beginning of a basic block).
-   BBNUM is the flow graph basic block this insn occurs in.
NEXT is the next insn in the same basic block.  */
 struct seginfo
 {
   int mode;
   rtx_insn *insn_ptr;
-  int bbnum;
   struct seginfo *next;
   HARD_REG_SET regs_live;
 };
@@ -84,11 +82,6 @@ struct bb_info
   int mode_in;
 };
 
-static struct seginfo * new_seginfo (int, rtx_insn *, int, HARD_REG_SET);
-static void add_seginfo (struct bb_info *, struct seginfo *);
-static void reg_dies (rtx, HARD_REG_SET *);
-static void reg_becomes_live (rtx, const_rtx, void *);
-
 /* Clear ode I from entity J in bitmap B.  */
 #define clear_mode_bit(b, j, i) \
bitmap_clear_bit (b, (j * max_num_modes) + i)
@@ -148,13 +141,13 @@ commit_mode_sets (struct edge_list *edge_list, int e, 
struct bb_info *info)
 }
 
 /* Allocate a new BBINFO structure, initialized with the MODE, INSN,
-   and basic block BB parameters.
+   and REGS_LIVE parameters.
INSN may not be a NOTE_INSN_BASIC_BLOCK, unless it is an empty
basic block; that allows us later to insert instructions in a FIFO-like
manner.  */
 
 static struct seginfo *
-new_seginfo (int mode, rtx_insn *insn, int bb, HARD_REG_SET regs_live)
+new_seginfo (int mode, rtx_insn *insn, const HARD_REG_SET ®s_live)
 {
   struct seginfo *ptr;
 
@@ -163,7 +156,6 @@ new_seginfo (int mode, rtx_insn *insn, int bb, HARD_REG_SET 
regs_live)
   ptr = XNEW (struct seginfo);
   ptr->mode = mode;
   ptr->insn_ptr = insn;
-  ptr->bbnum = bb;
   ptr->next = NULL;
   ptr->regs_live = regs_live;
   return ptr;
@@ -605,7 +597,7 @@ optimize_mode_switching (void)
gcc_assert (NOTE_INSN_BASIC_BLOCK_P (ins_pos));
if (ins_pos != BB_END (bb))
  ins_pos = NEXT_INSN (ins_pos);
-   ptr = new_seginfo (no_mode, ins_pos, bb->index, live_now);
+   ptr = new_seginfo (no_mode, ins_pos, live_now);
add_seginfo (info + bb->index, ptr);
for (i = 0; i < no_mode; i++)
  clear_mode_bit (transp[bb->index], j, i);
@@ -623,7 +615,7 @@ optimize_mode_switching (void)
{
  any_set_required = true;
  last_mode = mode;
- ptr = new_seginfo (mode, insn, bb->index, live_now);
+ ptr = new_seginfo (mode, insn, live_now);
  add_seginfo (info + bb->index, ptr);
  for (i = 0; i < no_mode; i++)
clear_mode_bit (transp[bb->index], j, i);
@@ -652,7 +644,7 @@ optimize_mode_switching (void)
 mark the block as nontransparent.  */
  if (!any_set_required)
{
- ptr = new_seginfo (no_mode, BB_END (bb), bb->index, live_now);
+ ptr = new_seginfo (no_mode, BB_END (bb), live_now);
  add_seginfo (info + bb->index, ptr);
  if (last_mode != no_mode)
for (i = 0; i < no_mode; i++)
-- 
2.25.1



[PATCH] explow: Allow dynamic allocations after vregs

2023-11-05 Thread Richard Sandiford
This patch allows allocate_dynamic_stack_space to be called before
or after virtual registers have been instantiated.  It uses the
same approach as allocate_stack_local, which already supported this.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard


gcc/
* function.h (get_stack_dynamic_offset): Declare.
* function.cc (get_stack_dynamic_offset): New function,
split out from...
(get_stack_dynamic_offset): ...here.
* explow.cc (allocate_dynamic_stack_space): Handle calls made
after virtual registers have been instantiated.
---
 gcc/explow.cc   | 10 +++---
 gcc/function.cc | 12 +++-
 gcc/function.h  |  1 +
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/gcc/explow.cc b/gcc/explow.cc
index 0c03ac350bb..aa64d5e906c 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -1375,12 +1375,16 @@ allocate_dynamic_stack_space (rtx size, unsigned 
size_align,
   HOST_WIDE_INT stack_usage_size = -1;
   rtx_code_label *final_label;
   rtx final_target, target;
+  rtx addr = (virtuals_instantiated
+ ? plus_constant (Pmode, stack_pointer_rtx,
+  get_stack_dynamic_offset ())
+ : virtual_stack_dynamic_rtx);
 
   /* If we're asking for zero bytes, it doesn't matter what we point
  to since we can't dereference it.  But return a reasonable
  address anyway.  */
   if (size == const0_rtx)
-return virtual_stack_dynamic_rtx;
+return addr;
 
   /* Otherwise, show we're calling alloca or equivalent.  */
   cfun->calls_alloca = 1;
@@ -1532,7 +1536,7 @@ allocate_dynamic_stack_space (rtx size, unsigned 
size_align,
   poly_int64 saved_stack_pointer_delta;
 
   if (!STACK_GROWS_DOWNWARD)
-   emit_move_insn (target, virtual_stack_dynamic_rtx);
+   emit_move_insn (target, force_operand (addr, target));
 
   /* Check stack bounds if necessary.  */
   if (crtl->limit_stack)
@@ -1575,7 +1579,7 @@ allocate_dynamic_stack_space (rtx size, unsigned 
size_align,
   stack_pointer_delta = saved_stack_pointer_delta;
 
   if (STACK_GROWS_DOWNWARD)
-   emit_move_insn (target, virtual_stack_dynamic_rtx);
+   emit_move_insn (target, force_operand (addr, target));
 }
 
   suppress_reg_args_size = false;
diff --git a/gcc/function.cc b/gcc/function.cc
index afb0b33da9e..527ea4807b0 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -1943,6 +1943,16 @@ instantiate_decls (tree fndecl)
   vec_free (cfun->local_decls);
 }
 
+/* Return the value of STACK_DYNAMIC_OFFSET for the current function.
+   This is done through a function wrapper so that the macro sees a
+   predictable set of included files.  */
+
+poly_int64
+get_stack_dynamic_offset ()
+{
+  return STACK_DYNAMIC_OFFSET (current_function_decl);
+}
+
 /* Pass through the INSNS of function FNDECL and convert virtual register
references to hard register references.  */
 
@@ -1954,7 +1964,7 @@ instantiate_virtual_regs (void)
   /* Compute the offsets to use for this function.  */
   in_arg_offset = FIRST_PARM_OFFSET (current_function_decl);
   var_offset = targetm.starting_frame_offset ();
-  dynamic_offset = STACK_DYNAMIC_OFFSET (current_function_decl);
+  dynamic_offset = get_stack_dynamic_offset ();
   out_arg_offset = STACK_POINTER_OFFSET;
 #ifdef FRAME_POINTER_CFA_OFFSET
   cfa_offset = FRAME_POINTER_CFA_OFFSET (current_function_decl);
diff --git a/gcc/function.h b/gcc/function.h
index 5caf1e153ea..29846564bc6 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -715,6 +715,7 @@ extern vec convert_jumps_to_returns (basic_block 
last_bb, bool simple_p,
 extern basic_block emit_return_for_exit (edge exit_fallthru_edge,
 bool simple_p);
 extern void reposition_prologue_and_epilogue_notes (void);
+extern poly_int64 get_stack_dynamic_offset ();
 
 /* Returns the name of the current function.  */
 extern const char *fndecl_name (tree);
-- 
2.25.1



[PATCH] explow: Avoid unnecessary alignment operations

2023-11-05 Thread Richard Sandiford
align_dynamic_address would output alignment operations even
for a required alignment of 1 byte.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard


gcc/
* explow.cc (align_dynamic_address): Do nothing if the required
alignment is a byte.
---
 gcc/explow.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/explow.cc b/gcc/explow.cc
index aa64d5e906c..0be6d2629c9 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -1201,6 +1201,9 @@ record_new_stack_level (void)
 rtx
 align_dynamic_address (rtx target, unsigned required_align)
 {
+  if (required_align == BITS_PER_UNIT)
+return target;
+
   /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
  but we know it can't.  So add ourselves and then do
  TRUNC_DIV_EXPR.  */
-- 
2.25.1



[PATCH 00/12] Tweaks and extensions to the mode-switching pass

2023-11-05 Thread Richard Sandiford
This series of patches extends the mode-switching pass so that it
can be used for AArch64's SME.  I wondered about including a detailed
description of how the SME mode changes work, but it'd probably be
a distraction.  The system is quite complex and target-specific, and
hopefully the details aren't necessary to understand the motivation.

One of the main requirements for one of the mode-switched "entities" is
that the current mode must always be known at compile time.  It would be
too cumbersome to work out the current mode at runtime and make a dynamic
choice about what to do.  The entity therefore wants the usual LCM
placement where possible, but would rather have redundant mode
transitions than transitions from unknown modes.

In many cases, the modified pass seems to generate optimal or near-optimal
mode-switching code, even with these additional requirements.  Tests are
included with the SME work.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu, although only the latter is useful since
AArch64 doesn't yet use the pass.  Also tested by building crosses
for epiphany-elf, riscv64-elf and sh-linux-gnu, to pick one triplet
per other target that uses mode switching.

OK to install?

Thanks,
Richard

Richard Sandiford (12):
  mode-switching: Tweak the macro/hook documentation
  mode-switching: Add note problem
  mode-switching: Avoid quadractic list operation
  mode-switching: Fix the mode passed to the emit hook
  mode-switching: Simplify recording of transparency
  mode-switching: Tweak entry/exit handling
  mode-switching: Allow targets to set the mode for EH handlers
  mode-switching: Pass set of live registers to the needed hook
  mode-switching: Pass the set of live registers to the after hook
  mode-switching: Use 1-based edge aux fields
  mode-switching: Add a target-configurable confluence operator
  mode-switching: Add a backprop hook

 gcc/config/epiphany/epiphany-protos.h  |   7 +-
 gcc/config/epiphany/epiphany.cc|   7 +-
 gcc/config/epiphany/mode-switch-use.cc |   2 +-
 gcc/config/i386/i386.cc|   4 +-
 gcc/config/riscv/riscv.cc  |   4 +-
 gcc/config/sh/sh.cc|   9 +-
 gcc/doc/tm.texi| 126 --
 gcc/doc/tm.texi.in |  32 +-
 gcc/mode-switching.cc  | 582 +
 gcc/target.def | 103 -
 10 files changed, 714 insertions(+), 162 deletions(-)

-- 
2.25.1



[PATCH 01/12] mode-switching: Tweak the macro/hook documentation

2023-11-05 Thread Richard Sandiford
I found the documentation for the mode-switching macros/hooks
a bit hard to follow at first.  This patch tries to add the
information that I think would have made it easier to understand.

Of course, documentation preferences are personal, and so I could
be changing something that others understood to something that
seems impenetrable.

Some notes on specific changes:

- "in an optimizing compilation" didn't seem accurate; the pass
  is run even at -O0, and often needs to be for correctness.

- "at run time" meant when the compiler was run, rather than when
  the compiled code was run.

- Removing the list of optional macros isn't a clarification,
  but it means that upcoming patches don't create an absurdly
  long list.

- I don't really understand the purpose of TARGET_MODE_PRIORITY,
  so I mostly left that alone.

gcc/
* target.def: Tweak documentation of mode-switching hooks.
* doc/tm.texi.in (OPTIMIZE_MODE_SWITCHING): Tweak documentation.
(NUM_MODES_FOR_MODE_SWITCHING): Likewise.
* doc/tm.texi: Regenerate.
---
 gcc/doc/tm.texi| 69 --
 gcc/doc/tm.texi.in | 26 +
 gcc/target.def | 43 ++---
 3 files changed, 84 insertions(+), 54 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f7ac806ff15..759331a2c96 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10368,7 +10368,7 @@ The following macros control mode switching 
optimizations:
 
 @defmac OPTIMIZE_MODE_SWITCHING (@var{entity})
 Define this macro if the port needs extra instructions inserted for mode
-switching in an optimizing compilation.
+switching.
 
 For an example, the SH4 can perform both single and double precision
 floating point operations, but to perform a single precision operation,
@@ -10378,73 +10378,88 @@ purpose register as a scratch register, hence these 
FPSCR sets have to
 be inserted before reload, i.e.@: you cannot put this into instruction emitting
 or @code{TARGET_MACHINE_DEPENDENT_REORG}.
 
-You can have multiple entities that are mode-switched, and select at run time
-which entities actually need it.  @code{OPTIMIZE_MODE_SWITCHING} should
-return nonzero for any @var{entity} that needs mode-switching.
+You can have multiple entities that are mode-switched, some of which might
+only be needed conditionally.  The entities are identified by their index
+into the @code{NUM_MODES_FOR_MODE_SWITCHING} initializer, with the length
+of the initializer determining the number of entities.
+
+@code{OPTIMIZE_MODE_SWITCHING} should return nonzero for any @var{entity}
+that needs mode-switching.
+
 If you define this macro, you also have to define
 @code{NUM_MODES_FOR_MODE_SWITCHING}, @code{TARGET_MODE_NEEDED},
 @code{TARGET_MODE_PRIORITY} and @code{TARGET_MODE_EMIT}.
-@code{TARGET_MODE_AFTER}, @code{TARGET_MODE_ENTRY}, and @code{TARGET_MODE_EXIT}
-are optional.
+The other macros in this section are optional.
 @end defmac
 
 @defmac NUM_MODES_FOR_MODE_SWITCHING
 If you define @code{OPTIMIZE_MODE_SWITCHING}, you have to define this as
 initializer for an array of integers.  Each initializer element
 N refers to an entity that needs mode switching, and specifies the number
-of different modes that might need to be set for this entity.
-The position of the initializer in the initializer---starting counting at
+of different modes that are defined for that entity.
+The position of the element in the initializer---starting counting at
 zero---determines the integer that is used to refer to the mode-switched
 entity in question.
-In macros that take mode arguments / yield a mode result, modes are
-represented as numbers 0 @dots{} N @minus{} 1.  N is used to specify that no 
mode
-switch is needed / supplied.
+Modes are represented as numbers 0 @dots{} N @minus{} 1.
+In mode arguments and return values, N either represents an unknown
+mode or ``no mode'', depending on context.
 @end defmac
 
 @deftypefn {Target Hook} void TARGET_MODE_EMIT (int @var{entity}, int 
@var{mode}, int @var{prev_mode}, HARD_REG_SET @var{regs_live})
 Generate one or more insns to set @var{entity} to @var{mode}.
 @var{hard_reg_live} is the set of hard registers live at the point where
 the insn(s) are to be inserted. @var{prev_moxde} indicates the mode
-to switch from. Sets of a lower numbered entity will be emitted before
+to switch from, or is the number of modes if the previous mode is not
+known.  Sets of a lower numbered entity will be emitted before
 sets of a higher numbered entity to a mode of the same or lower priority.
 @end deftypefn
 
 @deftypefn {Target Hook} int TARGET_MODE_NEEDED (int @var{entity}, rtx_insn 
*@var{insn})
 @var{entity} is an integer specifying a mode-switched entity.
-If @code{OPTIMIZE_MODE_SWITCHING} is defined, you must define this macro
-to return an integer value not larger than the corresponding element
-in @code{NUM_MODES_FOR_MODE_SWITCHING}, to denote the mode that @var{entity}
-must be switched in

[PATCH 02/12] mode-switching: Add note problem

2023-11-05 Thread Richard Sandiford
optimize_mode_switching uses REG_DEAD notes to track register
liveness, but it failed to tell DF to calculate up-to-date notes.

Noticed by inspection.  I don't have a testcase that fails
because of this.

gcc/
* mode-switching.cc (optimize_mode_switching): Call
df_note_add_problem.
---

I was tempted to apply this as obvious, but wasn't sure if I was
missing something.

 gcc/mode-switching.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index c3e4d24de9b..8577069bde1 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -541,6 +541,7 @@ optimize_mode_switching (void)
   pre_exit = create_pre_exit (n_entities, entity_map, num_modes);
 }
 
+  df_note_add_problem ();
   df_analyze ();
 
   /* Create the bitmap vectors.  */
-- 
2.25.1



[PATCH 03/12] mode-switching: Avoid quadractic list operation

2023-11-05 Thread Richard Sandiford
add_seginfo chained insn information to the end of a list
by starting at the head of the list.  This patch avoids the
quadraticness by keeping track of the tail pointer.

gcc/
* mode-switching.cc (add_seginfo): Replace head pointer with
a pointer to the tail pointer.
(optimize_mode_switching): Update calls accordingly.
---
 gcc/mode-switching.cc | 24 
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index 8577069bde1..bebe89d5fd2 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -162,23 +162,14 @@ new_seginfo (int mode, rtx_insn *insn, const HARD_REG_SET 
®s_live)
 }
 
 /* Add a seginfo element to the end of a list.
-   HEAD is a pointer to the list beginning.
+   TAIL is a pointer to the list's null terminator.
INFO is the structure to be linked in.  */
 
 static void
-add_seginfo (struct bb_info *head, struct seginfo *info)
+add_seginfo (struct seginfo ***tail_ptr, struct seginfo *info)
 {
-  struct seginfo *ptr;
-
-  if (head->seginfo == NULL)
-head->seginfo = info;
-  else
-{
-  ptr = head->seginfo;
-  while (ptr->next != NULL)
-   ptr = ptr->next;
-  ptr->next = info;
-}
+  **tail_ptr = info;
+  *tail_ptr = &info->next;
 }
 
 /* Record in LIVE that register REG died.  */
@@ -574,6 +565,7 @@ optimize_mode_switching (void)
 Also compute the initial transparency settings.  */
   FOR_EACH_BB_FN (bb, cfun)
{
+ struct seginfo **tail_ptr = &info[bb->index].seginfo;
  struct seginfo *ptr;
  int last_mode = no_mode;
  bool any_set_required = false;
@@ -599,7 +591,7 @@ optimize_mode_switching (void)
if (ins_pos != BB_END (bb))
  ins_pos = NEXT_INSN (ins_pos);
ptr = new_seginfo (no_mode, ins_pos, live_now);
-   add_seginfo (info + bb->index, ptr);
+   add_seginfo (&tail_ptr, ptr);
for (i = 0; i < no_mode; i++)
  clear_mode_bit (transp[bb->index], j, i);
  }
@@ -617,7 +609,7 @@ optimize_mode_switching (void)
  any_set_required = true;
  last_mode = mode;
  ptr = new_seginfo (mode, insn, live_now);
- add_seginfo (info + bb->index, ptr);
+ add_seginfo (&tail_ptr, ptr);
  for (i = 0; i < no_mode; i++)
clear_mode_bit (transp[bb->index], j, i);
}
@@ -646,7 +638,7 @@ optimize_mode_switching (void)
  if (!any_set_required)
{
  ptr = new_seginfo (no_mode, BB_END (bb), live_now);
- add_seginfo (info + bb->index, ptr);
+ add_seginfo (&tail_ptr, ptr);
  if (last_mode != no_mode)
for (i = 0; i < no_mode; i++)
  clear_mode_bit (transp[bb->index], j, i);
-- 
2.25.1



[PATCH 04/12] mode-switching: Fix the mode passed to the emit hook

2023-11-05 Thread Richard Sandiford
optimize_mode_switching passes an entity's current mode (if known)
to the emit hook.  However, the mode that it passed ignored the
effect of the after hook.  Instead, the mode for the first emit
call in a block was taken from the incoming mode, whereas the
mode for each subsequent emit call was taken from the result
of the previous call.

The previous pass through the insns already calculated the
correct mode, so this patch records it in the seginfo structure.
(There was a 32-bit hole on 64-bit hosts, so this doesn't increase
the size of the structure for them.)

gcc/
* mode-switching.cc (seginfo): Add a prev_mode field.
(new_seginfo): Take and initialize the prev_mode.
(optimize_mode_switching): Update calls accordingly.
Use the recorded modes during the emit phase, rather than
computing one on the fly.
---
 gcc/mode-switching.cc | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index bebe89d5fd2..12ddbd6adfa 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -68,6 +68,7 @@ along with GCC; see the file COPYING3.  If not see
NEXT is the next insn in the same basic block.  */
 struct seginfo
 {
+  int prev_mode;
   int mode;
   rtx_insn *insn_ptr;
   struct seginfo *next;
@@ -140,20 +141,22 @@ commit_mode_sets (struct edge_list *edge_list, int e, 
struct bb_info *info)
   return need_commit;
 }
 
-/* Allocate a new BBINFO structure, initialized with the MODE, INSN,
-   and REGS_LIVE parameters.
+/* Allocate a new BBINFO structure, initialized with the PREV_MODE, MODE,
+   INSN, and REGS_LIVE parameters.
INSN may not be a NOTE_INSN_BASIC_BLOCK, unless it is an empty
basic block; that allows us later to insert instructions in a FIFO-like
manner.  */
 
 static struct seginfo *
-new_seginfo (int mode, rtx_insn *insn, const HARD_REG_SET ®s_live)
+new_seginfo (int prev_mode, int mode, rtx_insn *insn,
+const HARD_REG_SET ®s_live)
 {
   struct seginfo *ptr;
 
   gcc_assert (!NOTE_INSN_BASIC_BLOCK_P (insn)
  || insn == BB_END (NOTE_BASIC_BLOCK (insn)));
   ptr = XNEW (struct seginfo);
+  ptr->prev_mode = prev_mode;
   ptr->mode = mode;
   ptr->insn_ptr = insn;
   ptr->next = NULL;
@@ -590,7 +593,7 @@ optimize_mode_switching (void)
gcc_assert (NOTE_INSN_BASIC_BLOCK_P (ins_pos));
if (ins_pos != BB_END (bb))
  ins_pos = NEXT_INSN (ins_pos);
-   ptr = new_seginfo (no_mode, ins_pos, live_now);
+   ptr = new_seginfo (no_mode, no_mode, ins_pos, live_now);
add_seginfo (&tail_ptr, ptr);
for (i = 0; i < no_mode; i++)
  clear_mode_bit (transp[bb->index], j, i);
@@ -606,12 +609,12 @@ optimize_mode_switching (void)
 
  if (mode != no_mode && mode != last_mode)
{
- any_set_required = true;
- last_mode = mode;
- ptr = new_seginfo (mode, insn, live_now);
+ ptr = new_seginfo (last_mode, mode, insn, live_now);
  add_seginfo (&tail_ptr, ptr);
  for (i = 0; i < no_mode; i++)
clear_mode_bit (transp[bb->index], j, i);
+ any_set_required = true;
+ last_mode = mode;
}
 
  if (targetm.mode_switching.after)
@@ -637,7 +640,7 @@ optimize_mode_switching (void)
 mark the block as nontransparent.  */
  if (!any_set_required)
{
- ptr = new_seginfo (no_mode, BB_END (bb), live_now);
+ ptr = new_seginfo (last_mode, no_mode, BB_END (bb), live_now);
  add_seginfo (&tail_ptr, ptr);
  if (last_mode != no_mode)
for (i = 0; i < no_mode; i++)
@@ -778,9 +781,9 @@ optimize_mode_switching (void)
   FOR_EACH_BB_FN (bb, cfun)
{
  struct seginfo *ptr, *next;
- int cur_mode = bb_info[j][bb->index].mode_in;
+ struct seginfo *first = bb_info[j][bb->index].seginfo;
 
- for (ptr = bb_info[j][bb->index].seginfo; ptr; ptr = next)
+ for (ptr = first; ptr; ptr = next)
{
  next = ptr->next;
  if (ptr->mode != no_mode)
@@ -790,14 +793,15 @@ optimize_mode_switching (void)
  rtl_profile_for_bb (bb);
  start_sequence ();
 
+ int cur_mode = (ptr == first && ptr->prev_mode == no_mode
+ ? bb_info[j][bb->index].mode_in
+ : ptr->prev_mode);
+
  targetm.mode_switching.emit (entity_map[j], ptr->mode,
   cur_mode, ptr->regs_live);
  mode_set = get_insns ();
  end_sequence ();
 
- /* modes kill each other inside a basic block.  */
-

[PATCH 05/12] mode-switching: Simplify recording of transparency

2023-11-05 Thread Richard Sandiford
For a given block, an entity is either transparent for
all modes or for none.  Each update to the transparency set
therefore used a loop like:

for (i = 0; i < no_mode; i++)
  clear_mode_bit (transp[bb->index], j, i);

This patch instead starts out with a bit-per-block bitmap
and updates the main bitmap at the end.

This isn't much of a simplification on its own.  The main
purpose is to simplify later patches.

gcc/
* mode-switching.cc (optimize_mode_switching): Initially
compute transparency in a bit-per-block bitmap.
---
 gcc/mode-switching.cc | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index 12ddbd6adfa..03dd4c1ebe4 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -556,6 +556,8 @@ optimize_mode_switching (void)
   bitmap_vector_clear (antic, last_basic_block_for_fn (cfun));
   bitmap_vector_clear (comp, last_basic_block_for_fn (cfun));
 
+  auto_sbitmap transp_all (last_basic_block_for_fn (cfun));
+
   for (j = n_entities - 1; j >= 0; j--)
 {
   int e = entity_map[j];
@@ -563,6 +565,8 @@ optimize_mode_switching (void)
   struct bb_info *info = bb_info[j];
   rtx_insn *insn;
 
+  bitmap_ones (transp_all);
+
   /* Determine what the first use (if any) need for a mode of entity E is.
 This will be the mode that is anticipatable for this block.
 Also compute the initial transparency settings.  */
@@ -595,8 +599,7 @@ optimize_mode_switching (void)
  ins_pos = NEXT_INSN (ins_pos);
ptr = new_seginfo (no_mode, no_mode, ins_pos, live_now);
add_seginfo (&tail_ptr, ptr);
-   for (i = 0; i < no_mode; i++)
- clear_mode_bit (transp[bb->index], j, i);
+   bitmap_clear_bit (transp_all, bb->index);
  }
  }
 
@@ -611,8 +614,7 @@ optimize_mode_switching (void)
{
  ptr = new_seginfo (last_mode, mode, insn, live_now);
  add_seginfo (&tail_ptr, ptr);
- for (i = 0; i < no_mode; i++)
-   clear_mode_bit (transp[bb->index], j, i);
+ bitmap_clear_bit (transp_all, bb->index);
  any_set_required = true;
  last_mode = mode;
}
@@ -643,8 +645,7 @@ optimize_mode_switching (void)
  ptr = new_seginfo (last_mode, no_mode, BB_END (bb), live_now);
  add_seginfo (&tail_ptr, ptr);
  if (last_mode != no_mode)
-   for (i = 0; i < no_mode; i++)
- clear_mode_bit (transp[bb->index], j, i);
+   bitmap_clear_bit (transp_all, bb->index);
}
}
   if (targetm.mode_switching.entry && targetm.mode_switching.exit)
@@ -667,8 +668,7 @@ optimize_mode_switching (void)
 an extra check in make_preds_opaque.  We also
 need this to avoid confusing pre_edge_lcm when
 antic is cleared but transp and comp are set.  */
- for (i = 0; i < no_mode; i++)
-   clear_mode_bit (transp[bb->index], j, i);
+ bitmap_clear_bit (transp_all, bb->index);
 
  /* Insert a fake computing definition of MODE into entry
 blocks which compute no mode. This represents the mode on
@@ -688,6 +688,9 @@ optimize_mode_switching (void)
 
  FOR_EACH_BB_FN (bb, cfun)
{
+ if (!bitmap_bit_p (transp_all, bb->index))
+   clear_mode_bit (transp[bb->index], j, m);
+
  if (info[bb->index].seginfo->mode == m)
set_mode_bit (antic[bb->index], j, m);
 
-- 
2.25.1



[PATCH 06/12] mode-switching: Tweak entry/exit handling

2023-11-05 Thread Richard Sandiford
An entity isn't transparent in a block that requires a specific mode.
optimize_mode_switching took that into account for normal insns,
but didn't for the exit block.  Later patches misbehaved because
of this.

In contrast, an entity was correctly marked as non-transparent
in the entry block, but the reasoning seemed a bit convoluted.
It also referred to a function that no longer exists.
Since KILL = ~TRANSP, the entity is by definition not transparent
in a block that defines the entity, so I think we can make it so
without comment.

Finally, the exit handling was nested in the entry handling,
but that doesn't seem necessary.  A target could say that an
entity is undefined on entry but must be defined on return,
on a "be liberal in what you accept, be conservative in what
you do" principle.

gcc/
* mode-switching.cc (optimize_mode_switching): Mark the exit
block as nontransparent if it requires a specific mode.
Handle the entry and exit mode as sibling rather than nested
concepts.  Remove outdated comment.
---
 gcc/mode-switching.cc | 34 +++---
 1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index 03dd4c1ebe4..1145350ca26 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -650,34 +650,30 @@ optimize_mode_switching (void)
}
   if (targetm.mode_switching.entry && targetm.mode_switching.exit)
{
- int mode = targetm.mode_switching.entry (e);
-
  info[post_entry->index].mode_out =
info[post_entry->index].mode_in = no_mode;
- if (pre_exit)
-   {
- info[pre_exit->index].mode_out =
-   info[pre_exit->index].mode_in = no_mode;
-   }
 
+ int mode = targetm.mode_switching.entry (e);
  if (mode != no_mode)
{
- bb = post_entry;
-
- /* By always making this nontransparent, we save
-an extra check in make_preds_opaque.  We also
-need this to avoid confusing pre_edge_lcm when
-antic is cleared but transp and comp are set.  */
- bitmap_clear_bit (transp_all, bb->index);
-
  /* Insert a fake computing definition of MODE into entry
 blocks which compute no mode. This represents the mode on
 entry.  */
- info[bb->index].computing = mode;
+ info[post_entry->index].computing = mode;
+ bitmap_clear_bit (transp_all, post_entry->index);
+   }
 
- if (pre_exit)
-   info[pre_exit->index].seginfo->mode =
- targetm.mode_switching.exit (e);
+ if (pre_exit)
+   {
+ info[pre_exit->index].mode_out =
+   info[pre_exit->index].mode_in = no_mode;
+
+ int mode = targetm.mode_switching.exit (e);
+ if (mode != no_mode)
+   {
+ info[pre_exit->index].seginfo->mode = mode;
+ bitmap_clear_bit (transp_all, pre_exit->index);
+   }
}
}
 
-- 
2.25.1



[PATCH 07/12] mode-switching: Allow targets to set the mode for EH handlers

2023-11-05 Thread Richard Sandiford
The mode-switching pass already had hooks to say what mode
an entity is in on entry to a function and what mode it must
be in on return.  For SME, we also want to say what mode an
entity is guaranteed to be in on entry to an exception handler.

gcc/
* target.def (mode_switching.eh_handler): New hook.
* doc/tm.texi.in (TARGET_MODE_EH_HANDLER): New @hook.
* doc/tm.texi: Regenerate.
* mode-switching.cc (optimize_mode_switching): Use eh_handler
to get the mode on entry to an exception handler.
---
 gcc/doc/tm.texi   | 6 ++
 gcc/doc/tm.texi.in| 2 ++
 gcc/mode-switching.cc | 5 -
 gcc/target.def| 7 +++
 4 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 759331a2c96..1a825c5004e 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10455,6 +10455,12 @@ If @code{TARGET_MODE_EXIT} is defined then 
@code{TARGET_MODE_ENTRY}
 must be defined.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_MODE_EH_HANDLER (int @var{entity})
+If this hook is defined, it should return the mode that @var{entity} is
+guaranteed to be in on entry to an exception handler, or the number of modes
+if there is no such guarantee.
+@end deftypefn
+
 @deftypefn {Target Hook} int TARGET_MODE_PRIORITY (int @var{entity}, int 
@var{n})
 This hook specifies the order in which modes for @var{entity}
 are processed. 0 is the highest priority,
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index a7b7aa289d8..5360c1bb2d8 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -6979,6 +6979,8 @@ mode or ``no mode'', depending on context.
 
 @hook TARGET_MODE_EXIT
 
+@hook TARGET_MODE_EH_HANDLER
+
 @hook TARGET_MODE_PRIORITY
 
 @node Target Attributes
diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index 1145350ca26..b8a887d81f7 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -597,7 +597,10 @@ optimize_mode_switching (void)
gcc_assert (NOTE_INSN_BASIC_BLOCK_P (ins_pos));
if (ins_pos != BB_END (bb))
  ins_pos = NEXT_INSN (ins_pos);
-   ptr = new_seginfo (no_mode, no_mode, ins_pos, live_now);
+   if (bb_has_eh_pred (bb)
+   && targetm.mode_switching.eh_handler)
+ last_mode = targetm.mode_switching.eh_handler (e);
+   ptr = new_seginfo (no_mode, last_mode, ins_pos, live_now);
add_seginfo (&tail_ptr, ptr);
bitmap_clear_bit (transp_all, bb->index);
  }
diff --git a/gcc/target.def b/gcc/target.def
index 3dae33522f1..a70275b8abd 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -7070,6 +7070,13 @@ If @code{TARGET_MODE_EXIT} is defined then 
@code{TARGET_MODE_ENTRY}\n\
 must be defined.",
  int, (int entity), NULL)
 
+DEFHOOK
+(eh_handler,
+ "If this hook is defined, it should return the mode that @var{entity} is\n\
+guaranteed to be in on entry to an exception handler, or the number of modes\n\
+if there is no such guarantee.",
+ int, (int entity), NULL)
+
 DEFHOOK
 (priority,
  "This hook specifies the order in which modes for @var{entity}\n\
-- 
2.25.1



[PATCH 08/12] mode-switching: Pass set of live registers to the needed hook

2023-11-05 Thread Richard Sandiford
The emit hook already takes the set of live hard registers as input.
This patch passes it to the needed hook too.  SME uses this to
optimise the mode choice based on whether state is live or dead.

The main caller already had access to the required info, but the
special handling of return values did not.

gcc/
* target.def (mode_switching.needed): Add a regs_live parameter.
* doc/tm.texi: Regenerate.
* config/epiphany/epiphany-protos.h (epiphany_mode_needed): Update
accordingly.
* config/epiphany/epiphany.cc (epiphany_mode_needed): Likewise.
* config/epiphany/mode-switch-use.cc (insert_uses): Likewise.
* config/i386/i386.cc (ix86_mode_needed): Likewise.
* config/riscv/riscv.cc (riscv_mode_needed): Likewise.
* config/sh/sh.cc (sh_mode_needed): Likewise.
* mode-switching.cc (optimize_mode_switching): Likewise.
(create_pre_exit): Likewise, using the DF simulate functions
to calculate the required information.
---
 gcc/config/epiphany/epiphany-protos.h  |  4 +++-
 gcc/config/epiphany/epiphany.cc|  2 +-
 gcc/config/epiphany/mode-switch-use.cc |  2 +-
 gcc/config/i386/i386.cc|  2 +-
 gcc/config/riscv/riscv.cc  |  2 +-
 gcc/config/sh/sh.cc|  4 ++--
 gcc/doc/tm.texi|  5 +++--
 gcc/mode-switching.cc  | 14 --
 gcc/target.def |  5 +++--
 9 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/gcc/config/epiphany/epiphany-protos.h 
b/gcc/config/epiphany/epiphany-protos.h
index 72c141c1a6d..ef49a1e06a4 100644
--- a/gcc/config/epiphany/epiphany-protos.h
+++ b/gcc/config/epiphany/epiphany-protos.h
@@ -44,7 +44,9 @@ extern void emit_set_fp_mode (int entity, int mode, int 
prev_mode,
 #endif
 extern void epiphany_insert_mode_switch_use (rtx_insn *insn, int, int);
 extern void epiphany_expand_set_fp_mode (rtx *operands);
-extern int epiphany_mode_needed (int entity, rtx_insn *insn);
+#ifdef HARD_CONST
+extern int epiphany_mode_needed (int entity, rtx_insn *insn, HARD_REG_SET);
+#endif
 extern int epiphany_mode_after (int entity, int last_mode, rtx_insn *insn);
 extern bool epiphany_epilogue_uses (int regno);
 extern bool epiphany_optimize_mode_switching (int entity);
diff --git a/gcc/config/epiphany/epiphany.cc b/gcc/config/epiphany/epiphany.cc
index a5460dbf97f..60a9b49d8a4 100644
--- a/gcc/config/epiphany/epiphany.cc
+++ b/gcc/config/epiphany/epiphany.cc
@@ -2400,7 +2400,7 @@ epiphany_mode_priority (int entity, int priority)
 }
 
 int
-epiphany_mode_needed (int entity, rtx_insn *insn)
+epiphany_mode_needed (int entity, rtx_insn *insn, HARD_REG_SET)
 {
   enum attr_fp_mode mode;
 
diff --git a/gcc/config/epiphany/mode-switch-use.cc 
b/gcc/config/epiphany/mode-switch-use.cc
index 71530612658..183b9b7a394 100644
--- a/gcc/config/epiphany/mode-switch-use.cc
+++ b/gcc/config/epiphany/mode-switch-use.cc
@@ -58,7 +58,7 @@ insert_uses (void)
{
  if (!INSN_P (insn))
continue;
- mode = epiphany_mode_needed (e, insn);
+ mode = epiphany_mode_needed (e, insn, {});
  if (mode == no_mode)
continue;
  if (target_insert_mode_switch_use)
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index fdc9362cf5b..7a5a9a966e8 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -15061,7 +15061,7 @@ ix86_i387_mode_needed (int entity, rtx_insn *insn)
prior to the execution of insn.  */
 
 static int
-ix86_mode_needed (int entity, rtx_insn *insn)
+ix86_mode_needed (int entity, rtx_insn *insn, HARD_REG_SET)
 {
   switch (entity)
 {
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 08ff05dcc3f..f915de7ed56 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9413,7 +9413,7 @@ riscv_frm_mode_needed (rtx_insn *cur_insn, int code)
prior to the execution of insn.  */
 
 static int
-riscv_mode_needed (int entity, rtx_insn *insn)
+riscv_mode_needed (int entity, rtx_insn *insn, HARD_REG_SET)
 {
   int code = recog_memoized (insn);
 
diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
index 294faf7c0c3..c363490e852 100644
--- a/gcc/config/sh/sh.cc
+++ b/gcc/config/sh/sh.cc
@@ -195,7 +195,7 @@ static int calc_live_regs (HARD_REG_SET *);
 static HOST_WIDE_INT rounded_frame_size (int);
 static bool sh_frame_pointer_required (void);
 static void sh_emit_mode_set (int, int, int, HARD_REG_SET);
-static int sh_mode_needed (int, rtx_insn *);
+static int sh_mode_needed (int, rtx_insn *, HARD_REG_SET);
 static int sh_mode_after (int, int, rtx_insn *);
 static int sh_mode_entry (int);
 static int sh_mode_exit (int);
@@ -12531,7 +12531,7 @@ sh_emit_mode_set (int entity ATTRIBUTE_UNUSED, int mode,
 }
 
 static int
-sh_mode_needed (int entity ATTRIBUTE_UNUSED, rtx_insn *insn)
+sh_mode_needed (int entity ATTRIBUTE_UNUSED, rtx_insn *insn, HARD_REG_SET)
 {
   return recog_memoized (insn

[PATCH 09/12] mode-switching: Pass the set of live registers to the after hook

2023-11-05 Thread Richard Sandiford
This patch passes the set of live hard registers to the after hook,
like the previous one did for the needed hook.

gcc/
* target.def (mode_switching.after): Add a regs_live parameter.
* doc/tm.texi: Regenerate.
* config/epiphany/epiphany-protos.h (epiphany_mode_after): Update
accordingly.
* config/epiphany/epiphany.cc (epiphany_mode_needed): Likewise.
(epiphany_mode_after): Likewise.
* config/i386/i386.cc (ix86_mode_after): Likewise.
* config/riscv/riscv.cc (riscv_mode_after): Likewise.
* config/sh/sh.cc (sh_mode_after): Likewise.
* mode-switching.cc (optimize_mode_switching): Likewise.
---
 gcc/config/epiphany/epiphany-protos.h | 3 ++-
 gcc/config/epiphany/epiphany.cc   | 5 +++--
 gcc/config/i386/i386.cc   | 2 +-
 gcc/config/riscv/riscv.cc | 2 +-
 gcc/config/sh/sh.cc   | 5 +++--
 gcc/doc/tm.texi   | 4 +++-
 gcc/mode-switching.cc | 8 
 gcc/target.def| 4 +++-
 8 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/gcc/config/epiphany/epiphany-protos.h 
b/gcc/config/epiphany/epiphany-protos.h
index ef49a1e06a4..ff8987ea99e 100644
--- a/gcc/config/epiphany/epiphany-protos.h
+++ b/gcc/config/epiphany/epiphany-protos.h
@@ -46,8 +46,9 @@ extern void epiphany_insert_mode_switch_use (rtx_insn *insn, 
int, int);
 extern void epiphany_expand_set_fp_mode (rtx *operands);
 #ifdef HARD_CONST
 extern int epiphany_mode_needed (int entity, rtx_insn *insn, HARD_REG_SET);
+extern int epiphany_mode_after (int entity, int last_mode, rtx_insn *insn,
+   HARD_REG_SET);
 #endif
-extern int epiphany_mode_after (int entity, int last_mode, rtx_insn *insn);
 extern bool epiphany_epilogue_uses (int regno);
 extern bool epiphany_optimize_mode_switching (int entity);
 extern bool epiphany_is_interrupt_p (tree);
diff --git a/gcc/config/epiphany/epiphany.cc b/gcc/config/epiphany/epiphany.cc
index 60a9b49d8a4..68e748c688e 100644
--- a/gcc/config/epiphany/epiphany.cc
+++ b/gcc/config/epiphany/epiphany.cc
@@ -2437,7 +2437,7 @@ epiphany_mode_needed (int entity, rtx_insn *insn, 
HARD_REG_SET)
 return 2;
   case EPIPHANY_MSW_ENTITY_ROUND_KNOWN:
 if (recog_memoized (insn) == CODE_FOR_set_fp_mode)
-  mode = (enum attr_fp_mode) epiphany_mode_after (entity, mode, insn);
+  mode = (enum attr_fp_mode) epiphany_mode_after (entity, mode, insn, {});
 /* Fall through.  */
   case EPIPHANY_MSW_ENTITY_NEAREST:
   case EPIPHANY_MSW_ENTITY_TRUNC:
@@ -2498,7 +2498,8 @@ epiphany_mode_entry_exit (int entity, bool exit)
 }
 
 int
-epiphany_mode_after (int entity, int last_mode, rtx_insn *insn)
+epiphany_mode_after (int entity, int last_mode, rtx_insn *insn,
+HARD_REG_SET)
 {
   /* We have too few call-saved registers to hope to keep the masks across
  calls.  */
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 7a5a9a966e8..7b72aabf0da 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -15110,7 +15110,7 @@ ix86_avx_u128_mode_after (int mode, rtx_insn *insn)
 /* Return the mode that an insn results in.  */
 
 static int
-ix86_mode_after (int entity, int mode, rtx_insn *insn)
+ix86_mode_after (int entity, int mode, rtx_insn *insn, HARD_REG_SET)
 {
   switch (entity)
 {
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f915de7ed56..e36b5fb9bd0 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9514,7 +9514,7 @@ riscv_frm_mode_after (rtx_insn *insn, int mode)
 /* Return the mode that an insn results in.  */
 
 static int
-riscv_mode_after (int entity, int mode, rtx_insn *insn)
+riscv_mode_after (int entity, int mode, rtx_insn *insn, HARD_REG_SET)
 {
   switch (entity)
 {
diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
index c363490e852..6ec2eecf754 100644
--- a/gcc/config/sh/sh.cc
+++ b/gcc/config/sh/sh.cc
@@ -196,7 +196,7 @@ static HOST_WIDE_INT rounded_frame_size (int);
 static bool sh_frame_pointer_required (void);
 static void sh_emit_mode_set (int, int, int, HARD_REG_SET);
 static int sh_mode_needed (int, rtx_insn *, HARD_REG_SET);
-static int sh_mode_after (int, int, rtx_insn *);
+static int sh_mode_after (int, int, rtx_insn *, HARD_REG_SET);
 static int sh_mode_entry (int);
 static int sh_mode_exit (int);
 static int sh_mode_priority (int entity, int n);
@@ -12537,7 +12537,8 @@ sh_mode_needed (int entity ATTRIBUTE_UNUSED, rtx_insn 
*insn, HARD_REG_SET)
 }
 
 static int
-sh_mode_after (int entity ATTRIBUTE_UNUSED, int mode, rtx_insn *insn)
+sh_mode_after (int entity ATTRIBUTE_UNUSED, int mode, rtx_insn *insn,
+  HARD_REG_SET)
 {
   if (TARGET_HITACHI && recog_memoized (insn) >= 0 &&
   get_attr_fp_set (insn) != FP_SET_NONE)
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 144b3f88c37..b730b5bf658 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10423,12 +10423,14 @@

[PATCH 10/12] mode-switching: Use 1-based edge aux fields

2023-11-05 Thread Richard Sandiford
The pass used the edge aux field to record which mode change
should happen on the edge, with -1 meaning "none".  It's more
convenient for later patches to leave aux zero for "none",
and use numbers based at 1 to record a change.

gcc/
* mode-switching.cc (commit_mode_sets): Use 1-based edge aux values.
---
 gcc/mode-switching.cc | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index 7a5c4993d65..1815b397dd0 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -106,10 +106,10 @@ commit_mode_sets (struct edge_list *edge_list, int e, 
struct bb_info *info)
   for (int ed = NUM_EDGES (edge_list) - 1; ed >= 0; ed--)
 {
   edge eg = INDEX_EDGE (edge_list, ed);
-  int mode;
 
-  if ((mode = (int)(intptr_t)(eg->aux)) != -1)
+  if (eg->aux)
{
+ int mode = (int) (intptr_t) eg->aux - 1;
  HARD_REG_SET live_at_edge;
  basic_block src_bb = eg->src;
  int cur_mode = info[src_bb->index].mode_out;
@@ -728,14 +728,12 @@ optimize_mode_switching (void)
{
  edge eg = INDEX_EDGE (edge_list, ed);
 
- eg->aux = (void *)(intptr_t)-1;
-
  for (i = 0; i < no_mode; i++)
{
  int m = targetm.mode_switching.priority (entity_map[j], i);
  if (mode_bit_p (insert[ed], j, m))
{
- eg->aux = (void *)(intptr_t)m;
+ eg->aux = (void *) (intptr_t) (m + 1);
  break;
}
}
-- 
2.25.1



[PATCH 11/12] mode-switching: Add a target-configurable confluence operator

2023-11-05 Thread Richard Sandiford
The mode-switching pass assumed that all of an entity's modes
were mutually exclusive.  However, the upcoming SME changes
have an entity with some overlapping modes, so that there is
sometimes a "superunion" mode that contains two given modes.
We can use this relationship to pass something more helpful than
"don't know" to the emit hook.

This patch adds a new hook that targets can use to specify
a mode confluence operator.

With mutually exclusive modes, it's possible to compute a block's
incoming and outgoing modes by looking at its availability sets.
With the confluence operator, we instead need to solve a full
dataflow problem.

However, when emitting a mode transition, the upcoming SME use of
mode-switching benefits from having as much information as possible
about the starting mode.  Calculating this information is definitely
worth the compile time.

The dataflow problem is written to work before and after the LCM
problem has been solved.  A later patch makes use of this.

While there (since git blame would ping me for the reindented code),
I used a lambda to avoid the cut-&-pasted loops.

gcc/
* target.def (mode_switching.confluence): New hook.
* doc/tm.texi (TARGET_MODE_CONFLUENCE): New @hook.
* doc/tm.texi.in: Regenerate.
* mode-switching.cc (confluence_info): New variable.
(mode_confluence, forward_confluence_n, forward_transfer): New
functions.
(optimize_mode_switching): Use them to calculate mode_in when
TARGET_MODE_CONFLUENCE is defined.
---
 gcc/doc/tm.texi   |  16 
 gcc/doc/tm.texi.in|   2 +
 gcc/mode-switching.cc | 179 +++---
 gcc/target.def|  17 
 4 files changed, 186 insertions(+), 28 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b730b5bf658..cd346538fe2 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10440,6 +10440,22 @@ the number of modes if it does not know what mode 
@var{entity} has after
 Not defining the hook is equivalent to returning @var{mode}.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_MODE_CONFLUENCE (int @var{entity}, int 
@var{mode1}, int @var{mode2})
+By default, the mode-switching pass assumes that a given entity's modes
+are mutually exclusive.  This means that the pass can only tell
+@code{TARGET_MODE_EMIT} about an entity's previous mode if all
+incoming paths of execution leave the entity in the same state.
+
+However, some entities might have overlapping, non-exclusive modes,
+so that it is sometimes possible to represent ``mode @var{mode1} or mode
+@var{mode2}'' with something more specific than ``mode not known''.
+If this is true for at least one entity, you should define this hook
+and make it return a mode that includes @var{mode1} and @var{mode2}
+as possibilities.  (The mode can include other possibilities too.)
+The hook should return the number of modes if no suitable mode exists
+for the given arguments.
+@end deftypefn
+
 @deftypefn {Target Hook} int TARGET_MODE_ENTRY (int @var{entity})
 If this hook is defined, it is evaluated for every @var{entity} that
 needs mode switching.  It should return the mode that @var{entity} is
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 5360c1bb2d8..ae23241ea1c 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -6975,6 +6975,8 @@ mode or ``no mode'', depending on context.
 
 @hook TARGET_MODE_AFTER
 
+@hook TARGET_MODE_CONFLUENCE
+
 @hook TARGET_MODE_ENTRY
 
 @hook TARGET_MODE_EXIT
diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index 1815b397dd0..87b23d2c050 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -485,6 +485,101 @@ create_pre_exit (int n_entities, int *entity_map, const 
int *num_modes)
   return pre_exit;
 }
 
+/* Return the confluence of modes MODE1 and MODE2 for entity ENTITY,
+   using NO_MODE to represent an unknown mode if nothing more precise
+   is available.  */
+
+int
+mode_confluence (int entity, int mode1, int mode2, int no_mode)
+{
+  if (mode1 == mode2)
+return mode1;
+
+  if (mode1 != no_mode
+  && mode2 != no_mode
+  && targetm.mode_switching.confluence)
+return targetm.mode_switching.confluence (entity, mode1, mode2);
+
+  return no_mode;
+}
+
+/* Information for the dataflow problems below.  */
+struct
+{
+  /* Information about each basic block, indexed by block id.  */
+  struct bb_info *bb_info;
+
+  /* The entity that we're processing.  */
+  int entity;
+
+  /* The number of modes defined for the entity, and thus the identifier
+ of the "don't know" mode.  */
+  int no_mode;
+} confluence_info;
+
+/* Propagate information about any mode change on edge E to the
+   destination block's mode_in.  Return true if something changed.
+
+   The mode_in and mode_out fields use no_mode + 1 to mean "not yet set".  */
+
+static bool
+forward_confluence_n (edge e)
+{
+  /* The entry and exit blocks have no useful mode information.  */
+  if (e->src->index == ENTRY_BLOCK ||

[PATCH 12/12] mode-switching: Add a backprop hook

2023-11-05 Thread Richard Sandiford
This patch adds a way for targets to ask that selected mode changes
be brought forward, through a combination of:

(1) requiring a mode in blocks where the entity was previously
transparent

(2) pushing the transition at the head of a block onto incomging edges

SME has two uses for this:

- A "one-shot" entity that, for any given path of execution,
  either stays off or makes exactly one transition from off to on.
  This relies only on (1) above; see the hook description for more info.

  The main purpose of using mode-switching for this entity is to
  shrink-wrap the code that requires it.

- A second entity for which all transitions must be from known
  modes, which is enforced using a combination of (1) and (2).
  More specifically, (1) looks for edges B1->B2 for which:

  - B2 requires a specific mode and
  - B1 does not guarantee a specific starting mode

  In this system, such an edge is only possible if the entity is
  transparent in B1.  (1) then forces B1 to require some safe common
  mode.  Applying this inductively means that all incoming edges are
  from known modes.  If different edges give different starting modes,
  (2) pushes the transitions onto the edges themselves; this only
  happens if the entity is not transparent in some predecessor block.

The patch also uses the back-propagation as an excuse to do a simple
on-the-fly optimisation.

Hopefully the comments in the patch explain things a bit better.

gcc/
* target.def (mode_switching.backprop): New hook.
* doc/tm.texi.in (TARGET_MODE_BACKPROP): New @hook.
* doc/tm.texi: Regenerate.
* mode-switching.cc (struct bb_info): Add single_succ.
(confluence_info): Add transp field.
(single_succ_confluence_n, single_succ_transfer): New functions.
(backprop_confluence_n, backprop_transfer): Likewise.
(optimize_mode_switching): Use them.  Push mode transitions onto
a block's incoming edges, if the backprop hook requires it.
---
 gcc/doc/tm.texi   |  28 +
 gcc/doc/tm.texi.in|   2 +
 gcc/mode-switching.cc | 272 ++
 gcc/target.def|  29 +
 4 files changed, 331 insertions(+)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index cd346538fe2..d83ca73b1af 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10456,6 +10456,34 @@ The hook should return the number of modes if no 
suitable mode exists
 for the given arguments.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_MODE_BACKPROP (int @var{entity}, int 
@var{mode1}, int @var{mode2})
+If defined, the mode-switching pass uses this hook to back-propagate mode
+requirements through blocks that have no mode requirements of their own.
+Specifically, @var{mode1} is the mode that @var{entity} has on exit
+from a block B1 (say) and @var{mode2} is the mode that the next block
+requires @var{entity} to have.  B1 does not have any mode requirements
+of its own.
+
+The hook should return the mode that it prefers or requires @var{entity}
+to have in B1, or the number of modes if there is no such requirement.
+If the hook returns a required mode for more than one of B1's outgoing
+edges, those modes are combined as for @code{TARGET_MODE_CONFLUENCE}.
+
+For example, suppose there is a ``one-shot'' entity that,
+for a given execution of a function, either stays off or makes exactly
+one transition from off to on.  It is safe to make the transition at any
+time, but it is better not to do so unnecessarily.  This hook allows the
+function to manage such an entity without having to track its state at
+runtime.  Specifically. the entity would have two modes, 0 for off and
+1 for on, with 2 representing ``don't know''.  The system is forbidden from
+transitioning from 2 to 1, since 2 represents the possibility that the
+entity is already on (and the aim is to avoid having to emit code to
+check for that case).  This hook would therefore return 1 when @var{mode1}
+is 2 and @var{mode2} is 1, which would force the entity to be on in the
+source block.  Applying this inductively would remove all transitions
+in which the previous state is unknown.
+@end deftypefn
+
 @deftypefn {Target Hook} int TARGET_MODE_ENTRY (int @var{entity})
 If this hook is defined, it is evaluated for every @var{entity} that
 needs mode switching.  It should return the mode that @var{entity} is
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index ae23241ea1c..3d3ae12cc2f 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -6977,6 +6977,8 @@ mode or ``no mode'', depending on context.
 
 @hook TARGET_MODE_CONFLUENCE
 
+@hook TARGET_MODE_BACKPROP
+
 @hook TARGET_MODE_ENTRY
 
 @hook TARGET_MODE_EXIT
diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index 87b23d2c050..720c30df72d 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -81,6 +81,7 @@ struct bb_info
   int computing;
   int mode_out;
   int mode_in;
+  int single_succ;
 };
 
 /* Clear ode I from entity J in bitmap

Re: [PATCH] Remove unnecessary "& 1" in year_month_day_last::day()

2023-11-05 Thread Andrew Pinski
On Sun, Nov 5, 2023 at 9:13 AM Cassio Neri  wrote:
>
> I could not find any entry in gcc's bugzilla for that. Perhaps my search 
> wasn't good enough.

I filed https://gcc.gnu.org/PR112395 with a first attempt at the patch
(will double check it soon).

Thanks,
Andrew

>
>
> On Sun, 5 Nov 2023 at 15:58, Marc Glisse  wrote:
>>
>> On Sun, 5 Nov 2023, Cassio Neri wrote:
>>
>> > When year_month_day_last::day() was implemented, Dr. Matthias Kretz 
>> > realised
>> > that the operation "& 1" wasn't necessary but we did not patch it at that
>> > time. This patch removes the unnecessary operation.
>>
>> Is there an entry in gcc's bugzilla about having the optimizer handle this
>> kind of optimization?
>>
>> unsigned f(unsigned x){
>>if(x>=32)__builtin_unreachable();
>>return 30|(x&1); // --> 30|x
>> }
>>
>> (that optimization would come in addition to your patch, doing the
>> optimization by hand is still a good idea)
>>
>> It looks like the criterion would be a|(b&c) when the possible 1 bits of b
>> are included in the certainly 1 bits of a|c.
>>
>> --
>> Marc Glisse


[committed] i386: Add LEGACY_INDEX_REG register class.

2023-11-05 Thread Uros Bizjak
Also rename LEGACY_REGS to LEGACY_GENERAL_REGS.

gcc/ChangeLog:

* config/i386/i386.h (enum reg_class): Add LEGACY_INDEX_REGS.
Rename LEGACY_REGS to LEGACY_GENERAL_REGS.
(REG_CLASS_NAMES): Ditto.
(REG_CLASS_CONTENTS): Ditto.
* config/i386/constraints.md ("R"): Update for rename.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index dc91bd94b27..f6275740eb2 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -23,7 +23,7 @@
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
-(define_register_constraint "R" "LEGACY_REGS"
+(define_register_constraint "R" "LEGACY_GENERAL_REGS"
  "Legacy register---the eight integer registers available on all
   i386 processors (@code{a}, @code{b}, @code{c}, @code{d},
   @code{si}, @code{di}, @code{bp}, @code{sp}).")
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index a56367a947b..3367197a633 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1307,12 +1307,16 @@ enum reg_class
   Q_REGS,  /* %eax %ebx %ecx %edx */
   NON_Q_REGS,  /* %esi %edi %ebp %esp */
   TLS_GOTBASE_REGS,/* %ebx %ecx %edx %esi %edi %ebp */
-  INDEX_REGS,  /* %eax %ebx %ecx %edx %esi %edi %ebp */
-  LEGACY_REGS, /* %eax %ebx %ecx %edx %esi %edi %ebp %esp */
+  LEGACY_GENERAL_REGS, /* %eax %ebx %ecx %edx %esi %edi %ebp %esp */
+  LEGACY_INDEX_REGS,   /* %eax %ebx %ecx %edx %esi %edi %ebp */
   GENERAL_REGS,/* %eax %ebx %ecx %edx %esi %edi %ebp 
%esp
   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15
   %r16 %r17 %r18 %r19 %r20 %r21 %r22 %r23
   %r24 %r25 %r26 %r27 %r28 %r29 %r30 %r31 */
+  INDEX_REGS,  /* %eax %ebx %ecx %edx %esi %edi %ebp
+  %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15
+  %r16 %r17 %r18 %r19 %r20 %r21 %r22 %r23
+  %r24 %r25 %r26 %r27 %r28 %r29 %r30 %r31 */
   GENERAL_GPR16,   /* %eax %ebx %ecx %edx %esi %edi %ebp %esp
   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 */
   INDEX_GPR16, /* %eax %ebx %ecx %edx %esi %edi %ebp
@@ -1376,9 +1380,10 @@ enum reg_class
"CLOBBERED_REGS",   \
"Q_REGS", "NON_Q_REGS", \
"TLS_GOTBASE_REGS", \
-   "INDEX_REGS",   \
-   "LEGACY_REGS",  \
+   "LEGACY_GENERAL_REGS",  \
+   "LEGACY_INDEX_REGS",\
"GENERAL_REGS", \
+   "INDEX_REGS",   \
"GENERAL_GPR16",\
"INDEX_GPR16",  \
"FP_TOP_REG", "FP_SECOND_REG",  \
@@ -1416,11 +1421,12 @@ enum reg_class
   { 0x0f,0x0,   0x0 }, /* Q_REGS */\
{ 0x900f0,0x0,   0x0 }, /* NON_Q_REGS */\
   { 0x7e,  0xff0,   0x0 }, /* TLS_GOTBASE_REGS */  \
-  { 0x7f,  0xff0,   0x000 },   /* INDEX_REGS */
\
-   { 0x900ff,0x0,   0x0 }, /* LEGACY_REGS */   \
-   { 0x900ff,  0xff0,   0x000 },   /* GENERAL_REGS */  
\
+   { 0x900ff,0x0,   0x0 }, /* LEGACY_GENERAL_REGS */   \
+  { 0x7f,0x0,   0x0 }, /* LEGACY_INDEX_REGS */ \
+   { 0x900ff,  0xff0,   0x000 }, /* GENERAL_REGS */\
+  { 0x7f,  0xff0,   0x000 }, /* INDEX_REGS */  \
{ 0x900ff,  0xff0,   0x0 }, /* GENERAL_GPR16 */ \
-   { 0x0007f,  0xff0,   0x0 }, /* INDEX_GPR16 */   \
+  { 0x7f,  0xff0,   0x0 }, /* INDEX_GPR16 */   \
  { 0x100,0x0,   0x0 }, /* FP_TOP_REG */\
  { 0x200,0x0,   0x0 }, /* FP_SECOND_REG */ \
 { 0xff00,0x0,   0x0 }, /* FLOAT_REGS */\


Re: [1/3] Add support for target_version attribute

2023-11-05 Thread Richard Sandiford
Andrew Carlotti  writes:
> On Thu, Oct 26, 2023 at 07:41:09PM +0100, Richard Sandiford wrote:
>> Andrew Carlotti  writes:
>> > This patch adds support for the "target_version" attribute to the middle
>> > end and the C++ frontend, which will be used to implement function
>> > multiversioning in the aarch64 backend.
>> >
>> > Note that C++ is currently the only frontend which supports
>> > multiversioning using the "target" attribute, whereas the
>> > "target_clones" attribute is additionally supported in C, D and Ada.
>> > Support for the target_version attribute will be extended to C at a
>> > later date.
>> >
>> > Targets that currently use the "target" attribute for function
>> > multiversioning (i.e. i386 and rs6000) are not affected by this patch.
>> >
>> >
>> > I could have implemented the target hooks slightly differently, by reusing 
>> > the
>> > valid_attribute_p hook and adding attribute name checks to each backend
>> > implementation (c.f. the aarch64 implementation in patch 2/3).  Would this 
>> > be
>> > preferable?
>> 
>> Having as much as possible in target-independent code seems better
>> to me FWIW.  On that basis:
>> 
>> >
>> > Otherwise, is this ok for master?
>> >
>> >
>> > gcc/c-family/ChangeLog:
>> >
>> >* c-attribs.cc (handle_target_version_attribute): New.
>> >(c_common_attribute_table): Add target_version.
>> >(handle_target_clones_attribute): Add conflict with
>> >target_version attribute.
>> >
>> > gcc/ChangeLog:
>> >
>> >* attribs.cc (is_function_default_version): Update comment to
>> >specify incompatibility with target_version attributes.
>> >* cgraphclones.cc (cgraph_node::create_version_clone_with_body):
>> >Call valid_version_attribute_p for target_version attributes.
>> >* target.def (valid_version_attribute_p): New hook.
>> >(expanded_clones_attribute): New hook.
>> >* doc/tm.texi.in: Add new hooks.
>> >* doc/tm.texi: Regenerate.
>> >* multiple_target.cc (create_dispatcher_calls): Remove redundant
>> >is_function_default_version check.
>> >(expand_target_clones): Use target hook for attribute name.
>> >* targhooks.cc (default_target_option_valid_version_attribute_p):
>> >New.
>> >* targhooks.h (default_target_option_valid_version_attribute_p):
>> >New.
>> >* tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
>> >target_version attributes.
>> >
>> > gcc/cp/ChangeLog:
>> >
>> >* decl2.cc (check_classfn): Update comment to include
>> >target_version attributes.
>> >
>> >
>> > diff --git a/gcc/attribs.cc b/gcc/attribs.cc
>> > index 
>> > b1300018d1e8ed8e02ded1ea721dc192a6d32a49..a3c4a81e8582ea4fd06b9518bf51fad7c998ddd6
>> >  100644
>> > --- a/gcc/attribs.cc
>> > +++ b/gcc/attribs.cc
>> > @@ -1233,8 +1233,9 @@ make_dispatcher_decl (const tree decl)
>> >return func_decl;  
>> >  }
>> >  
>> > -/* Returns true if decl is multi-versioned and DECL is the default 
>> > function,
>> > -   that is it is not tagged with target specific optimization.  */
>> > +/* Returns true if DECL is multi-versioned using the target attribute, 
>> > and this
>> > +   is the default version.  This function can only be used for targets 
>> > that do
>> > +   not support the "target_version" attribute.  */
>> >  
>> >  bool
>> >  is_function_default_version (const tree decl)
>> > diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
>> > index 
>> > 072cfb69147bd6b314459c0bd48a0c1fb92d3e4d..1a224c036277d51ab4dc0d33a403177bd226e48a
>> >  100644
>> > --- a/gcc/c-family/c-attribs.cc
>> > +++ b/gcc/c-family/c-attribs.cc
>> > @@ -148,6 +148,7 @@ static tree handle_alloc_align_attribute (tree *, 
>> > tree, tree, int, bool *);
>> >  static tree handle_assume_aligned_attribute (tree *, tree, tree, int, 
>> > bool *);
>> >  static tree handle_assume_attribute (tree *, tree, tree, int, bool *);
>> >  static tree handle_target_attribute (tree *, tree, tree, int, bool *);
>> > +static tree handle_target_version_attribute (tree *, tree, tree, int, 
>> > bool *);
>> >  static tree handle_target_clones_attribute (tree *, tree, tree, int, bool 
>> > *);
>> >  static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
>> >  static tree ignore_attribute (tree *, tree, tree, int, bool *);
>> > @@ -480,6 +481,8 @@ const struct attribute_spec c_common_attribute_table[] 
>> > =
>> >  handle_error_attribute, NULL },
>> >{ "target", 1, -1, true, false, false, false,
>> >  handle_target_attribute, NULL },
>> > +  { "target_version", 1, -1, true, false, false, false,
>> > +handle_target_version_attribute, NULL },
>> >{ "target_clones",  1, -1, true, false, false, false,
>> >  handle_target_clones_attribute, NULL },
>> >{ "optimize",   1, -1, true, false, false, false,
>> > @@ -5569,6 +5572,45 @@ handle_target_attribute (tree *node, tree name, 
>> 

Re: [PATCH] testsuite: Force use of -c when precompiling headers

2023-11-05 Thread Mike Stump
On Oct 27, 2023, at 8:11 AM, Christophe Lyon  wrote:
> 
> In some configurations of our validation setup, we always call the
> compiler with -Wl,-rpath=XXX, which instructs the driver to invoke the
> linker if none of -c, -S or -E is used.
> 
> This happens to be the case in the PCH tests, where dg-flags-pch sets
> dg-do-what-default to precompile.
> 
> This works most of the time, in absence of any linker option, the
> compiler defaults to generating a precompiled header (otherwise the
> linker complains because it cannot find 'main').
> 
> This small patch forces the use of '-c' when generating the .gch file,
> which is sufficient not to invoke the linker.
> 
> Arguably, this could be seen as a dejagnu bug: in gcc-dg-test-1 (in
> gcc-dg.exp), we set compile_type to "precompiled_header", which is not
> one of the supported values in dejagnu's default_target_compile (in
> target.exp).
> 
> 2023-10-27  Christophe Lyon  
> 
>   gcc/testsuite/
>   * lib/dg-pch.exp (dg-flags-pch): Add -c when generating the
>   precompiled header.

Ok.



Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-05 Thread Richard Sandiford
Robin Dapp  writes:
>> Ah, OK.  IMO it's better to keep the optab operands the same as the IFN
>> operands, even if that makes things inconsistent with vcond_mask.
>> vcond_mask isn't really a good example to follow, since the operand
>> order is not only inconsistent with the IFN, it's also inconsistent
>> with the natural if_then_else order.
>
> v4 attached with that changed,  match.pd patterns interleaved as well
> as scratch-handling added and VLS modes removed.  Lehua has since pushed
> another patch that extends gimple_match_op to 6/7 operands already so
> that could be removed as well making the patch even smaller now.
>
> Testsuite on riscv looks good (apart from the mentioned cond_widen...),
> still running on aarch64 and x86.  OK if those pass?
>
> Regards
>  Robin
>
> Subject: [PATCH v4] internal-fn: Add VCOND_MASK_LEN.
>
> In order to prevent simplification of a COND_OP with degenerate mask
> (CONSTM1_RTX) into just an OP in the presence of length masking this
> patch introduces a length-masked analog to VEC_COND_EXPR:
> IFN_VCOND_MASK_LEN.
>
> It also adds new match patterns that allow the combination of
> unconditional unary, binary and ternay operations with the
> VCOND_MASK_LEN into a conditional operation if the target supports it.
>
> gcc/ChangeLog:
>
>   PR tree-optimization/111760
>
>   * config/riscv/autovec.md (vcond_mask_len_): Add
>   expander.
>   * config/riscv/riscv-protos.h (enum insn_type): Add.
>   * config/riscv/riscv-v.cc (needs_fp_rounding): Add !pred_mov.
>   * doc/md.texi: Add vcond_mask_len.
>   * gimple-match-exports.cc (maybe_resimplify_conditional_op):
>   Create VCOND_MASK_LEN when length masking.
>   * gimple-match.h (gimple_match_op::gimple_match_op): Always
>   initialize len and bias.
>   * internal-fn.cc (vec_cond_mask_len_direct): Add.
>   (direct_vec_cond_mask_len_optab_supported_p): Add.
>   (internal_fn_len_index): Add VCOND_MASK_LEN.
>   (internal_fn_mask_index): Ditto.
>   * internal-fn.def (VCOND_MASK_LEN): New internal function.
>   * match.pd: Combine unconditional unary, binary and ternary
>   operations into the respective COND_LEN operations.
>   * optabs.def (OPTAB_D): Add vcond_mask_len optab.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/vect/vect-cond-arith-2.c: No vect cost model for
>   riscv_v.
> ---
>  gcc/config/riscv/autovec.md   | 26 ++
>  gcc/config/riscv/riscv-protos.h   |  3 ++
>  gcc/config/riscv/riscv-v.cc   |  3 +-
>  gcc/doc/md.texi   |  9 
>  gcc/gimple-match-exports.cc   | 13 +++--
>  gcc/gimple-match.h|  6 ++-
>  gcc/internal-fn.cc|  5 ++
>  gcc/internal-fn.def   |  2 +
>  gcc/match.pd  | 51 +++
>  gcc/optabs.def|  1 +
>  gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c |  1 +
>  11 files changed, 114 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index cc4c9596bbf..0a5e4ccb54e 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -565,6 +565,32 @@ (define_insn_and_split "vcond_mask_"
>[(set_attr "type" "vector")]
>  )
>  
> +(define_expand "vcond_mask_len_"
> +  [(match_operand:V 0 "register_operand")
> +(match_operand: 1 "nonmemory_operand")
> +(match_operand:V 2 "nonmemory_operand")
> +(match_operand:V 3 "autovec_else_operand")
> +(match_operand 4 "autovec_length_operand")
> +(match_operand 5 "const_0_operand")]
> +  "TARGET_VECTOR"
> +  {
> +if (satisfies_constraint_Wc1 (operands[1]))
> +  riscv_vector::expand_cond_len_unop (code_for_pred_mov (mode),
> +   operands);
> +else
> +  {
> + /* The order of then and else is opposite to pred_merge.  */
> + rtx ops[] = {operands[0], operands[3], operands[3], operands[2],
> +  operands[1]};
> + riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (mode),
> +   riscv_vector::MERGE_OP_TU,
> +   ops, operands[4]);
> +  }
> +DONE;
> +  }
> +  [(set_attr "type" "vector")]
> +)
> +
>  ;; -
>  ;;  [BOOL] Select based on masks
>  ;; -
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index a1be731c28e..0d0ee5effea 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -359,6 +359,9 @@ enum insn_type : unsigned int
>/* For vmerge, no mask operand, no mask policy operand.  */
>MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
>  
> +  /* For vmerge with TU policy.  */
> +  MERGE_OP

Re: [PATCH] Testsuite, i386: Mark test as requiring dfp

2023-11-05 Thread FX Coudert
kind ping for this easy patch


> Le 30 oct. 2023 à 15:19, FX Coudert  a écrit :
> 
> Hi,
> 
> The test is currently failing on x86_64-apple-darwin with "decimal 
> floating-point not supported for this target”.
> Marking the test as requiring dfp fixes the issue.
> 
> OK to push?
> 
> FX
> 


0001-Testsuite-i386-Mark-test-as-requiring-dfp.patch
Description: Binary data


Re: [PATCH] Testsuite, i386: Mark test as requiring dfp

2023-11-05 Thread Iain Sandoe
Hi FX

> On 5 Nov 2023, at 10:33, FX Coudert  wrote:
> 
> kind ping for this easy patch

IMO adding feature tests for features required by a test falls into the 
“obvious”
category,

Iain

> 
> 
>> Le 30 oct. 2023 à 15:19, FX Coudert  a écrit :
>> 
>> Hi,
>> 
>> The test is currently failing on x86_64-apple-darwin with "decimal 
>> floating-point not supported for this target”.
>> Marking the test as requiring dfp fixes the issue.
>> 
>> OK to push?



Re: testsuite: introduce hostedlib effective target

2023-11-05 Thread Mike Stump
On Nov 1, 2023, at 6:11 PM, Alexandre Oliva  wrote:
> 
> Several C++ tests fail with --disable-hosted-libstdcxx, whether
> because stdc++ext gets linked in despite not being built, because
> standard headers are included but that are unavailable in this mode,
> or because headers are (mistakenly?) expected to introduce
> declarations such as for ::abort, but in this mode they don't.
> 
> This patch introduces an effective target for GCC test, equivalent to
> one that's available in the libstdc++-v3 testsuite, and arranges for
> all such tests to be skipped when libstdc++-v3 is not hosted.
> 
> This patch was tested with arm-eabi, with libstdc++-v3 configured with
> --disable-hosted-libstdcxx, on gcc-13, and with x86_64-linux-gnu
> likewise on trunk.  In the latter, there are a number of additional
> fails that appear to be related, and that I'm yet to investigate, but
> this is big enough already, so I figured I'd post this and see whether
> the approach is regarded as sound and acceptable before proceeding any
> further.  WDYT?  Ok to install, to deal with other targets
> incrementally?

Ick.  I wish there were fewer changed lines and not 1 per test case. It feels 
like we've painted ourselves into a corner.

That said, I'd rather have a nice solid game plan that is better and suggest it 
over this approach but, the best I can think of it something that can notice 
after the fact during an error, and during error processing, trim or expect, 
which is awfully vague.

So, instead of commenting more, I'd ask if anyone has a nice, good concrete 
idea and say I want to withdraw from the vague.

If someone comes up with something you think is better, easy, smaller and or 
other goodness and you want to go that direction, I'd encourage that, 
otherwise, I'll approve this version.



Re: [PATCH 1/2] testsuite: Add and use thread_fence effective-target

2023-11-05 Thread Mike Stump
On Oct 2, 2023, at 1:24 AM, Christophe Lyon  wrote:
> 
> ping?
> 
> On Sun, 10 Sept 2023 at 21:31, Christophe Lyon  
> wrote:
> Some targets like arm-eabi with newlib and default settings rely on
> __sync_synchronize() to ensure synchronization.  Newlib does not
> implement it by default, to make users aware they have to take special
> care.
> 
> This makes a few tests fail to link.
> 
> This patch adds a new thread_fence effective target (similar to the
> corresponding one in libstdc++ testsuite), and uses it in the tests
> that need it, making them UNSUPPORTED instead of FAIL and UNRESOLVED.
> 
> 2023-09-10  Christophe Lyon  
> 
> gcc/
> * doc/sourcebuild.texi (Other attributes): Document thread_fence
> effective-target.
> 
> gcc/testsuite/
> * g++.dg/init/array54.C: Require thread_fence.
> * gcc.dg/c2x-nullptr-1.c: Likewise.
> * gcc.dg/pr103721-2.c: Likewise.
> * lib/target-supports.exp (check_effective_target_thread_fence):
> New.

Ok.



Re: [PATCH] testsuite: check for and use -mno-strict-align where needed

2023-11-05 Thread Mike Stump
On Oct 19, 2023, at 8:16 PM, Alexandre Oliva  wrote:
> 
> On Mar 10, 2021, Alexandre Oliva  wrote:
> 
>> ppc configurations that have -mstrict-align enabled by default fail
>> gcc.dg/strlenopt-80.c, because some memcpy calls don't get turned into
>> MEM_REFs, which defeats the tested-for strlen optimization.
> 
> I've combined this patch with other patches that added -mno-strict-align
> to tests that needed it on targets configured with -mstrict-align
> enabled by default, and conditioned the use of the flag to targets that
> support it.
> 
> Regstrapped on x86_64-linux-gnu, ppc64le-linux-gnu, also tested on a
> ppc-vx7r2 configured with -mstrict-align.  Ok to install?

Ok.



Re: [PATCH] Testsuite, i386: Mark test as requiring dfp

2023-11-05 Thread Mike Stump
On Nov 5, 2023, at 12:33 PM, FX Coudert  wrote:
> 
> kind ping for this easy patch
> 
> 
>> Le 30 oct. 2023 à 15:19, FX Coudert  a écrit :
>> 
>> Hi,
>> 
>> The test is currently failing on x86_64-apple-darwin with "decimal 
>> floating-point not supported for this target”.
>> Marking the test as requiring dfp fixes the issue.
>> 
>> OK to push?

Ok.

Re: [PATCH] testsuite, Darwin: Add support for Mach-O function body scans.

2023-11-05 Thread Richard Sandiford
Iain Sandoe  writes:
> Hi Richard,
>
>> On 26 Oct 2023, at 21:00, Iain Sandoe  wrote:
>
>>> On 26 Oct 2023, at 20:49, Richard Sandiford 
> wrote:
>>>
>>> Iain Sandoe  writes:
 This was written before Thomas' modification to the ELF-handling to allow
 a config-based change for target details.  I did consider updating this
 to try and use that scheme, but I think that it would sit a little
 awkwardly, since there are some differences in the start-up scanning for
 Mach-O.  I would say that in all probability we could improve things but
 I'd like to put this forward as a well-tested initial implementation.
>>>
>>> Sorry, I would prefer to extend the existing function instead.
>>> E.g. there's already some divergence between the Mach-O version
>>> and the default version, in that the Mach-O version doesn't print
>>> verbose messages.  I also don't think that the current default code
>>> is so watertight that it'll never need to be updated in future.
>>
>> Fair enough, will explore what can be done (as I recall last I looked the
>> primary difference was in the initial start-up scan).
>
> I’ve done this as attached.
>
> For the record, when doing it, it gave rise to the same misgivings that led
> to the separate implementation before.
>
>  * as we add formats and uncover asm oddities, they all need to be handled
>in one set of code, IMO it could be come quite convoluted.
>
>  * now making a change to the MACH-O code, means I have to check I did not
>inadvertently break ELF (and likewise, in theory, an ELF change should 
> check
>MACH-O, but many folks do/can not do that).
>
> Maybe there’s some half-way-house where code can usefully be shared without
> those down-sides.
>
> Anyway, to make progress, is the revised version OK for trunk? (tested on
> aarch64-linux and aarch64-darwin).

Sorry for the slow reply.  I was hoping we'd be able to share a bit more
code than that, and avoid an isMACHO toggle.  Does something like the
attached adaption of your patch work?  Only spot-checked on
aarch64-linux-gnu so far.

(The patch tries to avoid capturing the user label prefix, hopefully
avoiding the needsULP thing.)

Thanks,
Richard


diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 5df80325dff..2434550f0c3 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -785,23 +785,34 @@ proc configure_check-function-bodies { config } {
 
 # Regexp for the start of a function definition (name in \1).
 if { [istarget nvptx*-*-*] } {
-   set up_config(start) {^// BEGIN(?: GLOBAL|) FUNCTION DEF: 
([a-zA-Z_]\S+)$}
+   set up_config(start) {
+   {^// BEGIN(?: GLOBAL|) FUNCTION DEF: ([a-zA-Z_]\S+)$}
+   }
+} elseif { [istarget *-*-darwin*] } {
+   set up_config(start) {
+   {^_([a-zA-Z_]\S+):$}
+   {^LFB[0-9]+:}
+   }
 } else {
-   set up_config(start) {^([a-zA-Z_]\S+):$}
+   set up_config(start) {{^([a-zA-Z_]\S+):$}}
 }
 
 # Regexp for the end of a function definition.
 if { [istarget nvptx*-*-*] } {
set up_config(end) {^\}$}
+} elseif { [istarget *-*-darwin*] } {
+   set up_config(end) {^LFE[0-9]+}
 } else {
set up_config(end) {^\s*\.size}
 }
- 
+
 # Regexp for lines that aren't interesting.
 if { [istarget nvptx*-*-*] } {
# Skip lines beginning with '//' comments ('-fverbose-asm', for
# example).
set up_config(fluff) {^\s*(?://)}
+} elseif { [istarget *-*-darwin*] } {
+   set up_config(fluff) {^\s*(?:\.|//|@)|^L[0-9ACESV]}
 } else {
# Skip lines beginning with labels ('.L[...]:') or other directives
# ('.align', '.cfi_startproc', '.quad [...]', '.text', etc.), '//' or
@@ -833,9 +844,19 @@ proc parse_function_bodies { config filename result } {
 set fd [open $filename r]
 set in_function 0
 while { [gets $fd line] >= 0 } {
-   if { [regexp $up_config(start) $line dummy function_name] } {
-   set in_function 1
-   set function_body ""
+   if { $in_function == 0 } {
+   if { [regexp [lindex $up_config(start) 0] \
+$line dummy function_name] } {
+   set in_function 1
+   set function_body ""
+   }
+   } elseif { $in_function < [llength $up_config(start)] } {
+   if { [regexp [lindex $up_config(start) $in_function] $line] } {
+   incr in_function
+   } else {
+   verbose "parse_function_bodies: skipped $function_name"
+   set in_function 0
+   }
} elseif { $in_function } {
if { [regexp $up_config(end) $line] } {
verbose "parse_function_bodies: $function_name:\n$function_body"
-- 
2.25.1



[PATCH-2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-05 Thread HAO CHEN GUI
Hi,
  This patch enables vector mode for by pieces equality compare. It
adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES
and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare
relies both move and compare instructions, so both macro are changed.
The vector load/store might be unaligned, so the 16-byte move and
compare are only enabled when p8 vector enabled (TARGET_VSX +
TARGET_EFFICIENT_UNALIGNED_VSX).

  This patch enables 16 byte by pieces move. As the vector mode is not
enabled for by pieces move, TImode is used for the move. It caused some
regression cases. I drafted the third patch to fix them.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Enable vector mode for by pieces equality compare

This patch adds a new expand pattern - cbranchv16qi4 to enable vector
mode by pieces equality compare on rs6000.  The macro MOVE_MAX_PIECES
(COMPARE_MAX_PIECES) is set to 16 bytes when P8 vector is enabled,
otherwise keeps unchanged.  The macro STORE_MAX_PIECES is set to the
same value as MOVE_MAX_PIECES by default, so now it's explicitly
defined and keeps unchanged.

gcc/
PR target/111449
* config/rs6000/altivec.md (cbranchv16qi4): New expand pattern.
* config/rs6000/rs6000.cc (rs6000_generate_compare): Generate
insn sequence for V16QImode equality compare.
* config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
(STORE_MAX_PIECES): Define.

gcc/testsuite/
PR target/111449
* gcc.target/powerpc/pr111449-1.c: New.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index e8a596fb7e9..d0937f192d6 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2605,6 +2605,45 @@ (define_insn "altivec_vupklpx"
 }
   [(set_attr "type" "vecperm")])

+/* The cbranch_optabs doesn't allow FAIL, so altivec load/store
+   instructions are disabled as the cost is high for unaligned
+   load/store.  */
+(define_expand "cbranchv16qi4"
+  [(use (match_operator 0 "equality_operator"
+   [(match_operand:V16QI 1 "reg_or_mem_operand")
+(match_operand:V16QI 2 "reg_or_mem_operand")]))
+   (use (match_operand 3))]
+  "VECTOR_MEM_VSX_P (V16QImode)
+   && TARGET_EFFICIENT_UNALIGNED_VSX"
+{
+  if (!TARGET_P9_VECTOR
+  && !BYTES_BIG_ENDIAN
+  && MEM_P (operands[1])
+  && !altivec_indexed_or_indirect_operand (operands[1], V16QImode)
+  && MEM_P (operands[2])
+  && !altivec_indexed_or_indirect_operand (operands[2], V16QImode))
+{
+  /* Use direct move for P8 little endian to skip bswap, as the byte
+order doesn't matter for equality compare.  */
+  rtx reg_op1 = gen_reg_rtx (V16QImode);
+  rtx reg_op2 = gen_reg_rtx (V16QImode);
+  rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode);
+  rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode);
+  operands[1] = reg_op1;
+  operands[2] = reg_op2;
+}
+  else
+{
+  operands[1] = force_reg (V16QImode, operands[1]);
+  operands[2] = force_reg (V16QImode, operands[2]);
+}
+
+  rtx_code code = GET_CODE (operands[0]);
+  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]);
+  rs6000_emit_cbranch (V16QImode, operands);
+  DONE;
+})
+
 ;; Compare vectors producing a vector result and a predicate, setting CR6 to
 ;; indicate a combined status
 (define_insn "altivec_vcmpequ_p"
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index cc24dd5301e..10279052636 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15472,6 +15472,18 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
  else
emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b));
}
+  else if (mode == V16QImode)
+   {
+ gcc_assert (code == EQ || code == NE);
+
+ rtx result_vector = gen_reg_rtx (V16QImode);
+ rtx cc_bit = gen_reg_rtx (SImode);
+ emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
+ emit_insn (gen_cr6_test_for_lt (cc_bit));
+ emit_insn (gen_rtx_SET (compare_result,
+ gen_rtx_COMPARE (comp_mode, cc_bit,
+  const1_rtx)));
+   }
   else
emit_insn (gen_rtx_SET (compare_result,
gen_rtx_COMPARE (comp_mode, op0, op1)));
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 22595f6ebd7..51441825e20 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1730,6 +1730,8 @@ typedef struct rs6000_args
in one reasonably fast instruction.  */
 #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
 #define MAX_MOVE_MAX 8
+#define MOVE_MAX_PIECES (TARGET_P8_VECTOR ? 16 : (TARGET_POWERPC64 ? 8 : 4))
+#define STORE_MAX_PIECES (TARGET_POWERPC64 ? 8 : 4)

 /* Nonzero if acces

[PATCH-3, rs6000] Enable 16-byte by pieces move [PR111449]

2023-11-05 Thread HAO CHEN GUI
Hi,
  The patch 2 enables 16-byte by pieces move on rs6000. This patch fixes
the regression cases caused by previous patch. For sra-17/18, the long
array with 4 elements can be loaded by one 16-byte by pieces move on 32-bit
platform. So the array is not be constructed in LC0 and SRA optimization
is unable to be taken. "no-vsx" option is added for 32-bit platform, as
it sets the MOVE_MAX_PIECES to 4-byte on 32-bit platform and the array
can't be loaded by one by pieces move.

  Another regression is on P8 LE. The 16-byte memory to memory is
implemented by two TImode load/store. The TImode load/store is finally
split to two DImode load/store on P8 LE as it doesn't have unaligned
vector load/store instructions. Actually, 16-byte memory to memory move
can be implement by two V2DI reversed load/store on P8 LE. The patch
creates a insn_and_split pattern for this optimization.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Enable 16-byte by pieces move

This patch enables 16-byte by pieces move.  The 16-byte move is generated
with TImode and finally implemented by vector instructions.  There are
several regression cases after the enablement.  16-byte TImode memory to
memory move is originally implemented by two pairs of DImode load/store on
P8 LE as there is no unalignment vsx load/store on it.  The patch fixes
the problem by creating an insn_and_split pattern and converts it to one
pair of reversed load/store.  Two SRA cases lost the SRA optimization as
the array can be loaded by one 16-byte move so that not be initialized in
LC0 on 32-bit platform.  So fixes them by adding no-vsx option.

gcc/
PR target/111449
* config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New.

gcc/testsuite/
PR target/111449
* gcc.dg/tree-ssa/sra-17.c: Add no-vsx option for powerpc ilp32.
* gcc.dg/tree-ssa/sra-18.c: Likewise.
* gcc.target/powerpc/pr111449-1.c: New.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f3b40229094..9f6bc49998a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -414,6 +414,27 @@ (define_mode_attr VM3_char [(V2DI "d")

 ;; VSX moves

+;; TImode memory to memory move optimization on LE with p8vector
+(define_insn_and_split "*vsx_le_mem_to_mem_mov_ti"
+  [(set (match_operand:TI 0 "indexed_or_indirect_operand" "=Z")
+   (match_operand:TI 1 "indexed_or_indirect_operand" "Z"))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
+   && !MEM_VOLATILE_P (operands[0])
+   && !MEM_VOLATILE_P (operands[1])
+   && !reload_completed"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx tmp = gen_reg_rtx (V2DImode);
+  rtx src =  adjust_address (operands[1], V2DImode, 0);
+  emit_insn (gen_vsx_ld_elemrev_v2di (tmp, src));
+  rtx dest = adjust_address (operands[0], V2DImode, 0);
+  emit_insn (gen_vsx_st_elemrev_v2di (dest, tmp));
+  DONE;
+}
+  [(set_attr "length" "16")])
+
 ;; The patterns for LE permuted loads and stores come before the general
 ;; VSX moves so they match first.
 (define_insn_and_split "*vsx_le_perm_load_"
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
index 221d96b6cd9..36d72c9256b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
@@ -1,6 +1,7 @@
 /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* 
powerpc*-*-* s390*-*-* } } } */
 /* { dg-options "-O2 -fdump-tree-esra --param 
sra-max-scalarization-size-Ospeed=32" } */
 /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */
+/* { dg-additional-options "-mno-vsx" { target powerpc*-*-* && ilp32 } } */

 extern void abort (void);

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
index f5e6a21c2ae..3682a9a8c29 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
@@ -1,6 +1,7 @@
 /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* 
powerpc*-*-* s390*-*-* } } } */
 /* { dg-options "-O2 -fdump-tree-esra --param 
sra-max-scalarization-size-Ospeed=32" } */
 /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */
+/* { dg-additional-options "-mno-vsx" { target powerpc*-*-* && ilp32 } } */

 extern void abort (void);
 struct foo { long x; };
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c
new file mode 100644
index 000..7003bdc0208
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { has_arch_pwr8 } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mvsx -O2" } */
+
+/* Ensure 16-byte by pieces move is enabled.  */
+
+void move1 (void *s1, void *s2)
+{
+  __builtin_memcpy (s1, s2, 16);
+}
+
+void move2 (void *s1)
+{
+  __builtin_memcpy (s1, "0123456789012345", 16);
+}
+
+/

[PATCH] RISC-V: Enhance AVL propagation for complicate reduction auto-vectorization

2023-11-05 Thread Juzhe-Zhong
I notice we failed to AVL propagate for reduction with more complicate 
situation:

double foo (double *__restrict a, 
double *__restrict b, 
double *__restrict c,
int n)
{
  double result = 0;
  for (int i = 0; i < n; i++)
result += a[i] * b[i] * c[i];
  return result;
}

vsetvli a5,a3,e8,mf8,ta,ma   -> should be fused into e64m1,TU
sllia4,a5,3
vle64.v v3,0(a0)
vle64.v v1,0(a1)
vsetvli a6,zero,e64,m1,ta,ma -> redundant
vfmul.vvv1,v1,v3
vsetvli zero,a5,e64,m1,tu,ma -> redundant
vle64.v v3,0(a2)
vfmacc.vv   v2,v1,v3
add a0,a0,a4
add a1,a1,a4
add a2,a2,a4
sub a3,a3,a5
bne a3,zero,.L3

The failed AVL propgation causes redundant AVL/VL togglling.
The root cause as follows:

vsetvl a5, zero
vadd.vv def r136
vsetvl zero, a3, ... TU
vsub.vv (use r136)

We propagate AVL (r136) from 'vsub.vv' into 'vadd.vv' when 'vsub.vv' is TA 
policy.
However, it's too restrict so we missed optimization here. We enhance AVL 
propation
for TU policy for following situation:

vsetvl a5, zero
vadd.vv def r136
vsetvl zero, a3, ... TU
vsub.vv (use r136, merge != r136)

Note that we should only propagate AVL when merge != r136 for 'vsub.vv' doesn't 
depend on the tail elements.
After this patch:

vsetvli a5,a3,e64,m1,tu,ma
sllia4,a5,3
vle64.v v3,0(a0)
vle64.v v1,0(a1)
vfmul.vvv1,v1,v3
vle64.v v3,0(a2)
vfmacc.vv   v2,v3,v1
add a0,a0,a4
add a1,a1,a4
add a2,a2,a4
sub a3,a3,a5
bne a3,zero,.L3


PR target/112399

gcc/ChangeLog:

* config/riscv/riscv-avlprop.cc 
(pass_avlprop::get_vlmax_ta_preferred_avl): Enhance AVL propagation.
* config/riscv/t-riscv: Add new include.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/imm_switch-2.c: Adapt test.
* gcc.target/riscv/rvv/autovec/pr112399.c: New test.

---
 gcc/config/riscv/riscv-avlprop.cc | 17 --
 gcc/config/riscv/t-riscv  |  3 +-
 .../gcc.target/riscv/rvv/autovec/pr112399.c   | 31 +++
 .../riscv/rvv/vsetvl/imm_switch-2.c   |  3 +-
 4 files changed, 49 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112399.c

diff --git a/gcc/config/riscv/riscv-avlprop.cc 
b/gcc/config/riscv/riscv-avlprop.cc
index 1dfaa8742da..1f6ba405342 100644
--- a/gcc/config/riscv/riscv-avlprop.cc
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl-ssa.h"
 #include "cfgcleanup.h"
 #include "insn-attr.h"
+#include "tm-constrs.h"
 
 using namespace rtl_ssa;
 using namespace riscv_vector;
@@ -285,8 +286,20 @@ pass_avlprop::get_vlmax_ta_preferred_avl (insn_info *insn) 
const
  if (!use_insn->can_be_optimized () || use_insn->is_asm ()
  || use_insn->is_call () || use_insn->has_volatile_refs ()
  || use_insn->has_pre_post_modify ()
- || !has_vl_op (use_insn->rtl ())
- || !tail_agnostic_p (use_insn->rtl ()))
+ || !has_vl_op (use_insn->rtl ()))
+   return NULL_RTX;
+
+ /* We should only propagate non-VLMAX AVL into VLMAX insn when
+such insn potential tail elements (after propagation) are
+not used.  So, we should make sure the outcome of VLMAX insn
+is not depend on.  */
+ extract_insn_cached (use_insn->rtl ());
+ int merge_op_idx = get_attr_merge_op_idx (use_insn->rtl ());
+ if (merge_op_idx != INVALID_ATTRIBUTE
+ && !satisfies_constraint_vu (recog_data.operand[merge_op_idx])
+ && refers_to_regno_p (set->regno (),
+   recog_data.operand[merge_op_idx])
+ && !tail_agnostic_p (use_insn->rtl ()))
return NULL_RTX;
 
  int new_sew = get_sew (use_insn->rtl ());
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index f8ca3f4ac57..95becfc819b 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -80,7 +80,8 @@ riscv-vector-costs.o: 
$(srcdir)/config/riscv/riscv-vector-costs.cc \
 
 riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
-  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h 
+  $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h \
+  tm-constrs.h
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
$(srcdir)/config/riscv/riscv-avlprop.cc
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112399.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112399.c
new file mode 100644
index 000..948e12b8474
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112399.c
@@ -0,0 +1,31 @@
+

[PATCH V2] VECT: Support mask_len_strided_load/mask_len_strided_store in loop vectorize

2023-11-05 Thread Juzhe-Zhong
This patch adds strided load/store support on loop vectorizer depending on 
STMT_VINFO_STRIDED_P.

Bootstrap and regression on X86 passed.

Ok for trunk ?

gcc/ChangeLog:

* internal-fn.cc (strided_load_direct): New function.
(strided_store_direct): Ditto.
(expand_strided_store_optab_fn): Ditto.
(expand_scatter_store_optab_fn): Add strided store.
(expand_strided_load_optab_fn): New function.
(expand_gather_load_optab_fn): Add strided load.
(direct_strided_load_optab_supported_p): New function.
(direct_strided_store_optab_supported_p): Ditto.
(internal_load_fn_p): Add strided load.
(internal_strided_fn_p): New function.
(internal_fn_len_index): Add strided load/store.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Add strided store.
(internal_strided_fn_supported_p): New function.
* internal-fn.def (MASK_LEN_STRIDED_LOAD): New IFN.
(MASK_LEN_STRIDED_STORE): Ditto.
* internal-fn.h (internal_strided_fn_p): New function.
(internal_strided_fn_supported_p): Ditto.
* optabs-query.cc (supports_vec_gather_load_p): Add strided load.
(supports_vec_scatter_store_p): Add strided store.
* optabs-query.h (supports_vec_gather_load_p): Add strided load.
(supports_vec_scatter_store_p): Add strided store.
* tree-vect-data-refs.cc (vect_prune_runtime_alias_test_list): Add 
strided load/store.
(vect_gather_scatter_fn_p): Ditto.
(vect_check_gather_scatter): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(vect_truncate_gather_scatter_offset): Ditto.
(vect_use_strided_gather_scatters_p): Ditto.
(vect_get_strided_load_store_ops): Ditto.
(vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (vect_gather_scatter_fn_p): Ditto.

---
 gcc/internal-fn.cc | 101 -
 gcc/internal-fn.def|   4 ++
 gcc/internal-fn.h  |   2 +
 gcc/optabs-query.cc|  25 ++---
 gcc/optabs-query.h |   4 +-
 gcc/tree-vect-data-refs.cc |  45 +
 gcc/tree-vect-stmts.cc |  65 ++--
 gcc/tree-vectorizer.h  |   2 +-
 8 files changed, 199 insertions(+), 49 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c7d3564faef..a31a65755c7 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -164,6 +164,7 @@ init_internal_fns ()
 #define load_lanes_direct { -1, -1, false }
 #define mask_load_lanes_direct { -1, -1, false }
 #define gather_load_direct { 3, 1, false }
+#define strided_load_direct { -1, -1, false }
 #define len_load_direct { -1, -1, false }
 #define mask_len_load_direct { -1, 4, false }
 #define mask_store_direct { 3, 2, false }
@@ -172,6 +173,7 @@ init_internal_fns ()
 #define vec_cond_mask_direct { 1, 0, false }
 #define vec_cond_direct { 2, 0, false }
 #define scatter_store_direct { 3, 1, false }
+#define strided_store_direct { 1, 1, false }
 #define len_store_direct { 3, 3, false }
 #define mask_len_store_direct { 4, 5, false }
 #define vec_set_direct { 3, 3, false }
@@ -3561,62 +3563,87 @@ expand_LAUNDER (internal_fn, gcall *call)
   expand_assignment (lhs, gimple_call_arg (call, 0), false);
 }
 
+#define expand_strided_store_optab_fn expand_scatter_store_optab_fn
+
 /* Expand {MASK_,}SCATTER_STORE{S,U} call CALL using optab OPTAB.  */
 
 static void
 expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
 {
+  insn_code icode;
   internal_fn ifn = gimple_call_internal_fn (stmt);
   int rhs_index = internal_fn_stored_value_index (ifn);
   tree base = gimple_call_arg (stmt, 0);
   tree offset = gimple_call_arg (stmt, 1);
-  tree scale = gimple_call_arg (stmt, 2);
   tree rhs = gimple_call_arg (stmt, rhs_index);
 
   rtx base_rtx = expand_normal (base);
   rtx offset_rtx = expand_normal (offset);
-  HOST_WIDE_INT scale_int = tree_to_shwi (scale);
   rtx rhs_rtx = expand_normal (rhs);
 
   class expand_operand ops[8];
   int i = 0;
   create_address_operand (&ops[i++], base_rtx);
-  create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
-  create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
-  create_integer_operand (&ops[i++], scale_int);
+  if (internal_strided_fn_p (ifn))
+{
+  create_address_operand (&ops[i++], offset_rtx);
+  icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)));
+}
+  else
+{
+  tree scale = gimple_call_arg (stmt, 2);
+  HOST_WIDE_INT scale_int = tree_to_shwi (scale);
+  create_input_operand (&ops[i++], offset_rtx,
+   TYPE_MODE (TREE_TYPE (offset)));
+  create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
+  create_integer_operand (&ops[i++], scale_int);
+  icode = convert_optab_handler (optab, TYPE_MODE (TREE_TY

Re: [PATCH V2] VECT: Support mask_len_strided_load/mask_len_strided_store in loop vectorize

2023-11-05 Thread juzhe.zh...@rivai.ai
Sorry. 
This is middle-end patch, sending to wrong  CC lists.
Forget about this patch.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-11-06 14:52
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH V2] VECT: Support mask_len_strided_load/mask_len_strided_store 
in loop vectorize
This patch adds strided load/store support on loop vectorizer depending on 
STMT_VINFO_STRIDED_P.
 
Bootstrap and regression on X86 passed.
 
Ok for trunk ?
 
gcc/ChangeLog:
 
* internal-fn.cc (strided_load_direct): New function.
(strided_store_direct): Ditto.
(expand_strided_store_optab_fn): Ditto.
(expand_scatter_store_optab_fn): Add strided store.
(expand_strided_load_optab_fn): New function.
(expand_gather_load_optab_fn): Add strided load.
(direct_strided_load_optab_supported_p): New function.
(direct_strided_store_optab_supported_p): Ditto.
(internal_load_fn_p): Add strided load.
(internal_strided_fn_p): New function.
(internal_fn_len_index): Add strided load/store.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Add strided store.
(internal_strided_fn_supported_p): New function.
* internal-fn.def (MASK_LEN_STRIDED_LOAD): New IFN.
(MASK_LEN_STRIDED_STORE): Ditto.
* internal-fn.h (internal_strided_fn_p): New function.
(internal_strided_fn_supported_p): Ditto.
* optabs-query.cc (supports_vec_gather_load_p): Add strided load.
(supports_vec_scatter_store_p): Add strided store.
* optabs-query.h (supports_vec_gather_load_p): Add strided load.
(supports_vec_scatter_store_p): Add strided store.
* tree-vect-data-refs.cc (vect_prune_runtime_alias_test_list): Add strided 
load/store.
(vect_gather_scatter_fn_p): Ditto.
(vect_check_gather_scatter): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(vect_truncate_gather_scatter_offset): Ditto.
(vect_use_strided_gather_scatters_p): Ditto.
(vect_get_strided_load_store_ops): Ditto.
(vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (vect_gather_scatter_fn_p): Ditto.
 
---
gcc/internal-fn.cc | 101 -
gcc/internal-fn.def|   4 ++
gcc/internal-fn.h  |   2 +
gcc/optabs-query.cc|  25 ++---
gcc/optabs-query.h |   4 +-
gcc/tree-vect-data-refs.cc |  45 +
gcc/tree-vect-stmts.cc |  65 ++--
gcc/tree-vectorizer.h  |   2 +-
8 files changed, 199 insertions(+), 49 deletions(-)
 
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c7d3564faef..a31a65755c7 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -164,6 +164,7 @@ init_internal_fns ()
#define load_lanes_direct { -1, -1, false }
#define mask_load_lanes_direct { -1, -1, false }
#define gather_load_direct { 3, 1, false }
+#define strided_load_direct { -1, -1, false }
#define len_load_direct { -1, -1, false }
#define mask_len_load_direct { -1, 4, false }
#define mask_store_direct { 3, 2, false }
@@ -172,6 +173,7 @@ init_internal_fns ()
#define vec_cond_mask_direct { 1, 0, false }
#define vec_cond_direct { 2, 0, false }
#define scatter_store_direct { 3, 1, false }
+#define strided_store_direct { 1, 1, false }
#define len_store_direct { 3, 3, false }
#define mask_len_store_direct { 4, 5, false }
#define vec_set_direct { 3, 3, false }
@@ -3561,62 +3563,87 @@ expand_LAUNDER (internal_fn, gcall *call)
   expand_assignment (lhs, gimple_call_arg (call, 0), false);
}
+#define expand_strided_store_optab_fn expand_scatter_store_optab_fn
+
/* Expand {MASK_,}SCATTER_STORE{S,U} call CALL using optab OPTAB.  */
static void
expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
{
+  insn_code icode;
   internal_fn ifn = gimple_call_internal_fn (stmt);
   int rhs_index = internal_fn_stored_value_index (ifn);
   tree base = gimple_call_arg (stmt, 0);
   tree offset = gimple_call_arg (stmt, 1);
-  tree scale = gimple_call_arg (stmt, 2);
   tree rhs = gimple_call_arg (stmt, rhs_index);
   rtx base_rtx = expand_normal (base);
   rtx offset_rtx = expand_normal (offset);
-  HOST_WIDE_INT scale_int = tree_to_shwi (scale);
   rtx rhs_rtx = expand_normal (rhs);
   class expand_operand ops[8];
   int i = 0;
   create_address_operand (&ops[i++], base_rtx);
-  create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
-  create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
-  create_integer_operand (&ops[i++], scale_int);
+  if (internal_strided_fn_p (ifn))
+{
+  create_address_operand (&ops[i++], offset_rtx);
+  icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)));
+}
+  else
+{
+  tree scale = gimple_call_arg (stmt, 2);
+  HOST_WIDE_INT scale_int = tree_to_shwi (scale);
+  create_input_operand (&ops[i++], offset_rtx,
+ TYPE_MODE (TREE_TYPE (offset)));
+  create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
+  create_integer_operand (&ops[i++], scale_int);
+  icode = convert_optab_handler 

[PATCH V2] VECT: Support mask_len_strided_load/mask_len_strided_store in loop vectorize

2023-11-05 Thread Juzhe-Zhong
This patch adds strided load/store support on loop vectorizer depending on 
STMT_VINFO_STRIDED_P.

Bootstrap and regression on X86 passed.

Ok for trunk ?

gcc/ChangeLog:

* internal-fn.cc (strided_load_direct): New function.
(strided_store_direct): Ditto.
(expand_strided_store_optab_fn): Ditto.
(expand_scatter_store_optab_fn): Add strided store.
(expand_strided_load_optab_fn): New function.
(expand_gather_load_optab_fn): Add strided load.
(direct_strided_load_optab_supported_p): New function.
(direct_strided_store_optab_supported_p): Ditto.
(internal_load_fn_p): Add strided load.
(internal_strided_fn_p): New function.
(internal_fn_len_index): Add strided load/store.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Add strided store.
(internal_strided_fn_supported_p): New function.
* internal-fn.def (MASK_LEN_STRIDED_LOAD): New IFN.
(MASK_LEN_STRIDED_STORE): Ditto.
* internal-fn.h (internal_strided_fn_p): New function.
(internal_strided_fn_supported_p): Ditto.
* optabs-query.cc (supports_vec_gather_load_p): Add strided load.
(supports_vec_scatter_store_p): Add strided store.
* optabs-query.h (supports_vec_gather_load_p): Add strided load.
(supports_vec_scatter_store_p): Add strided store.
* tree-vect-data-refs.cc (vect_prune_runtime_alias_test_list): Add 
strided load/store.
(vect_gather_scatter_fn_p): Ditto.
(vect_check_gather_scatter): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(vect_truncate_gather_scatter_offset): Ditto.
(vect_use_strided_gather_scatters_p): Ditto.
(vect_get_strided_load_store_ops): Ditto.
(vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (vect_gather_scatter_fn_p): Ditto.

---
 gcc/internal-fn.cc | 101 -
 gcc/internal-fn.def|   4 ++
 gcc/internal-fn.h  |   2 +
 gcc/optabs-query.cc|  25 ++---
 gcc/optabs-query.h |   4 +-
 gcc/tree-vect-data-refs.cc |  45 +
 gcc/tree-vect-stmts.cc |  65 ++--
 gcc/tree-vectorizer.h  |   2 +-
 8 files changed, 199 insertions(+), 49 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c7d3564faef..a31a65755c7 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -164,6 +164,7 @@ init_internal_fns ()
 #define load_lanes_direct { -1, -1, false }
 #define mask_load_lanes_direct { -1, -1, false }
 #define gather_load_direct { 3, 1, false }
+#define strided_load_direct { -1, -1, false }
 #define len_load_direct { -1, -1, false }
 #define mask_len_load_direct { -1, 4, false }
 #define mask_store_direct { 3, 2, false }
@@ -172,6 +173,7 @@ init_internal_fns ()
 #define vec_cond_mask_direct { 1, 0, false }
 #define vec_cond_direct { 2, 0, false }
 #define scatter_store_direct { 3, 1, false }
+#define strided_store_direct { 1, 1, false }
 #define len_store_direct { 3, 3, false }
 #define mask_len_store_direct { 4, 5, false }
 #define vec_set_direct { 3, 3, false }
@@ -3561,62 +3563,87 @@ expand_LAUNDER (internal_fn, gcall *call)
   expand_assignment (lhs, gimple_call_arg (call, 0), false);
 }
 
+#define expand_strided_store_optab_fn expand_scatter_store_optab_fn
+
 /* Expand {MASK_,}SCATTER_STORE{S,U} call CALL using optab OPTAB.  */
 
 static void
 expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
 {
+  insn_code icode;
   internal_fn ifn = gimple_call_internal_fn (stmt);
   int rhs_index = internal_fn_stored_value_index (ifn);
   tree base = gimple_call_arg (stmt, 0);
   tree offset = gimple_call_arg (stmt, 1);
-  tree scale = gimple_call_arg (stmt, 2);
   tree rhs = gimple_call_arg (stmt, rhs_index);
 
   rtx base_rtx = expand_normal (base);
   rtx offset_rtx = expand_normal (offset);
-  HOST_WIDE_INT scale_int = tree_to_shwi (scale);
   rtx rhs_rtx = expand_normal (rhs);
 
   class expand_operand ops[8];
   int i = 0;
   create_address_operand (&ops[i++], base_rtx);
-  create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
-  create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
-  create_integer_operand (&ops[i++], scale_int);
+  if (internal_strided_fn_p (ifn))
+{
+  create_address_operand (&ops[i++], offset_rtx);
+  icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)));
+}
+  else
+{
+  tree scale = gimple_call_arg (stmt, 2);
+  HOST_WIDE_INT scale_int = tree_to_shwi (scale);
+  create_input_operand (&ops[i++], offset_rtx,
+   TYPE_MODE (TREE_TYPE (offset)));
+  create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
+  create_integer_operand (&ops[i++], scale_int);
+  icode = convert_optab_handler (optab, TYPE_MODE (TREE_TY

[PATCH] rs6000, testcase: Add require-effective-target has_arch_ppc64 to pr106550_1.c

2023-11-05 Thread Jiufu Guo
Hi,

With latest trunk, case pr106550_1.c can run with failure on ppc under -m32.
While, the case is testing 64bit constant building. So, "has_arch_ppc64"
is required.

Test pass on ppc64{,le}.

BR,
Jeff (Jiufu Guo)

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106550_1.c: Add has_arch_ppc64 target require.

---
 gcc/testsuite/gcc.target/powerpc/pr106550_1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550_1.c 
b/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
index 7e709fcf9d8..5ab40d71a56 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
@@ -1,5 +1,6 @@
 /* PR target/106550 */
 /* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target has_arch_ppc64 } */
 /* { dg-options "-O2 -mdejagnu-cpu=power10 -fdisable-rtl-split1" } */
 /* force the constant splitter run after RA: -fdisable-rtl-split1.  */
 
-- 
2.25.1



Re: [PATCH] explow: Allow dynamic allocations after vregs

2023-11-05 Thread Richard Biener
On Sun, Nov 5, 2023 at 7:32 PM Richard Sandiford
 wrote:
>
> This patch allows allocate_dynamic_stack_space to be called before
> or after virtual registers have been instantiated.  It uses the
> same approach as allocate_stack_local, which already supported this.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

OK

> Richard
>
>
> gcc/
> * function.h (get_stack_dynamic_offset): Declare.
> * function.cc (get_stack_dynamic_offset): New function,
> split out from...
> (get_stack_dynamic_offset): ...here.
> * explow.cc (allocate_dynamic_stack_space): Handle calls made
> after virtual registers have been instantiated.
> ---
>  gcc/explow.cc   | 10 +++---
>  gcc/function.cc | 12 +++-
>  gcc/function.h  |  1 +
>  3 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/explow.cc b/gcc/explow.cc
> index 0c03ac350bb..aa64d5e906c 100644
> --- a/gcc/explow.cc
> +++ b/gcc/explow.cc
> @@ -1375,12 +1375,16 @@ allocate_dynamic_stack_space (rtx size, unsigned 
> size_align,
>HOST_WIDE_INT stack_usage_size = -1;
>rtx_code_label *final_label;
>rtx final_target, target;
> +  rtx addr = (virtuals_instantiated
> + ? plus_constant (Pmode, stack_pointer_rtx,
> +  get_stack_dynamic_offset ())
> + : virtual_stack_dynamic_rtx);
>
>/* If we're asking for zero bytes, it doesn't matter what we point
>   to since we can't dereference it.  But return a reasonable
>   address anyway.  */
>if (size == const0_rtx)
> -return virtual_stack_dynamic_rtx;
> +return addr;
>
>/* Otherwise, show we're calling alloca or equivalent.  */
>cfun->calls_alloca = 1;
> @@ -1532,7 +1536,7 @@ allocate_dynamic_stack_space (rtx size, unsigned 
> size_align,
>poly_int64 saved_stack_pointer_delta;
>
>if (!STACK_GROWS_DOWNWARD)
> -   emit_move_insn (target, virtual_stack_dynamic_rtx);
> +   emit_move_insn (target, force_operand (addr, target));
>
>/* Check stack bounds if necessary.  */
>if (crtl->limit_stack)
> @@ -1575,7 +1579,7 @@ allocate_dynamic_stack_space (rtx size, unsigned 
> size_align,
>stack_pointer_delta = saved_stack_pointer_delta;
>
>if (STACK_GROWS_DOWNWARD)
> -   emit_move_insn (target, virtual_stack_dynamic_rtx);
> +   emit_move_insn (target, force_operand (addr, target));
>  }
>
>suppress_reg_args_size = false;
> diff --git a/gcc/function.cc b/gcc/function.cc
> index afb0b33da9e..527ea4807b0 100644
> --- a/gcc/function.cc
> +++ b/gcc/function.cc
> @@ -1943,6 +1943,16 @@ instantiate_decls (tree fndecl)
>vec_free (cfun->local_decls);
>  }
>
> +/* Return the value of STACK_DYNAMIC_OFFSET for the current function.
> +   This is done through a function wrapper so that the macro sees a
> +   predictable set of included files.  */
> +
> +poly_int64
> +get_stack_dynamic_offset ()
> +{
> +  return STACK_DYNAMIC_OFFSET (current_function_decl);
> +}
> +
>  /* Pass through the INSNS of function FNDECL and convert virtual register
> references to hard register references.  */
>
> @@ -1954,7 +1964,7 @@ instantiate_virtual_regs (void)
>/* Compute the offsets to use for this function.  */
>in_arg_offset = FIRST_PARM_OFFSET (current_function_decl);
>var_offset = targetm.starting_frame_offset ();
> -  dynamic_offset = STACK_DYNAMIC_OFFSET (current_function_decl);
> +  dynamic_offset = get_stack_dynamic_offset ();
>out_arg_offset = STACK_POINTER_OFFSET;
>  #ifdef FRAME_POINTER_CFA_OFFSET
>cfa_offset = FRAME_POINTER_CFA_OFFSET (current_function_decl);
> diff --git a/gcc/function.h b/gcc/function.h
> index 5caf1e153ea..29846564bc6 100644
> --- a/gcc/function.h
> +++ b/gcc/function.h
> @@ -715,6 +715,7 @@ extern vec convert_jumps_to_returns (basic_block 
> last_bb, bool simple_p,
>  extern basic_block emit_return_for_exit (edge exit_fallthru_edge,
>  bool simple_p);
>  extern void reposition_prologue_and_epilogue_notes (void);
> +extern poly_int64 get_stack_dynamic_offset ();
>
>  /* Returns the name of the current function.  */
>  extern const char *fndecl_name (tree);
> --
> 2.25.1
>


Re: [PATCH] explow: Avoid unnecessary alignment operations

2023-11-05 Thread Richard Biener
On Sun, Nov 5, 2023 at 7:33 PM Richard Sandiford
 wrote:
>
> align_dynamic_address would output alignment operations even
> for a required alignment of 1 byte.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

OK

> Richard
>
>
> gcc/
> * explow.cc (align_dynamic_address): Do nothing if the required
> alignment is a byte.
> ---
>  gcc/explow.cc | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/gcc/explow.cc b/gcc/explow.cc
> index aa64d5e906c..0be6d2629c9 100644
> --- a/gcc/explow.cc
> +++ b/gcc/explow.cc
> @@ -1201,6 +1201,9 @@ record_new_stack_level (void)
>  rtx
>  align_dynamic_address (rtx target, unsigned required_align)
>  {
> +  if (required_align == BITS_PER_UNIT)
> +return target;
> +
>/* CEIL_DIV_EXPR needs to worry about the addition overflowing,
>   but we know it can't.  So add ourselves and then do
>   TRUNC_DIV_EXPR.  */
> --
> 2.25.1
>


Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-05 Thread Richard Biener
On Sun, 5 Nov 2023, Richard Sandiford wrote:

> Robin Dapp  writes:
> >> Ah, OK.  IMO it's better to keep the optab operands the same as the IFN
> >> operands, even if that makes things inconsistent with vcond_mask.
> >> vcond_mask isn't really a good example to follow, since the operand
> >> order is not only inconsistent with the IFN, it's also inconsistent
> >> with the natural if_then_else order.
> >
> > v4 attached with that changed,  match.pd patterns interleaved as well
> > as scratch-handling added and VLS modes removed.  Lehua has since pushed
> > another patch that extends gimple_match_op to 6/7 operands already so
> > that could be removed as well making the patch even smaller now.
> >
> > Testsuite on riscv looks good (apart from the mentioned cond_widen...),
> > still running on aarch64 and x86.  OK if those pass?
> >
> > Regards
> >  Robin
> >
> > Subject: [PATCH v4] internal-fn: Add VCOND_MASK_LEN.
> >
> > In order to prevent simplification of a COND_OP with degenerate mask
> > (CONSTM1_RTX) into just an OP in the presence of length masking this
> > patch introduces a length-masked analog to VEC_COND_EXPR:
> > IFN_VCOND_MASK_LEN.
> >
> > It also adds new match patterns that allow the combination of
> > unconditional unary, binary and ternay operations with the
> > VCOND_MASK_LEN into a conditional operation if the target supports it.
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/111760
> >
> > * config/riscv/autovec.md (vcond_mask_len_): Add
> > expander.
> > * config/riscv/riscv-protos.h (enum insn_type): Add.
> > * config/riscv/riscv-v.cc (needs_fp_rounding): Add !pred_mov.
> > * doc/md.texi: Add vcond_mask_len.
> > * gimple-match-exports.cc (maybe_resimplify_conditional_op):
> > Create VCOND_MASK_LEN when length masking.
> > * gimple-match.h (gimple_match_op::gimple_match_op): Always
> > initialize len and bias.
> > * internal-fn.cc (vec_cond_mask_len_direct): Add.
> > (direct_vec_cond_mask_len_optab_supported_p): Add.
> > (internal_fn_len_index): Add VCOND_MASK_LEN.
> > (internal_fn_mask_index): Ditto.
> > * internal-fn.def (VCOND_MASK_LEN): New internal function.
> > * match.pd: Combine unconditional unary, binary and ternary
> > operations into the respective COND_LEN operations.
> > * optabs.def (OPTAB_D): Add vcond_mask_len optab.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/vect/vect-cond-arith-2.c: No vect cost model for
> > riscv_v.
> > ---
> >  gcc/config/riscv/autovec.md   | 26 ++
> >  gcc/config/riscv/riscv-protos.h   |  3 ++
> >  gcc/config/riscv/riscv-v.cc   |  3 +-
> >  gcc/doc/md.texi   |  9 
> >  gcc/gimple-match-exports.cc   | 13 +++--
> >  gcc/gimple-match.h|  6 ++-
> >  gcc/internal-fn.cc|  5 ++
> >  gcc/internal-fn.def   |  2 +
> >  gcc/match.pd  | 51 +++
> >  gcc/optabs.def|  1 +
> >  gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c |  1 +
> >  11 files changed, 114 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> > index cc4c9596bbf..0a5e4ccb54e 100644
> > --- a/gcc/config/riscv/autovec.md
> > +++ b/gcc/config/riscv/autovec.md
> > @@ -565,6 +565,32 @@ (define_insn_and_split "vcond_mask_"
> >[(set_attr "type" "vector")]
> >  )
> >  
> > +(define_expand "vcond_mask_len_"
> > +  [(match_operand:V 0 "register_operand")
> > +(match_operand: 1 "nonmemory_operand")
> > +(match_operand:V 2 "nonmemory_operand")
> > +(match_operand:V 3 "autovec_else_operand")
> > +(match_operand 4 "autovec_length_operand")
> > +(match_operand 5 "const_0_operand")]
> > +  "TARGET_VECTOR"
> > +  {
> > +if (satisfies_constraint_Wc1 (operands[1]))
> > +  riscv_vector::expand_cond_len_unop (code_for_pred_mov (mode),
> > + operands);
> > +else
> > +  {
> > +   /* The order of then and else is opposite to pred_merge.  */
> > +   rtx ops[] = {operands[0], operands[3], operands[3], operands[2],
> > +operands[1]};
> > +   riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (mode),
> > + riscv_vector::MERGE_OP_TU,
> > + ops, operands[4]);
> > +  }
> > +DONE;
> > +  }
> > +  [(set_attr "type" "vector")]
> > +)
> > +
> >  ;; 
> > -
> >  ;;  [BOOL] Select based on masks
> >  ;; 
> > -
> > diff --git a/gcc/config/riscv/riscv-protos.h 
> > b/gcc/config/riscv/riscv-protos.h
> > index a1be731c28e..0d0ee5effea 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-pro

[PATCH v6 0/21]middle-end: Support early break/return auto-vectorization

2023-11-05 Thread Tamar Christina
Hi All,

This patch adds initial support for early break vectorization in GCC.
The support is added for any target that implements a vector cbranch optab,
this includes both fully masked and non-masked targets.

Depending on the operation, the vectorizer may also require support for boolean
mask reductions using Inclusive OR.  This is however only checked then the
comparison would produce multiple statements.

Note: I am currently struggling to get patch 7 correct in all cases and could 
use
  some feedback there.

Concretely the kind of loops supported are of the forms:

 for (int i = 0; i < N; i++)
 {
   
   if ()
 {
   ...
   ;
 }
   
 }

where  can be:
 - break
 - return
 - goto

Any number of statements can be used before the  occurs.

Since this is an initial version for GCC 14 it has the following limitations and
features:

- Only fixed sized iterations and buffers are supported.  That is to say any
  vectors loaded or stored must be to statically allocated arrays with known
  sizes. N must also be known.  This limitation is because our primary target
  for this optimization is SVE.  For VLA SVE we can't easily do cross page
  iteraion checks. The result is likely to also not be beneficial. For that
  reason we punt support for variable buffers till we have First-Faulting
  support in GCC.
- any stores in  should not be to the same objects as in
  .  Loads are fine as long as they don't have the possibility to
  alias.  More concretely, we block RAW dependencies when the intermediate value
  can't be separated fromt the store, or the store itself can't be moved.
- Prologue peeling, alignment peelinig and loop versioning are supported.
- Fully masked loops, unmasked loops and partially masked loops are supported
- Any number of loop early exits are supported.
- No support for epilogue vectorization.  The only epilogue supported is the
  scalar final one.  Peeling code supports it but the code motion code cannot
  find instructions to make the move in the epilog.
- Early breaks are only supported for inner loop vectorization.

I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break

With the help of IPA and LTO this still gets hit quite often.  During bootstrap
it hit rather frequently.  Additionally TSVC s332, s481 and s482 all pass now
since these are tests for support for early exit vectorization.

This implementation does not support completely handling the early break inside
the vector loop itself but instead supports adding checks such that if we know
that we have to exit in the current iteration then we branch to scalar code to
actually do the final VF iterations which handles all the code in .

For the scalar loop we know that whatever exit you take you have to perform at
most VF iterations.  For vector code we only case about the state of fully
performed iteration and reset the scalar code to the (partially) remaining loop.

That is to say, the first vector loop executes so long as the early exit isn't
needed.  Once the exit is taken, the scalar code will perform at most VF extra
iterations.  The exact number depending on peeling and iteration start and which
exit was taken (natural or early).   For this scalar loop, all early exits are
treated the same.

When we vectorize we move any statement not related to the early break itself
and that would be incorrect to execute before the break (i.e. has side effects)
to after the break.  If this is not possible we decline to vectorize.

This means that we check at the start of iterations whether we are going to exit
or not.  During the analyis phase we check whether we are allowed to do this
moving of statements.  Also note that we only move the scalar statements, but
only do so after peeling but just before we start transforming statements.

Codegen:

for e.g.

#define N 803
unsigned vect_a[N];
unsigned vect_b[N];

unsigned test4(unsigned x)
{
 unsigned ret = 0;
 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i] > x)
 break;
   vect_a[i] = x;

 }
 return ret;
}

We generate for Adv. SIMD:

test4:
adrpx2, .LC0
adrpx3, .LANCHOR0
dup v2.4s, w0
add x3, x3, :lo12:.LANCHOR0
moviv4.4s, 0x4
add x4, x3, 3216
ldr q1, [x2, #:lo12:.LC0]
mov x1, 0
mov w2, 0
.p2align 3,,7
.L3:
ldr q0, [x3, x1]
add v3.4s, v1.4s, v2.4s
add v1.4s, v1.4s, v4.4s
cmhiv0.4s, v0.4s, v2.4s
umaxp   v0.4s, v0.4s, v0.4s
fmovx5, d0
cbnzx5, .L6
add w2, w2, 1
str q3, [x1, x4]
str q2, [x3, x1]
add x1, x1, 16
cmp w2, 200
bne .L3
mov w7, 3
.L2:
lsl w2, w2, 2
add x5, x3, 3216
add w6, w2, w0
sxtwx4, w2
ldr w1, [x3, x4, lsl 2]
str w6, [x5, x4, lsl 2]
cmp w0, w1
bcc .L4
   

[PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests

2023-11-05 Thread Tamar Christina
Hi All,

This adds pragma GCC novector to testcases that have showed up
since last regression run and due to this series detecting more.

Is it ok that when it comes time to commit I can just update any
new cases before committing? since this seems a cat and mouse game..

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

* gcc.dg/vect/no-scevccp-slp-30.c: Add pragma novector.
* gcc.dg/vect/no-scevccp-slp-31.c: Likewise.
* gcc.dg/vect/no-section-anchors-vect-69.c: Likewise.
* gcc.target/aarch64/vect-xorsign_exec.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
index 
00d0eca56eeca6aee6f11567629dc955c0924c74..534bee4a1669a7cbd95cf6007f28dafd23bab8da
 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
@@ -24,9 +24,9 @@ main1 ()
}
 
   /* check results:  */
-#pragma GCC novector
for (j = 0; j < N; j++)
{
+#pragma GCC novector
 for (i = 0; i < N; i++)
   {
 if (out[i*4] != 8
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
index 
48b6a9b0681cf1fe410755c3e639b825b27895b0..22817a57ef81398cc018a78597755397d20e0eb9
 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
@@ -27,6 +27,7 @@ main1 ()
 #pragma GCC novector
  for (i = 0; i < N; i++)
{
+#pragma GCC novector
 for (j = 0; j < N; j++) 
   {
 if (a[i][j] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c 
b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
index 
a0e53d5fef91868dfdbd542dd0a98dff92bd265b..0861d488e134d3f01a2fa83c56eff7174f36ddfb
 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
@@ -83,9 +83,9 @@ int main1 ()
 }
 
   /* check results:  */
-#pragma GCC novector
   for (i = 0; i < N; i++)
 {
+#pragma GCC novector
   for (j = 0; j < N; j++)
{
   if (tmp1[2].e.n[1][i][j] != 8)
@@ -103,9 +103,9 @@ int main1 ()
 }
 
   /* check results:  */
-#pragma GCC novector
   for (i = 0; i < N - NINTS; i++)
 {
+#pragma GCC novector
   for (j = 0; j < N - NINTS; j++)
{
   if (tmp2[2].e.n[1][i][j] != 8)
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c 
b/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
index 
cfa22115831272cb1d4e1a38512f10c3a1c6ad77..84f33d3f6cce9b0017fd12ab961019041245ffae
 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
@@ -33,6 +33,7 @@ main (void)
 r[i] = a[i] * __builtin_copysignf (1.0f, b[i]);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
 if (r[i] != a[i] * __builtin_copysignf (1.0f, b[i]))
   abort ();
@@ -41,6 +42,7 @@ main (void)
 rd[i] = ad[i] * __builtin_copysign (1.0d, bd[i]);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
 if (rd[i] != ad[i] * __builtin_copysign (1.0d, bd[i]))
   abort ();




-- 
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
index 
00d0eca56eeca6aee6f11567629dc955c0924c74..534bee4a1669a7cbd95cf6007f28dafd23bab8da
 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
@@ -24,9 +24,9 @@ main1 ()
}
 
   /* check results:  */
-#pragma GCC novector
for (j = 0; j < N; j++)
{
+#pragma GCC novector
 for (i = 0; i < N; i++)
   {
 if (out[i*4] != 8
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
index 
48b6a9b0681cf1fe410755c3e639b825b27895b0..22817a57ef81398cc018a78597755397d20e0eb9
 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
@@ -27,6 +27,7 @@ main1 ()
 #pragma GCC novector
  for (i = 0; i < N; i++)
{
+#pragma GCC novector
 for (j = 0; j < N; j++) 
   {
 if (a[i][j] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c 
b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
index 
a0e53d5fef91868dfdbd542dd0a98dff92bd265b..0861d488e134d3f01a2fa83c56eff7174f36ddfb
 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
@@ -83,9 +83,9 @@ int main1 ()
 }
 
   /* check results:  */
-#pragma GCC novector
   for (i = 0; i < N; i++)
 {
+#pragma GCC novector
   for (j = 0; j < N; j++)
{
   if (tmp1[2].e.n[1][i][j] != 8)
@@ -103,9 +103,9 @@ int main1 ()
 }
 
   /* check results:  */
-#pragma GCC novector
   for (i = 0; i < N - NINTS; i++)
 {
+#pragma GCC n

[PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks

2023-11-05 Thread Tamar Christina
Hi All,

When performing early break vectorization we need to be sure that the vector
operations are safe to perform.  A simple example is e.g.

 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i]*2 != x)
 break;
   vect_a[i] = x;
 }

where the store to vect_b is not allowed to be executed unconditionally since
if we exit through the early break it wouldn't have been done for the full VF
iteration.

Effective the code motion determines:
  - is it safe/possible to vectorize the function
  - what updates to the VUSES should be performed if we do
  - Which statements need to be moved
  - Which statements can't be moved:
* values that are live must be reachable through all exits
* values that aren't single use and shared by the use/def chain of the cond
  - The final insertion point of the instructions.  In the cases we have
multiple early exist statements this should be the one closest to the loop
latch itself.

After motion the loop above is:

 for (int i = 0; i < N; i++)
 {
   ... y = x + i;
   if (vect_a[i]*2 != x)
 break;
   vect_b[i] = y;
   vect_a[i] = x;

 }

The operation is split into two, during data ref analysis we determine
validity of the operation and generate a worklist of actions to perform if we
vectorize.

After peeling and just before statetement tranformation we replay this worklist
which moves the statements and updates book keeping only in the main loop that's
to be vectorized.  This includes updating of USES in exit blocks.

At the moment we don't support this for epilog nomasks since the additional
vectorized epilog's stmt UIDs are not found.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-data-refs.cc (validate_early_exit_stmts): New.
(vect_analyze_early_break_dependences): New.
(vect_analyze_data_ref_dependences): Use them.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
early_breaks.
(move_early_exit_stmts): New.
(vect_transform_loop): use it/
* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
(class _loop_vec_info): Add early_breaks, early_break_conflict,
early_break_vuses.
(LOOP_VINFO_EARLY_BREAKS): New.
(LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
(LOOP_VINFO_EARLY_BRK_VUSES): New.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 
d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c32b9ce7be77f3e1d60
 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -613,6 +613,332 @@ vect_analyze_data_ref_dependence (struct 
data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+ - Any memory access must be to a fixed size buffer.
+ - There must not be any loads and stores to the same object.
+ - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+ This implemementation is very conservative. Any overlappig loads/stores
+ that take place before the early break statement gets rejected aside from
+ WAR dependencies.
+
+ i.e.:
+
+   a[i] = 8
+   c = a[i]
+   if (b[i])
+ ...
+
+   is not allowed, but
+
+   c = a[i]
+   a[i] = 8
+   if (b[i])
+ ...
+
+   is which is the common case.
+
+   Arguments:
+ - LOOP_VINFO: loop information for the current loop.
+ - CHAIN: Currently detected sequence of instructions that need to be moved
+ if we are to vectorize this early break.
+ - FIXED: Sequences of SSA_NAMEs that must not be moved, they are 
reachable from
+ one or more cond conditions.  If this set overlaps with CHAIN 
then FIXED
+ takes precedence.  This deals with non-single use cases.
+ - LOADS: List of all loads found during traversal.
+ - BASES: List of all load data references found during traversal.
+ - GSTMT: Current position to inspect for validity.  The sequence
+ will be moved upwards from this point.
+ - REACHING_VUSE: The dominating VUSE found so far.  */
+
+static bool
+validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set *chain,
+  hash_set *fixed, vec *loads,
+  vec *bases, tree *reaching_vuse,
+  gimple_stmt_iterator *gstmt)
+{
+  if (gsi_end_p (*gstmt))
+return true;
+
+  gimple *stmt = gsi_stmt (*gstmt);
+  /* ?? Do I need to move debug statements? not quite sure..  */
+  if (gimple_has_ops (stmt)
+  && !is_gimple_debug (stmt))
+{
+  tree dest = NULL_TREE;

[PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form for early breaks

2023-11-05 Thread Tamar Christina
Hi All,

This splits the part of the function that does peeling for loops at exits to
a different function.  In this new function we also peel for early breaks.

Peeling for early breaks works by redirecting all early break exits to a
single "early break" block and combine them and the normal exit edge together
later in a different block which then goes into the epilog preheader.

This allows us to re-use all the existing code for IV updates, Additionally this
also enables correct linking for multiple vector epilogues.

flush_pending_stmts cannot be used in this scenario since it updates the PHI
nodes in the order that they are in the exit destination blocks.  This means
they are in CFG visit order.  With a single exit this doesn't matter but with
multiple exits with different live values through the different exits the order
usually does not line up.

Additionally the vectorizer helper functions expect to be able to iterate over
the nodes in the order that they occur in the loop header blocks.  This is an
invariant we must maintain.  To do this we just inline the work
flush_pending_stmts but maintain the order by using the header blocks to guide
the work.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_is_loop_exit_latch_pred): New.
(slpeel_tree_duplicate_loop_for_vectorization): New.
(slpeel_tree_duplicate_loop_to_edge_cfg): use it.
* tree-vectorizer.h (is_loop_header_bb_p): Drop assert.
(slpeel_tree_duplicate_loop_to_edge_cfg): Update signature.
(vect_is_loop_exit_latch_pred): New.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
43ca985c53ce58aa83fb9689a9ea9b20b207e0a8..6fbb5b80986fd657814b48eb009b52b094f331e6
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1444,6 +1444,151 @@ slpeel_duplicate_current_defs_from_edges (edge from, 
edge to)
 get_current_def (PHI_ARG_DEF_FROM_EDGE (from_phi, from)));
 }
 
+/* Determine if the exit choosen by the loop vectorizer differs from the
+   natural loop exit.  i.e. if the exit leads to the loop patch or not.
+   When this happens we need to flip the understanding of main and other
+   exits by peeling and IV updates.  */
+
+bool
+vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
+{
+  return single_pred (loop->latch) == loop_exit->src;
+}
+
+/* Perform peeling for when the peeled loop is placed after the original loop.
+   This maintains LCSSA and creates the appropriate blocks for multiple exit
+   vectorization.   */
+
+void static
+slpeel_tree_duplicate_loop_for_vectorization (class loop *loop, edge loop_exit,
+ vec &loop_exits, edge e,
+ class loop *new_loop,
+ bool flow_loops,
+ basic_block new_preheader)
+{
+  bool multiple_exits_p = loop_exits.length () > 1;
+  basic_block main_loop_exit_block = new_preheader;
+  if (multiple_exits_p)
+{
+  edge loop_entry = single_succ_edge (new_preheader);
+  new_preheader = split_edge (loop_entry);
+}
+
+  /* First create the empty phi nodes so that when we flush the
+ statements they can be filled in.   However because there is no order
+ between the PHI nodes in the exits and the loop headers we need to
+ order them base on the order of the two headers.  First record the new
+ phi nodes. Then redirect the edges and flush the changes.  This writes 
out the new
+SSA names.  */
+  for (auto exit : loop_exits)
+{
+  basic_block dest
+   = exit == loop_exit ? main_loop_exit_block : new_preheader;
+  redirect_edge_and_branch (exit, dest);
+}
+
+  /* Copy the current loop LC PHI nodes between the original loop exit
+ block and the new loop header.  This allows us to later split the
+ preheader block and still find the right LC nodes.  */
+  edge loop_entry = single_succ_edge (new_preheader);
+  hash_set lcssa_vars;
+  if (flow_loops)
+for (auto gsi_from = gsi_start_phis (loop->header),
+gsi_to = gsi_start_phis (new_loop->header);
+!gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+gsi_next (&gsi_from), gsi_next (&gsi_to))
+  {
+   gimple *from_phi = gsi_stmt (gsi_from);
+   gimple *to_phi = gsi_stmt (gsi_to);
+   tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, loop_latch_edge (loop));
+
+   /* In all cases, even in early break situations we're only
+  interested in the number of fully executed loop iters.  As such
+  we discard any partially done iteration.  So we simply propagate
+  the phi nodes from the latch to the merge block.  */
+   tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+   gphi *lcssa_phi = create_phi_node (new_res, main

[PATCH 6/21]middle-end: support multiple exits in loop versioning

2023-11-05 Thread Tamar Christina
Hi All,

This has loop versioning use the vectorizer's IV exit edge when it's available
since single_exit (..) fails with multiple exits.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_loop_versioning): Support multiple
exits.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
3d59119787d6afdc5a6465a547d1ea2d3d940373..58b4b9c11d8b844ee86156cdfcba7f838030a7c2
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -4180,12 +4180,24 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
 If loop versioning wasn't done from loop, but scalar_loop instead,
 merge_bb will have already just a single successor.  */
 
-  merge_bb = single_exit (loop_to_version)->dest;
+  /* Due to the single_exit check above we should only get here when
+loop == loop_to_version, that means we can use loop_vinfo to get the
+exits.  */
+  edge exit_edge = single_exit (loop_to_version);
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+   {
+ /* In early exits the main exit will fail into the merge block of the
+alternative exits.  So we need the single successor of the main
+exit here to find the merge block.  */
+ exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo);
+   }
+  gcc_assert (exit_edge);
+  merge_bb = exit_edge->dest;
   if (EDGE_COUNT (merge_bb->preds) >= 2)
{
  gcc_assert (EDGE_COUNT (merge_bb->preds) >= 2);
- new_exit_bb = split_edge (single_exit (loop_to_version));
- new_exit_e = single_exit (loop_to_version);
+ new_exit_bb = split_edge (exit_edge);
+ new_exit_e = exit_edge;
  e = EDGE_SUCC (new_exit_bb, 0);
 
  for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi);




-- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
3d59119787d6afdc5a6465a547d1ea2d3d940373..58b4b9c11d8b844ee86156cdfcba7f838030a7c2
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -4180,12 +4180,24 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
 If loop versioning wasn't done from loop, but scalar_loop instead,
 merge_bb will have already just a single successor.  */
 
-  merge_bb = single_exit (loop_to_version)->dest;
+  /* Due to the single_exit check above we should only get here when
+loop == loop_to_version, that means we can use loop_vinfo to get the
+exits.  */
+  edge exit_edge = single_exit (loop_to_version);
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+   {
+ /* In early exits the main exit will fail into the merge block of the
+alternative exits.  So we need the single successor of the main
+exit here to find the merge block.  */
+ exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo);
+   }
+  gcc_assert (exit_edge);
+  merge_bb = exit_edge->dest;
   if (EDGE_COUNT (merge_bb->preds) >= 2)
{
  gcc_assert (EDGE_COUNT (merge_bb->preds) >= 2);
- new_exit_bb = split_edge (single_exit (loop_to_version));
- new_exit_e = single_exit (loop_to_version);
+ new_exit_bb = split_edge (exit_edge);
+ new_exit_e = exit_edge;
  e = EDGE_SUCC (new_exit_bb, 0);
 
  for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi);





[PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch

2023-11-05 Thread Tamar Christina
Hi All,

As requested, the vectorizer is now free to pick it's own exit which can be
different than what the loop CFG infrastucture uses.  The vectorizer makes use
of this to vectorize loops that it previously could not.

But this means that loop control must be materialized in the block that needs it
less we corrupt the SSA chain.  This makes it so we use the vectorizer's main
IV block instead of the loop infra.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-ssa-loop-manip.cc (standard_iv_increment_position): Conditionally
take dest BB.
* tree-ssa-loop-manip.h (standard_iv_increment_position): Likewise.
* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Use it.
(vect_set_loop_condition_partial_vectors_avx512): Likewise.
(vect_set_loop_condition_normal): Likewise.

--- inline copy of patch -- 
diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
index 
bda09f51d5619420331c513a9906831c779fd2b4..5938588c8882d842b00301423df111cbe7bf7ba8
 100644
--- a/gcc/tree-ssa-loop-manip.h
+++ b/gcc/tree-ssa-loop-manip.h
@@ -38,7 +38,8 @@ extern basic_block split_loop_exit_edge (edge, bool = false);
 extern basic_block ip_end_pos (class loop *);
 extern basic_block ip_normal_pos (class loop *);
 extern void standard_iv_increment_position (class loop *,
-   gimple_stmt_iterator *, bool *);
+   gimple_stmt_iterator *, bool *,
+   basic_block = NULL);
 extern bool
 gimple_duplicate_loop_body_to_header_edge (class loop *, edge, unsigned int,
   sbitmap, edge, vec *, int);
diff --git a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc
index 
e7436915e01297e7af2a3bcf1afd01e014de6f32..bdc7a3d74a788f450ca5dde6c29492ce4d4e4550
 100644
--- a/gcc/tree-ssa-loop-manip.cc
+++ b/gcc/tree-ssa-loop-manip.cc
@@ -792,14 +792,19 @@ ip_normal_pos (class loop *loop)
 
 /* Stores the standard position for induction variable increment in LOOP
(just before the exit condition if it is available and latch block is empty,
-   end of the latch block otherwise) to BSI.  INSERT_AFTER is set to true if
-   the increment should be inserted after *BSI.  */
+   end of the latch block otherwise) to BSI.  If DEST_BB is specified then that
+   basic block is used as the destination instead of the loop latch source
+   block.  INSERT_AFTER is set to true if the increment should be inserted 
after
+   *BSI.  */
 
 void
 standard_iv_increment_position (class loop *loop, gimple_stmt_iterator *bsi,
-   bool *insert_after)
+   bool *insert_after, basic_block dest_bb)
 {
-  basic_block bb = ip_normal_pos (loop), latch = ip_end_pos (loop);
+  basic_block bb = dest_bb;
+  if (!bb)
+bb = ip_normal_pos (loop);
+  basic_block latch = ip_end_pos (loop);
   gimple *last = last_nondebug_stmt (latch);
 
   if (!bb
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
6fbb5b80986fd657814b48eb009b52b094f331e6..3d59119787d6afdc5a6465a547d1ea2d3d940373
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -531,7 +531,8 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree index_before_incr, index_after_incr;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
-  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  standard_iv_increment_position (loop, &incr_gsi, &insert_after, exit_e->src);
   if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
 {
   /* Create an IV that counts down from niters_total and whose step
@@ -1017,7 +1018,8 @@ vect_set_loop_condition_partial_vectors_avx512 (class 
loop *loop,
   tree index_before_incr, index_after_incr;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
-  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
+ exit_edge->src);
   create_iv (niters_adj, MINUS_EXPR, iv_step, NULL_TREE, loop,
 &incr_gsi, insert_after, &index_before_incr,
 &index_after_incr);
@@ -1185,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class 
loop *loop,
loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
class loop *loop, tree niters, tree step,
tree final_iv, bool niters_maybe_zero,
gimple_stmt_iterator loop_cond_gsi)
@@ -1278,7 +1280,8 @@ vect_set_loop_condition_normal (loop_vec_info /* 
loop_vinfo */, edge exit_e

[PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits

2023-11-05 Thread Tamar Christina
Hi All,

This changes the PHI node updates to support early breaks.
It has to support both the case where the loop's exit matches the normal loop
exit and one where the early exit is "inverted", i.e. it's an early exit edge.

In the latter case we must always restart the loop for VF iterations.  For an
early exit the reason is obvious, but there are cases where the "normal" exit
is located before the early one.  This exit then does a check on ivtmp resulting
in us leaving the loop since it thinks we're done.

In these case we may still have side-effects to perform so we also go to the
scalar loop.

For the "normal" exit niters has already been adjusted for peeling, for the
early exits we must find out how many iterations we actually did.  So we have
to recalculate the new position for each exit.

This works, however ./gcc/testsuite/gcc.dg/vect/vect-early-break_76.c is
currently giving me a runtime failure, but I cannot seem to tell why.

The generated control looks correct to me, See loop 1:
https://gist.github.com/Mistuke/78b439de05e303ac6de5438dd83f079b

Any help in pointing out the mistake is appreciated.

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide unused.
(vect_is_loop_exit_latch_pred): Mark inline
(vect_update_ivs_after_vectorizer): Support early break.
(vect_do_peeling): Use it.
(find_guard_arg): Keep the same value.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
58b4b9c11d8b844ee86156cdfcba7f838030a7c2..abd905b78f3661f80168c3866d7c3e68a9c15521
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1187,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class 
loop *loop,
loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
class loop *loop, tree niters, tree step,
tree final_iv, bool niters_maybe_zero,
gimple_stmt_iterator loop_cond_gsi)
@@ -1452,7 +1452,7 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge 
to)
When this happens we need to flip the understanding of main and other
exits by peeling and IV updates.  */
 
-bool
+bool inline
 vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
 {
   return single_pred (loop->latch) == loop_exit->src;
@@ -2193,6 +2193,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
  Input:
  - LOOP - a loop that is going to be vectorized. The last few iterations
   of LOOP were peeled.
+ - VF   - The chosen vectorization factor for LOOP.
  - NITERS - the number of iterations that LOOP executes (before it is
 vectorized). i.e, the number of times the ivs should be bumped.
  - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path
@@ -2203,6 +2204,9 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
   The phi args associated with the edge UPDATE_E in the bb
   UPDATE_E->dest are updated accordingly.
 
+ - MAIN_EXIT_P - Indicates whether UPDATE_E is twhat the vectorizer
+considers the main loop exit.
+
  Assumption 1: Like the rest of the vectorizer, this function assumes
  a single loop exit that has a single predecessor.
 
@@ -2220,18 +2224,21 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
  */
 
 static void
-vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
- tree niters, edge update_e)
+vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, poly_uint64 vf,
+ tree niters, edge update_e, bool main_exit_p)
 {
   gphi_iterator gsi, gsi1;
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
+  bool inversed_iv
+   = !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
+LOOP_VINFO_LOOP (loop_vinfo));
 
-  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-
-  /* Make sure there exists a single-predecessor exit bb:  */
-  gcc_assert (single_pred_p (exit_bb));
-  gcc_assert (single_succ_edge (exit_bb) == update_e);
+  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  gcond *cond = get_loop_exit_condition (loop_e);
+  basic_block exit_bb = loop_e->dest;
+  basic_block iv_block = NULL;
+  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
 
   for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
!gsi_end_p (gsi) && !gsi_end_p (gsi1);
@@ -2241,7 +2248,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info 
loop_vinfo,
   tree step_expr, off;
   tree type;
   tree var, ni, ni_name;
-  gimple_stmt_iterator last_gsi;
 
   gph

[PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits

2023-11-05 Thread Tamar Christina
Hi All,

This adds support to vectorizable_live_reduction to handle multiple exits by
doing a search for which exit the live value should be materialized in.

Additinally which value in the index we're after depends on whether the exit
it's materialized in is an early exit or whether the loop's main exit is
different from the loop's natural one (i.e. the one with the same src block as
the latch).

In those two cases we want the first rather than the last value as we're going
to restart the iteration in the scalar loop.  For VLA this means we need to
reverse both the mask and vector since there's only a way to get the last
active element and not the first.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
* tree-vectorizer.h (perm_mask_for_reverse): Expose.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
c123398aad207082384a2079c5234033c3d825ea..55d6aee3d29151e6b528f6fdde15c693e5bdd847
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10503,12 +10503,56 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
   lhs' = new_tree;  */
 
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+  /* A value can only be live in one exit.  So figure out which one.  */
+  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  /* Check if we have a loop where the chosen exit is not the main exit,
+in these cases for an early break we restart the iteration the vector 
code
+did.  For the live values we want the value at the start of the 
iteration
+rather than at the end.  */
+  bool inverted_ctrl_p = false;
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+   {
+ FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+   if (!is_gimple_debug (use_stmt)
+   && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+ {
+   basic_block use_bb = gimple_bb (use_stmt);
+   for (auto edge : get_loop_exit_edges (loop))
+ {
+   /* Alternative exits can have an intermediate BB in
+  between to update the IV.  In those cases we need to
+  look one block further.  */
+   if (use_bb == edge->dest
+   || (single_succ_p (edge->dest)
+   && use_bb == single_succ (edge->dest)))
+ {
+   exit_e = edge;
+   goto found;
+ }
+ }
+ }
+found:
+ /* If the edge isn't a single pred then split the edge so we have a
+location to place the live operations.  Perhaps we should always
+split during IV updating.  But this way the CFG is cleaner to
+follow.  */
+ inverted_ctrl_p = !vect_is_loop_exit_latch_pred (exit_e, loop);
+ if (!single_pred_p (exit_e->dest))
+   exit_e = single_pred_edge (split_edge (exit_e));
+
+ /* For early exit where the exit is not in the BB that leads to the
+latch then we're restarting the iteration in the scalar loop. So
+get the first live value.  */
+ if (inverted_ctrl_p)
+   bitstart = build_zero_cst (TREE_TYPE (bitstart));
+   }
+
+  basic_block exit_bb = exit_e->dest;
   gcc_assert (single_pred_p (exit_bb));
 
   tree vec_lhs_phi = copy_ssa_name (vec_lhs);
   gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, 
vec_lhs);
+  SET_PHI_ARG_DEF (phi, exit_e->dest_idx, vec_lhs);
 
   gimple_seq stmts = NULL;
   tree new_tree;
@@ -10539,6 +10583,12 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
  len, bias_minus_one);
 
+ /* This needs to implement extraction of the first index, but not sure
+how the LEN stuff works.  At the moment we shouldn't get here since
+there's no LEN support for early breaks.  But guard this so there's
+no incorrect codegen.  */
+ gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
  /* SCALAR_RES = VEC_EXTRACT .  */
  tree scalar_res
= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
@@ -10563,8 +10613,37 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
  &LOOP_VINFO_MASKS (loop_vinfo),
  1, vectype, 0);
  gimple_seq_add_seq (&stmts, tem);

[PATCH 10/21]middle-end: implement relevancy analysis support for control flow

2023-11-05 Thread Tamar Christina
Hi All,

This updates relevancy analysis to support marking gcond's belonging to early
breaks as relevant for vectorization.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_stmt_relevant_p,
vect_mark_stmts_to_be_vectorized, vect_analyze_stmt, vect_is_simple_use,
vect_get_vector_types_for_stmt): Support early breaks.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
4809b822632279493a843d402a833c9267bb315e..31474e923cc3feb2604ca2882ecfb300cd211679
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -359,9 +359,14 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, 
loop_vec_info loop_vinfo,
   *live_p = false;
 
   /* cond stmt other than loop exit cond.  */
-  if (is_ctrl_stmt (stmt_info->stmt)
-  && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
-*relevant = vect_used_in_scope;
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  if (is_ctrl_stmt (stmt) && is_a  (stmt))
+{
+  gcond *cond = as_a  (stmt);
+  if (LOOP_VINFO_LOOP_CONDS (loop_vinfo).contains (cond)
+ && LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != cond)
+   *relevant = vect_used_in_scope;
+}
 
   /* changing memory.  */
   if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
@@ -374,6 +379,11 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, 
loop_vec_info loop_vinfo,
*relevant = vect_used_in_scope;
   }
 
+  auto_vec exits = get_loop_exit_edges (loop);
+  auto_bitmap exit_bbs;
+  for (edge exit : exits)
+bitmap_set_bit (exit_bbs, exit->dest->index);
+
   /* uses outside the loop.  */
   FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter, SSA_OP_DEF)
 {
@@ -392,7 +402,6 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, 
loop_vec_info loop_vinfo,
  /* We expect all such uses to be in the loop exit phis
 (because of loop closed form)   */
  gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
- gcc_assert (bb == single_exit (loop)->dest);
 
   *live_p = true;
}
@@ -793,6 +802,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info 
loop_vinfo, bool *fatal)
return res;
}
  }
+   }
+ else if (gcond *cond = dyn_cast  (stmt_vinfo->stmt))
+   {
+ enum tree_code rhs_code = gimple_cond_code (cond);
+ gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
+ opt_result res
+   = process_use (stmt_vinfo, gimple_cond_lhs (cond),
+  loop_vinfo, relevant, &worklist, false);
+ if (!res)
+   return res;
+ res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
+   loop_vinfo, relevant, &worklist, false);
+ if (!res)
+   return res;
 }
  else if (gcall *call = dyn_cast  (stmt_vinfo->stmt))
{
@@ -13043,11 +13066,15 @@ vect_analyze_stmt (vec_info *vinfo,
 node_instance, cost_vec);
   if (!res)
return res;
-   }
+}
+
+  if (is_ctrl_stmt (stmt_info->stmt))
+STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
 
   switch (STMT_VINFO_DEF_TYPE (stmt_info))
 {
   case vect_internal_def:
+  case vect_early_exit_def:
 break;
 
   case vect_reduction_def:
@@ -13080,6 +13107,7 @@ vect_analyze_stmt (vec_info *vinfo,
 {
   gcall *call = dyn_cast  (stmt_info->stmt);
   gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
+ || gimple_code (stmt_info->stmt) == GIMPLE_COND
  || (call && gimple_call_lhs (call) == NULL_TREE));
   *need_to_vectorize = true;
 }
@@ -13835,6 +13863,14 @@ vect_is_simple_use (vec_info *vinfo, stmt_vec_info 
stmt, slp_tree slp_node,
  else
*op = gimple_op (ass, operand + 1);
}
+  else if (gcond *cond = dyn_cast  (stmt->stmt))
+   {
+ gimple_match_op m_op;
+ if (!gimple_extract_op (cond, &m_op))
+   return false;
+ gcc_assert (m_op.code.is_tree_code ());
+ *op = m_op.ops[operand];
+   }
   else if (gcall *call = dyn_cast  (stmt->stmt))
*op = gimple_call_arg (call, operand);
   else
@@ -14445,6 +14481,8 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, 
stmt_vec_info stmt_info,
   *nunits_vectype_out = NULL_TREE;
 
   if (gimple_get_lhs (stmt) == NULL_TREE
+  /* Allow vector conditionals through here.  */
+  && !is_ctrl_stmt (stmt)
   /* MASK_STORE has no lhs, but is ok.  */
   && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
 {
@@ -14461,7 +14499,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, 
stmt_vec_info stmt_info,
}
 
   return opt_result::failure_at (stmt,
-"not vector

[PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-11-05 Thread Tamar Christina
Hi All,

This implements vectorable_early_exit which is used as the codegen part of
vectorizing a gcond.

For the most part it shares the majority of the code with
vectorizable_comparison with addition that it needs to be able to reduce
multiple resulting statements into a single one for use in the gcond, and also
needs to be able to perform masking on the comparisons.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
lhs.
(vectorizable_early_exit): New.
(vect_analyze_stmt, vect_transform_stmt): Use it.
(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
36aeca60a22cfaea8d3b43348000d75de1d525c7..4809b822632279493a843d402a833c9267bb315e
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12475,7 +12475,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree 
vectype,
   vec vec_oprnds0 = vNULL;
   vec vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
 return false;
@@ -12615,8 +12615,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree 
vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 rhs1, &vec_oprnds0, vectype,
@@ -12630,7 +12631,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree 
vectype,
   gimple *new_stmt;
   vec_rhs2 = vec_oprnds1[i];
 
-  new_temp = make_ssa_name (mask);
+  if (lhs)
+   new_temp = make_ssa_name (mask);
+  else
+   new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
   if (bitop1 == NOP_EXPR)
{
  new_stmt = gimple_build_assign (new_temp, code,
@@ -12709,6 +12713,196 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+gimple_stmt_iterator *gsi, gimple **vec_stmt,
+slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast  (vinfo);
+  if (!loop_vinfo
+  || !is_a  (STMT_VINFO_STMT (stmt_info)))
+return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
+return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+return false;
+
+  gimple_match_op op;
+  if (!gimple_extract_op (stmt_info->stmt, &op))
+gcc_unreachable ();
+  gcc_assert (op.code.is_tree_code ());
+  auto code = tree_code (op.code);
+
+  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype_out);
+
+  tree var_op = op.ops[0];
+
+  /* When vectorizing things like pointer comparisons we will assume that
+ the VF of both operands are the same. e.g. a pointer must be compared
+ to a pointer.  We'll leave this up to vectorizable_comparison_1 to
+ check further.  */
+  tree vectype_op = vectype_out;
+  if (SSA_VAR_P (var_op))
+{
+  stmt_vec_info operand0_info
+   = loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (var_op));
+  if (!operand0_info)
+   return false;
+
+  /* If we're in a pattern get the type of the original statement.  */
+  if (STMT_VINFO_IN_PATTERN_P (operand0_info))
+   operand0_info = STMT_VINFO_RELATED_STMT (operand0_info);
+  vectype_op = STMT_VINFO_VECTYPE (operand0_info);
+}
+
+  tree truth_type = truth_type_for (vectype_op);
+  machine_mode mode = TYPE_MODE (truth_type);
+  int ncopies;
+
+  if (slp_node)
+ncopies = 1;
+  else
+ncopies = vect_get_num_copies (loop_vinfo, truth_type);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+{
+  if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+   {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "can't vectorize early exit because the "
+  "target doesn't support flag setting vector "
+  "comparisons.\n");
+ return false;
+   }
+
+  if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
+   {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "can't vectorize early exit because the "
+ 

[PATCH 11/21]middle-end: wire through peeling changes and dominator updates after guard edge split

2023-11-05 Thread Tamar Christina
Hi All,

This wires through the final bits to support adding the guard block between
the loop and epilog.

For an "inverted loop", i.e. one where an early exit was chosen as the main
exit then we can never skip the scalar loop since we know we have side effects
to still perform.  For those cases we always restart the scalar loop regardless
of which exit is taken.

When we add the guard we also need to update the dominators calculated by the
peeling code as the bypass edge changes the dominators.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_do_peeling):

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
abd905b78f3661f80168c3866d7c3e68a9c15521..eef2bb50c1505f5cf802d5d80300affc2cbe69f6
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3512,11 +3512,14 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
   epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
   edge epilog_e = vect_epilogues ? e : scalar_e;
   edge new_epilog_e = NULL;
-  epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e, epilog,
-  epilog_e, e,
-  &new_epilog_e);
+  auto_vec doms;
+  epilog
+   = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e, epilog, epilog_e, e,
+ &new_epilog_e, true, &doms);
+
   LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
   gcc_assert (epilog);
+  gcc_assert (new_epilog_e);
   epilog->force_vectorize = false;
   bb_before_epilog = loop_preheader_edge (epilog)->src;
 
@@ -3610,10 +3613,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
 
}
 
-  if (skip_epilog)
+  if (skip_epilog || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
{
- guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
+ /* For the case where a different exit was chosen we must execute
+the scalar loop with the remaining iterations.  */
+ if (inversed_iv)
+   guard_cond = boolean_false_node;
+ else
+   guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
niters, niters_vector_mult_vf);
+
  guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
  edge epilog_e = LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
  guard_to = split_edge (epilog_e);
@@ -3621,11 +3630,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
   skip_vector ? anchor : guard_bb,
   prob_epilog.invert (),
   irred_flag);
+ doms.safe_push (guard_to);
  if (vect_epilogues)
epilogue_vinfo->skip_this_loop_edge = guard_e;
  edge main_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
  slpeel_update_phi_nodes_for_guard2 (loop, epilog, main_iv, guard_e,
  epilog_e);
+
  /* Only need to handle basic block before epilog loop if it's not
 the guard_bb, which is the case when skip_vector is true.  */
  if (guard_bb != bb_before_epilog)
@@ -3637,6 +3648,10 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
  scale_loop_profile (epilog, prob_epilog, -1);
}
 
+  /* Recalculate the dominators after adding the guard edge.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+   iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+
   unsigned HOST_WIDE_INT bound;
   if (bound_scalar.is_constant (&bound))
{




-- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
abd905b78f3661f80168c3866d7c3e68a9c15521..eef2bb50c1505f5cf802d5d80300affc2cbe69f6
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3512,11 +3512,14 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
   epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
   edge epilog_e = vect_epilogues ? e : scalar_e;
   edge new_epilog_e = NULL;
-  epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e, epilog,
-  epilog_e, e,
-  &new_epilog_e);
+  auto_vec doms;
+  epilog
+   = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e, epilog, epilog_e, e,
+ &new_epilog_e, true, &doms);
+
   LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
   gcc_assert (epilog);
+  gcc_assert (new_epilog_e);
   epilog->force_vectorize = false;
   bb_before_epilog = loop_preheader_edge (

[PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks

2023-11-05 Thread Tamar Christina
Hi All,

This finishes wiring that didn't fit in any of the other patches.
Essentially just adding related changes so peeling for early break works.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_set_loop_condition_normal,
vect_do_peeling): Support early breaks.
* tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p): Likewise.
* tree-vectorizer.cc (pass_vectorize::execute): Check all exits.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
eef2bb50c1505f5cf802d5d80300affc2cbe69f6..9c1405d79fd8fe8689007df3b7605b7a3d3ecdd7
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1187,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class 
loop *loop,
loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
class loop *loop, tree niters, tree step,
tree final_iv, bool niters_maybe_zero,
gimple_stmt_iterator loop_cond_gsi)
@@ -1296,7 +1296,8 @@ vect_set_loop_condition_normal (loop_vec_info /* 
loop_vinfo */, edge exit_edge,
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
 
   /* Record the number of latch iterations.  */
-  if (limit == niters)
+  if (limit == niters
+  || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
 /* Case A: the loop iterates NITERS times.  Subtract one to get the
latch count.  */
 loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
@@ -3242,6 +3243,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
 bound_epilog += vf - 1;
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
 bound_epilog += 1;
+
+  /* For early breaks the scalar loop needs to execute at most VF times
+ to find the element that caused the break.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+{
+  bound_epilog = vf;
+  /* Force a scalar epilogue as we can't vectorize the index finding.  */
+  vect_epilogues = false;
+}
+
   bool epilog_peeling = maybe_ne (bound_epilog, 0U);
   poly_uint64 bound_scalar = bound_epilog;
 
@@ -3376,14 +3387,23 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
  bound_prolog + bound_epilog)
  : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
 || vect_epilogues));
+
+  /* We only support early break vectorization on known bounds at this time.
+ This means that if the vector loop can't be entered then we won't generate
+ it at all.  So for now force skip_vector off because the additional 
control
+ flow messes with the BB exits and we've already analyzed them.  */
+ skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
+
   /* Epilog loop must be executed if the number of iterations for epilog
  loop is known at compile time, otherwise we need to add a check at
  the end of vector loop and skip to the end of epilog loop.  */
   bool skip_epilog = (prolog_peeling < 0
  || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
  || !vf.is_constant ());
-  /* PEELING_FOR_GAPS is special because epilog loop must be executed.  */
-  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+  /* PEELING_FOR_GAPS and peeling for early breaks are special because epilog
+ loop must be executed.  */
+  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+  || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
 skip_epilog = false;
 
   class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
55d6aee3d29151e6b528f6fdde15c693e5bdd847..51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1236,6 +1236,14 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info 
loop_vinfo)
 th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO
  (loop_vinfo));
 
+  /* When we have multiple exits and VF is unknown, we must require partial
+ vectors because the loop bounds is not a minimum but a maximum.  That is 
to
+ say we cannot unpredicate the main loop unless we peel or use partial
+ vectors in the epilogue.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+  && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
+return true;
+
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
   && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
 {
@@ -3149,7 +3157,8 @@ start_over:
 
   /* If an epilogue loop is required make sure we can create one.  */
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
-  || L

[PATCH 13/21]middle-end: Update loop form analysis to support early break

2023-11-05 Thread Tamar Christina
Hi All,

This sets LOOP_VINFO_EARLY_BREAKS and does some misc changes so the other
patches are self contained.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop.cc (vect_analyze_loop_form): Analyse all exits.
(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
(vect_transform_loop): Use it.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb991f07cd6052491d0
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost 
(loop_vec_info loop_vinfo)
   loop_vinfo->scalar_costs->finish_cost (nullptr);
 }
 
-
 /* Function vect_analyze_loop_form.
 
Verify that certain CFG restrictions hold, including:
- the loop has a pre-header
-   - the loop has a single entry and exit
+   - the loop has a single entry
+   - nested loops can have only a single exit.
- the loop exit condition is simple enough
- the number of iterations can be analyzed, i.e, a countable loop.  The
  niter could be analyzed under some assumptions.  */
@@ -1841,10 +1841,14 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
   "not vectorized: latch block not empty.\n");
 
   /* Make sure the exit is not abnormal.  */
-  if (exit_e->flags & EDGE_ABNORMAL)
-return opt_result::failure_at (vect_location,
-  "not vectorized:"
-  " abnormal loop exit edge.\n");
+  auto_vec exits = get_loop_exit_edges (loop);
+  for (edge e : exits)
+{
+  if (e->flags & EDGE_ABNORMAL)
+   return opt_result::failure_at (vect_location,
+  "not vectorized:"
+  " abnormal loop exit edge.\n");
+}
 
   info->conds
 = vect_get_loop_niters (loop, exit_e, &info->assumptions,
@@ -1920,6 +1924,10 @@ vect_create_loop_vinfo (class loop *loop, 
vec_info_shared *shared,
 
   LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
 
+  /* Check to see if we're vectorizing multiple exits.  */
+  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+= !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
+
   if (info->inner_loop_cond)
 {
   stmt_vec_info inner_loop_cond_info
@@ -11577,7 +11585,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
   /* Make sure there exists a single-predecessor exit bb.  Do this before 
  versioning.   */
   edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
-  if (! single_pred_p (e->dest))
+  if (! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
 {
   split_loop_exit_edge (e, true);
   if (dump_enabled_p ())




-- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb991f07cd6052491d0
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost 
(loop_vec_info loop_vinfo)
   loop_vinfo->scalar_costs->finish_cost (nullptr);
 }
 
-
 /* Function vect_analyze_loop_form.
 
Verify that certain CFG restrictions hold, including:
- the loop has a pre-header
-   - the loop has a single entry and exit
+   - the loop has a single entry
+   - nested loops can have only a single exit.
- the loop exit condition is simple enough
- the number of iterations can be analyzed, i.e, a countable loop.  The
  niter could be analyzed under some assumptions.  */
@@ -1841,10 +1841,14 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
   "not vectorized: latch block not empty.\n");
 
   /* Make sure the exit is not abnormal.  */
-  if (exit_e->flags & EDGE_ABNORMAL)
-return opt_result::failure_at (vect_location,
-  "not vectorized:"
-  " abnormal loop exit edge.\n");
+  auto_vec exits = get_loop_exit_edges (loop);
+  for (edge e : exits)
+{
+  if (e->flags & EDGE_ABNORMAL)
+   return opt_result::failure_at (vect_location,
+  "not vectorized:"
+  " abnormal loop exit edge.\n");
+}
 
   info->conds
 = vect_get_loop_niters (loop, exit_e, &info->assumptions,
@@ -1920,6 +1924,10 @@ vect_create_loop_vinfo (class loop *loop, 
vec_info_shared *shared,
 
   LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
 
+  /* Check to see if we're vectorizing multiple exits.  */
+  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+= !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
+
   if (info->inner_loop_cond)
 {
   stmt_vec_info inner_loop_cond_info
@@ -11577,7 +11585,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
   /* Make sure there e

[PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg

2023-11-05 Thread Tamar Christina
Hi All,

The vectorizer at the moment uses a num_bb check to check for control flow.
This rejects a number of loops with no reason.  Instead this patch changes it
to check the destination of the exits instead.

This also allows early break to work by also dropping the single_exit check.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop-manip.cc (slpeel_can_duplicate_loop_p):
* tree-vect-loop.cc (vect_analyze_loop_form):

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
9c1405d79fd8fe8689007df3b7605b7a3d3ecdd7..466cf4c47154099a33dc63e22d74eef42d282444
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1937,12 +1937,10 @@ slpeel_can_duplicate_loop_p (const class loop *loop, 
const_edge exit_e,
   edge entry_e = loop_preheader_edge (loop);
   gcond *orig_cond = get_loop_exit_condition (exit_e);
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
-  unsigned int num_bb = loop->inner? 5 : 2;
 
   /* All loops have an outer scope; the only case loop->outer is NULL is for
  the function itself.  */
   if (!loop_outer (loop)
-  || loop->num_nodes != num_bb
   || !empty_block_p (loop->latch)
   || !exit_e
   /* Verify that new loop exit condition can be trivially modified.  */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
ddb6cad60f2f2cfdc96732f3f256d86e315d7357..27ab6abfa854f14f8a4cf3d9fcb1ac1c203a4198
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1727,6 +1727,17 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
   "using as main loop exit: %d -> %d [AUX: %p]\n",
   exit_e->src->index, exit_e->dest->index, exit_e->aux);
 
+  /* Check if we have any control flow that doesn't leave the loop.  */
+  class loop *v_loop = loop->inner ? loop->inner : loop;
+  basic_block *bbs= get_loop_body (v_loop);
+  for (unsigned i = 0; i < v_loop->num_nodes; i++)
+if (!empty_block_p (bbs[i])
+   && !loop_exits_from_bb_p (v_loop, bbs[i])
+   && bbs[i]->loop_father == v_loop)
+  return opt_result::failure_at (vect_location,
+"not vectorized:"
+" unsupported control flow in loop.\n");
+
   /* Different restrictions apply when we are considering an inner-most loop,
  vs. an outer (nested) loop.
  (FORNOW. May want to relax some of these restrictions in the future).  */
@@ -1746,11 +1757,6 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
|
 (exit-bb)  */
 
-  if (loop->num_nodes != 2)
-   return opt_result::failure_at (vect_location,
-  "not vectorized:"
-  " control flow in loop.\n");
-
   if (empty_block_p (loop->header))
return opt_result::failure_at (vect_location,
   "not vectorized: empty loop.\n");
@@ -1782,11 +1788,6 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
   "not vectorized:"
   " multiple nested loops.\n");
 
-  if (loop->num_nodes != 5)
-   return opt_result::failure_at (vect_location,
-  "not vectorized:"
-  " control flow in loop.\n");
-
   entryedge = loop_preheader_edge (innerloop);
   if (entryedge->src != loop->header
  || !single_exit (innerloop)
@@ -1823,9 +1824,6 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
   info->inner_loop_cond = inner.conds[0];
 }
 
-  if (!single_exit (loop))
-return opt_result::failure_at (vect_location,
-  "not vectorized: multiple exits.\n");
   if (EDGE_COUNT (loop->header->preds) != 2)
 return opt_result::failure_at (vect_location,
   "not vectorized:"




-- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
9c1405d79fd8fe8689007df3b7605b7a3d3ecdd7..466cf4c47154099a33dc63e22d74eef42d282444
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1937,12 +1937,10 @@ slpeel_can_duplicate_loop_p (const class loop *loop, 
const_edge exit_e,
   edge entry_e = loop_preheader_edge (loop);
   gcond *orig_cond = get_loop_exit_condition (exit_e);
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
-  unsigned int num_bb = loop->inner? 5 : 2;
 
   /* All loops have an outer scope; the only case loop->outer is NULL is for
  the function itself.  */
   if (!loop_outer (loop)
-  || loop->num_nodes != num_bb
   || !empty_block_p (loop->latch)
   || !exit_e
   /* Verify that new loop exit condition can be 

[PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging

2023-11-05 Thread Tamar Christina
Hi All,

What do people think about having the ability to force only the latch connected
exit as the exit as a param? I.e. what's in the patch but as a param.

I found this useful when debugging large example failures as it tells me where
I should be looking.  No hard requirement but just figured I'd ask if we should.

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop.cc (vec_init_loop_exit_info): Allow forcing of exit.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
27ab6abfa854f14f8a4cf3d9fcb1ac1c203a4198..d6b35372623e94e02965510ab557cb568c302ebe
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -964,6 +964,7 @@ vec_init_loop_exit_info (class loop *loop)
   if (exits.length () == 1)
 return exits[0];
 
+#if 0
   /* If we have multiple exits we only support counting IV at the moment.  
Analyze
  all exits and return one */
   class tree_niter_desc niter_desc;
@@ -982,6 +983,16 @@ vec_init_loop_exit_info (class loop *loop)
 }
 
   return candidate;
+#else
+  basic_block bb = ip_normal_pos (loop);
+  if (!bb)
+return NULL;
+
+  edge exit = EDGE_SUCC (bb, 0);
+  if (exit->dest == loop->latch)
+return EDGE_SUCC (bb, 1);
+  return exit;
+#endif
 }
 
 /* Function bb_in_loop_p




-- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
27ab6abfa854f14f8a4cf3d9fcb1ac1c203a4198..d6b35372623e94e02965510ab557cb568c302ebe
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -964,6 +964,7 @@ vec_init_loop_exit_info (class loop *loop)
   if (exits.length () == 1)
 return exits[0];
 
+#if 0
   /* If we have multiple exits we only support counting IV at the moment.  
Analyze
  all exits and return one */
   class tree_niter_desc niter_desc;
@@ -982,6 +983,16 @@ vec_init_loop_exit_info (class loop *loop)
 }
 
   return candidate;
+#else
+  basic_block bb = ip_normal_pos (loop);
+  if (!bb)
+return NULL;
+
+  edge exit = EDGE_SUCC (bb, 0);
+  if (exit->dest == loop->latch)
+return EDGE_SUCC (bb, 1);
+  return exit;
+#endif
 }
 
 /* Function bb_in_loop_p





[PATCH 16/21]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization

2023-11-05 Thread Tamar Christina
Hi All,

I didn't want these to get lost in the noise of updates.

The following three tests now correctly work for targets that have an
implementation of cbranch for vectors so XFAILs are conditionally removed gated
on vect_early_break support.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/vect-tsvc-s332.c: Remove xfail when early break
supported.
* gcc.dg/vect/tsvc/vect-tsvc-s481.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s482.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
index 
3fd490b3797d9f033c8804b813ee6e222aa45a3b..f3227bf064856c800d3152e62d2c4921bbe0d062
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
@@ -49,4 +49,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
index 
bf98e173d2e6315ffc45477642eab7f9441c4376..441fdb2a41969c7beaf90714474802a87c0e6d04
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break} } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
index 
c4e26806292af03d59d5b9dc13777ba36831c7fc..5f2d2bf96c5bfc77e7c788ceb3f6d6beb677a367
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
@@ -37,4 +37,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break } } } } */




-- 
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
index 
3fd490b3797d9f033c8804b813ee6e222aa45a3b..f3227bf064856c800d3152e62d2c4921bbe0d062
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
@@ -49,4 +49,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
index 
bf98e173d2e6315ffc45477642eab7f9441c4376..441fdb2a41969c7beaf90714474802a87c0e6d04
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break} } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
index 
c4e26806292af03d59d5b9dc13777ba36831c7fc..5f2d2bf96c5bfc77e7c788ceb3f6d6beb677a367
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
@@ -37,4 +37,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break } } } } */





Re: [PATCH] [doc] middle-end/112296 - __builtin_constant_p and side-effects

2023-11-05 Thread Richard Biener
On Fri, 3 Nov 2023, Joseph Myers wrote:

> On Fri, 3 Nov 2023, Richard Biener wrote:
> 
> > The following tries to clarify the __builtin_constant_p documentation,
> > stating that the argument expression is not evaluated and side-effects
> > are discarded.  I'm struggling to find the correct terms matching
> > what the C language standard would call things so I'd appreciate
> > some help here.
> > 
> > OK for trunk?
> 
> OK.

Pushed.

> > Shall we diagnose arguments with side-effects?  It seems to me
> > such use is usually unintended?  I think rather than dropping
> 
> The traditional use is definitely in macros to choose between code for 
> constant arguments (evaluating them more than once) and maybe-out-of-line 
> code for non-constant arguments (evaluating them exactly once), in which 
> case having a side effect is definitely OK.

I was wondering about literally writing

 if (__builtin_constant_p (++x))
   {
...
   }
 else
   {
 ...
   }

which would have the surprising effects that a) it always evaluates
to false, b) the side-effect to 'x' is discarded.

> > side-effects as a side-effect of folding the frontend should
> > discard them at parsing time instead, no?
> 
> I suppose the original expression needs to remain around in some form 
> until the latest point at which optimizers might decide to evaluate 
> __builtin_constant_p to true.  Although cases with possible side effects 
> might well be optimized to false earlier; the interesting cases for 
> deciding later are e.g. __builtin_constant_p called on an argument to an 
> inline function (no side effects for __builtin_constant_p to discard, 
> whether or not there are side effects in the caller from evaluating the 
> expression passed to the function).

Yes, maybe we can improve here but as of now arguments with
side-effects will always result in a 'false' assessment as to
__builtin_constant_p, so the behavior is hardly useful apart from
having "correct" behavior for the traditional macro case.

Richard.


[PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD

2023-11-05 Thread Tamar Christina
Hi All,

This adds an implementation for conditional branch optab for AArch64.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] > 0)
break;
}
}

For 128-bit vectors we generate:

cmgtv1.4s, v1.4s, #0
umaxp   v1.4s, v1.4s, v1.4s
fmovx3, d1
cbnzx3, .L8

and of 64-bit vector we can omit the compression:

cmgtv1.2s, v1.2s, #0
fmovx2, d1
cbz x2, .L13

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (cbranch4): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd70a92924f62524c15
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3830,6 +3830,46 @@ (define_expand "vcond_mask_"
   DONE;
 })
 
+;; Patterns comparing two vectors and conditionally jump
+
+(define_expand "cbranch4"
+  [(set (pc)
+(if_then_else
+  (match_operator 0 "aarch64_equality_operator"
+[(match_operand:VDQ_I 1 "register_operand")
+ (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
+  (label_ref (match_operand 3 ""))
+  (pc)))]
+  "TARGET_SIMD"
+{
+  auto code = GET_CODE (operands[0]);
+  rtx tmp = operands[1];
+
+  /* If comparing against a non-zero vector we have to do a comparison first
+ so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (mode))
+emit_insn (gen_vec_cmp (tmp, operands[0], operands[1],
+   operands[2]));
+
+  /* For 64-bit vectors we need no reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (mode)))
+{
+  /* Always reduce using a V4SI.  */
+  rtx reduc = gen_lowpart (V4SImode, tmp);
+  rtx res = gen_reg_rtx (V4SImode);
+  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+  emit_move_insn (tmp, gen_lowpart (mode, res));
+}
+
+  rtx val = gen_reg_rtx (DImode);
+  emit_move_insn (val, gen_lowpart (DImode, tmp));
+
+  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  DONE;
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c 
b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
new file mode 100644
index 
..c0363c3787270507d7902bb2ac0e39faef63a852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+** ...
+** cmgtv[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] > 0)
+   break;
+}
+}
+
+/*
+** f2:
+** ...
+** cmgev[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] >= 0)
+   break;
+}
+}
+
+/*
+** f3:
+** ...
+** cmeqv[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] == 0)
+   break;
+}
+}
+
+/*
+** f4:
+** ...
+** cmtst   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] != 0)
+   break;
+}
+}
+
+/*
+** f5:
+** ...
+** cmltv[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] < 0)
+   break;
+}
+}
+
+/*
+** f6:
+** ...
+** cmlev[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+

[PATCH 18/21]AArch64: Add optimization for vector != cbranch fed into compare with 0 for Advanced SIMD

2023-11-05 Thread Tamar Christina
Hi All,

Advanced SIMD lacks a cmpeq for vectors, and unlike compare to 0 we can't
rewrite to a cmtst.

This operation is however fairly common, especially now that we support early
break vectorization.

As such this adds a pattern to recognize the negated any comparison and
transform it to an all.  i.e. any(~x) => all(x) and invert the branches.

For e.g.

void f1 (int x)
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] != x)
break;
}
}

We currently generate:

cmeqv31.4s, v30.4s, v29.4s
not v31.16b, v31.16b
umaxp   v31.4s, v31.4s, v31.4s
fmovx5, d31
cbnzx5, .L2

and after this patch:

cmeqv31.4s, v30.4s, v29.4s
uminp   v31.4s, v31.4s, v31.4s
fmovx5, d31
cbz x5, .L2

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*cbranchnev4si): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-early-break-cbranch_2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
cd5ec35c3f53028f14828bd70a92924f62524c15..b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3870,6 +3870,37 @@ (define_expand "cbranch4"
   DONE;
 })
 
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+(define_insn_and_split "*cbranchnev4si"
+  [(set (pc)
+(if_then_else
+  (ne (subreg:DI
+   (unspec:V4SI
+ [(not:V4SI (match_operand:V4SI 0 "register_operand" "w"))
+  (not:V4SI (match_dup 0))]
+   UNSPEC_UMAXV) 0)
+  (const_int 0))
+   (label_ref (match_operand 1 ""))
+   (pc)))
+(clobber (match_scratch:DI 2 "=w"))]
+  "TARGET_SIMD"
+  "#"
+  "&& true"
+  [(set (match_dup 2)
+   (unspec:V4SI [(match_dup 0) (match_dup 0)] UNSPEC_UMINV))
+   (set (pc)
+(if_then_else
+  (eq (subreg:DI (match_dup 2) 0)
+ (const_int 0))
+   (label_ref (match_dup 1))
+   (pc)))]
+{
+  if (can_create_pseudo_p ())
+operands[2] = gen_reg_rtx (V4SImode);
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c 
b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
new file mode 100644
index 
..e81027bb50138be627f4dfdffb1557893a5a7723
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+** ...
+   cmeqv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+   uminp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+   fmovx[0-9]+, d[0-9]+
+   cbz x[0-9]+, \.L[0-9]+
+** ...
+*/
+void f1 (int x)
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] != x)
+   break;
+}
+}




-- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
cd5ec35c3f53028f14828bd70a92924f62524c15..b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3870,6 +3870,37 @@ (define_expand "cbranch4"
   DONE;
 })
 
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+(define_insn_and_split "*cbranchnev4si"
+  [(set (pc)
+(if_then_else
+  (ne (subreg:DI
+   (unspec:V4SI
+ [(not:V4SI (match_operand:V4SI 0 "register_operand" "w"))
+  (not:V4SI (match_dup 0))]
+   UNSPEC_UMAXV) 0)
+  (const_int 0))
+   (label_ref (match_operand 1 ""))
+   (pc)))
+(clobber (match_scratch:DI 2 "=w"))]
+  "TARGET_SIMD"
+  "#"
+  "&& true"
+  [(set (match_dup 2)
+   (unspec:V4SI [(match_dup 0) (match_dup 0)] UNSPEC_UMINV))
+   (set (pc)
+(if_then_else
+  (eq (subreg:DI (match_dup 2) 0)
+ (const_int 0))
+   (label_ref (match_dup 1))
+   (pc)))]
+{
+  if (can_create_pseudo_p ())
+operands[2] = gen_reg_rtx (V4SImode);
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c 
b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
new file mode 100644
index 
..e81027bb50138be627f4dfdffb1557893a5a7723
--- /dev/null
+

[PATCH 21/21]Arm: Add MVE cbranch implementation

2023-11-05 Thread Tamar Christina
Hi All,

This adds an implementation for conditional branch optab for MVE.

Unfortunately MVE has rather limited operations on VPT.P0, we are missing the
ability to do P0 comparisons and logical OR on P0.

For that reason we can only support cbranch with 0, as for comparing to a 0
predicate we don't need to actually do a comparison, we only have to check that
any bit is set within P0.

Because we can only do P0 comparisons with 0, the costing of the comparison was
reduced in order for the compiler not to try to push 0 to a register thinking
it's too expensive.  For the cbranch implementation to be safe we must see the
constant 0 vector.

For the lack of logical OR on P0 we can't really work around.  This means MVE
can't support cases where the sizes of operands in the comparison don't match,
i.e. when one operand has been unpacked.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] > 0)
break;
}
}

For 128-bit vectors we generate:

vcmp.s32gt, q3, q1
vmrsr3, p0  @ movhi
cbnzr3, .L2

MVE does not have 64-bit vector comparisons, as such that is also not supported.

Bootstrapped arm-none-linux-gnueabihf and regtested with
-march=armv8.1-m.main+mve -mfpu=auto and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/arm.cc (arm_rtx_costs_internal): Update costs for pred 0
compares.
* config/arm/mve.md (cbranch4): New.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (vect_early_break): Add MVE.
* gcc.target/arm/mve/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
38f0839de1c75547c259ac3d655fcfc14e7208a2..15e65c15cb3cb6f70161787e84b255a24eb51e32
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -11883,6 +11883,15 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, 
enum rtx_code outer_code,
   || TARGET_HAVE_MVE)
  && simd_immediate_valid_for_move (x, mode, NULL, NULL))
*cost = COSTS_N_INSNS (1);
+  else if (TARGET_HAVE_MVE
+  && outer_code == COMPARE
+  && VALID_MVE_PRED_MODE (mode))
+   /* MVE allows very limited instructions on VPT.P0,  however comparisons
+  to 0 do not require us to materialze this constant or require a
+  predicate comparison as we can go through SImode.  For that reason
+  allow P0 CMP 0 as a cheap operation such that the 0 isn't forced to
+  registers as we can't compare two predicates.  */
+   *cost = COSTS_N_INSNS (1);
   else
*cost = COSTS_N_INSNS (4);
   return true;
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 
74909ce47e132c22a94f7d9cd3a0921b38e33051..95d40770ecc25f9eb251eba38306dd43cbebfb3f
 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -6880,6 +6880,21 @@ (define_expand "vcond_mask_"
   DONE;
 })
 
+(define_expand "cbranch4"
+  [(set (pc) (if_then_else
+ (match_operator 0 "expandable_comparison_operator"
+  [(match_operand:MVE_7 1 "register_operand")
+   (match_operand:MVE_7 2 "zero_operand")])
+ (label_ref (match_operand 3 "" ""))
+ (pc)))]
+  "TARGET_HAVE_MVE"
+{
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, operands[1]));
+  emit_jump_insn (gen_cbranchsi4 (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
 (define_expand "@arm_mve_reinterpret"
   [(set (match_operand:MVE_vecs 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c 
b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
new file mode 100644
index 
..c3b8506dca0b2b044e6869a6c8259d663c1ff930
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
@@ -0,0 +1,117 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+/*
+** f1:
+** ...
+** vcmp.s32gt, q[0-9]+, q[0-9]+
+** vmrsr[0-9]+, p0 @ movhi
+** cbnzr[0-9]+, \.L[0-9]+
+** ...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] > 0)
+   break;
+}
+}
+
+/*
+** f2:
+** ...
+** vcmp.s32ge, q[0-9]+, q[0-9]+
+** vmrsr[0-9]+, p0 @ movhi
+** cbnzr[0-9]+, \.L[0-9]+
+** ...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] >= 0)
+   break;
+}
+}
+
+/*
+** f3:
+** ...
+** vcmp.i32eq, q[0-9]+, q[0-9]+
+** vmrsr[0-9]+, p0 @ movhi
+** cbnzr[0-9]+, \.L[0-9]+
+** ..

[PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and Advanced SIMD

2023-11-05 Thread Tamar Christina
Hi All,

Advanced SIMD lacks flag setting vector comparisons which SVE adds.  Since 
machines
with SVE also support Advanced SIMD we can use the SVE comparisons to perform 
the
operation in cases where SVE codegen is allowed, but the vectorizer has decided
to generate Advanced SIMD because of loop costing.

e.g. for

void f1 (int x)
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] != x)
break;
}
}

We currently generate:

cmeqv31.4s, v31.4s, v28.4s
uminp   v31.4s, v31.4s, v31.4s
fmovx5, d31
cbz x5, .L2

and after this patch:

ptrue   p7.b, vl16
...
cmpne   p15.s, p7/z, z31.s, z28.s
b.any   .L2

Because we need to lift the predicate creation to outside of the loop we need to
expand the predicate early,  however in the cbranch expansion we don't see the
outer compare which we need to consume.

For this reason the expansion is two fold, when expanding the cbranch we emit an
SVE predicated comparison and later on during combine we match the SVE and NEON
comparison while also consuming the ptest.

Unfortunately *aarch64_pred_cmpne_neon_ptest is needed because
for some reason combine destroys the NOT and transforms it into a plus and -1.

For the straight SVE ones, we seem to fail to eliminate the ptest in these cases
but that's a separate optimization

Test show that I'm missing a few, but before I write the patterns for them, are
these OK?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (cbranch4): Update with SVE.
* config/aarch64/aarch64-sve.md
(*aarch64_pred_cmp_neon_ptest,
*aarch64_pred_cmpeq_neon_ptest,
*aarch64_pred_cmpne_neon_ptest): New.
(aarch64_ptest): Rename to...
(@aarch64_ptest): ... This.
* genemit.cc: Include rtx-vector-builder.h.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/vect-early-break-cbranch_1.c: New test.
* gcc.target/aarch64/sve/vect-early-break-cbranch_2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
c06dd4fd6f85e07f0d4a77992b2bc06f04a1935b..33799dc35a1b90dd60d7e487ec41c5d84fb215a5
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3883,31 +3883,58 @@ (define_expand "cbranch4"
   "TARGET_SIMD"
 {
   auto code = GET_CODE (operands[0]);
-  rtx tmp = operands[1];
-
-  /* If comparing against a non-zero vector we have to do a comparison first
- so we can have a != 0 comparison with the result.  */
-  if (operands[2] != CONST0_RTX (mode))
-emit_insn (gen_vec_cmp (tmp, operands[0], operands[1],
-   operands[2]));
-
-  /* For 64-bit vectors we need no reductions.  */
-  if (known_eq (128, GET_MODE_BITSIZE (mode)))
+  /* If SVE is available, lets borrow some instructions.  We will optimize
+ these further later in combine.  */
+  if (TARGET_SVE)
 {
-  /* Always reduce using a V4SI.  */
-  rtx reduc = gen_lowpart (V4SImode, tmp);
-  rtx res = gen_reg_rtx (V4SImode);
-  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
-  emit_move_insn (tmp, gen_lowpart (mode, res));
+  machine_mode full_mode = aarch64_full_sve_mode (mode).require ();
+  rtx in1 = lowpart_subreg (full_mode, operands[1], mode);
+  rtx in2 = lowpart_subreg (full_mode, operands[2], mode);
+
+  machine_mode pred_mode = aarch64_sve_pred_mode (full_mode);
+  rtx_vector_builder builder (VNx16BImode, 16, 2);
+  for (unsigned int i = 0; i < 16; ++i)
+   builder.quick_push (CONST1_RTX (BImode));
+  for (unsigned int i = 0; i < 16; ++i)
+   builder.quick_push (CONST0_RTX (BImode));
+  rtx ptrue = force_reg (VNx16BImode, builder.build ());
+  rtx cast_ptrue = gen_lowpart (pred_mode, ptrue);
+  rtx ptrue_flag = gen_int_mode (SVE_KNOWN_PTRUE, SImode);
+
+  rtx tmp = gen_reg_rtx (pred_mode);
+  aarch64_expand_sve_vec_cmp_int (tmp, reverse_condition (code), in1, in2);
+  emit_insn (gen_aarch64_ptest (pred_mode, ptrue, cast_ptrue, ptrue_flag, 
tmp));
+  operands[1] = gen_rtx_REG (CC_NZCmode, CC_REGNUM);
+  operands[2] = const0_rtx;
 }
+  else
+{
+  rtx tmp = operands[1];
 
-  rtx val = gen_reg_rtx (DImode);
-  emit_move_insn (val, gen_lowpart (DImode, tmp));
+  /* If comparing against a non-zero vector we have to do a comparison 
first
+so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (mode))
+   emit_insn (gen_vec_cmp (tmp, operands[0], operands[1],
+   operands[2]));
 
-  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
-  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
-  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
-  DONE

[PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation

2023-11-05 Thread Tamar Christina
Hi All,

This adds an implementation for conditional branch optab for AArch32.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] > 0)
break;
}
}

For 128-bit vectors we generate:

vcgt.s32q8, q9, #0
vpmax.u32   d7, d16, d17
vpmax.u32   d7, d7, d7
vmovr3, s14 @ int
cmp r3, #0

and of 64-bit vector we can omit one vpmax as we still need to compress to
32-bits.

Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/neon.md (cbranch4): New.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (vect_early_break): Add AArch32.
* gcc.target/arm/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 
d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc344d2243dcb63
 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -408,6 +408,45 @@ (define_insn "vec_extract"
   [(set_attr "type" "neon_store1_one_lane,neon_to_gp")]
 )
 
+;; Patterns comparing two vectors and conditionally jump.
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+;;
+;; However unlike the AArch64 version, we can't optimize this further as the
+;; chain is too long for combine due to these being unspecs so it doesn't fold
+;; the operation to something simpler.
+(define_expand "cbranch4"
+  [(set (pc) (if_then_else
+ (match_operator 0 "expandable_comparison_operator"
+  [(match_operand:VDQI 1 "register_operand")
+   (match_operand:VDQI 2 "zero_operand")])
+ (label_ref (match_operand 3 "" ""))
+ (pc)))]
+  "TARGET_NEON"
+{
+  rtx mask = operands[1];
+
+  /* For 128-bit vectors we need an additional reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (mode)))
+{
+  /* Always reduce using a V4SI.  */
+  mask = gen_reg_rtx (V2SImode);
+  rtx low = gen_reg_rtx (V2SImode);
+  rtx high = gen_reg_rtx (V2SImode);
+  emit_insn (gen_neon_vget_lowv4si (low, operands[1]));
+  emit_insn (gen_neon_vget_highv4si (high, operands[1]));
+  emit_insn (gen_neon_vpumaxv2si (mask, low, high));
+}
+
+  emit_insn (gen_neon_vpumaxv2si (mask, mask, mask));
+
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, mask));
+  emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; This pattern is renamed from "vec_extract" to
 ;; "neon_vec_extract" and this pattern is called
 ;; by define_expand in vec-common.md file.
diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c 
b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
new file mode 100644
index 
..2c05aa10d26ed4ac9785672e6e3b4355cef046dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
@@ -0,0 +1,136 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+/* f1:
+** ...
+** vcgt.s32q[0-9]+, q[0-9]+, #0
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vmovr[0-9]+, s[0-9]+@ int
+** cmp r[0-9]+, #0
+** bne \.L[0-9]+
+** ...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] > 0)
+   break;
+}
+}
+
+/*
+** f2:
+** ...
+** vcge.s32q[0-9]+, q[0-9]+, #0
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vmovr[0-9]+, s[0-9]+@ int
+** cmp r[0-9]+, #0
+** bne \.L[0-9]+
+** ...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] >= 0)
+   break;
+}
+}
+
+/*
+** f3:
+** ...
+** vceq.i32q[0-9]+, q[0-9]+, #0
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vmovr[0-9]+, s[0-9]+@ int
+** cmp r[0-9]+, #0
+** bne \.L[0-9]+
+** ...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] == 0)
+   break;
+}
+}
+
+/*
+** f4:
+** ...
+** vceq.i32q[0-9]+, q[0-9]+, #0
+** vmvnq[0-9]+, q[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vmovr[0-9]+, s[0-9]+@ int
+** cmp r[0-9]+, #0
+** bne \.L[0-9]+
+** ...

Re: [RFC] vect: disable multiple calls of poly simdclones

2023-11-05 Thread Richard Biener
On Fri, 3 Nov 2023, Andre Vieira (lists) wrote:

> Hi,
> 
> The current codegen code to support VF's that are multiples of a simdclone
> simdlen rely on BIT_FIELD_REF to create multiple input vectors.  This does not
> work for non-constant simdclones, so we should disable using such clones when
> the VF is a multiple of the non-constant simdlen until we change the codegen
> to support those.
> 
> Enabling SVE simdclone support will cause ICEs if the vectorizer decides to
> use a SVE simdclone with a VF that is larger than the simdlen. I'll be away
> for the next two weeks, so cant' really discuss this further.
> I initially tried to solve the problem, but the way
> vectorizable_simd_clone_call is structured doesn't make it easy to replace
> BIT_FIELD_REF with the poly-suitable solution right now of using
> unpack_{hi,lo}.

I think it should be straight-forward to use unpack_{even,odd} (it's
even/odd for VLA, right?  If lo/hi would be possible then doing
BIT_FIELD_REF would be, too?  Also you need to have multiple stages
of unpack/pack when the factor is more than 2).

There's plenty of time even during stage3 to address this.

At least your patch should have come with a testcase (or two).

Is there a bugreport tracking this issue?  It should affect GCN as well
I guess.

Richard.


> Unfortunately I only found this now as I was adding further
> tests for SVE :(
> 
> gcc/ChangeLog:
> 
>   * tree-vect-stmts.cc (vectorizable_simd_clone_call): Reject simdclones
>   with non-constant simdlen when VF is not exactly the same.


[PATCH] LoongArch: Disable relaxation if the assembler don't support conditional branch relaxation [PR112330]

2023-11-05 Thread Xi Ruoyao
As the commit message of r14-4674 has indicated, if the assembler does
not support conditional branch relaxation, a relocation overflow may
happen on conditional branches when relaxation is enabled because the
number of NOP instructions inserted by the assembler will be more than
the number estimated by GCC.

To work around this issue, disable relaxation by default if the
assembler is detected incapable to perform conditional branch relaxation
at GCC build time.  We also need to pass -mno-relax to the assembler to
really disable relaxation.  But, if the assembler does not support
-mrelax option at all, we should not pass -mno-relax to the assembler or
it will immediately error out.  Also handle this with the build time
assembler capability probing, and add a pair of options
-m[no-]pass-mrelax-to-as to allow using a different assembler from the
build-time one.

With this change, if GCC is built with GAS 2.41, relaxation will be
disabled by default.  So the default value of -mexplicit-relocs= is also
changed to 'always' if -mno-relax is specified or implied by the
build-time default, because using assembler macros for symbol addresses
produces no benefit when relaxation is disabled.

gcc/ChangeLog:

PR target/112330
* config/loongarch/genopts/loongarch.opt.in: Add
-m[no]-pass-relax-to-as.  Change the default of -m[no]-relax to
account conditional branch relaxation support status.
* config/loongarch/loongarch.opt: Regenerate.
* configure.ac (gcc_cv_as_loongarch_cond_branch_relax): Check if
the assembler supports conditional branch relaxation.
* configure: Regenerate.
* config.in: Regenerate.
* config/loongarch/loongarch-opts.h
(HAVE_AS_COND_BRANCH_RELAXATION): Define to 0 if not defined.
* config/loongarch/loongarch-driver.h (ASM_MRELAX_DEFAULT):
Define.
(ASM_MRELAX_SPEC): Define.
(ASM_SPEC): Use ASM_MRELAX_SPEC instead of "%{mno-relax}".
* config/loongarch/loongarch.cc: Take the setting of
-m[no-]relax into account when determining the default of
-mexplicit-relocs=.
* doc/invoke.texi: Document -m[no-]relax and
-m[no-]pass-mrelax-to-as for LoongArch.  Update the default
value of -mexplicit-relocs=.
---

Bootstrapped and regtested on loongarch64-linux-gnu twice: once with
Binutils 2.41, another with Binutils 2.41.50.20231105.  With Binutils
2.41.50.20231105 there is a regression: the compilation of
c-c++-common/asan/pr59063-2.c timeouts.  My diagnostic has shown that
the timeout was caused by the linker (it seemed running indefinitely),
so it's more likely a Binutils regression rather than GCC regression
and I'll leave this for Qinggang.

Ok for trunk?

 gcc/config.in |  6 +++
 gcc/config/loongarch/genopts/loongarch.opt.in |  6 ++-
 gcc/config/loongarch/loongarch-driver.h   | 16 +++-
 gcc/config/loongarch/loongarch-opts.h |  4 ++
 gcc/config/loongarch/loongarch.cc |  2 +-
 gcc/config/loongarch/loongarch.opt|  6 ++-
 gcc/configure | 39 ++-
 gcc/configure.ac  | 10 +
 gcc/doc/invoke.texi   | 36 +
 9 files changed, 111 insertions(+), 14 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index 03faee1c6ac..7728e53ca1f 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -386,6 +386,12 @@
 #endif
 
 
+/* Define if your assembler supports conditional branch relaxation. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_COND_BRANCH_RELAXATION
+#endif
+
+
 /* Define if your assembler supports the --debug-prefix-map option. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_DEBUG_PREFIX_MAP
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index e1fe0c7086e..158701d327a 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -223,10 +223,14 @@ Target Var(TARGET_DIRECT_EXTERN_ACCESS) Init(0)
 Avoid using the GOT to access external symbols.
 
 mrelax
-Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION)
+Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION && 
HAVE_AS_COND_BRANCH_RELAXATION)
 Take advantage of linker relaxations to reduce the number of instructions
 required to materialize symbol addresses.
 
+mpass-mrelax-to-as
+Target Var(loongarch_pass_mrelax_to_as) Init(HAVE_AS_MRELAX_OPTION)
+Pass -mrelax or -mno-relax option to the assembler.
+
 -param=loongarch-vect-unroll-limit=
 Target Joined UInteger Var(loongarch_vect_unroll_limit) Init(6) 
IntegerRange(1, 64) Param
 Used to limit unroll factor which indicates how much the autovectorizer may
diff --git a/gcc/config/loongarch/loongarch-driver.h 
b/gcc/config/loongarch/loongarch-driver.h
index d859afcc9fe..20d233cc938 100644
--- a/gcc/config/loongarch/loongarch-driver.h
+++ b/gcc/config/loongarch/loo

[PATCH] LoongArch: Optimize single-used address with -mexplicit-relocs=auto for fld/fst

2023-11-05 Thread Xi Ruoyao
fld and fst have same address mode as ld.w and st.w, so the same
optimization as r14-4851 should be applied for them too.

gcc/ChangeLog:

* config/loongarch/loongarch.md (LD_AT_LEAST_32_BIT): New mode
iterator.
(ST_ANY): New mode iterator.
(define_peephole2): Use LD_AT_LEAST_32_BIT instead of GPR and
ST_ANY instead of QHWD for applicable patterns.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 46 +--
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 4dd716e1941..9c247242215 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -400,6 +400,22 @@ (define_mode_iterator SPLITF
(DI "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
(TF "TARGET_64BIT && TARGET_DOUBLE_FLOAT")])
 
+;; A mode for anything with 32 bits or more, and able to be loaded with
+;; the same addressing mode as ld.w.
+(define_mode_iterator LD_AT_LEAST_32_BIT
+  [SI
+   (DI "TARGET_64BIT")
+   (SF "TARGET_SINGLE_FLOAT || TARGET_DOUBLE_FLOAT")
+   (DF "TARGET_DOUBLE_FLOAT")])
+
+;; A mode for anything able to be stored with the same addressing mode as
+;; st.w.
+(define_mode_iterator ST_ANY
+  [QI HI SI
+   (DI "TARGET_64BIT")
+   (SF "TARGET_SINGLE_FLOAT || TARGET_DOUBLE_FLOAT")
+   (DF "TARGET_DOUBLE_FLOAT")])
+
 ;; In GPR templates, a string like "mul." will expand to "mul.w" in the
 ;; 32-bit version and "mul.d" in the 64-bit version.
 (define_mode_attr d [(SI "w") (DI "d")])
@@ -3785,13 +3801,14 @@ (define_insn "loongarch_crcc_w__w"
 (define_peephole2
   [(set (match_operand:P 0 "register_operand")
(match_operand:P 1 "symbolic_pcrel_operand"))
-   (set (match_operand:GPR 2 "register_operand")
-   (mem:GPR (match_dup 0)))]
+   (set (match_operand:LD_AT_LEAST_32_BIT 2 "register_operand")
+   (mem:LD_AT_LEAST_32_BIT (match_dup 0)))]
   "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
&& (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
&& (peep2_reg_dead_p (2, operands[0]) \
|| REGNO (operands[0]) == REGNO (operands[2]))"
-  [(set (match_dup 2) (mem:GPR (lo_sum:P (match_dup 0) (match_dup 1]
+  [(set (match_dup 2)
+   (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 0) (match_dup 1]
   {
 emit_insn (gen_pcalau12i_gr (operands[0], operands[1]));
   })
@@ -3799,14 +3816,15 @@ (define_peephole2
 (define_peephole2
   [(set (match_operand:P 0 "register_operand")
(match_operand:P 1 "symbolic_pcrel_operand"))
-   (set (match_operand:GPR 2 "register_operand")
-   (mem:GPR (plus (match_dup 0)
-  (match_operand 3 "const_int_operand"]
+   (set (match_operand:LD_AT_LEAST_32_BIT 2 "register_operand")
+   (mem:LD_AT_LEAST_32_BIT (plus (match_dup 0)
+   (match_operand 3 "const_int_operand"]
   "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
&& (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
&& (peep2_reg_dead_p (2, operands[0]) \
|| REGNO (operands[0]) == REGNO (operands[2]))"
-  [(set (match_dup 2) (mem:GPR (lo_sum:P (match_dup 0) (match_dup 1]
+  [(set (match_dup 2)
+   (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 0) (match_dup 1]
   {
 operands[1] = plus_constant (Pmode, operands[1], INTVAL (operands[3]));
 emit_insn (gen_pcalau12i_gr (operands[0], operands[1]));
@@ -3850,13 +3868,13 @@ (define_peephole2
 (define_peephole2
   [(set (match_operand:P 0 "register_operand")
(match_operand:P 1 "symbolic_pcrel_operand"))
-   (set (mem:QHWD (match_dup 0))
-   (match_operand:QHWD 2 "register_operand"))]
+   (set (mem:ST_ANY (match_dup 0))
+   (match_operand:ST_ANY 2 "register_operand"))]
   "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
&& (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
&& (peep2_reg_dead_p (2, operands[0])) \
&& REGNO (operands[0]) != REGNO (operands[2])"
-  [(set (mem:QHWD (lo_sum:P (match_dup 0) (match_dup 1))) (match_dup 2))]
+  [(set (mem:ST_ANY (lo_sum:P (match_dup 0) (match_dup 1))) (match_dup 2))]
   {
 emit_insn (gen_pcalau12i_gr (operands[0], operands[1]));
   })
@@ -3864,14 +3882,14 @@ (define_peephole2
 (define_peephole2
   [(set (match_operand:P 0 "register_operand")
(match_operand:P 1 "symbolic_pcrel_operand"))
-   (set (mem:QHWD (plus (match_dup 0)
-   (match_operand 3 "const_int_operand")))
-   (match_operand:QHWD 2 "register_operand"))]
+   (set (mem:ST_ANY (plus (match_dup 0)
+ (match_operand 3 "const_int_operand")))
+   (match_operand:ST_ANY 2 "register_operand"))]
   "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
&& (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
&& (peep2_reg_dead_p (2, operands[0])) \
&& REGNO (operands[0]) != REGNO (operands[2])"
-  [(set (mem:QHWD (lo_sum:P (match_dup 0) (match_dup 1))) (ma

Re: [PATCH] testsuite, Darwin: Add support for Mach-O function body scans.

2023-11-05 Thread Iain Sandoe
Hi Richard,

> On 5 Nov 2023, at 12:11, Richard Sandiford  wrote:
> 
> Iain Sandoe  writes:

>>> On 26 Oct 2023, at 21:00, Iain Sandoe  wrote:
>> 
 On 26 Oct 2023, at 20:49, Richard Sandiford 
>> wrote:
 
 Iain Sandoe  writes:
> This was written before Thomas' modification to the ELF-handling to allow
> a config-based change for target details.  I did consider updating this
> to try and use that scheme, but I think that it would sit a little
> awkwardly, since there are some differences in the start-up scanning for
> Mach-O.  I would say that in all probability we could improve things but
> I'd like to put this forward as a well-tested initial implementation.
 
 Sorry, I would prefer to extend the existing function instead.
 E.g. there's already some divergence between the Mach-O version
 and the default version, in that the Mach-O version doesn't print
 verbose messages.  I also don't think that the current default code
 is so watertight that it'll never need to be updated in future.
>>> 
>>> Fair enough, will explore what can be done (as I recall last I looked the
>>> primary difference was in the initial start-up scan).
>> 
>> I’ve done this as attached.
>> 
>> For the record, when doing it, it gave rise to the same misgivings that led
>> to the separate implementation before.
>> 
>> * as we add formats and uncover asm oddities, they all need to be handled
>>   in one set of code, IMO it could be come quite convoluted.
>> 
>> * now making a change to the MACH-O code, means I have to check I did not
>>   inadvertently break ELF (and likewise, in theory, an ELF change should 
>> check
>>   MACH-O, but many folks do/can not do that).
>> 
>> Maybe there’s some half-way-house where code can usefully be shared without
>> those down-sides.
>> 
>> Anyway, to make progress, is the revised version OK for trunk? (tested on
>> aarch64-linux and aarch64-darwin).
> 
> Sorry for the slow reply.  I was hoping we'd be able to share a bit more
> code than that, and avoid an isMACHO toggle.  Does something like the
> attached adaption of your patch work?  Only spot-checked on
> aarch64-linux-gnu so far.
> 
> (The patch tries to avoid capturing the user label prefix, hopefully
> avoiding the needsULP thing.)

Yes, this works for me too for Arm64 Darwin (and probably is fine for other
Darwin archs in case we implement body tests there).  If we decide to emit
some comment-based markers to delineat functions without unwind data,
we can just amend the start and end.

thanks,
Iain
(doing some wider testing, but for now the only mach-o cases are in the
 arm64 code, so the fact that those passed so far is pretty good indication).

-

As an aside what’s the intention for cases like this?

.data
foo:
. ….
.size foo, .-foo



> 
> Thanks,
> Richard
> 
> 
> diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
> index 5df80325dff..2434550f0c3 100644
> --- a/gcc/testsuite/lib/scanasm.exp
> +++ b/gcc/testsuite/lib/scanasm.exp
> @@ -785,23 +785,34 @@ proc configure_check-function-bodies { config } {
> 
> # Regexp for the start of a function definition (name in \1).
> if { [istarget nvptx*-*-*] } {
> - set up_config(start) {^// BEGIN(?: GLOBAL|) FUNCTION DEF: 
> ([a-zA-Z_]\S+)$}
> + set up_config(start) {
> + {^// BEGIN(?: GLOBAL|) FUNCTION DEF: ([a-zA-Z_]\S+)$}
> + }
> +} elseif { [istarget *-*-darwin*] } {
> + set up_config(start) {
> + {^_([a-zA-Z_]\S+):$}
> + {^LFB[0-9]+:}
> + }
> } else {
> - set up_config(start) {^([a-zA-Z_]\S+):$}
> + set up_config(start) {{^([a-zA-Z_]\S+):$}}
> }
> 
> # Regexp for the end of a function definition.
> if { [istarget nvptx*-*-*] } {
>   set up_config(end) {^\}$}
> +} elseif { [istarget *-*-darwin*] } {
> + set up_config(end) {^LFE[0-9]+}
> } else {
>   set up_config(end) {^\s*\.size}
> }
> - 
> +
> # Regexp for lines that aren't interesting.
> if { [istarget nvptx*-*-*] } {
>   # Skip lines beginning with '//' comments ('-fverbose-asm', for
>   # example).
>   set up_config(fluff) {^\s*(?://)}
> +} elseif { [istarget *-*-darwin*] } {
> + set up_config(fluff) {^\s*(?:\.|//|@)|^L[0-9ACESV]}
> } else {
>   # Skip lines beginning with labels ('.L[...]:') or other directives
>   # ('.align', '.cfi_startproc', '.quad [...]', '.text', etc.), '//' or
> @@ -833,9 +844,19 @@ proc parse_function_bodies { config filename result } {
> set fd [open $filename r]
> set in_function 0
> while { [gets $fd line] >= 0 } {
> - if { [regexp $up_config(start) $line dummy function_name] } {
> - set in_function 1
> - set function_body ""
> + if { $in_function == 0 } {
> + if { [regexp [lindex $up_config(start) 0] \
> +  $line dummy function_name] } {
> + set in_function 1
> + set functi