[PATCH] testsuite work-around compound-assignment-1.c C++ failures on various targets [PR111377]

2023-09-12 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 11, 2023 at 11:11:30PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Mon, Sep 11, 2023 at 07:27:57PM +0200, Benjamin Priour via Gcc-patches 
> wrote:
> > Thanks for the report,
> > 
> > After investigation it seems the location of the new dejagnu directive for
> > C++ differs depending on the configuration.
> > The expected warning is still emitted, but its location differ slightly.
> > I expect it to be not an issue per se of the analyzer, but a divergence in
> > the FE between the two configurations.
> 
> I think the divergence is whether called_by_test_5b returns the struct
> in registers or in memory.  If in memory (like in the x86_64 -m32 case), we 
> have
>   [compound-assignment-1.c:71:21] D.3191 = called_by_test_5b (); [return slot 
> optimization]
>   [compound-assignment-1.c:71:21 discrim 1] D.3191 ={v} {CLOBBER(eol)};
>   [compound-assignment-1.c:72:1] return;
> in the IL, while if in registers (like x86_64 -m64 case), just
>   [compound-assignment-1.c:71:21] D.3591 = called_by_test_5b ();
>   [compound-assignment-1.c:72:1] return;
> 
> If you just want to avoid the differences, putting } on the same line as the
> call might be a usable workaround for that.

Here is the workaround in patch form.  Tested on x86_64-linux -m32/-m64, ok
for trunk?

2023-09-12  Jakub Jelinek  

PR testsuite/111377
* c-c++-common/analyzer/compound-assignment-1.c (test_5b): Move
closing } to the same line as the call to work-around differences in
diagnostics line.

--- gcc/testsuite/c-c++-common/analyzer/compound-assignment-1.c.jj  
2023-09-11 11:05:47.523727789 +0200
+++ gcc/testsuite/c-c++-common/analyzer/compound-assignment-1.c 2023-09-12 
08:58:52.854231161 +0200
@@ -68,5 +68,8 @@ called_by_test_5b (void)
 
 void test_5b (void)
 {
-  called_by_test_5b ();
-} /* { dg-warning "leak of '.ptr_wrapper::ptr'" "" { target c++ } } 
*/
+  called_by_test_5b (); }
+/* { dg-warning "leak of '.ptr_wrapper::ptr'" "" { target c++ } .-1 
} */
+/* The closing } above is intentionally on the same line as the call, because
+   otherwise the exact line of the diagnostics depends on whether the
+   called_by_test_5b () call satisfies aggregate_value_p or not.  */


Jakub



Re: [Patch] OpenMP (C only): omp allocate - extend parsing support, improve diagnostic

2023-09-12 Thread Tobias Burnus

Hi Jakub,

thanks for the further suggestions; updated patch attached.

On 11.09.23 15:34, Jakub Jelinek wrote:

On Mon, Sep 11, 2023 at 03:21:54PM +0200, Tobias Burnus wrote:

+  if (TREE_STATIC (var))
+{
+  if (allocator == NULL_TREE && allocator_loc == UNKNOWN_LOCATION)
+error_at (loc, "% clause required for "
+   "static variable %qD", var);
+  else if (allocator
+   && (tree_int_cst_sgn (allocator) != 1
+   || tree_to_shwi (allocator) > 8))

Has anything checked that in this case allocator is actually INTEGER_CST
which fits into shwi?  Otherwise tree_to_shwi will ICE.
Consider say allocator argument of
329857234985743598347598437598347594835743895743wb
or (((unsigned __int128) 0x123456789abcdef0) << 64)
Either tree_fits_shwi_p (allocator) check would do it, or perhaps
else if (allocator
   && TREE_CODE (allocator) == INTEGER_CST
   && wi::to_widest (allocator) > 0
   && wi::to_widest (allocator) <= 8)
?

I have now used the latter. Using _BitInt and __int128 fails because the
type-check fails. I have not added them to the testsuite as not all
targets support the two. Using a casted -1 did ICE before I used to
wi::to_widest.

+  error_at (OMP_CLAUSE_LOCATION (nl),
+"allocator variable %qD must be declared before %qD",
+allocator, var);

...

+ error_at (EXPR_LOCATION (*l),
+   "allocator variable %qD, used in the "
+   "% directive for %qD, must not be "
+   "modified between declaration of %qD and its "
+   "% directive",
+   allocator, var, var);

BTW, it doesn't necessarily have to be just the simple case which you catch
here, namely that allocator is a VAR_DECL defined after var in current
scope.

...

I bet we can't catch everything, but perhaps e.g. doing the first
diagnostics from within walk_tree might be better.


Done now. What's not caught is, e.g., a value change by calling a
function which modifies its parameter:

omp_allocator_t a = ...; int v; foo(a); #pragma omp allocate(v) allocator(a)

as the current check is only whether 'a' is declared before 'v' or
whether 'a' is assigned to between v's declaration and the pragma.

Any additional comments or suggestions?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP (C only): omp allocate - extend parsing support, improve diagnostic

The 'allocate' directive can be used for both stack and static variables.
While the parser in C and C++ was pre-existing, it missed several
diagnostics, which this commit adds - for now only for C.

While the "sorry, unimplemented" for static variables is still issues
during parsing, the sorry for stack variables is now issued in the
middle end, preparing for the actual implementation. (Again: only for C.)

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_construct): Move call to
	c_parser_omp_allocate to ...
	(c_parser_pragma): ... here.
	(c_parser_omp_allocate): Avoid ICE is allocator could not be
	parsed; set 'omp allocate' attribute for stack/automatic variables
	and only reject static variables; add several additional
	restriction checks.
	* c-tree.h (c_mark_decl_jump_unsafe_in_current_scope): New prototype.
	* c-decl.cc (decl_jump_unsafe): Return true for omp-allocated decls.
	(c_mark_decl_jump_unsafe_in_current_scope): New.
	(warn_about_goto, c_check_switch_jump_warnings): Add error for
	omp-allocated decls.

gcc/ChangeLog:

	* gimplify.cc (gimplify_bind_expr): Check for
	insertion after variable cleanup.  Convert 'omp allocate'
	var-decl attribute to GOMP_alloc/GOMP_free calls.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/allocate-5.c: Fix testcase; make some
	dg-messages for 'sorry' as c++, only.
	* c-c++-common/gomp/directive-1.c: Make a 'sorry' c++ only.
	* c-c++-common/gomp/allocate-9.c: New test.
	* c-c++-common/gomp/allocate-11.c: New test.
	* c-c++-common/gomp/allocate-12.c: New test.
	* c-c++-common/gomp/allocate-14.c: New test.
	* c-c++-common/gomp/allocate-15.c: New test.
	* c-c++-common/gomp/allocate-16.c: New test.

 gcc/c/c-decl.cc   |  26 ++
 gcc/c/c-parser.cc | 115 +++---
 gcc/c/c-tree.h|   1 +
 gcc/gimplify.cc   |  40 +
 gcc/testsuite/c-c++-common/gomp/allocate-11.c |  40 +
 gcc/testsuite/c-c++-common/gomp/allocate-12.c |  49 +++
 gcc/testsuite/c-c++-common/gomp/allocate-14.c |  26 ++
 gcc/testsuite/c-c++-common/gomp/allocate-15.c |  28 +++
 gcc/testsuite/c-c++-common/gomp/allocate-16.c |  38 +
 gcc/testsuite/c-c++-common/gomp

Re: [Linaro-TCWG-CI] gcc patch #75674: FAIL: 68 regressions

2023-09-12 Thread Maxim Kuvyrkov via Gcc-patches
Hi Everyone,

Normally, notifications from Linaro TCWG precommit CI are sent only to patch 
author and patch submitter.  In this case the sender was rewritten to "Benjamin 
Priour via Gcc-patches ", which was detected by 
Patchwork [1] as patch submitter.

Hi Mark,

Is "From:" re-write on gcc-patches@ mailing list a side-effect of [2]?  I see 
that some, but not all messages to gcc-patches@ have their "From:" re-written.

Also, do you know if re-write of "From:" on gcc-patches@ is expected?

[1] https://patchwork.sourceware.org/project/gcc/list/
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=29713

Thanks!

--
Maxim Kuvyrkov
https://www.linaro.org

> On Sep 12, 2023, at 02:58, ci_not...@linaro.org wrote:
> 
> Dear contributor, our automatic CI has detected problems related to your 
> patch(es).  Please find some details below.  If you have any questions, 
> please follow up on linaro-toolch...@lists.linaro.org mailing list, Libera's 
> #linaro-tcwg channel, or ping your favourite Linaro toolchain developer on 
> the usual project channel.
> 
> In CI config tcwg_gcc_check/master-aarch64 after:
> 
>  | gcc patch https://patchwork.sourceware.org/patch/75674
>  | Author: benjamin priour 
>  | Date:   Mon Sep 11 19:44:34 2023 +0200
>  | 
>  | analyzer: Move gcc.dg/analyzer tests to c-c++-common (3) [PR96395]
>  | 
>  | Hi,
>  | 
>  | Patch below is mostly done, just have to check the formatting.
>  | Althought, I'd like your feedback on how to manage named_constants
>  | from enum in C++.
>  | ... 21 lines of the commit log omitted.
>  | ... applied on top of baseline commit:
>  | 048927ed856 i386: Handle CONST_WIDE_INT in output_pic_addr_const [PR111340]
> 
> FAIL: 68 regressions
> 
> regressions.sum:
> === g++ tests ===
> 
> Running g++:g++.dg/analyzer/analyzer.exp ...
> FAIL: c-c++-common/analyzer/call-summaries-2.c -std=c++14 (test for excess 
> errors)
> FAIL: c-c++-common/analyzer/call-summaries-2.c -std=c++17 (test for excess 
> errors)
> FAIL: c-c++-common/analyzer/call-summaries-2.c -std=c++20 (test for excess 
> errors)
> FAIL: c-c++-common/analyzer/call-summaries-2.c -std=c++98 (test for excess 
> errors)
> FAIL: c-c++-common/analyzer/memcpy-1.c -std=c++14  (test for warnings, line 
> 25)
> FAIL: c-c++-common/analyzer/memcpy-1.c -std=c++14  (test for warnings, line 
> 48)
> FAIL: c-c++-common/analyzer/memcpy-1.c -std=c++14 (test for excess errors)
> ... and 63 more entries
> 
> You can find the failure logs in *.log.1.xz files in
> - 
> https://ci.linaro.org/job/tcwg_gcc_check--master-aarch64-precommit/2244/artifact/artifacts/artifacts.precommit/00-sumfiles/
>  .
> The full lists of regressions and progressions are in
> - 
> https://ci.linaro.org/job/tcwg_gcc_check--master-aarch64-precommit/2244/artifact/artifacts/artifacts.precommit/notify/
>  .
> The list of [ignored] baseline and flaky failures are in
> - 
> https://ci.linaro.org/job/tcwg_gcc_check--master-aarch64-precommit/2244/artifact/artifacts/artifacts.precommit/sumfiles/xfails.xfail
>  .
> 
> 
> 
> -8<--8<--8<--
> The information below can be used to reproduce a debug environment:
> 
> Current build   : 
> https://ci.linaro.org/job/tcwg_gcc_check--master-aarch64-precommit/2244/artifact/artifacts
> Reference build : 
> https://ci.linaro.org/job/tcwg_gcc_check--master-aarch64-build/927/artifact/artifacts




Re: [Patch] OpenMP (C only): omp allocate - extend parsing support, improve diagnostic

2023-09-12 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 12, 2023 at 09:04:16AM +0200, Tobias Burnus wrote:
> Done now. What's not caught is, e.g., a value change by calling a
> function which modifies its parameter:
> 
> omp_allocator_t a = ...; int v; foo(a); #pragma omp allocate(v) allocator(a)
> 
> as the current check is only whether 'a' is declared before 'v' or
> whether 'a' is assigned to between v's declaration and the pragma.
> 
> Any additional comments or suggestions?

As I said, we can't catch all the mistakes, the unfortunate thing is that
the syntax allows them.  I'll try to make omp::decl attribute working soon
and that will make that problem less severe when using that syntax.

Jakub



[PING ^0] [PATCH] rs6000: unnecessary clear after vctzlsbb in vec_first_match_or_eos_index

2023-09-12 Thread Ajit Agarwal via Gcc-patches


Ping!

 Forwarded Message 
Subject: [PATCH] rs6000: unnecessary clear after vctzlsbb in 
vec_first_match_or_eos_index
Date: Thu, 31 Aug 2023 16:14:46 +0530
From: Ajit Agarwal via Gcc-patches 
Reply-To: Ajit Agarwal 
To: gcc-patches 
CC: Peter Bergner , Segher Boessenkool 



This patch removes zero extension from vctzlsbb as it already zero extends.
Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

rs6000: unnecessary clear after vctzlsbb in vec_first_match_or_eos_index

For rs6000 target we dont need zero_extend after vctzlsbb as vctzlsbb
already zero extend.

2023-08-31  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/vsx.md: Add new pattern.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/altivec-19.C: New testcase.
---
 gcc/config/rs6000/vsx.md  | 17 ++---
 gcc/testsuite/g++.target/powerpc/altivec-19.C | 11 +++
 2 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/altivec-19.C

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 19abfeb565a..09d21a6d00a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5846,11 +5846,22 @@
   [(set_attr "type" "vecsimple")])
 
 ;; Vector Count Trailing Zero Least-Significant Bits Byte
-(define_insn "vctzlsbb_"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+(define_insn "vctzlsbbzext_"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (zero_extend:DI
(unspec:SI
 [(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
-UNSPEC_VCTZLSBB))]
+UNSPEC_VCTZLSBB)))]
+  "TARGET_P9_VECTOR"
+  "vctzlsbb %0,%1"
+  [(set_attr "type" "vecsimple")])
+
+;; Vector Count Trailing Zero Least-Significant Bits Byte
+(define_insn "vctzlsbb_"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(unspec:SI
+ [(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
+ UNSPEC_VCTZLSBB))]
   "TARGET_P9_VECTOR"
   "vctzlsbb %0,%1"
   [(set_attr "type" "vecsimple")])
diff --git a/gcc/testsuite/g++.target/powerpc/altivec-19.C 
b/gcc/testsuite/g++.target/powerpc/altivec-19.C
new file mode 100644
index 000..2d630b2fc1f
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/altivec-19.C
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2 " } */ 
+
+#include 
+
+unsigned int foo (vector unsigned char a, vector unsigned char b) {
+  return vec_first_match_or_eos_index (a, b);
+}
+/* { dg-final { scan-assembler-not "rldicl" } } */
-- 
2.39.3



[PING ^0][PATCH 3/4] Improve functionality of ree pass.

2023-09-12 Thread Ajit Agarwal via Gcc-patches


Ping!

 Forwarded Message 
Subject: [PATCH 3/4] Improve functionality of ree pass.
Date: Mon, 4 Sep 2023 13:27:42 +0530
From: Ajit Agarwal via Gcc-patches 
Reply-To: Ajit Agarwal 
To: Jeff Law , gcc-patches 
CC: Peter Bergner , Segher Boessenkool 



Hello Jeff:

This patch eliminates redundant zero and sign extension with ree pass for rs6000
target.

Bootstrapped and regtested for powerpc64-linux-gnu.

Thanks & Regards
Ajit


ree: Improve ree pass

For rs6000 target we see redundant zero and sign extension and ree pass
s improved to eliminate such redundant zero and sign extension. Support of
zero_extend/sign_extend/AND.

2023-09-04  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (eliminate_across_bbs_p): Add checks to enable extension
elimination across and within basic blocks.
(def_arith_p): New function to check definition has arithmetic
operation.
(combine_set_extension): Modification to incorporate AND
and current zero_extend and sign_extend instruction.
(merge_def_and_ext): Add calls to eliminate_across_bbs_p and
zero_extend sign_extend and AND instruction.
(rtx_is_zext_p): New function.
(feasible_cfg): New function.
* rtl.h (reg_used_set_between_p): Add prototype.
* rtlanal.cc (reg_used_set_between_p): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim.C: New testcase.
* g++.target/powerpc/zext-elim-1.C: New testcase.
* g++.target/powerpc/zext-elim-2.C: New testcase.
* g++.target/powerpc/sext-elim.C: New testcase.
---
 gcc/ree.cc| 487 --
 gcc/rtl.h |   1 +
 gcc/rtlanal.cc|  15 +
 gcc/testsuite/g++.target/powerpc/sext-elim.C  |  17 +
 .../g++.target/powerpc/zext-elim-1.C  |  19 +
 .../g++.target/powerpc/zext-elim-2.C  |  11 +
 gcc/testsuite/g++.target/powerpc/zext-elim.C  |  30 ++
 7 files changed, 534 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/sext-elim.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-2.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..931b9b08821 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -253,6 +253,77 @@ struct ext_cand
 
 static int max_insn_uid;
 
+/* Return TRUE if OP can be considered a zero extension from one or
+   more sub-word modes to larger modes up to a full word.
+
+   For example (and:DI (reg) (const_int X))
+
+   Depending on the value of X could be considered a zero extension
+   from QI, HI and SI to larger modes up to DImode.  */
+
+static bool
+rtx_is_zext_p (rtx insn)
+{
+  if (GET_CODE (insn) == AND)
+{
+  rtx set = XEXP (insn, 0);
+  if (REG_P (set))
+   {
+ rtx src = XEXP (insn, 1);
+ machine_mode m_mode = GET_MODE (set);
+
+ if (CONST_INT_P (src)
+ && (INTVAL (src) == 1
+ || (m_mode == QImode && INTVAL (src) == 0x7)
+ || (m_mode == QImode && INTVAL (src) == 0x007F)
+ || (m_mode == HImode && INTVAL (src) == 0x7FFF)
+ || (m_mode == SImode && INTVAL (src) == 0x007F)))
+   return true;
+
+   }
+  else
+   return false;
+}
+
+  return false;
+}
+/* Return TRUE if OP can be considered a zero extension from one or
+   more sub-word modes to larger modes up to a full word.
+
+   For example (and:DI (reg) (const_int X))
+
+   Depending on the value of X could be considered a zero extension
+   from QI, HI and SI to larger modes up to DImode.  */
+
+static bool
+rtx_is_zext_p (rtx_insn *insn)
+{
+  rtx body = single_set (insn);
+
+  if (GET_CODE (body) == SET && GET_CODE (SET_SRC (body)) == AND)
+   {
+ rtx set = XEXP (SET_SRC (body), 0);
+
+ if (REG_P (set) && GET_MODE (SET_DEST (body)) == GET_MODE (set))
+   {
+ rtx src = XEXP (SET_SRC (body), 1);
+ machine_mode m_mode = GET_MODE (set);
+
+ if (CONST_INT_P (src)
+ && (INTVAL (src) == 1
+ || (m_mode == QImode && INTVAL (src) == 0x7)
+ || (m_mode == QImode && INTVAL (src) == 0x007F)
+ || (m_mode == HImode && INTVAL (src) == 0x7FFF)
+ || (m_mode == SImode && INTVAL (src) == 0x007F)))
+   return true;
+   }
+ else
+  return false;
+   }
+
+   return false;
+}
+
 /* Update or remove REG_EQUAL or REG_EQUIV notes for INSN.  */
 
 static bool
@@ -319,7 +390,7 @@ combine_set_extension (ext_cand *cand, rtx_insn *curr_insn, 
rtx *orig_set)
 {
   rtx orig_src = SET_SRC (*orig_set);
   machine_mode orig_mode = GET_MODE (SET_DEST (*orig_set));
-  rtx new_set;
+  rtx new_set = NULL_RTX;
   rtx cand_pat = single_set (cand->insn);
 
   /* I

[PATCH v2] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-09-12 Thread Pan Li via Gcc-patches
From: Pan Li 

Update in v2:

* Add get_non_overloaded_instance for function instance.
* Fix overload check for policy function.
* Enrich the test cases check.

Original log:

This patch would like add the framework to support the RVV overloaded
intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.

However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
with below steps.

* Register overloaded functions.
* Add function_resolver for overloaded function resolving.
* Add resolve API for function shape with default implementation.
* Implement HOOK for navigating the overloaded API to non-overloaded API.

We validated this framework by the vmv_v intrinsic API(s), and we will
add more intrins API support in the underlying patches.

gcc/ChangeLog:

* config/riscv/riscv-c.cc
(riscv_resolve_overloaded_builtin): New function for the hook.
(riscv_register_pragmas): Register the hook
* config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
* config/riscv/riscv-vector-builtins-shapes.cc (build_one):
Register overloaded function.
(struct overloaded_base): New struct for overloaded shape.
(struct non_overloaded_base): New struct for non overloaded shape.
(struct move_def): Inherit overloaded shape.
* config/riscv/riscv-vector-builtins.cc
(function_instance::get_non_overloaded_instance): New API impl.
(function_builder::add_function): Add overloaded arg.
(function_resolver::function_resolver): New constructor.
(function_builder::add_overloaded_function): New API impl.
(function_resolver::resolve): Ditto.
(function_resolver::lookup): Ditto.
(function_resolver::get_sub_code): Ditto.
(resolve_overloaded_builtin): New function impl.
* config/riscv/riscv-vector-builtins.h:
(class function_resolver): New class.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-c.cc   |  36 
 gcc/config/riscv/riscv-protos.h   |   1 +
 .../riscv/riscv-vector-builtins-shapes.cc |  20 ++-
 gcc/config/riscv/riscv-vector-builtins.cc | 155 +-
 gcc/config/riscv/riscv-vector-builtins.h  |  35 +++-
 .../riscv/rvv/base/overloaded_rv32_vmv_v.c|   8 +
 .../riscv/rvv/base/overloaded_rv64_vmv_v.c|   8 +
 .../riscv/rvv/base/overloaded_vmv_v.h |  27 +++
 8 files changed, 287 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_vmv_v.h

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 283052ae313..060edd3129d 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -220,11 +220,47 @@ riscv_check_builtin_call (location_t loc, vec 
arg_loc, tree fndecl,
   gcc_unreachable ();
 }
 
+/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  */
+static tree
+riscv_resolve_overloaded_builtin (unsigned int uncast_location, tree fndecl,
+ void *uncast_arglist)
+{
+  vec empty = {};
+  location_t loc = (location_t) uncast_location;
+  vec *arglist = (vec *) uncast_arglist;
+  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned int subcode = code >> RISCV_BUILTIN_SHIFT;
+  tree new_fndecl = NULL_TREE;
+
+  if (!arglist)
+arglist = ∅
+
+  switch (code & RISCV_BUILTIN_CLASS)
+{
+case RISCV_BUILTIN_GENERAL:
+  break;
+case RISCV_BUILTIN_VECTOR:
+  new_fndecl = riscv_vector::resolve_overloaded_builtin (loc, subcode,
+arglist);
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  if (new_fndecl == NULL_TREE)
+return new_fndecl;
+
+  return build_function_call_vec (loc, vNULL, new_fndecl, arglist, NULL,
+ fndecl);
+}
+
 /* Implement REGISTER_TARGET_PRAGMAS.  */
 
 void
 riscv_register_pragmas (void)
 {
+  targetm.resolve_overloaded_builtin = riscv_resolve_overloaded_builtin;
   targetm.check_builtin_call = riscv_check_builtin_call;
+
   c_register_pragma ("riscv", "intrinsic", riscv_pragma_intrinsic);
 }
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6dbf6b9f943..5d2492dd031 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -381,6 +381,7 @@ gimple *gimple_fold_builtin (unsigned int, 
gimple_stmt_iterator *, gcall *);
 rtx expand_builtin (unsigned int, tree, rtx);
 bool check_builtin_call (location_t, vec, unsigned int,
   tree, unsigned int, tree *);
+

[PING^5] PATCH v5 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

2023-09-12 Thread Ajit Agarwal via Gcc-patches


Ping!

 Forwarded Message 
Subject: [PING^4] PATCH v5 4/4] ree: Improve ree pass for rs6000 target using 
defined ABI interfaces.
Date: Mon, 21 Aug 2023 12:16:44 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Jeff Law , Richard Biener 
, Segher Boessenkool , 
Peter Bergner , rashmi.srid...@ibm.com


Ping!

 Forwarded Message 
Subject: [PING^3] PATCH v5 4/4] ree: Improve ree pass for rs6000 target using 
defined ABI interfaces.
Date: Tue, 1 Aug 2023 13:48:58 +0530
From: Ajit Agarwal 
To: gcc-patches , Jeff Law , 
Richard Biener , Peter Bergner 
, Segher Boessenkool , 
rashmi.srid...@ibm.com

Ping!


 Forwarded Message 
Subject: [PING^2] PATCH v5 4/4] ree: Improve ree pass for rs6000 target using 
defined ABI interfaces.
Date: Tue, 18 Jul 2023 13:28:08 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Jeff Law , Richard Biener 
, Segher Boessenkool , 
Peter Bergner 


Ping^2.

Please review.

Thanks & Regards
Ajit


This new version of patch 4 use improve ree pass for rs6000 target using 
defined ABI interfaces.
Bootstrapped and regtested on power64-linux-gnu.

Review comments incorporated.

Thanks & Regards
Ajit

Improve ree pass for rs6000 target using defined abi interfaces

For rs6000 target we see redundant zero and sign
extension and done to improve ree pass to eliminate
such redundant zero and sign extension using defined
ABI interfaces.

2023-06-01  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (combine_reaching_defs): Use of  zero_extend and sign_extend
defined abi interfaces.
(add_removable_extension): Use of defined abi interfaces for no
reaching defs.
(abi_extension_candidate_return_reg_p): New function.
(abi_extension_candidate_p): New function.
(abi_extension_candidate_argno_p): New function.
(abi_handle_regs_without_defs_p): New function.
(abi_target_promote_function_mode): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim-3.C
---
 gcc/ree.cc| 199 +++---
 .../g++.target/powerpc/zext-elim-3.C  |  13 ++
 2 files changed, 183 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..2025a7c43da 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
 if (REGNO (DF_REF_REG (def)) == REGNO (reg))
   break;
 
-  gcc_assert (def != NULL);
+  if (def == NULL)
+return NULL;
 
   ref_chain = DF_REF_CHAIN (def);
 
@@ -750,6 +751,120 @@ get_extended_src_reg (rtx src)
   return src;
 }
 
+/* Return TRUE if target mode is equal to source mode of zero_extend
+   or sign_extend otherwise false.  */
+
+static bool
+abi_target_promote_function_mode (machine_mode mode)
+{
+  int unsignedp;
+  machine_mode tgt_mode =
+targetm.calls.promote_function_mode (NULL_TREE, mode, &unsignedp,
+NULL_TREE, 1);
+
+  if (tgt_mode == mode)
+return true;
+  else
+return false;
+}
+
+/* Return TRUE if the candidate insn is zero extend and regno is
+   an return  registers.  */
+
+static bool
+abi_extension_candidate_return_reg_p (rtx_insn *insn, int regno)
+{
+  rtx set = single_set (insn);
+
+  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
+return false;
+
+  if (FUNCTION_VALUE_REGNO_P (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if reg source operand of zero_extend is argument registers
+   and not return registers and source and destination operand are same
+   and mode of source and destination operand are not same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+
+  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
+return false;
+
+  machine_mode ext_dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set),0);
+
+  bool copy_needed
+= (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0)));
+
+  if (!copy_needed && ext_dst_mode != GET_MODE (orig_src)
+  && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  && !abi_extension_candidate_return_reg_p (insn, REGNO (orig_src)))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the candidate insn is zero extend and regno is
+   an argument registers.  */
+
+static bool
+abi_extension_candidate_argno_p (rtx_code code, int regno)
+{
+  if (code !=  ZERO_EXTEND)
+return false;
+
+  if (FUNCTION_ARG_REGNO_P (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the candidate insn doesn't have defs and have
+ * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class.  */
+
+static bool
+abi_handle_regs_without_defs_p (rtx_insn *insn)
+{
+  if (side_effects_p (PATTERN (insn)))
+return false;
+
+  struct df_link *uses
+= get_uses (insn, SET_DEST (PATTERN (insn)));
+
+  if (!uses)
+return false;
+
+  for (df_link *use = uses; use; use = use

[PATCH 0/2] Optimize is_member_object_pointer trait performance

2023-09-12 Thread Ken Matsui via Gcc-patches
This patch series optimizes is_member_object_pointer trait
performance. The first patch implements __is_member_object_pointer
built-in trait. The second patch optimizes is_member_object_pointer
trait performance by using __is_member_object_pointer built-in trait
if available.

The performance improvement is shown in the following benchmark:

https://github.com/ken-matsui/gsoc23/blob/main/is_member_object_pointer_v.md#tue-sep-12-122433-am-pdt-2023

Time: -48.6199% +/- 2.7422%
Peak Memory Usage: -39.3192% +/- 0.00457374%
Total Memory Usage: -41.8478% +/- 0%

Ken Matsui (2):
  c++: Implement __is_member_object_pointer built-in trait
  libstdc++: Optimize is_member_object_pointer trait performance

 gcc/cp/constraint.cc  |  3 ++
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  4 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 ++
 .../g++.dg/ext/is_member_object_pointer.C | 30 +++
 libstdc++-v3/include/std/type_traits  | 18 ++-
 6 files changed, 58 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_member_object_pointer.C

-- 
2.42.0



[PATCH 1/2] c++: Implement __is_member_object_pointer built-in trait

2023-09-12 Thread Ken Matsui via Gcc-patches
This patch implements built-in trait for std::is_member_object_pointer.

gcc/cp/ChangeLog:

* cp-trait.def (IS_MEMBER_OBJECT_POINTER): Define
__is_member_object_pointer.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_MEMBER_OBJECT_POINTER.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__is_member_object_pointer.
* g++.dg/ext/is_member_object_pointer.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc  |  3 ++
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  4 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 ++
 .../g++.dg/ext/is_member_object_pointer.C | 30 +++
 5 files changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_member_object_pointer.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index c9e4e7043cd..e2d84aef5d9 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3757,6 +3757,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_TRIVIALLY_COPYABLE:
   inform (loc, "  %qT is not trivially copyable", t1);
   break;
+case CPTK_IS_MEMBER_OBJECT_POINTER:
+  inform (loc, "  %qT is not a member object pointer", t1);
+  break;
 case CPTK_IS_ASSIGNABLE:
   inform (loc, "  %qT is not assignable from %qT", t1, t2);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 8b7fece0cc8..af876b35178 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -69,6 +69,7 @@ DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
 DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
 DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
 DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
+DEFTRAIT_EXPR (IS_MEMBER_OBJECT_POINTER, "__is_member_object_pointer", 1)
 DEFTRAIT_EXPR (IS_NOTHROW_ASSIGNABLE, "__is_nothrow_assignable", 2)
 DEFTRAIT_EXPR (IS_NOTHROW_CONSTRUCTIBLE, "__is_nothrow_constructible", -1)
 DEFTRAIT_EXPR (IS_NOTHROW_CONVERTIBLE, "__is_nothrow_convertible", 2)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 0f7f4e87ae4..f0769e5f3f8 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12169,6 +12169,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_TRIVIAL:
   return trivial_type_p (type1);
 
+case CPTK_IS_MEMBER_OBJECT_POINTER:
+  return TYPE_PTRMEM_P (type1) && ! TYPE_PTRMEMFUNC_P (type1);
+
 case CPTK_IS_TRIVIALLY_ASSIGNABLE:
   return is_trivially_xible (MODIFY_EXPR, type1, type2);
 
@@ -12359,6 +12362,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_ENUM:
 case CPTK_IS_UNION:
 case CPTK_IS_SAME:
+case CPTK_IS_MEMBER_OBJECT_POINTER:
   break;
 
 case CPTK_IS_LAYOUT_COMPATIBLE:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index f343e153e56..0338f744259 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -146,3 +146,6 @@
 #if !__has_builtin (__remove_cvref)
 # error "__has_builtin (__remove_cvref) failed"
 #endif
+#if !__has_builtin (__is_member_object_pointer)
+# error "__has_builtin (__is_member_object_pointer) failed"
+#endif
diff --git a/gcc/testsuite/g++.dg/ext/is_member_object_pointer.C 
b/gcc/testsuite/g++.dg/ext/is_member_object_pointer.C
new file mode 100644
index 000..835e48c8f8e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_member_object_pointer.C
@@ -0,0 +1,30 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+
+#define SA_TEST_NON_VOLATILE(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT)
+
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
+
+// Positive tests.
+SA_TEST_CATEGORY(__is_member_object_pointer, int (ClassType::*), true);
+SA_TEST_CATEGORY(__is_member_object_pointer, ClassType (ClassType::*), true);
+
+// Negative tests.
+SA_TEST_NON_VOLATILE(__is_member_object_pointer, int (ClassType::*) (int), 
false);
+SA_TEST_NON_VOLATILE(__is_member_object_pointer, int (ClassType::*) (float, 
...), false);
+SA_TEST_NON_VOLATILE(__is_member_object_pointer, ClassType (ClassType::*) 
(ClassType), false);
+SA_TEST_NON_VOLATILE(__is_member_object_pointer, float (ClassType::*) (int, 
float, int[], int&), false);
+
+// Sanity check.
+SA_TEST_CATEGORY(__is_member_object_pointer, ClassType, false);
-- 
2.42.0



[PATCH 2/2] libstdc++: Optimize is_member_object_pointer trait performance

2023-09-12 Thread Ken Matsui via Gcc-patches
This patch optimizes the performance of the is_member_object_pointer trait
by dispatching to the new __is_member_object_pointer built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_member_object_pointer): Use
__is_member_object_pointer built-in trait.
(is_member_object_pointer_v): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 677cd934b94..7839ebebc3d 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -567,6 +567,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct is_rvalue_reference<_Tp&&>
 : public true_type { };
 
+  /// is_member_object_pointer
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_member_object_pointer)
+  template
+struct is_member_object_pointer
+: public __bool_constant<__is_member_object_pointer(_Tp)>
+{ };
+#else
   template
 struct __is_member_object_pointer_helper
 : public false_type { };
@@ -575,11 +582,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct __is_member_object_pointer_helper<_Tp _Cp::*>
 : public __not_>::type { };
 
-  /// is_member_object_pointer
+
   template
 struct is_member_object_pointer
 : public __is_member_object_pointer_helper<__remove_cv_t<_Tp>>::type
 { };
+#endif
 
   template
 struct __is_member_function_pointer_helper
@@ -3186,9 +3194,17 @@ template 
   inline constexpr bool is_rvalue_reference_v = false;
 template 
   inline constexpr bool is_rvalue_reference_v<_Tp&&> = true;
+
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_member_object_pointer)
+template 
+  inline constexpr bool is_member_object_pointer_v =
+__is_member_object_pointer(_Tp);
+#else
 template 
   inline constexpr bool is_member_object_pointer_v =
 is_member_object_pointer<_Tp>::value;
+#endif
+
 template 
   inline constexpr bool is_member_function_pointer_v =
 is_member_function_pointer<_Tp>::value;
-- 
2.42.0



[PING^3] [PATCH v8] tree-ssa-sink: Improve code sinking pass.

2023-09-12 Thread Ajit Agarwal via Gcc-patches



Ping!
 Forwarded Message 
Subject: [PING^2] [PATCH v8] tree-ssa-sink: Improve code sinking pass.
Date: Mon, 21 Aug 2023 12:14:03 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Richard Biener , Jeff Law 
, Segher Boessenkool , Peter 
Bergner , rashmi.srid...@ibm.com

Ping!


 Forwarded Message 
Subject: [PING^1] [PATCH v8] tree-ssa-sink: Improve code sinking pass.
Date: Tue, 1 Aug 2023 13:47:10 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Richard Biener , Jeff Law 
, Peter Bergner , Segher 
Boessenkool , rashmi.srid...@ibm.com

Ping! 


 Forwarded Message 
Subject: [PATCH v8] tree-ssa-sink: Improve code sinking pass.
Date: Tue, 18 Jul 2023 19:03:37 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Richard Biener , Jeff Law 
, Segher Boessenkool , Peter 
Bergner 

Hello All:

This patch improves code sinking pass to sink statements before call to reduce
register pressure.
Review comments are incorporated.

For example :

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}

Code Sinking does the following:

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  
  if (a != 5)
{
  l = a + b + c + d +e + f; 
  bar();
  j = l;
}
}

Bootstrapped regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

tree-ssa-sink: Improve code sinking pass

Currently, code sinking will sink code after function calls.  This increases
register pressure for callee-saved registers.  The following patch improves
code sinking by placing the sunk code before calls in the use block or in
the immediate dominator of the use blocks.

2023-07-18  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location): Move statements before
calls.
(def_use_same_block): New function.
(select_best_block): Add heuristics to select the best blocks in the
immediate post dominator.

gcc/testsuite/ChangeLog:

PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
* gcc.dg/tree-ssa/ssa-sink-21.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c | 15 ++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 19 +++
 gcc/tree-ssa-sink.cc| 59 -
 3 files changed, 67 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index b1ba7a2ad6c..e7190323abe 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -173,7 +173,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool 
*debug_stmts)
 
 /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
tree, return the best basic block between them (inclusive) to place
-   statements.
+   statements. The best basic block should be an immediate dominator of
+   best basic block if the use stmt is after the call.
 
We want the most control dependent block in the shallowest loop nest.
 
@@ -190,11 +191,22 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
 static basic_block
 select_best_block (basic_block early_bb,
   basic_block late_bb,
-  gimple *stmt)
+  gimple *stmt,
+  gimple *use)
 {
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
   int threshold;
+  /* Get the sinking threshold.  If the statement to be moved has memory
+ operands, then increase the threshold by 7% as those are even more
+

Re: [PATCH v2] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-09-12 Thread juzhe.zh...@rivai.ai
I think it's better to move 'get_non_overloaded_instance' into function_base.

+  /* To avoid API conflicting, we use void return type and void argument
+ for the overloaded function register, like aarch64-sve.  */

Plz rewrite the comments, don't mention aarch64 sve.

Could you run your rvv intrinsic api ci with this patch?
I am worrying that the resolve stuff will destroy the existing APi support.




juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-12 15:20
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic
From: Pan Li 
 
Update in v2:
 
* Add get_non_overloaded_instance for function instance.
* Fix overload check for policy function.
* Enrich the test cases check.
 
Original log:
 
This patch would like add the framework to support the RVV overloaded
intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.
 
However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
with below steps.
 
* Register overloaded functions.
* Add function_resolver for overloaded function resolving.
* Add resolve API for function shape with default implementation.
* Implement HOOK for navigating the overloaded API to non-overloaded API.
 
We validated this framework by the vmv_v intrinsic API(s), and we will
add more intrins API support in the underlying patches.
 
gcc/ChangeLog:
 
* config/riscv/riscv-c.cc
(riscv_resolve_overloaded_builtin): New function for the hook.
(riscv_register_pragmas): Register the hook
* config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
* config/riscv/riscv-vector-builtins-shapes.cc (build_one):
Register overloaded function.
(struct overloaded_base): New struct for overloaded shape.
(struct non_overloaded_base): New struct for non overloaded shape.
(struct move_def): Inherit overloaded shape.
* config/riscv/riscv-vector-builtins.cc
(function_instance::get_non_overloaded_instance): New API impl.
(function_builder::add_function): Add overloaded arg.
(function_resolver::function_resolver): New constructor.
(function_builder::add_overloaded_function): New API impl.
(function_resolver::resolve): Ditto.
(function_resolver::lookup): Ditto.
(function_resolver::get_sub_code): Ditto.
(resolve_overloaded_builtin): New function impl.
* config/riscv/riscv-vector-builtins.h:
(class function_resolver): New class.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-c.cc   |  36 
gcc/config/riscv/riscv-protos.h   |   1 +
.../riscv/riscv-vector-builtins-shapes.cc |  20 ++-
gcc/config/riscv/riscv-vector-builtins.cc | 155 +-
gcc/config/riscv/riscv-vector-builtins.h  |  35 +++-
.../riscv/rvv/base/overloaded_rv32_vmv_v.c|   8 +
.../riscv/rvv/base/overloaded_rv64_vmv_v.c|   8 +
.../riscv/rvv/base/overloaded_vmv_v.h |  27 +++
8 files changed, 287 insertions(+), 3 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_vmv_v.h
 
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 283052ae313..060edd3129d 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -220,11 +220,47 @@ riscv_check_builtin_call (location_t loc, vec 
arg_loc, tree fndecl,
   gcc_unreachable ();
}
+/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  */
+static tree
+riscv_resolve_overloaded_builtin (unsigned int uncast_location, tree fndecl,
+   void *uncast_arglist)
+{
+  vec empty = {};
+  location_t loc = (location_t) uncast_location;
+  vec *arglist = (vec *) uncast_arglist;
+  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned int subcode = code >> RISCV_BUILTIN_SHIFT;
+  tree new_fndecl = NULL_TREE;
+
+  if (!arglist)
+arglist = ∅
+
+  switch (code & RISCV_BUILTIN_CLASS)
+{
+case RISCV_BUILTIN_GENERAL:
+  break;
+case RISCV_BUILTIN_VECTOR:
+  new_fndecl = riscv_vector::resolve_overloaded_builtin (loc, subcode,
+  arglist);
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  if (new_fndecl == NULL_TREE)
+return new_fndecl;
+
+  return build_function_call_vec (loc, vNULL, new_fndecl, arglist, NULL,
+   fndecl);
+}
+
/* Implement REGISTER_TARGET_PRAGMAS.  */
void
riscv_register_pragmas (void)
{
+  targetm.resolve_overloaded_builtin = riscv_resolve_overloaded_builtin;
   targetm.check_builtin_call = riscv_check_builtin_call;
+
   c_register_pragma ("riscv", "intrinsic", riscv_pragma_intrinsic);
}
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6dbf6b9f943..5d2492dd031 100644
--- a/gcc/config/ri

RE: [PATCH v2] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-09-12 Thread Li, Pan2 via Gcc-patches
>I think it's better to move 'get_non_overloaded_instance' into function_base.
Sure.
> Plz rewrite the comments, don't mention aarch64 sve.
Sure

>Could you run your rvv intrinsic api ci with this patch?
>I am worrying that the resolve stuff will destroy the existing APi support.

This patch only enable the resolving for vmv_v, the test cases ensure the 
correctness for
both the exiting API and overloaded API of vmv_v.

Will send the v3 for this change.

Pan


From: juzhe.zh...@rivai.ai 
Sent: Tuesday, September 12, 2023 3:47 PM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v2] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic

I think it's better to move 'get_non_overloaded_instance' into function_base.

+  /* To avoid API conflicting, we use void return type and void argument
+ for the overloaded function register, like aarch64-sve.  */

Plz rewrite the comments, don't mention aarch64 sve.

Could you run your rvv intrinsic api ci with this patch?
I am worrying that the resolve stuff will destroy the existing APi support.



juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-09-12 15:20
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v2] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic
From: Pan Li mailto:pan2...@intel.com>>

Update in v2:

* Add get_non_overloaded_instance for function instance.
* Fix overload check for policy function.
* Enrich the test cases check.

Original log:

This patch would like add the framework to support the RVV overloaded
intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.

However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
with below steps.

* Register overloaded functions.
* Add function_resolver for overloaded function resolving.
* Add resolve API for function shape with default implementation.
* Implement HOOK for navigating the overloaded API to non-overloaded API.

We validated this framework by the vmv_v intrinsic API(s), and we will
add more intrins API support in the underlying patches.

gcc/ChangeLog:

* config/riscv/riscv-c.cc
(riscv_resolve_overloaded_builtin): New function for the hook.
(riscv_register_pragmas): Register the hook
* config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
* config/riscv/riscv-vector-builtins-shapes.cc (build_one):
Register overloaded function.
(struct overloaded_base): New struct for overloaded shape.
(struct non_overloaded_base): New struct for non overloaded shape.
(struct move_def): Inherit overloaded shape.
* config/riscv/riscv-vector-builtins.cc
(function_instance::get_non_overloaded_instance): New API impl.
(function_builder::add_function): Add overloaded arg.
(function_resolver::function_resolver): New constructor.
(function_builder::add_overloaded_function): New API impl.
(function_resolver::resolve): Ditto.
(function_resolver::lookup): Ditto.
(function_resolver::get_sub_code): Ditto.
(resolve_overloaded_builtin): New function impl.
* config/riscv/riscv-vector-builtins.h:
(class function_resolver): New class.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/riscv-c.cc   |  36 
gcc/config/riscv/riscv-protos.h   |   1 +
.../riscv/riscv-vector-builtins-shapes.cc |  20 ++-
gcc/config/riscv/riscv-vector-builtins.cc | 155 +-
gcc/config/riscv/riscv-vector-builtins.h  |  35 +++-
.../riscv/rvv/base/overloaded_rv32_vmv_v.c|   8 +
.../riscv/rvv/base/overloaded_rv64_vmv_v.c|   8 +
.../riscv/rvv/base/overloaded_vmv_v.h |  27 +++
8 files changed, 287 insertions(+), 3 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_vmv_v.h

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 283052ae313..060edd3129d 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -220,11 +220,47 @@ riscv_check_builtin_call (location_t loc, vec 
arg_loc, tree fndecl,
   gcc_unreachable ();
}
+/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  */
+static tree
+riscv_resolve_overloaded_builtin (unsigned int uncast_location, tree fndecl,
+   void *uncast_arglist)
+{
+  vec empty = {};
+  location_t loc = (location_t) uncast_location;
+  vec *arglist = (vec *) uncast_arglist;
+  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+  un

Re: [PATCH 2/2] libstdc++: Add dg-require-thread-fence in several tests

2023-09-12 Thread Christophe Lyon via Gcc-patches
On Mon, 11 Sept 2023 at 18:11, Jonathan Wakely  wrote:

> On Mon, 11 Sept 2023 at 16:40, Christophe Lyon
>  wrote:
> >
> >
> >
> > On Mon, 11 Sept 2023 at 17:22, Jonathan Wakely 
> wrote:
> >>
> >> On Mon, 11 Sept 2023 at 14:57, Christophe Lyon
> >>  wrote:
> >> >
> >> >
> >> >
> >> > On Mon, 11 Sept 2023 at 15:12, Jonathan Wakely 
> wrote:
> >> >>
> >> >> On Mon, 11 Sept 2023 at 13:36, Christophe Lyon
> >> >>  wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Mon, 11 Sept 2023 at 12:59, Jonathan Wakely 
> wrote:
> >> >> >>
> >> >> >> On Sun, 10 Sept 2023 at 20:31, Christophe Lyon
> >> >> >>  wrote:
> >> >> >> >
> >> >> >> > Some targets like arm-eabi with newlib and default settings
> rely on
> >> >> >> > __sync_synchronize() to ensure synchronization.  Newlib does not
> >> >> >> > implement it by default, to make users aware they have to take
> special
> >> >> >> > care.
> >> >> >> >
> >> >> >> > This makes a few tests fail to link.
> >> >> >>
> >> >> >> Does this mean those features are unusable on the target, or just
> that
> >> >> >> users need to provide their own __sync_synchronize to use them?
> >> >> >
> >> >> >
> >> >> > IIUC the user is expected to provide them.
> >> >> > Looks like we discussed this in the past :-)
> >> >> > In
> https://gcc.gnu.org/legacy-ml/gcc-patches/2016-10/msg01632.html,
> >> >> > see the pointer to Ramana's comment:
> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02751.html
> >> >>
> >> >> Oh yes, thanks for the reminder!
> >> >>
> >> >> >
> >> >> > The default arch for arm-eabi is armv4t which is very old.
> >> >> > When running the testsuite with something more recent (either as
> default by configuring GCC --with-arch=XXX or by forcing -march/-mcpu via
> dejagnu's target-board), the compiler generates barrier instructions and
> there are no such errors.
> >> >>
> >> >> Ah yes, that's fine then.
> >> >>
> >> >> > For instance, here is a log with the defaults:
> >> >> >
> https://git.linaro.org/toolchain/ci/base-artifacts/tcwg_gnu_embed_check_gcc/master-arm_eabi.git/tree/00-sumfiles?h=linaro-local/ci/tcwg_gnu_embed_check_gcc/master-arm_eabi
> >> >> > and a log when we target cortex-m0 which is still a very small cpu
> but has barriers:
> >> >> >
> https://git.linaro.org/toolchain/ci/base-artifacts/tcwg_gnu_embed_check_gcc/master-thumb_m0_eabi.git/tree/00-sumfiles?h=linaro-local/ci/tcwg_gnu_embed_check_gcc/master-thumb_m0_eabi
> >> >> >
> >> >> > I somehow wanted to get rid of such errors with the default
> configuration
> >> >>
> >> >> Yep, that makes sense, and we'll still be testing them for newer
> >> >> arches on the target, so it's not completely disabling those parts of
> >> >> the testsuite.
> >> >>
> >> >> But I'm still curious why some of those tests need this change. I
> >> >> think the ones I noted below are probably failing for some other
> >> >> reasons.
> >> >>
> >> > Just looked at  23_containers/span/back_assert_neg.cc, the linker
> says it needs
> >> > arm-eabi/libstdc++-v3/src/.libs/libstdc++.a(debug.o) to resolve
> >> > ./back_assert_neg-back_assert_neg.o (std::__glibcxx_assert_fail(char
> const*, int, char const*, char const*))
> >> > and indeed debug.o has a reference to __sync_synchronize
> >>
> >> Aha, that's just because I put __glibcxx_assert_fail in debug.o, but
> >> there are no dependencies on anything else in that file, including the
> >> _M_detach member function that uses atomics.
> >
> > indeed
> >
> >
> >>
> >> This would also be solved by -Wl,--gc-sections :-)
> >
> > :-)
> >
> >>
> >> I think it would be better to move __glibcxx_assert_fail to a new
> >> file, so that it doesn't make every assertion unnecessarily depend on
> >> __sync_synchronize. I'll do that now.
> >
> > Sounds like a good idea, thanks.
>
> Done now at r14-3846-g4a2766ed00a479
> >
> >>
> >> We could also make the atomics in debug.o conditional, so that debug
> >> mode doesn't depend on __sync_synchronize for single-threaded targets.
> >> Does the arm4t arch have pthreads support in newlib?  I didn't bother
> >
> > No ( grep _GLIBCXX_HAS_GTHREADS
> $objdir/arm-eabi/libstdc++-v3/include/arm-eabi/bits/c++config returns:
> > /* #undef _GLIBCXX_HAS_GTHREADS */
> >
> >> making the use of atomics conditional, because performance is not
> >> really a priority for debug mode bookkeeping. But the problem here
> >> isn't just a slight performance overhead of atomics, it's that they
> >> aren't even supported for arm4t.
> >
> > OK thanks.
> >
> > So finally, this uncovered at least a "bug" that  __glibcxx_assert_fail
> should be in a dedicated object file :-)
> >
> > I'll revisit my patch once you have moved __glibcxx_assert_fail
>
> That's done (at r14-3845-gc7db9000fa7cac) and there should be no more
> __sync_synchronize from src/c++11/debug.o at all now (at
> r14-3846-g4a2766ed00a479). With that second change, it would have been
> OK for __glibcxx_assert_fail to stay in that file, but it's not really
> related so it's probably better for it to be in a separate file
> anyw

Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

> +max_number_of_live_regs (const basic_block bb,
> +  const hash_map &live_ranges,
> +  unsigned int max_point, machine_mode biggest_mode,
> +  int lmul)
> +{
> +  unsigned int max_nregs = 0;
> +  unsigned int i;
> +  unsigned int live_point = 0;
> +  auto_vec live_vars_vec;
> +  live_vars_vec.safe_grow (max_point + 1, true);
> +  for (i = 0; i < live_vars_vec.length (); ++i)
> +live_vars_vec[i] = 0;
> +  for (hash_map::iterator iter = live_ranges.begin ();
> +   iter != live_ranges.end (); ++iter)
> +{
> +  tree var = (*iter).first;
> +  pair live_range = (*iter).second;
> +  for (i = live_range.first; i <= live_range.second; i++)
> + {
> +   machine_mode mode = TYPE_MODE (TREE_TYPE (var));
> +   unsigned int nregs
> + = compute_nregs_for_mode (mode, biggest_mode, lmul);
> +   live_vars_vec[i] += nregs;
> +   if (live_vars_vec[i] > max_nregs)
> + max_nregs = live_vars_vec[i];
> + }
> +}

My concern is that we have O(nm) here, where n = number of live_ranges
and m = size of live range.  In large basic blocks (think calculix of
SPECfp 2006 which can reach up to 2000 instructions IIRC) this might
become prohibitive.

I'm going to do a quick benchmark with calculix and report back.  If
there is no noticable difference we can ditch my idea.

For short live ranges (like < 10) the O(nm) could be better.  As of now,
we still calculate the nregs n*m times, though.  I have something like
the following in mind (it is definitely not shorter, though):

  struct range {
  unsigned int pt;
  bool start;
  unsigned int nregs;
  };

  auto_vec ranges (2 * live_ranges.elements ());
  for (hash_map::iterator iter = live_ranges.begin ();
   iter != live_ranges.end (); ++iter)
{
  tree var = (*iter).first;
  machine_mode mode = TYPE_MODE (TREE_TYPE (var));
  unsigned int nregs
  = compute_nregs_for_mode (mode, biggest_mode, lmul);
  ranges.quick_push ({(*iter).second.first, true, nregs});
  ranges.quick_push ({(*iter).second.second, false, nregs});
}

  ranges.qsort ([] (const void *a, const void *b) -> int {
unsigned int aa = ((const range *)a)->pt;
unsigned int bb = ((const range *)b)->pt;
if (aa < bb)
  return -1;
if (aa == bb)
  return 0;
return 1;
});

  unsigned int cur = 0;
  max_nregs = ranges[0].nregs;

  for (auto r : ranges)
{
  if (r.start)
cur += r.nregs;
  else
cur -= r.nregs;
  max_nregs = MAX (max_nregs, cur);
}

> +  for (i = 0; i < cfun->gimple_df->ssa_names->length (); i++)
> +{
> +  tree t = ssa_name (i);
> +  if (!t)
> +   continue;

Could likely be replaced by

  tree t;
  FOR_EACH_SSA_NAME (i, t, cfun)

> +static void
> +update_local_live_ranges (
> +  vec_info *vinfo,
> +  hash_map> &program_points_per_bb,
> +  hash_map> &live_ranges_per_bb)
> +{

I just realized (sorry) that this is "nested" a bit far.  Can we still
have e.g. 

> +  if (loop_vec_info loop_vinfo = dyn_cast (vinfo))
> +{

this,

> +   if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
> +   != undef_vec_info_type)

this,

> +   if (live_range)
> + {

and this just "continue"?

Apart from that, LGTM.

Regards
 Robin



Re: [PATCH] Improve rewrite_to_defined_overflow for lhs already the correct type

2023-09-12 Thread Richard Biener via Gcc-patches
On Sun, Sep 3, 2023 at 6:19 PM Andrew Pinski via Gcc-patches
 wrote:
>
> This improves rewrite_to_defined_overflow slightly if we already
> have an unsigned type. The only place where this seems to show up
> is ifcombine. It removes one extra statement which gets added and
> then later on removed.

What specific case is that?  It sounds like we call the function when
it isn't needed?  I also think that refactoring to a special case when
the LHS type already is OK will result in better code in the end.

Richard.

> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
> PR tree-optimization/111276
> * gimple-fold.cc (rewrite_to_defined_overflow): Don't
> add a new lhs if we already the unsigned type.
> ---
>  gcc/gimple-fold.cc | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index fd01810581a..2fcafeada37 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -8721,10 +8721,19 @@ rewrite_to_defined_overflow (gimple *stmt, bool 
> in_place /* = false */)
> op = gimple_convert (&stmts, type, op);
> gimple_set_op (stmt, i, op);
>}
> -  gimple_assign_set_lhs (stmt, make_ssa_name (type, stmt));
> +  bool needs_cast_back = false;
> +  if (!useless_type_conversion_p (type, TREE_TYPE (lhs)))
> +{
> +  gimple_assign_set_lhs (stmt, make_ssa_name (type, stmt));
> +  needs_cast_back = true;
> +}
> +
>if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR)
>  gimple_assign_set_rhs_code (stmt, PLUS_EXPR);
> -  gimple_set_modified (stmt, true);
> +
> +  if (needs_cast_back || stmts)
> +gimple_set_modified (stmt, true);
> +
>if (in_place)
>  {
>gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> @@ -8734,6 +8743,10 @@ rewrite_to_defined_overflow (gimple *stmt, bool 
> in_place /* = false */)
>  }
>else
>  gimple_seq_add_stmt (&stmts, stmt);
> +
> +  if (!needs_cast_back)
> +return stmts;
> +
>gimple *cvt = gimple_build_assign (lhs, NOP_EXPR, gimple_assign_lhs 
> (stmt));
>if (in_place)
>  {
> --
> 2.31.1
>


testsuite: Port 'check-function-bodies' to nvptx

2023-09-12 Thread Thomas Schwinge
Hi!

On 2023-09-05T15:28:20+0100, Richard Sandiford via Gcc-patches 
 wrote:
> Thomas Schwinge  writes:
>> On 2023-09-04T23:05:05+0200, I wrote:
>>> On 2019-07-16T15:04:49+0100, Richard Sandiford  
>>> wrote:
 This patch therefore adds a new check-function-bodies dg-final test
>>
 The regexps in parse_function_bodies are fairly general, but might
 still need to be extended in future for targets like Darwin or AIX.
>>>
>>> ..., or nvptx.  [...]

>> Any comments before I push the attached
>> "testsuite: Port 'check-function-bodies' to nvptx"?

> LGTM.  Just a minor comment:

>> --- a/gcc/doc/sourcebuild.texi
>> +++ b/gcc/doc/sourcebuild.texi
>> @@ -3327,9 +3327,12 @@ The first line of the expected output for a function 
>> @var{fn} has the form:
>>  Subsequent lines of the expected output also start with @var{prefix}.
>>  In both cases, whitespace after @var{prefix} is not significant.
>>
>> -The test discards assembly directives such as @code{.cfi_startproc}
>> -and local label definitions such as @code{.LFB0} from the compiler's
>> -assembly output.  It then matches the result against the expected
>> +Depending on the configuration (see
>> +@code{gcc/testsuite/lib/scanasm.exp:configure_check-function-bodies}),
>
> I can imagine such a long string wouldn't format well in the output.
> How about: @code{configure_check-function-bodies} in
> @filename{gcc/testsuite/lib/scanasm.exp}?

Thanks, good suggestion.

Also, I've backed out the 'gcc.target/nvptx/abort.c' change to use
'check-function-bodies', leaving that for a later commit to translate
more of 'gcc.target/nvptx/[...]'.

Pushed to master branch commit 50410234a3d2e1b85203d97fe6f65fd9d1f0e100
"testsuite: Port 'check-function-bodies' to nvptx", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 50410234a3d2e1b85203d97fe6f65fd9d1f0e100 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 4 Sep 2023 22:28:12 +0200
Subject: [PATCH] testsuite: Port 'check-function-bodies' to nvptx

This extends commit 4d706ff86ea86868615558e92407674a4f4b4af9
"Add dg test for matching function bodies" for nvptx.

	gcc/testsuite/
	* lib/scanasm.exp (configure_check-function-bodies): New proc.
	(parse_function_bodies, check-function-bodies): Use it.
	gcc/
	* doc/sourcebuild.texi (check-function-bodies): Update.
---
 gcc/doc/sourcebuild.texi  |  9 +++--
 gcc/testsuite/lib/scanasm.exp | 76 +++
 2 files changed, 66 insertions(+), 19 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1a78b3c1abb..de1aa8c2dba 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -3327,9 +3327,12 @@ The first line of the expected output for a function @var{fn} has the form:
 Subsequent lines of the expected output also start with @var{prefix}.
 In both cases, whitespace after @var{prefix} is not significant.
 
-The test discards assembly directives such as @code{.cfi_startproc}
-and local label definitions such as @code{.LFB0} from the compiler's
-assembly output.  It then matches the result against the expected
+Depending on the configuration (see
+@code{configure_check-function-bodies} in
+@file{gcc/testsuite/lib/scanasm.exp}), the test may discard from the
+compiler's assembly output directives such as @code{.cfi_startproc},
+local label definitions such as @code{.LFB0}, and more.
+It then matches the result against the expected
 output for a function as a single regular expression.  This means that
 later lines can use backslashes to refer back to @samp{(@dots{})}
 captures on earlier lines.  For example:
diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 0685de1d641..5df80325dff 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -777,33 +777,73 @@ proc scan-lto-assembler { args } {
 dg-scan "scan-lto-assembler" 1 $testcase $output_file $args
 }
 
-# Read assembly file FILENAME and store a mapping from function names
-# to function bodies in array RESULT.  FILENAME has already been uploaded
-# locally where necessary and is known to exist.
 
-proc parse_function_bodies { filename result } {
-upvar $result up_result
+# Set up CONFIG for check-function-bodies.
+
+proc configure_check-function-bodies { config } {
+upvar $config up_config
 
 # Regexp for the start of a function definition (name in \1).
-set label {^([a-zA-Z_]\S+):$}
+if { [istarget nvptx*-*-*] } {
+	set up_config(start) {^// BEGIN(?: GLOBAL|) FUNCTION DEF: ([a-zA-Z_]\S+)$}
+} else {
+	set up_config(start) {^([a-zA-Z_]\S+):$}
+}
 
 # Regexp for the end of a function definition.
-set terminator {^\s*\.size}
-
+if { [istarget nvptx*-*-*] } {
+	set up_config(end) {^\}$}
+} else {
+	

[PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-09-12 Thread Pan Li via Gcc-patches
From: Pan Li 

Update in v3:

* Rewrite comment for overloaded function add.
* Move get_non_overloaded_instance to function_base.

Update in v2:

* Add get_non_overloaded_instance for function instance.
* Fix overload check for policy function.
* Enrich the test cases check.

Original log:

This patch would like add the framework to support the RVV overloaded
intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.

However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
with below steps.

* Register overloaded functions.
* Add function_resolver for overloaded function resolving.
* Add resolve API for function shape with default implementation.
* Implement HOOK for navigating the overloaded API to non-overloaded API.

We validated this framework by the vmv_v intrinsic API(s), and we will
add more intrins API support in the underlying patches.

gcc/ChangeLog:

* config/riscv/riscv-c.cc
(riscv_resolve_overloaded_builtin): New function for the hook.
(riscv_register_pragmas): Register the hook
* config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
* config/riscv/riscv-vector-builtins-shapes.cc (build_one):
Register overloaded function.
(struct overloaded_base): New struct for overloaded shape.
(struct non_overloaded_base): New struct for non overloaded shape.
(struct move_def): Inherit overloaded shape.
* config/riscv/riscv-vector-builtins.cc
(function_base::get_non_overloaded_instance): New API impl.
(function_builder::add_function): Add overloaded arg.
(function_resolver::function_resolver): New constructor.
(function_builder::add_overloaded_function): New API impl.
(function_resolver::resolve): Ditto.
(function_resolver::lookup): Ditto.
(function_resolver::get_sub_code): Ditto.
(resolve_overloaded_builtin): New function impl.
* config/riscv/riscv-vector-builtins.h:
(class function_resolver): New class.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-c.cc   |  36 
 gcc/config/riscv/riscv-protos.h   |   1 +
 .../riscv/riscv-vector-builtins-shapes.cc |  20 ++-
 gcc/config/riscv/riscv-vector-builtins.cc | 155 +-
 gcc/config/riscv/riscv-vector-builtins.h  |  36 +++-
 .../riscv/rvv/base/overloaded_rv32_vmv_v.c|   8 +
 .../riscv/rvv/base/overloaded_rv64_vmv_v.c|   8 +
 .../riscv/rvv/base/overloaded_vmv_v.h |  27 +++
 8 files changed, 288 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_vmv_v.h

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 283052ae313..060edd3129d 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -220,11 +220,47 @@ riscv_check_builtin_call (location_t loc, vec 
arg_loc, tree fndecl,
   gcc_unreachable ();
 }
 
+/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  */
+static tree
+riscv_resolve_overloaded_builtin (unsigned int uncast_location, tree fndecl,
+ void *uncast_arglist)
+{
+  vec empty = {};
+  location_t loc = (location_t) uncast_location;
+  vec *arglist = (vec *) uncast_arglist;
+  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned int subcode = code >> RISCV_BUILTIN_SHIFT;
+  tree new_fndecl = NULL_TREE;
+
+  if (!arglist)
+arglist = ∅
+
+  switch (code & RISCV_BUILTIN_CLASS)
+{
+case RISCV_BUILTIN_GENERAL:
+  break;
+case RISCV_BUILTIN_VECTOR:
+  new_fndecl = riscv_vector::resolve_overloaded_builtin (loc, subcode,
+arglist);
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  if (new_fndecl == NULL_TREE)
+return new_fndecl;
+
+  return build_function_call_vec (loc, vNULL, new_fndecl, arglist, NULL,
+ fndecl);
+}
+
 /* Implement REGISTER_TARGET_PRAGMAS.  */
 
 void
 riscv_register_pragmas (void)
 {
+  targetm.resolve_overloaded_builtin = riscv_resolve_overloaded_builtin;
   targetm.check_builtin_call = riscv_check_builtin_call;
+
   c_register_pragma ("riscv", "intrinsic", riscv_pragma_intrinsic);
 }
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6dbf6b9f943..5d2492dd031 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -381,6 +381,7 @@ gimple *gimple_fold_builtin (unsigned int, 
gimple_stmt_iterator *, gcall *);
 rtx expand_builtin (unsigned int, tree, rtx);
 b

Re: [Patch] OpenMP (C only): omp allocate - extend parsing support, improve diagnostic

2023-09-12 Thread Tobias Burnus

Seems as if I missed a 'git add -u' yesterday evening + missed this when
rechecking this morning.

Now included as separate patch :-/
Unless there are comments, I intent to commit it very soon.

Namely, the actual c-parse.cc update was missing and only the updated
tests were included. In particular missing:

On 12.09.23 09:04, Tobias Burnus wrote:

+  error_at (OMP_CLAUSE_LOCATION (nl),
+"allocator variable %qD must be declared before
%qD",
+allocator, var);

...
...

I bet we can't catch everything, but perhaps e.g. doing the first
diagnostics from within walk_tree might be better.


Done now.


(Or only via the attach patch.)

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP (C only): For 'omp allocate', really walk tree for 'alloctor' check

Walk expression tree of the 'allocator' clause of 'omp allocate' to
detect more cases where the allocator expression depends on code between
a variable declaration and its associated '#pragma omp allocate'.

This commit was supposed to be part of
  r14-3863-g35f498d8dfc8e579eaba2ff2d2b96769c632fd58
  OpenMP (C only): omp allocate - extend parsing support, improve diagnostic
which also contains the associated testcase changes (oops!).

gcc/c/ChangeLog:

	* c-parser.cc (struct c_omp_loc_tree): New.
	(c_check_omp_allocate_allocator_r): New; checking moved from ...
	(c_parser_omp_allocate): ... here. Call it via walk_tree.

 gcc/c/c-parser.cc | 102 --
 1 file changed, 61 insertions(+), 41 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 643ec02706b..b9a1b75ca43 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -19343,6 +19343,61 @@ c_parser_oacc_wait (location_t loc, c_parser *parser, char *p_name)
   return stmt;
 }
 
+struct c_omp_loc_tree
+{
+  location_t loc;
+  tree var;
+};
+
+/* Check whether the expression used in the allocator clause is declared or
+   modified between the variable declaration and its allocate directive.  */
+static tree
+c_check_omp_allocate_allocator_r (tree *tp, int *, void *data)
+{
+  tree var = ((struct c_omp_loc_tree *) data)->var;
+  location_t loc = ((struct c_omp_loc_tree *) data)->loc;
+  if (TREE_CODE (*tp) == VAR_DECL && c_check_in_current_scope (*tp))
+{
+  if (linemap_location_before_p (line_table, DECL_SOURCE_LOCATION (var),
+ DECL_SOURCE_LOCATION (*tp)))
+	{
+	  error_at (loc, "variable %qD used in the % clause must "
+			 "be declared before %qD", *tp, var);
+	  inform (DECL_SOURCE_LOCATION (*tp), "declared here");
+	  inform (DECL_SOURCE_LOCATION (var),
+		  "to be allocated variable declared here");
+	  return *tp;
+	}
+  else
+	{
+	  gcc_assert (cur_stmt_list
+		  && TREE_CODE (cur_stmt_list) == STATEMENT_LIST);
+
+	  tree_stmt_iterator l = tsi_last (cur_stmt_list);
+	  while (!tsi_end_p (l))
+	{
+	  if (linemap_location_before_p (line_table, EXPR_LOCATION (*l),
+	 DECL_SOURCE_LOCATION (var)))
+		  break;
+	  if (TREE_CODE (*l) == MODIFY_EXPR
+		  && TREE_OPERAND (*l, 0) == *tp)
+		{
+		  error_at (loc,
+			"variable %qD used in the % clause "
+			"must not be modified between declaration of %qD "
+			"and its % directive", *tp, var);
+		  inform (EXPR_LOCATION (*l), "modified here");
+		  inform (DECL_SOURCE_LOCATION (var),
+			  "to be allocated variable declared here");
+		  return *tp;
+		}
+	  --l;
+	}
+	}
+}
+  return NULL_TREE;
+}
+
 /* OpenMP 5.x:
# pragma omp allocate (list)  clauses
 
@@ -19465,8 +19520,8 @@ c_parser_omp_allocate (c_parser *parser)
 	error_at (loc, "% clause required for "
 			   "static variable %qD", var);
 	  else if (allocator
-		   && (tree_int_cst_sgn (allocator) != 1
-		   || tree_to_shwi (allocator) > 8))
+		   && (wi::to_widest (allocator) < 1
+		   || wi::to_widest (allocator) > 8))
 	/* 8 = largest predefined memory allocator. */
 	error_at (allocator_loc,
 		  "% clause requires a predefined allocator as "
@@ -19477,46 +19532,11 @@ c_parser_omp_allocate (c_parser *parser)
 		  "%qD not yet supported", var);
 	  continue;
 	}
-  if (allocator
-	  && TREE_CODE (allocator) == VAR_DECL
-	  && c_check_in_current_scope (var))
+  if (allocator)
 	{
-	  if (linemap_location_before_p (line_table, DECL_SOURCE_LOCATION (var),
-	 DECL_SOURCE_LOCATION (allocator)))
-	{
-	  error_at (OMP_CLAUSE_LOCATION (nl),
-			"allocator variable %qD must be declared before %qD",
-			allocator, var);
-	  inform (DECL_SOURCE_LOCATION (allocator),
-		  "allocator declared here");
-	  inform (DECL_SOURCE_LOCATION (var), "declared here");
-	}
-	  else
-	   {
-	 gcc_assert (cur_stmt_list
-			 && TREE_CODE (cur_stmt_list) == STAT

Re: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-09-12 Thread juzhe.zh...@rivai.ai
It looks reasonable to me now.
But let's wait for kito's more comments.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-12 16:46
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic
From: Pan Li 
 
Update in v3:
 
* Rewrite comment for overloaded function add.
* Move get_non_overloaded_instance to function_base.
 
Update in v2:
 
* Add get_non_overloaded_instance for function instance.
* Fix overload check for policy function.
* Enrich the test cases check.
 
Original log:
 
This patch would like add the framework to support the RVV overloaded
intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.
 
However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
with below steps.
 
* Register overloaded functions.
* Add function_resolver for overloaded function resolving.
* Add resolve API for function shape with default implementation.
* Implement HOOK for navigating the overloaded API to non-overloaded API.
 
We validated this framework by the vmv_v intrinsic API(s), and we will
add more intrins API support in the underlying patches.
 
gcc/ChangeLog:
 
* config/riscv/riscv-c.cc
(riscv_resolve_overloaded_builtin): New function for the hook.
(riscv_register_pragmas): Register the hook
* config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
* config/riscv/riscv-vector-builtins-shapes.cc (build_one):
Register overloaded function.
(struct overloaded_base): New struct for overloaded shape.
(struct non_overloaded_base): New struct for non overloaded shape.
(struct move_def): Inherit overloaded shape.
* config/riscv/riscv-vector-builtins.cc
(function_base::get_non_overloaded_instance): New API impl.
(function_builder::add_function): Add overloaded arg.
(function_resolver::function_resolver): New constructor.
(function_builder::add_overloaded_function): New API impl.
(function_resolver::resolve): Ditto.
(function_resolver::lookup): Ditto.
(function_resolver::get_sub_code): Ditto.
(resolve_overloaded_builtin): New function impl.
* config/riscv/riscv-vector-builtins.h:
(class function_resolver): New class.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-c.cc   |  36 
gcc/config/riscv/riscv-protos.h   |   1 +
.../riscv/riscv-vector-builtins-shapes.cc |  20 ++-
gcc/config/riscv/riscv-vector-builtins.cc | 155 +-
gcc/config/riscv/riscv-vector-builtins.h  |  36 +++-
.../riscv/rvv/base/overloaded_rv32_vmv_v.c|   8 +
.../riscv/rvv/base/overloaded_rv64_vmv_v.c|   8 +
.../riscv/rvv/base/overloaded_vmv_v.h |  27 +++
8 files changed, 288 insertions(+), 3 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_vmv_v.h
 
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 283052ae313..060edd3129d 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -220,11 +220,47 @@ riscv_check_builtin_call (location_t loc, vec 
arg_loc, tree fndecl,
   gcc_unreachable ();
}
+/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  */
+static tree
+riscv_resolve_overloaded_builtin (unsigned int uncast_location, tree fndecl,
+   void *uncast_arglist)
+{
+  vec empty = {};
+  location_t loc = (location_t) uncast_location;
+  vec *arglist = (vec *) uncast_arglist;
+  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned int subcode = code >> RISCV_BUILTIN_SHIFT;
+  tree new_fndecl = NULL_TREE;
+
+  if (!arglist)
+arglist = ∅
+
+  switch (code & RISCV_BUILTIN_CLASS)
+{
+case RISCV_BUILTIN_GENERAL:
+  break;
+case RISCV_BUILTIN_VECTOR:
+  new_fndecl = riscv_vector::resolve_overloaded_builtin (loc, subcode,
+  arglist);
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  if (new_fndecl == NULL_TREE)
+return new_fndecl;
+
+  return build_function_call_vec (loc, vNULL, new_fndecl, arglist, NULL,
+   fndecl);
+}
+
/* Implement REGISTER_TARGET_PRAGMAS.  */
void
riscv_register_pragmas (void)
{
+  targetm.resolve_overloaded_builtin = riscv_resolve_overloaded_builtin;
   targetm.check_builtin_call = riscv_check_builtin_call;
+
   c_register_pragma ("riscv", "intrinsic", riscv_pragma_intrinsic);
}
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6dbf6b9f943..5d2492dd031 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -381,6 +381,7 @@ gimple *gimple_fold_builtin (unsigned int, 
gimple_stmt_iterator *, gcall *);
rtx expand_builtin (unsigned int, tree, rtx);
bool check_builtin_ca

[PATCH] RISC-V: Add missed cond autovec testcases

2023-09-12 Thread Lehua Ding
This patch adds all missed cond autovec testcases. For not support
cond patterns, the following patches will be sent to fix it.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_arith-1.c: Add vrem op.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-1.c: 
...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-2.c: 
...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-3.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-3.c: 
...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-4.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-4.c: 
...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-5.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-5.c: 
...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-1.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-2.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-3.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-4.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-5.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-9.c: New test.

---
 .../riscv/rvv/autovec/cond/cond_arith-1.c | 13 +
 .../riscv/rvv/autovec/cond/cond_arith-2.c |  3 ++
 .../riscv/rvv/autovec/cond/cond_arith-3.c | 15 ++
 .../riscv/rvv/autovec/cond/cond_arith-4.c |  3 ++
 .../riscv/rvv/autovec/cond/cond_arith-5.c | 13 +
 .../riscv/rvv/autovec/cond/cond_arith-6.c |  3 ++
 .../riscv/rvv/autovec/cond/cond_arith-7.c |  9 
 .../riscv/rvv/autovec/cond/cond_arith-8.c | 17 ++-
 .../riscv/rvv/autovec/cond/cond_arith-9.c | 11 -
 .../riscv/rvv/autovec/cond/cond_logical-1.c   | 43 
 .../riscv/rvv/autovec/cond/cond_logical-2.c   | 43 
 .../riscv/rvv/autovec/cond/cond_logical-3.c   | 43 
 .../riscv/rvv/autovec/cond/cond_logical-4.c   | 43 
 .../riscv/rvv/autovec/cond/cond_logical-5.c   | 43 
 .../rvv/autovec/cond/cond_logical_min_max-1.c | 49 +++
 .../rvv/autovec/cond/cond_logical_min_max-2.c | 49 +++
 .../rvv/autovec/cond/cond_logical_min_max-3.c | 49 +++
 .../rvv/autovec/cond/cond_logical_min_max-4.c | 49 +++
 .../rvv/autovec/cond/cond_logical_min_max-5.c | 49 +++
 ...l_run-1.c => cond_logical_min_max_run-1.c} |  2 +-
 ...l_run-2.c => cond_logical_min_max_run-2.c} |  2 +-
 ...l_run-3.c => cond_logical_min_max_run-3.c} |  2 +-
 ...l_run-4.c => cond_logical_min_max_run-4.c} |  2 +-
 ...l_run-5.c => cond_logical_min_max_run-5.c} |  2 +-
 .../autovec/cond/cond_widen_complicate-1.c| 35 +
 .../autovec/cond/cond_widen_complicate-2.c| 35 +
 .../autovec/cond/cond_widen_complicate-3.c| 36 ++
 .../autovec/cond/cond_widen_complicate-4.c| 35 +
 .../autovec/cond/cond_widen_complicate-5.c| 37 ++
 .../autovec/cond/cond_widen_complicate-6.c| 32 
 .../autovec/cond/cond_widen_complicate-7.c 

Re: [PATCH] ssa_name_has_boolean_range vs signed-boolean:31 types

2023-09-12 Thread Richard Biener via Gcc-patches
On Sat, Sep 2, 2023 at 4:33 AM Andrew Pinski via Gcc-patches
 wrote:
>
> This turns out to be a latent bug in ssa_name_has_boolean_range
> where it would return true for all boolean types but all of the
> uses of ssa_name_has_boolean_range was expecting 0/1 as the range
> rather than [-1,0].
> So when I fixed vector lower to do all comparisons in boolean_type
> rather than still in the signed-boolean:31 type (to fix a different issue),
> the pattern in match for `-(type)!A -> (type)A - 1.` would assume A (which
> was signed-boolean:31) had a range of [0,1] which broke down and sometimes
> gave us -1/-2 as values rather than what we were expecting of -1/0.
>
> This was the simpliest patch I found while testing.
>
> We have another way of matching [0,1] range which we could use instead
> of ssa_name_has_boolean_range except that uses only the global ranges
> rather than the local range (during VRP).
> I tried to clean this up slightly by using gimple_match_zero_one_valuedp
> inside ssa_name_has_boolean_range but that failed because due to using
> only the global ranges. I then tried to change get_nonzero_bits to use
> the local ranges at the optimization time but that failed also because
> we would remove branches to __builtin_unreachable during evrp and lose
> information as we don't set the global ranges during evrp.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

I guess the name of 'ssa_name_has_boolean_range' is unfortunate here.

We also lack documenting BOOLEAN_TYPE with [-1,0], neither tree.def
nor generic.texi elaborate on those.  build_nonstandard_boolean_type
simply calls fixup_signed_type which will end up setting MIN/MAX value
to [INT_MIN, INT_MAX].

Iff ssa_name_has_boolean_range really checks for zero_one we should
maybe rename it.

Iff _all_ signed BOOLEAN_TYPE have a true value of -1 (signed:8 can
very well represent [0, 1] as well) then we should document that.  (No,
I don't think we want TYPE_MIN/MAX_VALUE to specify this)

At some point the middle-end was very conservative and only considered
unsigned BOOLEAN_TYPE with 1 bit precision to have a [0,1] range.

I think that a more general 'boolean range' (not [0, 1]) query is only
possible if we hand in context.

The patch is definitely correct - not all BOOLEAN_TYPE types have a [0, 1]
range, thus OK.

Does Ada have signed booleans that are BOOLEAN_TYPE but do _not_
have [-1, 0] as range?  I think documenting [0, 1] for (single-bit precision?)
unsigned BOOLEAN_TYPE and [-1, 1] for signed BOOLEAN_TYPE would
be conservative.

Thanks,
Richard.

> PR 110817
>
> gcc/ChangeLog:
>
> * tree-ssanames.cc (ssa_name_has_boolean_range): Remove the
> check for boolean type as they don't have "[0,1]" range.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/execute/pr110817-1.c: New test.
> * gcc.c-torture/execute/pr110817-2.c: New test.
> * gcc.c-torture/execute/pr110817-3.c: New test.
> ---
>  gcc/testsuite/gcc.c-torture/execute/pr110817-1.c | 13 +
>  gcc/testsuite/gcc.c-torture/execute/pr110817-2.c | 16 
>  gcc/testsuite/gcc.c-torture/execute/pr110817-3.c | 14 ++
>  gcc/tree-ssanames.cc |  4 
>  4 files changed, 43 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110817-1.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110817-2.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110817-3.c
>
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110817-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr110817-1.c
> new file mode 100644
> index 000..1d33fa9a207
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr110817-1.c
> @@ -0,0 +1,13 @@
> +typedef unsigned long __attribute__((__vector_size__ (8))) V;
> +
> +
> +V c;
> +
> +int
> +main (void)
> +{
> +  V v = ~((V) { } <=0);
> +  if (v[0])
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110817-2.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr110817-2.c
> new file mode 100644
> index 000..1f759178425
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr110817-2.c
> @@ -0,0 +1,16 @@
> +
> +typedef unsigned char u8;
> +typedef unsigned __attribute__((__vector_size__ (8))) V;
> +
> +V v;
> +unsigned char c;
> +
> +int
> +main (void)
> +{
> +  V x = (v > 0) > (v != c);
> + // V x = foo ();
> +  if (x[0] || x[1])
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110817-3.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr110817-3.c
> new file mode 100644
> index 000..36f09c88dd9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr110817-3.c
> @@ -0,0 +1,14 @@
> +typedef unsigned __attribute__((__vector_size__ (1*sizeof(unsigned V;
> +
> +V v;
> +unsigned char c;
> +
> +int
> +main (void)
> +{
> +  V x = (v > 0) > (v != c);
> +  volatile signed int t = x[0];
> +  if (t)
> +__

nvptx 'TARGET_USE_LOCAL_THUNK_ALIAS_P', 'TARGET_SUPPORTS_ALIASES' (was: [committed][nvptx] Use .alias directive for mptx >= 6.3)

2023-09-12 Thread Thomas Schwinge
Hi!

On 2022-12-02T12:24:58+0100, I wrote:
> On 2022-03-22T14:41:46+0100, Tom de Vries via Gcc-patches 
>  wrote:
>> Starting with ptx isa version 6.3, a ptx directive .alias is available.
>> Use this directive to support symbol aliases, as far as possible.
>>
>> The alias support is off by default.  It can be turned on using a switch
>> -malias.

>> This patch causes a regression:
>> ...
>> -PASS: gcc.dg/pr60797.c  (test for errors, line 4)
>> +FAIL: gcc.dg/pr60797.c  (test for errors, line 4)
>> ...
>> The test-case is skipped for effective target alias, and both without and 
>> with
>> this patch the nvptx target is considered to not support it, so the 
>> test-case is
>> executed.  The test-case expects an error message along the lines of "alias
>> definitions not supported in this configuration", but instead we run into:
>> ...
>> gcc.dg/pr60797.c:4:12: error: foo aliased to undefined symbol
>> ...
>> This is probably due to the fact that the nvptx backend now defines macros
>> ASM_OUTPUT_DEF and ASM_OUTPUT_DEF_FROM_DECLS, so from the point of view of 
>> the
>> common part of the compiler, aliases are supported.
>
> Testing this with the new '-malias' flag not active (default), I'm seeing
> a number of more regressions:
>
> [...]: error: alias definitions not supported in this configuration
>
> [-PASS:-]{+FAIL:+} gcc.dg/pr56727-1.c (test for excess errors)
>
> PASS: gcc.dg/pr84739.c  (test for warnings, line 6)
> PASS: gcc.dg/pr84739.c  (test for warnings, line 21)
> [-PASS:-]{+FAIL:+} gcc.dg/pr84739.c (test for excess errors)
>
> [-PASS:-]{+FAIL:+} gcc.dg/ipa/pr81520.c (test for excess errors)
>
> ..., and then, in particular, a ton of regressions in GCC/C++ testing
> ('check-gcc-c++') due to approximately 4 new
> "error: alias definitions not supported in this configuration"
> diagnostics.  Whoops...  (I suppose you've not tested that at all?)
>
> If the new '-malias' flag is not active, we have to maintain (restore)
> the previous state of other macros, as I'm demonstrating in the attached
> "[WIP] nvptx 'TARGET_USE_LOCAL_THUNK_ALIAS_P', 'TARGET_SUPPORTS_ALIASES'".
> This completely addresses in particular the 'check-gcc-c++' regressions,
> and does not cause any changes if the new '-malias' flag in fact is
> active (as when testing with '-mptx=6.3 -malias', for example).
>
> However, this further regresses the 'gcc.dg/pr56727-1.c' and
> 'gcc.dg/ipa/pr81520.c' test cases mentioned above:
>
> during IPA pass: whole-program
> [...]: internal compiler error: in function_and_variable_visibility, at 
> ipa-visibility.cc:647
>
> {+FAIL: gcc.dg/pr56727-1.c (internal compiler error: in 
> function_and_variable_visibility, at ipa-visibility.cc:647)+}
> FAIL: gcc.dg/pr56727-1.c (test for excess errors)
>
> {+FAIL: gcc.dg/ipa/pr81520.c (internal compiler error: in 
> function_and_variable_visibility, at ipa-visibility.cc:647)+}
> FAIL: gcc.dg/ipa/pr81520.c (test for excess errors)
>
> Such ICEs we're not yet currently seeing elsewhere; this remains to be
> analyzed -- after all, these test cases PASSed originally.

That'll be addressed by

"More '#ifdef ASM_OUTPUT_DEF' -> 'if (TARGET_SUPPORTS_ALIASES)' etc.":

[-FAIL: gcc.dg/pr56727-1.c (internal compiler error: in 
function_and_variable_visibility, at ipa-visibility.cc:647)-]
[-FAIL:-]{+PASS:+} gcc.dg/pr56727-1.c (test for excess errors)

[-FAIL:-]{+PASS:+} gcc.dg/pr60797.c  (test for errors, line 4)
[-FAIL:-]{+PASS:+} gcc.dg/pr60797.c (test for excess errors)

[-FAIL: gcc.dg/ipa/pr81520.c (internal compiler error: in 
function_and_variable_visibility, at ipa-visibility.cc:647)-]
[-FAIL:-]{+PASS:+} gcc.dg/ipa/pr81520.c (test for excess errors)

> And, obviously, the patch needs some "copy editing".

> --- a/gcc/config/nvptx/nvptx.h
> +++ b/gcc/config/nvptx/nvptx.h
> @@ -338,6 +338,12 @@ struct GTY(()) machine_function
>  #define ASM_OUTPUT_DEF_FROM_DECLS(STREAM, NAME, VALUE)   \
>nvptx_asm_output_def_from_decls (STREAM, NAME, VALUE)
>
> +//TODO
> +/* ..., but don't let that dummy ASM_OUTPUT_DEF definition influence other
> +   macros.  */
> +#define TARGET_USE_LOCAL_THUNK_ALIAS_P(DECL) (!((nvptx_alias == 0 || 
> !TARGET_PTX_6_3)))
> +#define TARGET_SUPPORTS_ALIASES (!((nvptx_alias == 0 || !TARGET_PTX_6_3)))

Pushed to master branch commit 537e2cc30d0f8ba6433af52f2fef038d75d93174
"nvptx 'TARGET_USE_LOCAL_THUNK_ALIAS_P', 'TARGET_SUPPORTS_ALIASES'", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 537e2cc30d0f8ba6433af52f2fef038d75d93174 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 14 Jul 2022 22:05:17 +0200
Subject: [PATCH] nvptx 'TARGET_USE_LOCA

Re: [PATCH] RISC-V: Add missed cond autovec testcases

2023-09-12 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-09-12 16:57
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH] RISC-V: Add missed cond autovec testcases
This patch adds all missed cond autovec testcases. For not support
cond patterns, the following patches will be sent to fix it.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/cond/cond_arith-1.c: Add vrem op.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-1.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-2.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-3.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-3.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-4.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-4.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-5.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-5.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-1.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-2.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-3.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-4.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-5.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-9.c: New test.
 
---
.../riscv/rvv/autovec/cond/cond_arith-1.c | 13 +
.../riscv/rvv/autovec/cond/cond_arith-2.c |  3 ++
.../riscv/rvv/autovec/cond/cond_arith-3.c | 15 ++
.../riscv/rvv/autovec/cond/cond_arith-4.c |  3 ++
.../riscv/rvv/autovec/cond/cond_arith-5.c | 13 +
.../riscv/rvv/autovec/cond/cond_arith-6.c |  3 ++
.../riscv/rvv/autovec/cond/cond_arith-7.c |  9 
.../riscv/rvv/autovec/cond/cond_arith-8.c | 17 ++-
.../riscv/rvv/autovec/cond/cond_arith-9.c | 11 -
.../riscv/rvv/autovec/cond/cond_logical-1.c   | 43 
.../riscv/rvv/autovec/cond/cond_logical-2.c   | 43 
.../riscv/rvv/autovec/cond/cond_logical-3.c   | 43 
.../riscv/rvv/autovec/cond/cond_logical-4.c   | 43 
.../riscv/rvv/autovec/cond/cond_logical-5.c   | 43 
.../rvv/autovec/cond/cond_logical_min_max-1.c | 49 +++
.../rvv/autovec/cond/cond_logical_min_max-2.c | 49 +++
.../rvv/autovec/cond/cond_logical_min_max-3.c | 49 +++
.../rvv/autovec/cond/cond_logical_min_max-4.c | 49 +++
.../rvv/autovec/cond/cond_logical_min_max-5.c | 49 +++
...l_run-1.c => cond_logical_min_max_run-1.c} |  2 +-
...l_run-2.c => cond_logical_min_max_run-2.c} |  2 +-
...l_run-3.c => cond_logical_min_max_run-3.c} |  2 +-
...l_run-4.c => cond_logical_min_max_run-4.c} |  2 +-
...l_run-5.c => cond_logical_min_max_run-5.c} |  2 +-
.../autovec/cond/cond_widen_complicate-1.c| 35 +
.../autovec/cond/cond_widen_complicate-2.c| 35 +
.../autovec/cond/cond_widen_complicate-3.c| 36 ++
.../autovec/cond/cond_widen_complicate-4.c| 35 +
.../autovec/cond/cond_widen_complicate-5.c| 37 ++
.../autovec/cond/cond_widen_complicate-6.c| 32 
.../autovec/cond/cond_widen_complicate-7.c| 29 +++
.../autovec/cond/cond_widen_complicate-8.c| 28 +++
.../autovec/cond/cond_widen_complic

Re: [PATCH] RISC-V: Add missed cond autovec testcases

2023-09-12 Thread Kito Cheng via Gcc-patches
LGTM

On Tue, Sep 12, 2023 at 4:58 PM Lehua Ding  wrote:
>
> This patch adds all missed cond autovec testcases. For not support
> cond patterns, the following patches will be sent to fix it.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/cond/cond_arith-1.c: Add vrem op.
> * gcc.target/riscv/rvv/autovec/cond/cond_arith-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_arith-3.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_arith-4.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_arith-5.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_arith-6.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_arith-7.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_arith-8.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_arith-9.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-1.c: Moved to...
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-1.c: 
> ...here.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-2.c: Moved to...
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-2.c: 
> ...here.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-3.c: Moved to...
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-3.c: 
> ...here.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-4.c: Moved to...
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-4.c: 
> ...here.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-5.c: Moved to...
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-5.c: 
> ...here.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical-1.c: Removed.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical-2.c: Removed.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical-3.c: Removed.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical-4.c: Removed.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical-5.c: Removed.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-1.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-2.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-3.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-4.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-5.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-1.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-2.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-3.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-4.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-5.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-6.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-7.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-8.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-9.c: New 
> test.
>
> ---
>  .../riscv/rvv/autovec/cond/cond_arith-1.c | 13 +
>  .../riscv/rvv/autovec/cond/cond_arith-2.c |  3 ++
>  .../riscv/rvv/autovec/cond/cond_arith-3.c | 15 ++
>  .../riscv/rvv/autovec/cond/cond_arith-4.c |  3 ++
>  .../riscv/rvv/autovec/cond/cond_arith-5.c | 13 +
>  .../riscv/rvv/autovec/cond/cond_arith-6.c |  3 ++
>  .../riscv/rvv/autovec/cond/cond_arith-7.c |  9 
>  .../riscv/rvv/autovec/cond/cond_arith-8.c | 17 ++-
>  .../riscv/rvv/autovec/cond/cond_arith-9.c | 11 -
>  .../riscv/rvv/autovec/cond/cond_logical-1.c   | 43 
>  .../riscv/rvv/autovec/cond/cond_logical-2.c   | 43 
>  .../riscv/rvv/autovec/cond/cond_logical-3.c   | 43 
>  .../riscv/rvv/autovec/cond/cond_logical-4.c   | 43 
>  .../riscv/rvv/autovec/cond/cond_logical-5.c   | 43 
>  .../rvv/autovec/cond/cond_logical_min_max-1.c | 49 +++
>  .../rvv/autovec/cond/cond_logical_min_max-2.c | 49 +++
>  .../rvv/autovec/cond/cond_logical_min_max-3.c | 49 +++
>  .../rvv/autovec/cond/cond_logical_min_max-4.c | 49 +++
>  .../rvv/autovec/cond/cond_logical_min_max-5.c | 49 +++
>  ...l_run-1.c => cond_logical_min_max_run-1.c} |  2 +-
>  ...l_run-2.c => cond_logical_min_max_run-2.c} |  2 +-
>  ...l_run-3.c => cond_logical_min_max_run-3.c} |  2 +-
>  ...l_run-4.c => cond_logical_min_max_run-4.c} |  2 +-
>  ...l_run-5.c => cond_logical_min_max_run-5.c} |  2 +-
>  .../autovec/cond/cond_widen_complicate-1.c| 35 +
>  .../autovec/cond/cond_widen_complicate-2.c| 35 +
>  .../autovec/cond/cond_widen_complicate-3.c| 36 +

Re: [PATCH] RISC-V: Add missed cond autovec testcases

2023-09-12 Thread Lehua Ding

Committed, thanks Kito and Juzhe.

On 2023/9/12 17:02, Kito Cheng via Gcc-patches wrote:

LGTM

On Tue, Sep 12, 2023 at 4:58 PM Lehua Ding  wrote:


This patch adds all missed cond autovec testcases. For not support
cond patterns, the following patches will be sent to fix it.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/autovec/cond/cond_arith-1.c: Add vrem op.
 * gcc.target/riscv/rvv/autovec/cond/cond_arith-2.c: Ditto.
 * gcc.target/riscv/rvv/autovec/cond/cond_arith-3.c: Ditto.
 * gcc.target/riscv/rvv/autovec/cond/cond_arith-4.c: Ditto.
 * gcc.target/riscv/rvv/autovec/cond/cond_arith-5.c: Ditto.
 * gcc.target/riscv/rvv/autovec/cond/cond_arith-6.c: Ditto.
 * gcc.target/riscv/rvv/autovec/cond/cond_arith-7.c: Ditto.
 * gcc.target/riscv/rvv/autovec/cond/cond_arith-8.c: Ditto.
 * gcc.target/riscv/rvv/autovec/cond/cond_arith-9.c: Ditto.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-1.c: Moved to...
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-1.c: 
...here.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-2.c: Moved to...
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-2.c: 
...here.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-3.c: Moved to...
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-3.c: 
...here.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-4.c: Moved to...
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-4.c: 
...here.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-5.c: Moved to...
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-5.c: 
...here.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical-1.c: Removed.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical-2.c: Removed.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical-3.c: Removed.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical-4.c: Removed.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical-5.c: Removed.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-1.c: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-2.c: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-3.c: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-4.c: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-5.c: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-1.c: New 
test.
 * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-2.c: New 
test.
 * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-3.c: New 
test.
 * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-4.c: New 
test.
 * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-5.c: New 
test.
 * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-6.c: New 
test.
 * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-7.c: New 
test.
 * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-8.c: New 
test.
 * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-9.c: New 
test.

---
  .../riscv/rvv/autovec/cond/cond_arith-1.c | 13 +
  .../riscv/rvv/autovec/cond/cond_arith-2.c |  3 ++
  .../riscv/rvv/autovec/cond/cond_arith-3.c | 15 ++
  .../riscv/rvv/autovec/cond/cond_arith-4.c |  3 ++
  .../riscv/rvv/autovec/cond/cond_arith-5.c | 13 +
  .../riscv/rvv/autovec/cond/cond_arith-6.c |  3 ++
  .../riscv/rvv/autovec/cond/cond_arith-7.c |  9 
  .../riscv/rvv/autovec/cond/cond_arith-8.c | 17 ++-
  .../riscv/rvv/autovec/cond/cond_arith-9.c | 11 -
  .../riscv/rvv/autovec/cond/cond_logical-1.c   | 43 
  .../riscv/rvv/autovec/cond/cond_logical-2.c   | 43 
  .../riscv/rvv/autovec/cond/cond_logical-3.c   | 43 
  .../riscv/rvv/autovec/cond/cond_logical-4.c   | 43 
  .../riscv/rvv/autovec/cond/cond_logical-5.c   | 43 
  .../rvv/autovec/cond/cond_logical_min_max-1.c | 49 +++
  .../rvv/autovec/cond/cond_logical_min_max-2.c | 49 +++
  .../rvv/autovec/cond/cond_logical_min_max-3.c | 49 +++
  .../rvv/autovec/cond/cond_logical_min_max-4.c | 49 +++
  .../rvv/autovec/cond/cond_logical_min_max-5.c | 49 +++
  ...l_run-1.c => cond_logical_min_max_run-1.c} |  2 +-
  ...l_run-2.c => cond_logical_min_max_run-2.c} |  2 +-
  ...l_run-3.c => cond_logical_min_max_run-3.c} |  2 +-
  ...l_run-4.c => cond_logical_min_max_run-4.c} |  2 +-
  ...l_run-5.c => cond_logical_min_max_run-5.c} |  2 +-
  .../autovec/cond/cond_widen_complicate-1.c| 35 +
  .../autovec/cond/cond_widen_complicate-2.c| 35 +
  .../autovec/cond/cond_widen_complicate-3.c| 36 ++
  .../autovec/cond

Re: [PATCH 2/2] libstdc++: Add dg-require-thread-fence in several tests

2023-09-12 Thread Jonathan Wakely via Gcc-patches
On Tue, 12 Sept 2023 at 08:59, Christophe Lyon
 wrote:
>
>
>
> On Mon, 11 Sept 2023 at 18:11, Jonathan Wakely  wrote:
>>
>> On Mon, 11 Sept 2023 at 16:40, Christophe Lyon
>>  wrote:
>> >
>> >
>> >
>> > On Mon, 11 Sept 2023 at 17:22, Jonathan Wakely  wrote:
>> >>
>> >> On Mon, 11 Sept 2023 at 14:57, Christophe Lyon
>> >>  wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Mon, 11 Sept 2023 at 15:12, Jonathan Wakely  
>> >> > wrote:
>> >> >>
>> >> >> On Mon, 11 Sept 2023 at 13:36, Christophe Lyon
>> >> >>  wrote:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Mon, 11 Sept 2023 at 12:59, Jonathan Wakely  
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> On Sun, 10 Sept 2023 at 20:31, Christophe Lyon
>> >> >> >>  wrote:
>> >> >> >> >
>> >> >> >> > Some targets like arm-eabi with newlib and default settings rely 
>> >> >> >> > on
>> >> >> >> > __sync_synchronize() to ensure synchronization.  Newlib does not
>> >> >> >> > implement it by default, to make users aware they have to take 
>> >> >> >> > special
>> >> >> >> > care.
>> >> >> >> >
>> >> >> >> > This makes a few tests fail to link.
>> >> >> >>
>> >> >> >> Does this mean those features are unusable on the target, or just 
>> >> >> >> that
>> >> >> >> users need to provide their own __sync_synchronize to use them?
>> >> >> >
>> >> >> >
>> >> >> > IIUC the user is expected to provide them.
>> >> >> > Looks like we discussed this in the past :-)
>> >> >> > In  https://gcc.gnu.org/legacy-ml/gcc-patches/2016-10/msg01632.html,
>> >> >> > see the pointer to Ramana's comment: 
>> >> >> > https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02751.html
>> >> >>
>> >> >> Oh yes, thanks for the reminder!
>> >> >>
>> >> >> >
>> >> >> > The default arch for arm-eabi is armv4t which is very old.
>> >> >> > When running the testsuite with something more recent (either as 
>> >> >> > default by configuring GCC --with-arch=XXX or by forcing 
>> >> >> > -march/-mcpu via dejagnu's target-board), the compiler generates 
>> >> >> > barrier instructions and there are no such errors.
>> >> >>
>> >> >> Ah yes, that's fine then.
>> >> >>
>> >> >> > For instance, here is a log with the defaults:
>> >> >> > https://git.linaro.org/toolchain/ci/base-artifacts/tcwg_gnu_embed_check_gcc/master-arm_eabi.git/tree/00-sumfiles?h=linaro-local/ci/tcwg_gnu_embed_check_gcc/master-arm_eabi
>> >> >> > and a log when we target cortex-m0 which is still a very small cpu 
>> >> >> > but has barriers:
>> >> >> > https://git.linaro.org/toolchain/ci/base-artifacts/tcwg_gnu_embed_check_gcc/master-thumb_m0_eabi.git/tree/00-sumfiles?h=linaro-local/ci/tcwg_gnu_embed_check_gcc/master-thumb_m0_eabi
>> >> >> >
>> >> >> > I somehow wanted to get rid of such errors with the default 
>> >> >> > configuration
>> >> >>
>> >> >> Yep, that makes sense, and we'll still be testing them for newer
>> >> >> arches on the target, so it's not completely disabling those parts of
>> >> >> the testsuite.
>> >> >>
>> >> >> But I'm still curious why some of those tests need this change. I
>> >> >> think the ones I noted below are probably failing for some other
>> >> >> reasons.
>> >> >>
>> >> > Just looked at  23_containers/span/back_assert_neg.cc, the linker says 
>> >> > it needs
>> >> > arm-eabi/libstdc++-v3/src/.libs/libstdc++.a(debug.o) to resolve
>> >> > ./back_assert_neg-back_assert_neg.o (std::__glibcxx_assert_fail(char 
>> >> > const*, int, char const*, char const*))
>> >> > and indeed debug.o has a reference to __sync_synchronize
>> >>
>> >> Aha, that's just because I put __glibcxx_assert_fail in debug.o, but
>> >> there are no dependencies on anything else in that file, including the
>> >> _M_detach member function that uses atomics.
>> >
>> > indeed
>> >
>> >
>> >>
>> >> This would also be solved by -Wl,--gc-sections :-)
>> >
>> > :-)
>> >
>> >>
>> >> I think it would be better to move __glibcxx_assert_fail to a new
>> >> file, so that it doesn't make every assertion unnecessarily depend on
>> >> __sync_synchronize. I'll do that now.
>> >
>> > Sounds like a good idea, thanks.
>>
>> Done now at r14-3846-g4a2766ed00a479
>> >
>> >>
>> >> We could also make the atomics in debug.o conditional, so that debug
>> >> mode doesn't depend on __sync_synchronize for single-threaded targets.
>> >> Does the arm4t arch have pthreads support in newlib?  I didn't bother
>> >
>> > No ( grep _GLIBCXX_HAS_GTHREADS 
>> > $objdir/arm-eabi/libstdc++-v3/include/arm-eabi/bits/c++config returns:
>> > /* #undef _GLIBCXX_HAS_GTHREADS */
>> >
>> >> making the use of atomics conditional, because performance is not
>> >> really a priority for debug mode bookkeeping. But the problem here
>> >> isn't just a slight performance overhead of atomics, it's that they
>> >> aren't even supported for arm4t.
>> >
>> > OK thanks.
>> >
>> > So finally, this uncovered at least a "bug" that  __glibcxx_assert_fail 
>> > should be in a dedicated object file :-)
>> >
>> > I'll revisit my patch once you have moved __glibcxx_assert_fail
>>
>> That's done (at r14-3845-gc7db9000fa7cac) an

Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread juzhe.zh...@rivai.ai
Thanks Robin.

I have tried your codes. It works fine and tests passes.
Does your code O(nlogn) complexity ?




juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 16:19
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
Hi Juzhe,
 
> +max_number_of_live_regs (const basic_block bb,
> + const hash_map &live_ranges,
> + unsigned int max_point, machine_mode biggest_mode,
> + int lmul)
> +{
> +  unsigned int max_nregs = 0;
> +  unsigned int i;
> +  unsigned int live_point = 0;
> +  auto_vec live_vars_vec;
> +  live_vars_vec.safe_grow (max_point + 1, true);
> +  for (i = 0; i < live_vars_vec.length (); ++i)
> +live_vars_vec[i] = 0;
> +  for (hash_map::iterator iter = live_ranges.begin ();
> +   iter != live_ranges.end (); ++iter)
> +{
> +  tree var = (*iter).first;
> +  pair live_range = (*iter).second;
> +  for (i = live_range.first; i <= live_range.second; i++)
> + {
> +   machine_mode mode = TYPE_MODE (TREE_TYPE (var));
> +   unsigned int nregs
> + = compute_nregs_for_mode (mode, biggest_mode, lmul);
> +   live_vars_vec[i] += nregs;
> +   if (live_vars_vec[i] > max_nregs)
> + max_nregs = live_vars_vec[i];
> + }
> +}
 
My concern is that we have O(nm) here, where n = number of live_ranges
and m = size of live range.  In large basic blocks (think calculix of
SPECfp 2006 which can reach up to 2000 instructions IIRC) this might
become prohibitive.
 
I'm going to do a quick benchmark with calculix and report back.  If
there is no noticable difference we can ditch my idea.
 
For short live ranges (like < 10) the O(nm) could be better.  As of now,
we still calculate the nregs n*m times, though.  I have something like
the following in mind (it is definitely not shorter, though):
 
  struct range {
  unsigned int pt;
  bool start;
  unsigned int nregs;
  };
 
  auto_vec ranges (2 * live_ranges.elements ());
  for (hash_map::iterator iter = live_ranges.begin ();
   iter != live_ranges.end (); ++iter)
{
  tree var = (*iter).first;
  machine_mode mode = TYPE_MODE (TREE_TYPE (var));
  unsigned int nregs
  = compute_nregs_for_mode (mode, biggest_mode, lmul);
  ranges.quick_push ({(*iter).second.first, true, nregs});
  ranges.quick_push ({(*iter).second.second, false, nregs});
}
 
  ranges.qsort ([] (const void *a, const void *b) -> int {
unsigned int aa = ((const range *)a)->pt;
unsigned int bb = ((const range *)b)->pt;
if (aa < bb)
  return -1;
if (aa == bb)
  return 0;
return 1;
});
 
  unsigned int cur = 0;
  max_nregs = ranges[0].nregs;
 
  for (auto r : ranges)
{
  if (r.start)
cur += r.nregs;
  else
cur -= r.nregs;
  max_nregs = MAX (max_nregs, cur);
}
 
> +  for (i = 0; i < cfun->gimple_df->ssa_names->length (); i++)
> +{
> +  tree t = ssa_name (i);
> +  if (!t)
> +   continue;
 
Could likely be replaced by
 
  tree t;
  FOR_EACH_SSA_NAME (i, t, cfun)
 
> +static void
> +update_local_live_ranges (
> +  vec_info *vinfo,
> +  hash_map> &program_points_per_bb,
> +  hash_map> &live_ranges_per_bb)
> +{
 
I just realized (sorry) that this is "nested" a bit far.  Can we still
have e.g. 
 
> +  if (loop_vec_info loop_vinfo = dyn_cast (vinfo))
> +{
 
this,
 
> +   if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
> +   != undef_vec_info_type)
 
this,
 
> +   if (live_range)
> + {
 
and this just "continue"?
 
Apart from that, LGTM.
 
Regards
Robin
 
 


Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread Robin Dapp via Gcc-patches
I did some benchmarks and, at least for calculix the differences are
miniscule.  I'd say we can stick with the current approach and improve
as needed.

However, I noticed ICEs here:

+  gcc_assert (biggest_size >= mode_size);

and here:

+  mode = TYPE_MODE (TREE_TYPE (lhs));

when compiling calculix.

Regards
 Robin


[PATCH v6] RISC-V:Optimize the MASK opt generation

2023-09-12 Thread Feng Wang
New patch add some comments and update docs for this new usage.
---
Accoring to Kito's advice, using "MASK(name) Var(other_flag_name)"
to generate MASK and TARGET MACRO automatically.
This patch improve the MACRO generation of MASK_* and TARGET_*.
Due to the more and more riscv extensions are added, the default target_flag
is full.
Before this patch,if you want to add new MACRO,you should define the
MACRO in the riscv-opts.h manually.
After this patch, you just need two steps:
1.Define the new TargetVariable.
2.Define "MASK(name) Var(new_target_flag).

gcc/ChangeLog:

* config/riscv/riscv-opts.h (MASK_ZICSR):
(MASK_ZIFENCEI): Delete;
(MASK_ZIHINTNTL):Ditto;
(MASK_ZIHINTPAUSE):  Ditto;
(TARGET_ZICSR):  Ditto;
(TARGET_ZIFENCEI):   Ditto;
(TARGET_ZIHINTNTL):  Ditto;
(TARGET_ZIHINTPAUSE):Ditto;
(MASK_ZAWRS):Ditto;
(TARGET_ZAWRS):  Ditto;
(MASK_ZBA):  Ditto;
(MASK_ZBB):  Ditto;
(MASK_ZBC):  Ditto;
(MASK_ZBS):  Ditto;
(TARGET_ZBA):Ditto;
(TARGET_ZBB):Ditto;
(TARGET_ZBC):Ditto;
(TARGET_ZBS):Ditto;
(MASK_ZFINX):Ditto;
(MASK_ZDINX):Ditto;
(MASK_ZHINX):Ditto;
(MASK_ZHINXMIN): Ditto;
(TARGET_ZFINX):  Ditto;
(TARGET_ZDINX):  Ditto;
(TARGET_ZHINX):  Ditto;
(TARGET_ZHINXMIN):   Ditto;
(MASK_ZBKB): Ditto;
(MASK_ZBKC): Ditto;
(MASK_ZBKX): Ditto;
(MASK_ZKNE): Ditto;
(MASK_ZKND): Ditto;
(MASK_ZKNH): Ditto;
(MASK_ZKR):  Ditto;
(MASK_ZKSED):Ditto;
(MASK_ZKSH): Ditto;
(MASK_ZKT):  Ditto;
(TARGET_ZBKB):   Ditto;
(TARGET_ZBKC):   Ditto;
(TARGET_ZBKX):   Ditto;
(TARGET_ZKNE):   Ditto;
(TARGET_ZKND):   Ditto;
(TARGET_ZKNH):   Ditto;
(TARGET_ZKR):Ditto;
(TARGET_ZKSED):  Ditto;
(TARGET_ZKSH):   Ditto;
(TARGET_ZKT):Ditto;
(MASK_ZTSO): Ditto;
(TARGET_ZTSO):   Ditto;
(MASK_VECTOR_ELEN_32):   Ditto;
(MASK_VECTOR_ELEN_64):   Ditto;
(MASK_VECTOR_ELEN_FP_32):Ditto;
(MASK_VECTOR_ELEN_FP_64):Ditto;
(MASK_VECTOR_ELEN_FP_16):Ditto;
(TARGET_VECTOR_ELEN_32): Ditto;
(TARGET_VECTOR_ELEN_64): Ditto;
(TARGET_VECTOR_ELEN_FP_32):Ditto;
(TARGET_VECTOR_ELEN_FP_64):Ditto;
(TARGET_VECTOR_ELEN_FP_16):Ditto;
 (MASK_ZVBB):   Ditto;
(MASK_ZVBC):   Ditto;
(TARGET_ZVBB): Ditto;
(TARGET_ZVBC): Ditto;
(MASK_ZVKG):   Ditto;
(MASK_ZVKNED): Ditto;
(MASK_ZVKNHA): Ditto;
(MASK_ZVKNHB): Ditto;
(MASK_ZVKSED): Ditto;
(MASK_ZVKSH):  Ditto;
(MASK_ZVKN):   Ditto;
(MASK_ZVKNC):  Ditto;
(MASK_ZVKNG):  Ditto;
(MASK_ZVKS):   Ditto;
(MASK_ZVKSC):  Ditto;
(MASK_ZVKSG):  Ditto;
(MASK_ZVKT):   Ditto;
(TARGET_ZVKG): Ditto;
(TARGET_ZVKNED):   Ditto;
(TARGET_ZVKNHA):   Ditto;
(TARGET_ZVKNHB):   Ditto;
(TARGET_ZVKSED):   Ditto;
(TARGET_ZVKSH):Ditto;
(TARGET_ZVKN): Ditto;
(TARGET_ZVKNC):Ditto;
(TARGET_ZVKNG):Ditto;
(TARGET_ZVKS): Ditto;
(TARGET_ZVKSC):Ditto;
(TARGET_ZVKSG):Ditto;
(TARGET_ZVKT): Ditto;
(MASK_ZVL32B): Ditto;
(MASK_ZVL64B): Ditto;
(MASK_ZVL128B):Ditto;
(MASK_ZVL256B):Ditto;
(MASK_ZVL512B):Ditto;
(MASK_ZVL1024B):   Ditto;
(MASK_ZVL2048B):   Ditto;
(MASK_ZVL4096B):   Ditto;
(MASK_ZVL8192B):   Ditto;
(MASK_ZVL16384B):  Ditto;
(MASK_ZVL32768B):  Ditto;
(MASK_ZVL65536B):  Ditto;
(TARGET_ZVL32B):   Ditto;
(TARGET_ZVL64B):   Ditto;
(TARGET_ZVL128B):  Ditto;
(TARGET_ZVL256B):  Ditto;
(TARGET_ZVL512B):  Ditto;
(TARGET_ZVL1024B): Ditto;
(TARGET_ZVL2048B): Ditto;
(TARGET_ZVL4096B): Ditto;
(TARGET_ZVL8192B): Ditto;
(TARGET_ZVL16384B):Ditto;
(TARGET_ZVL32768B):Ditto;
(TARGET_ZVL65536B):Ditto;
(MASK_ZICBOZ): Ditto;
(MASK_ZICBOM): Ditto;
(MASK_ZICBOP): Ditto;
(TARGET_ZICBOZ):   Ditto;
(TARGET_ZICBOM):   Ditto;
(TARGET_ZICBOP):   D

Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread juzhe.zh...@rivai.ai
Is calculix big ?

Could you give me the testcase to reproduce it?

For +  gcc_assert (biggest_size >= mode_size);
I currently don't have an idea to fix it.

But for +  mode = TYPE_MODE (TREE_TYPE (lhs));
I think I can fix it. 

if (!gimple_store_p (stmt))
{
  tree lhs = gimple_get_lhs (stmt);
  mode = TYPE_MODE (TREE_TYPE (lhs));

If it is not a STORE, I assume it always has a LHS. Turns out that my original 
thought is incorrect.
I think I know the fix.





juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 17:17
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
I did some benchmarks and, at least for calculix the differences are
miniscule.  I'd say we can stick with the current approach and improve
as needed.
 
However, I noticed ICEs here:
 
+  gcc_assert (biggest_size >= mode_size);
 
and here:
 
+  mode = TYPE_MODE (TREE_TYPE (lhs));
 
when compiling calculix.
 
Regards
Robin
 


Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread Robin Dapp via Gcc-patches
> Is calculix big ?

It's 7 nested for loops IIRC and, when unrolling, can get pretty nasty.
I tested with -Ofast -funroll-loops.  I think wrf is even larger, maybe I
can run a full comparison test tonight to have good coverage.

> Could you give me the testcase to reproduce it?

OK, I will try to reduce it, will be Fortran, though.

Regards
 Robin


Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]

2023-09-12 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/9/4 13:33, HAO CHEN GUI wrote:
> Hi,
>   This patch enables SImode in FP registers on P7. Instruction "fctiw"
> stores its integer output in an FP register. So SImode in FP register
> needs be enabled on P7 if we want support "fctiw" on P7.
> 
>   The test case is in the second patch which implements 32bit inline
> lrint.
> 
>   Compared to the last version, the main change it to remove disparaging
> on the alternatives of "fmr". Test shows it doesn't cause regression.

Ok, at least regression testing doesn't expose any needs to do disparaging
for this.  Could you also test this patch with SPEC2017 for P7 and P8
separately at options like -O2 or -O3, to see if there is any assembly
change, and if yes filtering out some typical to check it's expected or
not?  I think it can help us to better evaluate the impact.  Thanks!

BR,
Kewen

> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628435.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> 
> 
> ChangeLog
> rs6000: enable SImode in FP register on P7
> 
> gcc/
>   PR target/88558
>   * config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached):
>   Enable SImode in FP registers on P7.
>   * config/rs6000/rs6000.md (*movsi_internal1): Add fmr for SImode
>   move between FP registers.  Set attribute isa of stfiwx to "*"
>   and attribute of stxsiwx to "p7".
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 44b448d2ba6..99085c2cdd7 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1903,7 +1903,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, 
> machine_mode mode)
> if(GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD)
>   return 1;
> 
> -   if (TARGET_P8_VECTOR && (mode == SImode))
> +   if (TARGET_POPCNTD && mode == SImode)
>   return 1;
> 
> if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index cdab49fbb91..edf49bd74e3 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7566,7 +7566,7 @@ (define_split
> 
>  (define_insn "*movsi_internal1"
>[(set (match_operand:SI 0 "nonimmediate_operand"
> -   "=r, r,
> +   "=r, r,  d,
>  r,  d,  v,
>  m,  ?Z, ?Z,
>  r,  r,  r,  r,
> @@ -7575,7 +7575,7 @@ (define_insn "*movsi_internal1"
>  wa, r,
>  r,  *h, *h")
>   (match_operand:SI 1 "input_operand"
> -   "r,  U,
> +   "r,  U,  d,
>  m,  ?Z, ?Z,
>  r,  d,  v,
>  I,  L,  eI, n,
> @@ -7588,6 +7588,7 @@ (define_insn "*movsi_internal1"
>"@
> mr %0,%1
> la %0,%a1
> +   fmr %0,%1
> lwz%U1%X1 %0,%1
> lfiwzx %0,%y1
> lxsiwzx %x0,%y1
> @@ -7611,7 +7612,7 @@ (define_insn "*movsi_internal1"
> mt%0 %1
> nop"
>[(set_attr "type"
> -   "*,  *,
> +   "*,  *,  fpsimple,
>  load,   fpload, fpload,
>  store,  fpstore,fpstore,
>  *,  *,  *,  *,
> @@ -7620,7 +7621,7 @@ (define_insn "*movsi_internal1"
>  mtvsr,  mfvsr,
>  *,  *,  *")
> (set_attr "length"
> -   "*,  *,
> +   "*,  *,  *,
>  *,  *,  *,
>  *,  *,  *,
>  *,  *,  *,  8,
> @@ -7629,9 +7630,9 @@ (define_insn "*movsi_internal1"
>  *,  *,
>  *,  *,  *")
> (set_attr "isa"
> -   "*,  *,
> -*,  p8v,p8v,
> -*,  p8v,p8v,
> +   "*,  *,  *,
> +*,  p7, p8v,
> +*,  *,  p8v,
>  *,  *,  p10,*,
>  p8v,p9v,p9v,p8v,
>  p9v,p8v,p9v,
> 


Re: [PATCH-2v2, rs6000] Implement 32bit inline lrint [PR88558]

2023-09-12 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/9/4 13:33, HAO CHEN GUI wrote:
> Hi,
>   This patch implements 32bit inline lrint by "fctiw". It depends on
> the patch1 to do SImode move from FP registers on P7.
> 
>   Compared to last version, the main change is to add tests for "lrintf"
> and adjust the count of corresponding instructions.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628436.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: support 32bit inline lrint
> 
> gcc/
>   PR target/88558
>   * config/rs6000/rs6000.md (lrintdi2): Remove TARGET_FPRND
>   from insn condition.
>   (lrintsi2): New insn pattern for 32bit lrint.
> 
> gcc/testsuite/
>   PR target/106769
>   * gcc.target/powerpc/pr88558.h: New.
>   * gcc.target/powerpc/pr88558-p7.c: New.
>   * gcc.target/powerpc/pr88558-p8.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index edf49bd74e3..a41898e0e08 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6655,10 +6655,18 @@ (define_insn "lrintdi2"
>[(set (match_operand:DI 0 "gpc_reg_operand" "=d")
>   (unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
>  UNSPEC_FCTID))]
> -  "TARGET_HARD_FLOAT && TARGET_FPRND"
> +  "TARGET_HARD_FLOAT"
>"fctid %0,%1"
>[(set_attr "type" "fp")])
> 
> +(define_insn "lrintsi2"
> +  [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
> + (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
> +UNSPEC_FCTIW))]
> +  "TARGET_HARD_FLOAT && TARGET_POPCNTD"
> +  "fctiw %0,%1"
> +  [(set_attr "type" "fp")])
> +
>  (define_insn "btrunc2"
>[(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
>   (unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")]
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c 
> b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
> new file mode 100644
> index 000..f302491c4d0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-math-errno -mdejagnu-cpu=power7" } */
> +
> +/* -fno-math-errno is required to make {i,l,ll}rint inlined */

Nit: Comment is a bit out of date since now we have irintf.

> +
> +#include "pr88558.h"
> +
> +/* { dg-final { scan-assembler-times {\mfctid\M} 3 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {\mfctid\M} 1 { target ilp32 } } } */
> +/* { dg-final { scan-assembler-times {\mfctiw\M} 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {\mfctiw\M} 3 { target ilp32 } } } */
> +/* { dg-final { scan-assembler-times {\mstfiwx\M} 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {\mstfiwx\M} 3 { target ilp32 } } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p8.c 
> b/gcc/testsuite/gcc.target/powerpc/pr88558-p8.c
> new file mode 100644
> index 000..33398aa74c2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr88558-p8.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-O2 -fno-math-errno -mdejagnu-cpu=power8" } */
> +
> +/* -fno-math-errno is required to make {i,l,ll}rint inlined */

Ditto.

> +
> +#include "pr88558.h"
> +
> +/* { dg-final { scan-assembler-times {\mfctid\M} 3 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {\mfctid\M} 1 { target ilp32 } } } */
> +/* { dg-final { scan-assembler-times {\mfctiw\M} 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {\mfctiw\M} 3 { target ilp32 } } } */
> +/* { dg-final { scan-assembler-times {\mmfvsrwz\M} 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {\mmfvsrwz\M} 3 { target ilp32 } } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558.h 
> b/gcc/testsuite/gcc.target/powerpc/pr88558.h
> new file mode 100644
> index 000..698640c0ef7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr88558.h
> @@ -0,0 +1,19 @@
> +long int test1 (double a)
> +{
> +  return __builtin_lrint (a);
> +}
> +
> +long long test2 (double a)
> +{
> +  return __builtin_llrint (a);
> +}
> +
> +int test3 (double a)
> +{
> +  return __builtin_irint (a);
> +}
> +
> +long int test4 (float a)
> +{
> +  return __builtin_lrintf (a);
> +}

As you added the extra coverage for irint and llrint excepting for lrint,
I'd expect you can also add the coverage for llrintf and irintf, to make
them consistent.

The others look good to me.  Thanks!

BR,
Kewen


Pass 'SYSROOT_CFLAGS_FOR_TARGET' down to target libraries [PR109951] (was: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951])

2023-09-12 Thread Thomas Schwinge
Hi!

On 2023-06-02T11:52:04+0200, I wrote:
> On 2020-01-14T21:31:13+0800, Chung-Lin Tang  wrote:
>> I understand your situation with --with-build-sysroot/--without-sysroot, 
>> [...]
>>
>> Can you test if the attached patch works for you? The patch exports the 
>> build sysroot
>> setting from the toplevel to target library subdirs, and adds the --sysroot= 
>> option
>> when doing build-tree testing [...]
>
> Belatedly: thanks, I like that approach better indeed.
>
> This is, by the way, in line with what GCC compiler testing is doing;
> 'gcc/Makefile.in':
>
> # Set if the compiler was configured with --with-build-sysroot.
> SYSROOT_CFLAGS_FOR_TARGET = @SYSROOT_CFLAGS_FOR_TARGET@
>
> # TEST_ALWAYS_FLAGS are flags that should be passed to every compilation.
> # They are passed first to allow individual tests to override them.
> @echo "set TEST_ALWAYS_FLAGS \"$(SYSROOT_CFLAGS_FOR_TARGET)\"" >> 
> ./site.tmp
>
> That is, via 'site.exp', put 'SYSROOT_CFLAGS_FOR_TARGET' into
> 'TEST_ALWAYS_FLAGS', which is "passed to every compilation".
>
>> [...], if this does work, then other library testsuites (e.g. libatomic.exp) 
>> might
>> also need considering updating, I think.
>
> Correct.  (I'm offering to take care of that.)

First, regarding the top-level build system part:

>> 2020-01-14  Chung-Lin Tang  
>>
>>   * Makefile.tpl  (NORMAL_TARGET_EXPORTS): Add export of
>>   SYSROOT_CFLAGS_FOR_TARGET variable.
>>   * Makefile.in:  Regenerate.

>> --- Makefile.tpl  (revision 279954)
>> +++ Makefile.tpl  (working copy)
>> @@ -322,6 +322,7 @@ RAW_CXX_TARGET_EXPORTS = \
>>
>>  NORMAL_TARGET_EXPORTS = \
>>   $(BASE_TARGET_EXPORTS) \
>> + SYSROOT_CFLAGS_FOR_TARGET="$(SYSROOT_CFLAGS_FOR_TARGET)"; export 
>> SYSROOT_CFLAGS_FOR_TARGET; \
>>   CXX="$(CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS"; export CXX;
>
> With that one moved into the generic 'BASE_TARGET_EXPORTS', [...]

Pushed to master branch commit d1bff1ba4d470f6723be83c0e3c4d5083e51877a
"Pass 'SYSROOT_CFLAGS_FOR_TARGET' down to target libraries [PR109951]",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From d1bff1ba4d470f6723be83c0e3c4d5083e51877a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 1 Jun 2023 23:07:37 +0200
Subject: [PATCH] Pass 'SYSROOT_CFLAGS_FOR_TARGET' down to target libraries
 [PR109951]

..., where we need to use it (separate commits) for build-tree testing, similar
to 'gcc/Makefile.in:site.exp':

# TEST_ALWAYS_FLAGS are flags that should be passed to every compilation.
# They are passed first to allow individual tests to override them.
	@echo "set TEST_ALWAYS_FLAGS \"$(SYSROOT_CFLAGS_FOR_TARGET)\"" >> ./site.tmp

	PR testsuite/109951
	* Makefile.tpl (BASE_TARGET_EXPORTS): Add
	'SYSROOT_CFLAGS_FOR_TARGET'.
	* Makefile.in: Regenerate.

Co-authored-by: Chung-Lin Tang 
---
 Makefile.in  | 1 +
 Makefile.tpl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/Makefile.in b/Makefile.in
index c97130a2338..2f136839c35 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -321,6 +321,7 @@ BASE_TARGET_EXPORTS = \
 	RANLIB="$(RANLIB_FOR_TARGET)"; export RANLIB; \
 	READELF="$(READELF_FOR_TARGET)"; export READELF; \
 	STRIP="$(STRIP_FOR_TARGET)"; export STRIP; \
+	SYSROOT_CFLAGS_FOR_TARGET="$(SYSROOT_CFLAGS_FOR_TARGET)"; export SYSROOT_CFLAGS_FOR_TARGET; \
 	WINDRES="$(WINDRES_FOR_TARGET)"; export WINDRES; \
 	WINDMC="$(WINDMC_FOR_TARGET)"; export WINDMC; \
 @if gcc-bootstrap
diff --git a/Makefile.tpl b/Makefile.tpl
index 36fa20950d4..5872dd03f2c 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -324,6 +324,7 @@ BASE_TARGET_EXPORTS = \
 	RANLIB="$(RANLIB_FOR_TARGET)"; export RANLIB; \
 	READELF="$(READELF_FOR_TARGET)"; export READELF; \
 	STRIP="$(STRIP_FOR_TARGET)"; export STRIP; \
+	SYSROOT_CFLAGS_FOR_TARGET="$(SYSROOT_CFLAGS_FOR_TARGET)"; export SYSROOT_CFLAGS_FOR_TARGET; \
 	WINDRES="$(WINDRES_FOR_TARGET)"; export WINDRES; \
 	WINDMC="$(WINDMC_FOR_TARGET)"; export WINDMC; \
 @if gcc-bootstrap
-- 
2.34.1



libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951] (was: Consider '--with-build-sysroot=[...]' for target lib

2023-09-12 Thread Thomas Schwinge
Hi!

On 2023-06-03T21:32:57+0100, "Maciej W. Rozycki"  wrote:
>> Will you, Maciej, please test that this doesn't break your setting?
>
>  Umm, this was implemented for my Western Digital development environment,
> which I don't have access to anymore.  I'll see what I can do, but it may
> be neither easy nor quick.  It's been long ago and I don't have a setup
> with multilibs enabled anymore.  And neither I remember the thorough
> problem analysis I went through that has led me to the conclusions.

I see.  I've therefore myself now done a quick hack to replicate the
original requirement, to verify that given '--with-build-sysroot=',
build-tree testing must have '--sysroot=[...]' appear for every driver
invocation:

--- gcc/gcc.cc
+++ gcc/gcc.cc
@@ -8190,6 +8190,10 @@ driver::main (int argc, char **argv)
   if (!maybe_print_and_exit ())
 return 0;

+  if (!env.get ("SKIP_VERIFY_SYSROOT"))
+if (!(target_system_root && !strcmp (target_system_root, "/boot/..")))
+  internal_error ("MISSING SYSROOT");
+
   early_exit = prepare_infiles ();
   if (early_exit)
 return get_exit_code ();

With that, build GCC with '--with-build-sysroot=/boot/..' and with the
environment variable 'SKIP_VERIFY_SYSROOT' set (to ignore any build-time
issues, not relevant to this discussion here), and test without the
'SKIP_VERIFY_SYSROOT' environment variable set (meaning the checking is
active).  I observe that with current (un-altered) GCC sources,
'--sysroot=[...]' doesn't actually appear for all compiler test suites;
the following ones FAIL:

  - 'ada/acats/acats.sum'
  - 'testsuite/g++/g++.sum': 'g++.dg/plugin/plugin.exp' only
  - 'testsuite/gcc/gcc.sum': 'gcc.dg/plugin/plugin.exp' only
  - 'gm2/gm2.sum'
  - 'gnat/gnat.sum'
  - 'obj-c++/obj-c++.sum': 'obj-c++.dg/plugin/plugin.exp' only
  - 'objc/objc.sum': 'objc.dg/plugin/plugin.exp' only

Additionally, the following target library test suite also FAILs:

'libitm/testsuite/libitm.sum'

(Resolving these is not my objective right now; I suppose these were not
relevant in Maciej's original scenario.)

Otherwise, I observe that my proposed re-work does still achieve the
desired outcome re '--sysroot=[...]', and Iain has long ago confirmed
that it does resolve 
"libgomp, testsuite: non-native multilib c++ tests fail on Darwin", so
pushed to master branch commit fb5d27be272b71fb9026224535fc73f125ce3be7
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' 
build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From fb5d27be272b71fb9026224535fc73f125ce3be7 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 1 Jun 2023 23:07:37 +0200
Subject: [PATCH] libgomp: Consider '--with-build-sysroot=[...]' for target
 libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884,
 PR109951]

This is commit c8e759b4215ba4b376c9d468aeffe163b3d520f0 (Subversion r279708)
"libgomp/test: Fix compilation for build sysroot" and follow-up
commit 749bd22ddc50b5112e5ed506ffef7249bf8e6fb3
"libgomp/test: Remove a build sysroot fix regression" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC', 'CXX', 'FC'.

	PR testsuite/91884
	PR testsuite/109951
	libgomp/
	* configure.ac: Revert earlier changes, instead
	'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
	* testsuite/lib/libgomp.exp (libgomp_init): Remove
	"Fix up '-funconfigured-libstdc++-v3' in 'GXX_UNDER_TEST'" code.
	If '--with-build-sysroot=[...]' was specified, use it for
	build-tree testing.
	* testsuite/libgomp-site-extra.exp.in (GCC_UNDER_TEST)
	(GXX_UNDER_TEST, GFORTRAN_UNDER_TEST): Don't set.
	(SYSROOT_CFLAGS_FOR_TARGET): Set.
	* testsuite/libgomp.c++/c++.exp (lang_source_re)
	(lang_include_flags): Set for build-tree testing.
	* testsuite/libgomp.oacc-c++/c++.exp (lang_source_re)
	(lang_include_flags): Likewise.

Co-authored-by: Chung-Lin Tang 
---
 libgomp/Makefile.in |  2 +-
 libgomp/configure   | 17 -
 libgomp/configure.ac| 15 +++
 libgomp/testsuite/Makefile.in   |  2 +-
 libgomp/testsuite/lib/libgomp.exp   | 18 +-
 libgomp/testsuite/libgomp-site-extra.exp.in |  4 +---
 libgomp/testsuite/libgomp.c++/c++.exp   |  6 ++
 libgomp/testsuite/libgomp.oacc-c++/c++.exp  |  6 ++
 8 files changed, 27 insertions(+), 43 deletions(-)

diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 3ef05e6a3cb..431bc87b629 100644
--- a/libgomp/Makefile.

Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread juzhe.zh...@rivai.ai
This is first version of dynamic LMUL.
I didn't test it with full GCC testsuite.

My plan is to first pass all GCC testsuite (including vect.exp) with default 
LMUL = M1.
Then enable dynamic LMUL to test it.

Maybe we could tolerate this ICE issue for now. Then we can test it with full 
GCC testsuite (I belive we can reproduce with some case in GCC testsuite in the 
future).

Is that reasonable ? If yes, I will fix all your comments and send V5.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 17:31
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
> Is calculix big ?
 
It's 7 nested for loops IIRC and, when unrolling, can get pretty nasty.
I tested with -Ofast -funroll-loops.  I think wrf is even larger, maybe I
can run a full comparison test tonight to have good coverage.
 
> Could you give me the testcase to reproduce it?
 
OK, I will try to reduce it, will be Fortran, though.
 
Regards
Robin
 


Re: [PATCH] rs6000: unnecessary clear after vctzlsbb in vec_first_match_or_eos_index

2023-09-12 Thread Kewen.Lin via Gcc-patches
Hi Ajit,

on 2023/8/31 18:44, Ajit Agarwal via Gcc-patches wrote:
> 
> This patch removes zero extension from vctzlsbb as it already zero extends.
> Bootstrapped and regtested on powerpc64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> rs6000: unnecessary clear after vctzlsbb in vec_first_match_or_eos_index
> 
> For rs6000 target we dont need zero_extend after vctzlsbb as vctzlsbb
> already zero extend.
> 
> 2023-08-31  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/vsx.md: Add new pattern.

Nit: we can offer the pattern name, such as:

* config/rs6000/vsx.md (vctzlsbb_zext_): New define_insn.

> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/powerpc/altivec-19.C: New testcase.
> ---
>  gcc/config/rs6000/vsx.md  | 17 ++---
>  gcc/testsuite/g++.target/powerpc/altivec-19.C | 11 +++
>  2 files changed, 25 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/altivec-19.C
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 19abfeb565a..09d21a6d00a 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5846,11 +5846,22 @@
>[(set_attr "type" "vecsimple")])
>  
>  ;; Vector Count Trailing Zero Least-Significant Bits Byte
> -(define_insn "vctzlsbb_"
> -  [(set (match_operand:SI 0 "register_operand" "=r")
> +(define_insn "vctzlsbbzext_"

Nit: s/vctzlsbbzext_/vctzlsbb_zext_/, it's read better to separate
the mnemonic (vctzlsbb).

> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (zero_extend:DI
>   (unspec:SI
>[(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
> -  UNSPEC_VCTZLSBB))]
> +  UNSPEC_VCTZLSBB)))]
> +  "TARGET_P9_VECTOR"
> +  "vctzlsbb %0,%1"
> +  [(set_attr "type" "vecsimple")])
> +
> +;; Vector Count Trailing Zero Least-Significant Bits Byte
> +(define_insn "vctzlsbb_"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(unspec:SI
> + [(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
> + UNSPEC_VCTZLSBB))]
>"TARGET_P9_VECTOR"
>"vctzlsbb %0,%1"
>[(set_attr "type" "vecsimple")])
> diff --git a/gcc/testsuite/g++.target/powerpc/altivec-19.C 
> b/gcc/testsuite/g++.target/powerpc/altivec-19.C
> new file mode 100644
> index 000..2d630b2fc1f
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/altivec-19.C
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */

Nit: Can be simpler with /* { dg-do compile } */
as the target requirement is always satisfied in powerpc
test suite.

> +/* { dg-require-effective-target lp64 } */

This line can be removed as this case and its checking doesn't
require 64 bit env.

> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-mcpu=power9 -O2 " } */

Use -mdejagnu-cpu=power9 instead of -mcpu=power9, it can always
take effect when mixing some other -mcpu=* from the RUN FLAGS.

> +
> +#include 
> +
> +unsigned int foo (vector unsigned char a, vector unsigned char b) {
> +  return vec_first_match_or_eos_index (a, b);
> +}
> +/* { dg-final { scan-assembler-not "rldicl" } } */

Nit: Maybe with \m and \M like:

/* { dg-final { scan-assembler-not {\mrldicl\M} } } */


BR,
Kewen


Re: [PATCH] ggc, jit: forcibly clear GTY roots in jit

2023-09-12 Thread Richard Biener via Gcc-patches
On Wed, Sep 6, 2023 at 3:41 PM David Malcolm via Gcc-patches
 wrote:
>
> As part of Antoyo's work on supporting LTO in rustc_codegen_gcc, he
> noticed an ICE inside libgccjit when compiling certain rust files.
>
> Debugging libgccjit showed that outdated information from a previous
> in-memory compile was referring to ad-hoc locations in the previous
> compile's line_table.
>
> The issue turned out to be the function decls in internal_fn_fnspec_array
> from the previous compile keeping alive the symtab nodes for these
> functions, and from this finding other functions in the previous
> compile, walking their CFGs, and finding ad-hoc data pointers in an edge
> with a location_t using ad-hoc data from the previous line_table
> instance, and thus a use-after-free ICE attempting to use this ad-hoc
> data.
>
> Previously in toplev::finalize we've fixed global state "piecemeal" by
> calling out to individual source_name_cc_finalize functions.  However,
> it occurred to me that we have run-time information on where the
> GTY-marked pointers are.
>
> Hence this patch takes something of a "big hammer" approach by adding a
> new ggc_common_finalize that walks the GC roots, zeroing all of the
> pointers.  I stepped through this in the debugger and observed that, in
> particular, this correctly zeroes the internal_fn_fnspec_array at the end
> of a libgccjit compile.  Antoyo reports that this fixes the ICE for him.
> Doing so uncovered an ICE with libgccjit in dwarf2cfi.cc due to reuse of
> global variables from the previous compile, which this patch also fixes.
>
> I noticed that in ggc_mark_roots when clearing deletable roots we only
> clear the initial element in each gcc_root_tab_t.  This looks like a
> latent bug to me, which the patch fixes.  That said, there don't seem to
> be any deletable roots where the number of elements != 1.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> OK for trunk?

OK.

Thanks,
Richard.

> Thanks
> Dave
>
> gcc/ChangeLog:
> * dwarf2cfi.cc (dwarf2cfi_cc_finalize): New.
> * dwarf2out.h (dwarf2cfi_cc_finalize): New decl.
> * ggc-common.cc (ggc_mark_roots): Multiply by rti->nelt when
> clearing the deletable gcc_root_tab_t.
> (ggc_common_finalize): New.
> * ggc.h (ggc_common_finalize): New decl.
> * toplev.cc (toplev::finalize): Call dwarf2cfi_cc_finalize and
> ggc_common_finalize.
> ---
>  gcc/dwarf2cfi.cc  |  9 +
>  gcc/dwarf2out.h   |  1 +
>  gcc/ggc-common.cc | 23 ++-
>  gcc/ggc.h |  2 ++
>  gcc/toplev.cc |  3 +++
>  5 files changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
> index ddc728f4ad00..f1777c0a4cf1 100644
> --- a/gcc/dwarf2cfi.cc
> +++ b/gcc/dwarf2cfi.cc
> @@ -3822,4 +3822,13 @@ make_pass_dwarf2_frame (gcc::context *ctxt)
>return new pass_dwarf2_frame (ctxt);
>  }
>
> +void dwarf2cfi_cc_finalize ()
> +{
> +  add_cfi_insn = NULL;
> +  add_cfi_vec = NULL;
> +  cur_trace = NULL;
> +  cur_row = NULL;
> +  cur_cfa = NULL;
> +}
> +
>  #include "gt-dwarf2cfi.h"
> diff --git a/gcc/dwarf2out.h b/gcc/dwarf2out.h
> index 870b56a6a372..61a996050ff9 100644
> --- a/gcc/dwarf2out.h
> +++ b/gcc/dwarf2out.h
> @@ -419,6 +419,7 @@ struct fixed_point_type_info
>  } scale_factor;
>  };
>
> +void dwarf2cfi_cc_finalize (void);
>  void dwarf2out_cc_finalize (void);
>
>  /* Some DWARF internals are exposed for the needs of DWARF-based debug
> diff --git a/gcc/ggc-common.cc b/gcc/ggc-common.cc
> index bed7a9d4d021..95803fa95a17 100644
> --- a/gcc/ggc-common.cc
> +++ b/gcc/ggc-common.cc
> @@ -86,7 +86,7 @@ ggc_mark_roots (void)
>
>for (rt = gt_ggc_deletable_rtab; *rt; rt++)
>  for (rti = *rt; rti->base != NULL; rti++)
> -  memset (rti->base, 0, rti->stride);
> +  memset (rti->base, 0, rti->stride * rti->nelt);
>
>for (rt = gt_ggc_rtab; *rt; rt++)
>  ggc_mark_root_tab (*rt);
> @@ -1293,3 +1293,24 @@ report_heap_memory_use ()
>  SIZE_AMOUNT (MALLINFO_FN ().arena));
>  #endif
>  }
> +
> +/* Forcibly clear all GTY roots.  */
> +
> +void
> +ggc_common_finalize ()
> +{
> +  const struct ggc_root_tab *const *rt;
> +  const_ggc_root_tab_t rti;
> +
> +  for (rt = gt_ggc_deletable_rtab; *rt; rt++)
> +for (rti = *rt; rti->base != NULL; rti++)
> +  memset (rti->base, 0, rti->stride * rti->nelt);
> +
> +  for (rt = gt_ggc_rtab; *rt; rt++)
> +for (rti = *rt; rti->base != NULL; rti++)
> +  memset (rti->base, 0, rti->stride * rti->nelt);
> +
> +  for (rt = gt_pch_scalar_rtab; *rt; rt++)
> +for (rti = *rt; rti->base != NULL; rti++)
> +  memset (rti->base, 0, rti->stride * rti->nelt);
> +}
> diff --git a/gcc/ggc.h b/gcc/ggc.h
> index 34108e2f0061..3280314f8481 100644
> --- a/gcc/ggc.h
> +++ b/gcc/ggc.h
> @@ -368,4 +368,6 @@ inline void gt_ggc_mx (unsigned long int) { }
>  inline void gt_ggc_mx (long long int) { }
>  inline void gt_ggc_mx (unsigned long long int) { }
>
> +extern void g

Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread juzhe.zh...@rivai.ai
Then you don't need to waste time on reduce the case from SPEC.



juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-09-12 17:36
To: Robin Dapp; gcc-patches
CC: Robin Dapp; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
This is first version of dynamic LMUL.
I didn't test it with full GCC testsuite.

My plan is to first pass all GCC testsuite (including vect.exp) with default 
LMUL = M1.
Then enable dynamic LMUL to test it.

Maybe we could tolerate this ICE issue for now. Then we can test it with full 
GCC testsuite (I belive we can reproduce with some case in GCC testsuite in the 
future).

Is that reasonable ? If yes, I will fix all your comments and send V5.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 17:31
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
> Is calculix big ?
 
It's 7 nested for loops IIRC and, when unrolling, can get pretty nasty.
I tested with -Ofast -funroll-loops.  I think wrf is even larger, maybe I
can run a full comparison test tonight to have good coverage.
 
> Could you give me the testcase to reproduce it?
 
OK, I will try to reduce it, will be Fortran, though.
 
Regards
Robin
 


Re: [PATCH v2 2/2] riscv: Add support for str(n)cmp inline expansion

2023-09-12 Thread Philipp Tomsich
Applied to master. Thanks!
Philipp.

On Tue, 12 Sept 2023 at 05:34, Jeff Law  wrote:
>
>
>
> On 9/6/23 10:07, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > This patch implements expansions for the cmpstrsi and cmpstrnsi
> > builtins for RV32/RV64 for xlen-aligned strings if Zbb or XTheadBb
> > instructions are available.  The expansion basically emits a comparison
> > sequence which compares XLEN bits per step if possible.
> >
> > This allows to inline calls to strcmp() and strncmp() if both strings
> > are xlen-aligned.  For strncmp() the length parameter needs to be known.
> > The benefits over calls to libc are:
> > * no call/ret instructions
> > * no stack frame allocation
> > * no register saving/restoring
> > * no alignment tests
> >
> > The inlining mechanism is gated by a new switches ('-minline-strcmp' and
> > '-minline-strncmp') and by the variable 'optimize_size'.
> > The amount of emitted unrolled loop iterations can be controlled by the
> > parameter '--param=riscv-strcmp-inline-limit=N', which defaults to 64.
> >
> > The comparision sequence is inspired by the strcmp example
> > in the appendix of the Bitmanip specification (incl. the fast
> > result calculation in case the first word does not contain
> > a NULL byte).  Additional inspiration comes from rs6000-string.c.
> >
> > The emitted sequence is not triggering any readahead pagefault issues,
> > because only aligned strings are accessed by aligned xlen-loads.
> >
> > This patch has been tested using the glibc string tests on QEMU:
> > * rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=64
> > * rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=8
> > * rv32gc_zbb/rv32gc_xtheadbb with riscv-strcmp-inline-limit=64
> >
> > Signed-off-by: Christoph Müllner 
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/bitmanip.md (*_not): Export INSN name.
> >   (_not3): Likewise.
> >   * config/riscv/riscv-protos.h (riscv_expand_strcmp): New
> >   prototype.
> >   * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
> >   macros.
> >   (GEN_EMIT_HELPER2): Likewise.
> >   (emit_strcmp_scalar_compare_byte): New function.
> >   (emit_strcmp_scalar_compare_subword): Likewise.
> >   (emit_strcmp_scalar_compare_word): Likewise.
> >   (emit_strcmp_scalar_load_and_compare): Likewise.
> >   (emit_strcmp_scalar_call_to_libc): Likewise.
> >   (emit_strcmp_scalar_result_calculation_nonul): Likewise.
> >   (emit_strcmp_scalar_result_calculation): Likewise.
> >   (riscv_expand_strcmp_scalar): Likewise.
> >   (riscv_expand_strcmp): Likewise.
> >   * config/riscv/riscv.md (*slt_): Export
> >   INSN name.
> >   (@slt_3): Likewise.
> >   (cmpstrnsi): Invoke expansion function for str(n)cmp.
> >   (cmpstrsi): Likewise.
> >   * config/riscv/riscv.opt: Add new parameter
> >   '-mstring-compare-inline-limit'.
> >   * doc/invoke.texi: Document new parameter
> >   '-mstring-compare-inline-limit'.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/xtheadbb-strcmp-unaligned.c: New test.
> >   * gcc.target/riscv/xtheadbb-strcmp.c: New test.
> >   * gcc.target/riscv/zbb-strcmp-disabled-2.c: New test.
> >   * gcc.target/riscv/zbb-strcmp-disabled.c: New test.
> >   * gcc.target/riscv/zbb-strcmp-unaligned.c: New test.
> >   * gcc.target/riscv/zbb-strcmp.c: New test.
> OK for the trunk.  THanks for pushing this along.
>
> jeff


Re: [PATCH v2 1/2] riscv: Add support for strlen inline expansion

2023-09-12 Thread Philipp Tomsich
Applied to master. Thanks!
Philipp.


On Wed, 6 Sept 2023 at 18:07, Christoph Muellner
 wrote:
>
> From: Christoph Müllner 
>
> This patch implements the expansion of the strlen builtin for RV32/RV64
> for xlen-aligned aligned strings if Zbb or XTheadBb instructions are 
> available.
> The inserted sequences are:
>
> rv32gc_zbb (RV64 is similar):
>   add a3,a0,4
>   li  a4,-1
> .L1:  lw  a5,0(a0)
>   add a0,a0,4
>   orc.b   a5,a5
>   beq a5,a4,.L1
>   not a5,a5
>   ctz a5,a5
>   srl a5,a5,0x3
>   add a0,a0,a5
>   sub a0,a0,a3
>
> rv64gc_xtheadbb (RV32 is similar):
>   add   a4,a0,8
> .L2:  lda5,0(a0)
>   add   a0,a0,8
>   th.tstnbz a5,a5
>   beqz  a5,.L2
>   th.reva5,a5
>   th.ff1a5,a5
>   srl   a5,a5,0x3
>   add   a0,a0,a5
>   sub   a0,a0,a4
>
> This allows to inline calls to strlen(), with optimized code for
> xlen-aligned strings, resulting in the following benefits over
> a call to libc:
> * no call/ret instructions
> * no stack frame allocation
> * no register saving/restoring
> * no alignment test
>
> The inlining mechanism is gated by a new switch ('-minline-strlen')
> and by the variable 'optimize_size'.
>
> Tested using the glibc string tests.
>
> Signed-off-by: Christoph Müllner 
>
> gcc/ChangeLog:
>
> * config.gcc: Add new object riscv-string.o.
> riscv-string.cc.
> * config/riscv/riscv-protos.h (riscv_expand_strlen):
> New function.
> * config/riscv/riscv.md (strlen): New expand INSN.
> * config/riscv/riscv.opt: New flag 'minline-strlen'.
> * config/riscv/t-riscv: Add new object riscv-string.o.
> * config/riscv/thead.md (th_rev2): Export INSN name.
> (th_rev2): Likewise.
> (th_tstnbz2): New INSN.
> * doc/invoke.texi: Document '-minline-strlen'.
> * emit-rtl.cc (emit_likely_jump_insn): New helper function.
> (emit_unlikely_jump_insn): Likewise.
> * rtl.h (emit_likely_jump_insn): New prototype.
> (emit_unlikely_jump_insn): Likewise.
> * config/riscv/riscv-string.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/xtheadbb-strlen-unaligned.c: New test.
> * gcc.target/riscv/xtheadbb-strlen.c: New test.
> * gcc.target/riscv/zbb-strlen-disabled-2.c: New test.
> * gcc.target/riscv/zbb-strlen-disabled.c: New test.
> * gcc.target/riscv/zbb-strlen-unaligned.c: New test.
> * gcc.target/riscv/zbb-strlen.c: New test.
> ---
>  gcc/config.gcc|   3 +-
>  gcc/config/riscv/riscv-protos.h   |   3 +
>  gcc/config/riscv/riscv-string.cc  | 183 ++
>  gcc/config/riscv/riscv.md |  28 +++
>  gcc/config/riscv/riscv.opt|   4 +
>  gcc/config/riscv/t-riscv  |   6 +
>  gcc/config/riscv/thead.md |   9 +-
>  gcc/doc/invoke.texi   |  11 +-
>  gcc/emit-rtl.cc   |  24 +++
>  gcc/rtl.h |   2 +
>  .../riscv/xtheadbb-strlen-unaligned.c |  14 ++
>  .../gcc.target/riscv/xtheadbb-strlen.c|  19 ++
>  .../gcc.target/riscv/zbb-strlen-disabled-2.c  |  15 ++
>  .../gcc.target/riscv/zbb-strlen-disabled.c|  15 ++
>  .../gcc.target/riscv/zbb-strlen-unaligned.c   |  14 ++
>  gcc/testsuite/gcc.target/riscv/zbb-strlen.c   |  19 ++
>  16 files changed, 366 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/config/riscv/riscv-string.cc
>  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen.c
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index b2fe7c7ceef..aff6b6a5601 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -530,7 +530,8 @@ pru-*-*)
> ;;
>  riscv*)
> cpu_type=riscv
> -   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
> riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o 
> riscv-vector-costs.o"
> +   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
> riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
> +   extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o 
> riscv-vector-costs.o"
> extra_objs="${extra_objs} riscv-vector-builtins.o 
> riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
> extra_objs="${extra_objs} thead.o"
> d_target_objs="riscv-d.o"
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 6dbf6b9f943..b060d047

Re: [PATCH] ssa_name_has_boolean_range vs signed-boolean:31 types

2023-09-12 Thread Eric Botcazou via Gcc-patches
> Does Ada have signed booleans that are BOOLEAN_TYPE but do _not_
> have [-1, 0] as range?  I think documenting [0, 1] for (single-bit
> precision?) unsigned BOOLEAN_TYPE and [-1, 1] for signed BOOLEAN_TYPE would
> be conservative.

All BOOLEAN_TYPEs are unsigned in Ada but may have precision > 1, typically 8.

-- 
Eric Botcazou




Re: [PATCH] Checking undefined_p before using the vr

2023-09-12 Thread Richard Biener via Gcc-patches
On Thu, 7 Sep 2023, Jiufu Guo wrote:

> Hi,
> 
> As discussed in PR111303:
> 
> For pattern "(X + C) / N": "div (plus@3 @0 INTEGER_CST@1) INTEGER_CST@2)",
> Even if "X" has value-range and "X + C" does not overflow, "@3" may still
> be undefined. Like below example:
> 
> _3 = _2 + -5;
> if (0 != 0)
>   goto ; [34.00%]
> else
>   goto ; [66.00%]
> ;;  succ:   3
> ;;  4
> 
> ;; basic block 3, loop depth 0
> ;;  pred:   2
> _5 = _3 / 5; 
> ;;  succ:   4
> 
> The whole pattern "(_2 + -5 ) / 5" is in "bb 3", but "bb 3" would be
> unreachable (because "if (0 != 0)" is always false).
> And "get_range_query (cfun)->range_of_expr (vr3, @3)" is checked in
> "bb 3", "range_of_expr" gets an "undefined vr3". Where "@3" is "_5".
> 
> So, before using "vr3", it would be safe to check "!vr3.undefined_p ()".
> 
> Bootstrap & regtest pass on ppc64{,le} and x86_64.
> Is this ok for trunk?

OK, but I wonder why ->range_of_expr () doesn't return false for
undefined_p ()?  While "undefined" technically means we can treat
it as nonnegative_p (or not, maybe but maybe not both), we seem to
not want to do that.  So why expose it at all to ranger users
(yes, internally we in some places want to handle undefined).

Richard.

> BR,
> Jeff (Jiufu Guo)
> 
>   PR middle-end/111303
> 
> gcc/ChangeLog:
> 
>   * match.pd ((X - N * M) / N): Add undefined_p checking.
>   (X + N * M) / N): Likewise.
>   ((X + C) div_rshift N): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr111303.c: New test.
> 
> ---
>  gcc/match.pd|  3 +++
>  gcc/testsuite/gcc.dg/pr111303.c | 11 +++
>  2 files changed, 14 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr111303.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 801edb128f9..e2583ca7960 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -975,6 +975,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> /* "X+(N*M)" doesn't overflow.  */
> && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr3)
> && get_range_query (cfun)->range_of_expr (vr4, @4)
> +   && !vr4.undefined_p ()
> /* "X+N*M" is not with opposite sign as "X".  */
> && (TYPE_UNSIGNED (type)
>  || (vr0.nonnegative_p () && vr4.nonnegative_p ())
> @@ -995,6 +996,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> /* "X - (N*M)" doesn't overflow.  */
> && range_op_handler (MINUS_EXPR).overflow_free_p (vr0, vr3)
> && get_range_query (cfun)->range_of_expr (vr4, @4)
> +   && !vr4.undefined_p ()
> /* "X-N*M" is not with opposite sign as "X".  */
> && (TYPE_UNSIGNED (type)
>  || (vr0.nonnegative_p () && vr4.nonnegative_p ())
> @@ -1025,6 +1027,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> /* "X+C" doesn't overflow.  */
> && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr1)
> && get_range_query (cfun)->range_of_expr (vr3, @3)
> +   && !vr3.undefined_p ()
> /* "X+C" and "X" are not of opposite sign.  */
> && (TYPE_UNSIGNED (type)
> || (vr0.nonnegative_p () && vr3.nonnegative_p ())
> diff --git a/gcc/testsuite/gcc.dg/pr111303.c b/gcc/testsuite/gcc.dg/pr111303.c
> new file mode 100644
> index 000..eaabe55c105
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr111303.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* Make sure no ICE. */
> +unsigned char a;
> +int b(int c) {
> +  if (c >= 5000)
> +return c / 5;
> +}
> +void d() { b(a - 5); }
> +int main() {}
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] Tweak language choice in config-list.mk

2023-09-12 Thread Richard Biener via Gcc-patches
On Thu, Sep 7, 2023 at 11:30 AM Richard Sandiford via Gcc-patches
 wrote:
>
> When I tried to use config-list.mk, the build for every triple except
> the build machine's failed for m2.  This is because, unlike other
> languages, m2 builds target objects during all-gcc.  The build will
> therefore fail unless you have access to an appropriate binutils
> (or an equivalent).  That's quite a big ask for over 100 targets. :)
>
> This patch therefore makes m2 an optional inclusion.
>
> Doing that wasn't entirely straightforward though.  The current
> configure line includes "--enable-languages=all,...", which means
> that the "..." can only force languages to be added that otherwise
> wouldn't have been.  (I.e. the only effect of the "..." is to
> override configure autodetection.)
>
> The choice of all,ada and:
>
>   # Make sure you have a recent enough gcc (with ada support) in your path so
>   # that --enable-werror-always will work.
>
> make it clear that lack of GNAT should be a build failure rather than
> silently ignored.  This predates the D frontend, which requires GDC
> in the same way that Ada requires GNAT.  I don't know of a reason
> why D should be treated differently.
>
> The patch therefore expands the "all" into a specific list of
> languages.
>
> That in turn meant that Fortran had to be handled specially,
> since bpf and mmix don't support Fortran.
>
> Perhaps there's an argument that m2 shouldn't build target objects
> during all-gcc,

Yes, I think that's unfortunate - can you open a bugreport for this?

> but (a) it works for practical usage and (b) the
> patch is an easy workaround.  I'd be happy for the patch to be
> reverted if the build system changes.
>
> OK to install?

OK.

> Richard
>
>
> gcc/
> * contrib/config-list.mk (OPT_IN_LANGUAGES): New variable.
> ($(LIST)): Replace --enable-languages=all with a specifc list.
> Disable fortran on bpf and mmix.  Enable the languages in
> OPT_IN_LANGUAGES.
> ---
>  contrib/config-list.mk | 17 ++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/contrib/config-list.mk b/contrib/config-list.mk
> index e570b13c71b..50ecb014bc0 100644
> --- a/contrib/config-list.mk
> +++ b/contrib/config-list.mk
> @@ -12,6 +12,11 @@ TEST=all-gcc
>  # supply an absolute path.
>  GCC_SRC_DIR=../../gcc
>
> +# Define this to ,m2 if you want to build Modula-2.  Modula-2 builds target
> +# objects during all-gcc, so it can only be included if you've installed
> +# binutils (or an equivalent) for each target.
> +OPT_IN_LANGUAGES=
> +
>  # Use -j / -l make arguments and nice to assure a smooth resource-efficient
>  # load on the build machine, e.g. for 24 cores:
>  # svn co svn://gcc.gnu.org/svn/gcc/branches/foo-branch gcc
> @@ -126,17 +131,23 @@ $(LIST): make-log-dir
> TGT=`echo $@ | awk 'BEGIN { FS = "OPT" }; { print $$1 }'` &&  
>   \
> TGT=`$(GCC_SRC_DIR)/config.sub $$TGT` &&  
>   \
> case $$TGT in 
>   \
> -   *-*-darwin* | *-*-cygwin* | *-*-mingw* | *-*-aix* | 
> bpf-*-*)\
> +   bpf-*-*)  
>   \
> ADDITIONAL_LANGUAGES="";  
>   \
> ;;
>   \
> -   *)
>   \
> +   *-*-darwin* | *-*-cygwin* | *-*-mingw* | *-*-aix* | 
> bpf-*-*)\
> +   ADDITIONAL_LANGUAGES=",fortran";  
>   \
> +   ;;
>   \
> +   mmix-*-*) 
>   \
> ADDITIONAL_LANGUAGES=",go";   
>   \
> ;;
>   \
> +   *)
>   \
> +   ADDITIONAL_LANGUAGES=",fortran,go";   
>   \
> +   ;;
>   \
> esac &&   
>   \
> $(GCC_SRC_DIR)/configure  
>   \
> --target=$(subst SCRIPTS,`pwd`/../scripts/,$(subst 
> OPT,$(empty) -,$@))  \
> --enable-werror-always ${host_options}
>   \
> -

Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems

2023-09-12 Thread Richard Biener via Gcc-patches
On Thu, Sep 7, 2023 at 2:32 PM Segher Boessenkool
 wrote:
>
> On Thu, Sep 07, 2023 at 02:23:00PM +0300, Dan Carpenter wrote:
> > On Thu, Sep 07, 2023 at 06:04:09AM -0500, Segher Boessenkool wrote:
> > > On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches 
> > > wrote:
> > > > I started to hunt
> > > > down all the Makefile which add a -Werror but there are a lot and
> > > > eventually I got bored and gave up.
> > >
> > > I have a patch stack for that, since 2014 or so.  I build Linux with
> > > unreleased GCC versions all the time, so pretty much any new warning is
> > > fatal if you unwisely use -Werror.
> > >
> > > > Someone should patch GCC so there it checks an environment variable to
> > > > ignore -Werror.  Somethine like this?
> > >
> > > No.  You should patch your program, instead.
> >
> > There are 2930 Makefiles in the kernel source.
>
> Yes.  And you need patches to about thirty.  Or a bit more, if you want
> to do it more cleanly.  This isn't a guess.
>
> > > One easy way is to add a
> > > -Wno-error at the end of your command lines.  Or even just -w if you
> > > want or need a bigger hammer.
> >
> > I tried that.  Some of the Makefiles check an environemnt variable as
> > well if you want to turn off -Werror.  It's not a complete solution at
> > all.  I have no idea what a complete solution looks like because I gave
> > up.
>
> A solution can not involve changing the compiler.  That is just saying
> the kernel doesn't know how to fix its own problems, so let's give the
> compiler some more unnecessary problems.

You can change the compiler by replacing it with a script that appends
-Wno-error
for example.

> > > Or nicer, put it all in Kconfig, like powerpc already has for example.
> > > There is a CONFIG_WERROR as well, so maybe use that in all places?
> >
> > That's a good idea but I'm trying to compile old kernels and not the
> > current kernel.
>
> You can patch older kernels, too, you know :-)
>
> If you need to not make any changes to your source code for some crazy
> reason (political perhaps?), just use a shell script or shell function
> instead of invoking the compiler driver directly?
>
>
> Segher
>
> Segher


Re: [PATCH] math-opts: Add dbgcounter for FMA formation

2023-09-12 Thread Richard Biener via Gcc-patches
On Thu, Sep 7, 2023 at 6:47 PM Martin Jambor  wrote:
>
> Hello,
>
> This patch is a simple addition of a debug counter to FMA formation in
> tree-ssa-math-opts.cc.  Given that issues with FMAs do occasionally
> pop up, it seems genuinely useful.
>
> I simply added an if right after the initial checks in
> convert_mult_to_fma even though when FMA formation deferring is
> active (i.e. when targeting Zen CPUs) this would interact with it (and
> at this moment lead to producing all deferred candidates), so when
> using the dbg counter to find a harmful set of FMAs, it is probably
> best to also set param_avoid_fma_max_bits to zero.  I could not find a
> better place which would not also make the code unnecessarily more
> complicated.
>
> Bootstrapped and tested on x86_64-linux.  OK for master?

OK.

> Thanks,
>
> Martin
>
>
>
> gcc/ChangeLog:
>
> 2023-09-06  Martin Jambor  
>
> * dbgcnt.def (form_fma): New.
> * tree-ssa-math-opts.cc: Include dbgcnt.h.
> (convert_mult_to_fma): Bail out if the debug counter say so.
> ---
>  gcc/dbgcnt.def| 1 +
>  gcc/tree-ssa-math-opts.cc | 4 
>  2 files changed, 5 insertions(+)
>
> diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
> index 9e2f1d857b4..871cbf75d93 100644
> --- a/gcc/dbgcnt.def
> +++ b/gcc/dbgcnt.def
> @@ -162,6 +162,7 @@ DEBUG_COUNTER (dom_unreachable_edges)
>  DEBUG_COUNTER (dse)
>  DEBUG_COUNTER (dse1)
>  DEBUG_COUNTER (dse2)
> +DEBUG_COUNTER (form_fma)
>  DEBUG_COUNTER (gcse2_delete)
>  DEBUG_COUNTER (gimple_unroll)
>  DEBUG_COUNTER (global_alloc_at_func)
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index 95c22694368..3db69ad5733 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -116,6 +116,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "targhooks.h"
>  #include "domwalk.h"
>  #include "tree-ssa-math-opts.h"
> +#include "dbgcnt.h"
>
>  /* This structure represents one basic block that either computes a
> division, or is a common dominator for basic block that compute a
> @@ -3366,6 +3367,9 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree 
> op2,
>&& !has_single_use (mul_result))
>  return false;
>
> +  if (!dbg_cnt (form_fma))
> +return false;
> +
>/* Make sure that the multiplication statement becomes dead after
>   the transformation, thus that all uses are transformed to FMAs.
>   This means we assume that an FMA operation has the same cost
> --
> 2.41.0
>


[PATCH v2 0/11] Native complex operations

2023-09-12 Thread Sylvain Noiry via Gcc-patches
I have updated the series of patches. Most changes consist of bug fixes.

However 2 new patches add features:

PATCH 9/11: Remove useless special cases

This patch remove two special cases for complex which are now fairly 
enough handled by the general case. Don't hesitate to tell me if you 
think I'm wrong.

PATCH 10/11: Add a fast complex multiplication pattern

In some cases where the target machine does not have a dedicated 
instruction 
for a floating point operation, we may let gcc expand it into a series of 
basics operations, and IEEE checks are automatically added. However it may 
be interesting for a backend developer to write its own fast path of an 
emulated operation, without the need to check IEEE manually. This is what a 
fast pattern stands for. For example, it's possible to write a fast 
emulated 
complex multiplication pattern, but let gcc check if the result is correct, 
or call the helper elsewhere. 

The experimental x86 support is now patch number 11.

Thanks,

Sylvain







[PATCH v2 01/11] Native complex ops : Conditional lowering

2023-09-12 Thread Sylvain Noiry via Gcc-patches
Summary:
Allow the cplxlower pass to identify if an operation does not need
to be lowered through optabs. In this case, lowering is not performed.
The cplxlower pass now has to handle a mix of lowered and non-lowered
operations. A quick access to both parts of a complex constant is
also implemented.

gcc/lto/ChangeLog:

* lto-common.cc (compare_tree_sccs_1): Handle both parts of a
  complex constant

gcc/ChangeLog:

* coretypes.h: Add enum for complex parts
* gensupport.cc (match_pattern): Add complex types
* lto-streamer-out.cc (DFS::DFS_write_tree_body):
(hash_tree): Handle both parts of a complex constant
* tree-complex.cc (get_component_var): Support handling of
both parts of a complex
(get_component_ssa_name): Likewise
(set_component_ssa_name): Likewise
(extract_component): Likewise
(update_complex_components): Likewise
(update_complex_components_on_edge): Likewise
(update_complex_assignment): Likewise
(update_phi_components): Likewise
(expand_complex_move): Likewise
(expand_complex_asm): Update with complex_part_t
(complex_component_cst_p): New: check if a complex
component is a constant
(target_native_complex_operation): New: Check if complex
operation is supported natively by the backend, through
the optab
(expand_complex_operations_1): Condionally lowered ops
(tree_lower_complex): Support handling of both parts of
 a complex
* tree-core.h (struct GTY): Add field for both parts of
the tree_complex struct
* tree-streamer-in.cc (lto_input_ts_complex_tree_pointers):
Handle both parts of a complex constant
* tree-streamer-out.cc (write_ts_complex_tree_pointers):
Likewise
* tree.cc (build_complex): likewise
* tree.h (class auto_suppress_location_wrappers):
(type_has_mode_precision_p): Add special case for complex
* tree-dfa.cc (get_ref_base_and_extent): Handle REALPART_EXPR
and IMAGPART_EXPR
---
 gcc/coretypes.h  |  11 ++
 gcc/gensupport.cc|   2 +
 gcc/lto-streamer-out.cc  |   2 +
 gcc/lto/lto-common.cc|   2 +
 gcc/tree-complex.cc  | 401 ++-
 gcc/tree-core.h  |   1 +
 gcc/tree-dfa.cc  |   3 +
 gcc/tree-streamer-in.cc  |   1 +
 gcc/tree-streamer-out.cc |   1 +
 gcc/tree.cc  |   8 +
 gcc/tree.h   |  15 +-
 11 files changed, 358 insertions(+), 89 deletions(-)

diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index f86dc169a40..76f49f25cad 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -448,6 +448,17 @@ enum optimize_size_level
   OPTIMIZE_SIZE_MAX
 };
 
+/* Part of a complex.  */
+
+enum complex_part_e
+{
+  REAL_P = 0,
+  IMAG_P = 1,
+  BOTH_P = 2
+};
+
+typedef enum complex_part_e complex_part_t;
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
is a pointer to a pointer, the second either NULL if the pointer to
pointer points into a GC object or the actual pointer address if
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index f7164b3214d..54f7b3cfe81 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -3746,9 +3746,11 @@ match_pattern (optab_pattern *p, const char *name, const 
char *pat)
break;
if (*p == 0
&& (! force_int || mode_class[i] == MODE_INT
+   || mode_class[i] == MODE_COMPLEX_INT
|| mode_class[i] == MODE_VECTOR_INT)
&& (! force_partial_int
|| mode_class[i] == MODE_INT
+   || mode_class[i] == MODE_COMPLEX_INT
|| mode_class[i] == MODE_PARTIAL_INT
|| mode_class[i] == MODE_VECTOR_INT)
&& (! force_float
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index 5ffa8954022..38c48e44867 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -985,6 +985,7 @@ DFS::DFS_write_tree_body (struct output_block *ob,
 {
   DFS_follow_tree_edge (TREE_REALPART (expr));
   DFS_follow_tree_edge (TREE_IMAGPART (expr));
+  DFS_follow_tree_edge (TREE_COMPLEX_BOTH_PARTS (expr));
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_DECL_MINIMAL))
@@ -1417,6 +1418,7 @@ hash_tree (struct streamer_tree_cache_d *cache, 
hash_map *map,
 {
   visit (TREE_REALPART (t));
   visit (TREE_IMAGPART (t));
+  visit (TREE_COMPLEX_BOTH_PARTS (t));
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_DECL_MINIMAL))
diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc
index 703e665b698..f647ee62f9e 100644
--- a/gcc/lto/lto-common.cc
+++ b/gcc/lto/lto-common.cc
@@ -1408,6 +1408,8 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
 {
   compare_tree_edges (TREE_REALPART (t1), TREE_REALPART (t2));
   compare_tree_edg

[PATCH v2 03/11] Native complex ops: Add gen_rtx_complex hook

2023-09-12 Thread Sylvain Noiry via Gcc-patches
Summary:
Add a new target hook for complex element creation during
the expand pass, called gen_rtx_complex. The default implementation
calls gen_rtx_CONCAT like before. Then calls to gen_rtx_CONCAT for
complex handling are replaced by calls to targetm.gen_rtx_complex.

gcc/ChangeLog:

* target.def: Add gen_rtx_complex target hook
* targhooks.cc (default_gen_rtx_complex): New: Default
implementation for gen_rtx_complex
* targhooks.h: Add default_gen_rtx_complex
* doc/tm.texi: Document TARGET_GEN_RTX_COMPLEX
* doc/tm.texi.in: Add TARGET_GEN_RTX_COMPLEX
* emit-rtl.cc (gen_reg_rtx): Replace call to
gen_rtx_CONCAT by call to gen_rtx_complex
(init_emit_once): Likewise
* expmed.cc (flip_storage_order): Likewise
* optabs.cc (expand_doubleword_mod): Likewise
---
 gcc/doc/tm.texi|  6 ++
 gcc/doc/tm.texi.in |  2 ++
 gcc/emit-rtl.cc| 26 +-
 gcc/expmed.cc  |  2 +-
 gcc/optabs.cc  | 11 ++-
 gcc/target.def | 10 ++
 gcc/targhooks.cc   | 27 +++
 gcc/targhooks.h|  2 ++
 8 files changed, 63 insertions(+), 23 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c4f935b5746..470497a3ade 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4620,6 +4620,12 @@ to return a nonzero value when it is required, the 
compiler will run out
 of spill registers and print a fatal error message.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_GEN_RTX_COMPLEX (machine_mode @var{mode}, 
rtx @var{real_part}, rtx @var{imag_part})
+This hook should return an rtx representing a complex of mode 
@var{machine_mode} built from @var{real_part} and @var{imag_part}.
+  If both arguments are @code{NULL}, create them as registers.
+ The default is @code{gen_rtx_CONCAT}.
+@end deftypefn
+
 @deftypefn {Target Hook} rtx TARGET_READ_COMPLEX_PART (rtx @var{cplx}, 
complex_part_t @var{part})
 This hook should return the rtx representing the specified @var{part} of the 
complex given by @var{cplx}.
   @var{part} can be the real part, the imaginary part, or both of them.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b8970761c8d..27a0b321fe0 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3392,6 +3392,8 @@ stack.
 
 @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 
+@hook TARGET_GEN_RTX_COMPLEX
+
 @hook TARGET_READ_COMPLEX_PART
 
 @hook TARGET_WRITE_COMPLEX_PART
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index f6276a2d0b6..22012bfea13 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -1190,19 +1190,7 @@ gen_reg_rtx (machine_mode mode)
   if (generating_concat_p
   && (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
  || GET_MODE_CLASS (mode) == MODE_COMPLEX_INT))
-{
-  /* For complex modes, don't make a single pseudo.
-Instead, make a CONCAT of two pseudos.
-This allows noncontiguous allocation of the real and imaginary parts,
-which makes much better code.  Besides, allocating DCmode
-pseudos overstrains reload on some machines like the 386.  */
-  rtx realpart, imagpart;
-  machine_mode partmode = GET_MODE_INNER (mode);
-
-  realpart = gen_reg_rtx (partmode);
-  imagpart = gen_reg_rtx (partmode);
-  return gen_rtx_CONCAT (mode, realpart, imagpart);
-}
+return targetm.gen_rtx_complex (mode, NULL, NULL);
 
   /* Do not call gen_reg_rtx with uninitialized crtl.  */
   gcc_assert (crtl->emit.regno_pointer_align_length);
@@ -6274,14 +6262,18 @@ init_emit_once (void)
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_COMPLEX_INT)
 {
-  rtx inner = const_tiny_rtx[0][(int)GET_MODE_INNER (mode)];
-  const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner, inner);
+  machine_mode imode = GET_MODE_INNER (mode);
+  rtx inner = const_tiny_rtx[0][(int) imode];
+  const_tiny_rtx[0][(int) mode] =
+   targetm.gen_rtx_complex (mode, inner, inner);
 }
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_COMPLEX_FLOAT)
 {
-  rtx inner = const_tiny_rtx[0][(int)GET_MODE_INNER (mode)];
-  const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner, inner);
+  machine_mode imode = GET_MODE_INNER (mode);
+  rtx inner = const_tiny_rtx[0][(int) imode];
+  const_tiny_rtx[0][(int) mode] =
+   targetm.gen_rtx_complex (mode, inner, inner);
 }
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_BOOL)
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 973c16a14d3..ce935951781 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -400,7 +400,7 @@ flip_storage_order (machine_mode mode, rtx x)
   real = flip_storage_order (GET_MODE_INNER (mode), real);
   imag = flip_storage_order (GET_MODE_INNER (mode), imag);
 
-  return gen_rtx_CONCAT (mode, real, imag);
+  return targetm.gen_rtx_complex (mode, real, imag);
 }
 
   if (UNLIKELY (reverse_storage_order_supported < 0))
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 32ff379ffc3..429a20f9

[PATCH v2 02/11] Native complex ops: Move functions to hooks

2023-09-12 Thread Sylvain Noiry via Gcc-patches
Summary:
Move read_complex_part and write_complex_part to target hooks. Their
signature also change because of the type of argument part is now
complex_part_t. Calls to theses functions are updated accordingly.

gcc/ChangeLog:

* target.def: Define hooks for read_complex_part and
write_complex_part
* targhooks.cc (default_read_complex_part): New: default
implementation of read_complex_part
(default_write_complex_part): New: default implementation
if write_complex_part
* targhooks.h: Add default_read_complex_part and
default_write_complex_part
* doc/tm.texi: Document the new TARGET_READ_COMPLEX_PART
and TARGET_WRITE_COMPLEX_PART hooks
* doc/tm.texi.in: Add TARGET_READ_COMPLEX_PART and
TARGET_WRITE_COMPLEX_PART
* expr.cc
(write_complex_part): Call TARGET_READ_COMPLEX_PART hook
(read_complex_part): Call TARGET_WRITE_COMPLEX_PART hook
* expr.h: Update function signatures of read_complex_part
and write_complex_part
* builtins.cc (expand_ifn_atomic_compare_exchange_into_call):
Update calls to read_complex_part and write_complex_part
(expand_ifn_atomic_compare_exchange): Likewise
* expmed.cc (flip_storage_order): Likewise
(clear_storage_hints): Likewise
and write_complex_part
(emit_move_complex_push): Likewise
(emit_move_complex_parts): Likewise
(expand_assignment): Likewise
(expand_expr_real_2): Likewise
(expand_expr_real_1): Likewise
(const_vector_from_tree): Likewise
* internal-fn.cc (expand_arith_set_overflow): Likewise
(expand_arith_overflow_result_store): Likewise
(expand_addsub_overflow): Likewise
(expand_neg_overflow): Likewise
(expand_mul_overflow): Likewise
(expand_arith_overflow): Likewise
(expand_UADDC): Likewise
---
 gcc/builtins.cc|   8 +--
 gcc/doc/tm.texi|  10 +++
 gcc/doc/tm.texi.in |   4 ++
 gcc/expmed.cc  |   4 +-
 gcc/expr.cc| 165 +
 gcc/expr.h |   5 +-
 gcc/internal-fn.cc |  16 ++---
 gcc/target.def |  18 +
 gcc/targhooks.cc   | 139 ++
 gcc/targhooks.h|   4 ++
 10 files changed, 221 insertions(+), 152 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 3b453b3ec8c..b5cb652c413 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -6349,8 +6349,8 @@ expand_ifn_atomic_compare_exchange_into_call (gcall 
*call, machine_mode mode)
   if (GET_MODE (boolret) != mode)
boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
   x = force_reg (mode, x);
-  write_complex_part (target, boolret, true, true);
-  write_complex_part (target, x, false, false);
+  write_complex_part (target, boolret, IMAG_P, true);
+  write_complex_part (target, x, REAL_P, false);
 }
 }
 
@@ -6405,8 +6405,8 @@ expand_ifn_atomic_compare_exchange (gcall *call)
   rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
   if (GET_MODE (boolret) != mode)
boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
-  write_complex_part (target, boolret, true, true);
-  write_complex_part (target, oldval, false, false);
+  write_complex_part (target, boolret, IMAG_P, true);
+  write_complex_part (target, oldval, REAL_P, false);
 }
 }
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index ff69207fb9f..c4f935b5746 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4620,6 +4620,16 @@ to return a nonzero value when it is required, the 
compiler will run out
 of spill registers and print a fatal error message.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_READ_COMPLEX_PART (rtx @var{cplx}, 
complex_part_t @var{part})
+This hook should return the rtx representing the specified @var{part} of the 
complex given by @var{cplx}.
+  @var{part} can be the real part, the imaginary part, or both of them.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_WRITE_COMPLEX_PART (rtx @var{cplx}, rtx 
@var{val}, complex_part_t @var{part})
+This hook should move the rtx value given by @var{val} to the specified 
@var{var} of the complex given by @var{cplx}.
+  @var{var} can be the real part, the imaginary part, or both of them.
+@end deftypefn
+
 @node Scalar Return
 @subsection How Scalar Function Values Are Returned
 @cindex return values in registers
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index cad6308a87c..b8970761c8d 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3392,6 +3392,10 @@ stack.
 
 @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 
+@hook TARGET_READ_COMPLEX_PART
+
+@hook TARGET_WRITE_COMPLEX_PART
+
 @node Scalar Return
 @subsection How Scalar Function Values Are Returned
 @cindex return values in registers
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index b294eabb08d..973c16a14d3 100644
--- 

[PATCH v2 08/11] Native complex ops: Add explicit vector of complex

2023-09-12 Thread Sylvain Noiry via Gcc-patches
Summary:
Allow the creation and usage of builtins vectors of complex
in C, using __attribute__ ((vector_size ()))

gcc/c-family/ChangeLog:

* c-attribs.cc (vector_mode_valid_p): Add cases for
vectors of complex
(handle_mode_attribute): Likewise
(type_valid_for_vector_size): Likewise
* c-common.cc (c_common_type_for_mode): Likewise
(vector_types_compatible_elements_p): Likewise

gcc/ChangeLog:

* fold-const.cc (fold_binary_loc): Likewise

gcc/c/ChangeLog:

* c-typeck.cc (build_unary_op): Likewise
---
 gcc/c-family/c-attribs.cc | 12 ++--
 gcc/c-family/c-common.cc  | 21 +++--
 gcc/c/c-typeck.cc |  8 ++--
 gcc/fold-const.cc |  1 +
 4 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index e0c4259c905..b3ca5219730 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -2019,6 +2019,8 @@ vector_mode_valid_p (machine_mode mode)
   /* Doh!  What's going on?  */
   if (mclass != MODE_VECTOR_INT
   && mclass != MODE_VECTOR_FLOAT
+  && mclass != MODE_VECTOR_COMPLEX_INT
+  && mclass != MODE_VECTOR_COMPLEX_FLOAT
   && mclass != MODE_VECTOR_FRACT
   && mclass != MODE_VECTOR_UFRACT
   && mclass != MODE_VECTOR_ACCUM
@@ -2125,6 +2127,8 @@ handle_mode_attribute (tree *node, tree name, tree args,
 
case MODE_VECTOR_INT:
case MODE_VECTOR_FLOAT:
+   case MODE_VECTOR_COMPLEX_INT:
+   case MODE_VECTOR_COMPLEX_FLOAT:
case MODE_VECTOR_FRACT:
case MODE_VECTOR_UFRACT:
case MODE_VECTOR_ACCUM:
@@ -4361,9 +4365,13 @@ type_valid_for_vector_size (tree type, tree atname, tree 
args,
 
   if ((!INTEGRAL_TYPE_P (type)
&& !SCALAR_FLOAT_TYPE_P (type)
+   && !COMPLEX_INTEGER_TYPE_P (type)
+   && !COMPLEX_FLOAT_TYPE_P (type)
&& !FIXED_POINT_TYPE_P (type))
-  || (!SCALAR_FLOAT_MODE_P (orig_mode)
- && GET_MODE_CLASS (orig_mode) != MODE_INT
+  || ((!SCALAR_FLOAT_MODE_P (orig_mode)
+  && GET_MODE_CLASS (orig_mode) != MODE_INT)
+ && (!COMPLEX_FLOAT_MODE_P (orig_mode)
+ && GET_MODE_CLASS (orig_mode) != MODE_COMPLEX_INT)
  && !ALL_SCALAR_FIXED_POINT_MODE_P (orig_mode))
   || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type))
   || TREE_CODE (type) == BOOLEAN_TYPE
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 73e739c503d..f236fae94d4 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -2441,7 +2441,23 @@ c_common_type_for_mode (machine_mode mode, int unsignedp)
  : make_signed_type (precision));
 }
 
-  if (COMPLEX_MODE_P (mode))
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
+  && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
+{
+  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+   GET_MODE_NUNITS (mode));
+  tree bool_type = build_nonstandard_boolean_type (elem_bits);
+  return build_vector_type_for_mode (bool_type, mode);
+}
+  else if (VECTOR_MODE_P (mode)
+  && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
+{
+  machine_mode inner_mode = GET_MODE_INNER (mode);
+  tree inner_type = c_common_type_for_mode (inner_mode, unsignedp);
+  if (inner_type != NULL_TREE)
+   return build_vector_type_for_mode (inner_type, mode);
+}
+  else if (COMPLEX_MODE_P (mode))
 {
   machine_mode inner_mode;
   tree inner_type;
@@ -8360,10 +8376,11 @@ vector_types_compatible_elements_p (tree t1, tree t2)
 
   gcc_assert ((INTEGRAL_TYPE_P (t1)
   || c1 == REAL_TYPE
+  || c1 == COMPLEX_TYPE
   || c1 == FIXED_POINT_TYPE)
  && (INTEGRAL_TYPE_P (t2)
  || c2 == REAL_TYPE
- || c2 == FIXED_POINT_TYPE));
+ || c2 == COMPLEX_TYPE || c2 == FIXED_POINT_TYPE));
 
   t1 = c_common_signed_type (t1);
   t2 = c_common_signed_type (t2);
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index e55e887da14..25e7f68b5ab 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -4576,7 +4576,9 @@ build_unary_op (location_t location, enum tree_code code, 
tree xarg,
   if (typecode == INTEGER_TYPE
  || typecode == BITINT_TYPE
  || (gnu_vector_type_p (TREE_TYPE (arg))
- && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg
+ && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg))
+ && !COMPLEX_INTEGER_TYPE_P (TREE_TYPE (TREE_TYPE (arg)))
+ && !COMPLEX_FLOAT_TYPE_P (TREE_TYPE (TREE_TYPE (arg)
{
  tree e = arg;
 
@@ -4599,7 +4601,9 @@ build_unary_op (location_t location, enum tree_code code, 
tree xarg,
  if (!noconvert)
arg = default_conversion (arg);
}
-  else if (typecode == COMPLEX_TYPE)
+  else if (typecode == COMPLEX_TYPE
+  || COMPLEX_INTEGER_TYPE

[PATCH v2 06/11] Native complex ops: Update how complex rotations are handled

2023-09-12 Thread Sylvain Noiry via Gcc-patches
Summary:
Catch complex rotation by 90° and 270° in fold-const.cc like before,
but now convert them into the new COMPLEX_ROT90 and COMPLEX_ROT270
internal functions. Also add crot90 and crot270 optabs to expose these
operation the backends. So conditionnaly lower COMPLEX_ROT90/COMPLEX_ROT270
by checking if crot90/crot270 are in the optab. Finally, convert
a + crot90/270(b) into cadd90/270(a, b) in a similar way than FMAs.

gcc/ChangeLog:

* internal-fn.def: Add COMPLEX_ROT90 and COMPLEX_ROT270
* fold-const.cc (fold_binary_loc): Update the folding of
complex rotations to generate called to COMPLEX_ROT90 and
COMPLEX_ROT270
* optabs.def: add crot90/crot270 optabs
* tree-complex.cc (init_dont_simulate_again): Catch calls
to COMPLEX_ROT90 and COMPLEX_ROT270
(expand_complex_rotation): Conditionally lower complex
rotations if no pattern is present in the backend
(expand_complex_operations_1): Likewise
(convert_crot): Likewise
* tree-ssa-math-opts.cc (convert_crot_1): Catch complex
rotations with additions in a similar way the FMAs.
(math_opts_dom_walker::after_dom_children): Call convert_crot
if a COMPLEX_ROT90 or COMPLEX_ROT270 is identified
---
 gcc/fold-const.cc | 145 +++---
 gcc/internal-fn.def   |   2 +
 gcc/optabs.def|   2 +
 gcc/tree-complex.cc   |  83 +-
 gcc/tree-ssa-math-opts.cc | 128 +
 5 files changed, 335 insertions(+), 25 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index d19b4666c65..dc05599c7fe 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -11865,30 +11865,6 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
}
   else
{
- /* Fold z * +-I to __complex__ (-+__imag z, +-__real z).
-This is not the same for NaNs or if signed zeros are
-involved.  */
- if (!HONOR_NANS (arg0)
- && !HONOR_SIGNED_ZEROS (arg0)
- && COMPLEX_FLOAT_TYPE_P (TREE_TYPE (arg0))
- && TREE_CODE (arg1) == COMPLEX_CST
- && real_zerop (TREE_REALPART (arg1)))
-   {
- tree rtype = TREE_TYPE (TREE_TYPE (arg0));
- if (real_onep (TREE_IMAGPART (arg1)))
-   return
- fold_build2_loc (loc, COMPLEX_EXPR, type,
-  negate_expr (fold_build1_loc (loc, IMAGPART_EXPR,
-rtype, arg0)),
-  fold_build1_loc (loc, REALPART_EXPR, rtype, 
arg0));
- else if (real_minus_onep (TREE_IMAGPART (arg1)))
-   return
- fold_build2_loc (loc, COMPLEX_EXPR, type,
-  fold_build1_loc (loc, IMAGPART_EXPR, rtype, 
arg0),
-  negate_expr (fold_build1_loc (loc, REALPART_EXPR,
-rtype, arg0)));
-   }
-
  /* Optimize z * conj(z) for floating point complex numbers.
 Guarded by flag_unsafe_math_optimizations as non-finite
 imaginary components don't produce scalar results.  */
@@ -11901,6 +11877,127 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
  && operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
return fold_mult_zconjz (loc, type, arg0);
}
+
+  /* Fold z * +-I to __complex__ (-+__imag z, +-__real z).
+This is not the same for NaNs or if signed zeros are
+involved.  */
+  if (!HONOR_NANS (arg0)
+ && !HONOR_SIGNED_ZEROS (arg0)
+ && TREE_CODE (arg1) == COMPLEX_CST
+ && (COMPLEX_FLOAT_TYPE_P (TREE_TYPE (arg0))
+ && real_zerop (TREE_REALPART (arg1
+   {
+ if (real_onep (TREE_IMAGPART (arg1)))
+   {
+ tree rtype = TREE_TYPE (TREE_TYPE (arg0));
+ tree cplx_build = fold_build2_loc (loc, COMPLEX_EXPR, type,
+negate_expr (fold_build1_loc
+ (loc,
+  IMAGPART_EXPR,
+  rtype, arg0)),
+fold_build1_loc (loc,
+ REALPART_EXPR,
+ rtype,
+ arg0));
+ if (cplx_build
+ && TREE_CODE (TREE_OPERAND (cplx_build, 0)) != NEGATE_EXPR)
+   return cplx_build;
+
+ if ((TREE_CODE (arg0) == COMPLEX_EXPR)
+ && real_zerop (TREE_OPERAND (arg0, 1)))
+   return fold_build2

[PATCH v2 10/11] Native complex ops: Add a fast complex multiplication pattern

2023-09-12 Thread Sylvain Noiry via Gcc-patches
Summary:
Add a new fast_mult_optab to define a pattern corresponding to
the fast path of a IEEE compliant multiplication. Indeed, the backend
programmer can change the fast path without having to handle manually
the IEEE checks.

gcc/ChangeLog:

* internal-fn.def: Add a FAST_MULT internal fn
* optabs.def: Add fast_mult_optab
* tree-complex.cc (expand_complex_multiplication_components):
Adapt complex multiplication expand to generate
FAST_MULT internal fn
(expand_complex_multiplication): Likewise
(expand_complex_operations_1): Likewise
---
 gcc/internal-fn.def |  1 +
 gcc/optabs.def  |  1 +
 gcc/tree-complex.cc | 70 +
 3 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 0ac6cd98a4f..f1046996a48 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -396,6 +396,7 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, 
cadd90, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
+DEF_INTERNAL_OPTAB_FN (FAST_MULT, ECF_CONST, fast_mul, binary)
 DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary)
 DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_PLUS,
ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index d146cac5eec..a90b6ee6440 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -344,6 +344,7 @@ OPTAB_D (cmla_optab, "cmla$a4")
 OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
 OPTAB_D (cmls_optab, "cmls$a4")
 OPTAB_D (cmls_conj_optab, "cmls_conj$a4")
+OPTAB_D (fast_mul_optab, "fast_mul$a3")
 OPTAB_D (cos_optab, "cos$a2")
 OPTAB_D (cosh_optab, "cosh$a2")
 OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-complex.cc b/gcc/tree-complex.cc
index d814e407af6..16759f1f3ba 100644
--- a/gcc/tree-complex.cc
+++ b/gcc/tree-complex.cc
@@ -1138,25 +1138,36 @@ expand_complex_libcall (gimple_stmt_iterator *gsi, tree 
type, tree ar, tree ai,
 
 static void
 expand_complex_multiplication_components (gimple_seq *stmts, location_t loc,
- tree type, tree ar, tree ai,
- tree br, tree bi,
- tree *rr, tree *ri)
+ tree type, tree ac, tree ar,
+ tree ai, tree bc, tree br, tree bi,
+ tree *rr, tree *ri,
+ bool fast_mult)
 {
-  tree t1, t2, t3, t4;
+  tree inner_type = TREE_TYPE (type);
+  if (!fast_mult)
+{
+  tree t1, t2, t3, t4;
 
-  t1 = gimple_build (stmts, loc, MULT_EXPR, type, ar, br);
-  t2 = gimple_build (stmts, loc, MULT_EXPR, type, ai, bi);
-  t3 = gimple_build (stmts, loc, MULT_EXPR, type, ar, bi);
+  t1 = gimple_build (stmts, loc, MULT_EXPR, inner_type, ar, br);
+  t2 = gimple_build (stmts, loc, MULT_EXPR, inner_type, ai, bi);
+  t3 = gimple_build (stmts, loc, MULT_EXPR, inner_type, ar, bi);
 
-  /* Avoid expanding redundant multiplication for the common
- case of squaring a complex number.  */
-  if (ar == br && ai == bi)
-t4 = t3;
-  else
-t4 = gimple_build (stmts, loc, MULT_EXPR, type, ai, br);
+  /* Avoid expanding redundant multiplication for the common
+case of squaring a complex number.  */
+  if (ar == br && ai == bi)
+   t4 = t3;
+  else
+   t4 = gimple_build (stmts, loc, MULT_EXPR, inner_type, ai, br);
 
-  *rr = gimple_build (stmts, loc, MINUS_EXPR, type, t1, t2);
-  *ri = gimple_build (stmts, loc, PLUS_EXPR, type, t3, t4);
+  *rr = gimple_build (stmts, loc, MINUS_EXPR, inner_type, t1, t2);
+  *ri = gimple_build (stmts, loc, PLUS_EXPR, inner_type, t3, t4);
+}
+  else
+{
+  tree rc = gimple_build (stmts, loc, CFN_FAST_MULT, type, ac, bc);
+  *rr = gimple_build (stmts, loc, REALPART_EXPR, inner_type, rc);
+  *ri = gimple_build (stmts, loc, IMAGPART_EXPR, inner_type, rc);
+}
 }
 
 /* Expand complex multiplication to scalars:
@@ -1165,13 +1176,18 @@ expand_complex_multiplication_components (gimple_seq 
*stmts, location_t loc,
 
 static void
 expand_complex_multiplication (gimple_stmt_iterator *gsi, tree type,
-  tree ar, tree ai, tree br, tree bi,
+  tree ac, tree ar, tree ai,
+  tree bc, tree br, tree bi,
   complex_lattice_t al, complex_lattice_t bl)
 {
   tree rr, ri;
   tree inner_type = TREE_TYPE (type);
   location_t loc = gimple_location (gsi_stmt (*gsi));
   gimple_seq stmts = NULL;
+  bool fast_mult = direct_internal_fn_supported_p (IFN_FAST_MULT, type,
+  bb_optimization_type
+  

[PATCH v2 04/11] Native complex ops: Allow native complex regs and ops in rtl

2023-09-12 Thread Sylvain Noiry via Gcc-patches
Summary:
Support registers of complex types in rtl. Also adapt the functions
called during the expand pass to support native complex operations.

gcc/ChangeLog:

* explow.cc (trunc_int_for_mode): Allow complex int modes
* expr.cc (emit_move_complex_parts): Move both parts at the
same time if it is supported by the backend
(emit_move_complex): Do not move via integer if not int mode
corresponds. For complex floats, relax the constraint on the
number of registers for targets with pairs of registers, and
use native moves if it is supported by the backend.
(expand_expr_real_2): Move both parts at the same time if it
is supported by the backend
(expand_expr_real_1): Update the expand of complex constants
(const_vector_from_tree): Add the expand of both parts of a
complex constant
* real.h: update FLOAT_MODE_FORMAT
* machmode.h: Add COMPLEX_INT_MODE_P and COMPLEX_FLOAT_MODE_P
predicates
* optabs-libfuncs.cc (gen_int_libfunc): Add support for
complex modes
(gen_intv_fp_libfunc): Likewise
* recog.cc (general_operand): Likewise
* cse.cc (try_const_anchors): Likewise
* emit-rtl.cc: (validate_subreg): Likewise
---
 gcc/cse.cc   |  2 +-
 gcc/doc/tm.texi  |  2 +-
 gcc/emit-rtl.cc  |  2 +-
 gcc/explow.cc|  2 +-
 gcc/expr.cc  | 70 ++--
 gcc/internal-fn.cc   |  4 +--
 gcc/machmode.h   |  8 +
 gcc/optabs-libfuncs.cc   | 25 ++
 gcc/real.h   |  3 +-
 gcc/recog.cc |  1 +
 gcc/target.def   |  2 +-
 gcc/targhooks.cc |  8 ++---
 gcc/targhooks.h  |  3 +-
 gcc/tree-ssa-forwprop.cc |  1 +
 14 files changed, 105 insertions(+), 28 deletions(-)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index c46870059e6..5ce6c692070 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -1313,7 +1313,7 @@ try_const_anchors (rtx src_const, machine_mode mode)
   unsigned lower_old, upper_old;
 
   /* CONST_INT may be in various modes, avoid non-scalar-int mode. */
-  if (!SCALAR_INT_MODE_P (mode))
+  if (!(SCALAR_INT_MODE_P (mode) || COMPLEX_INT_MODE_P (mode)))
 return NULL_RTX;
 
   if (!compute_const_anchors (src_const, &lower_base, &lower_offs,
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 470497a3ade..1e87f798449 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4631,7 +4631,7 @@ This hook should return the rtx representing the 
specified @var{part} of the com
   @var{part} can be the real part, the imaginary part, or both of them.
 @end deftypefn
 
-@deftypefn {Target Hook} void TARGET_WRITE_COMPLEX_PART (rtx @var{cplx}, rtx 
@var{val}, complex_part_t @var{part})
+@deftypefn {Target Hook} void TARGET_WRITE_COMPLEX_PART (rtx @var{cplx}, rtx 
@var{val}, complex_part_t @var{part}, bool @var{undefined_p})
 This hook should move the rtx value given by @var{val} to the specified 
@var{var} of the complex given by @var{cplx}.
   @var{var} can be the real part, the imaginary part, or both of them.
 @end deftypefn
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 22012bfea13..f7c33c4afb1 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -946,7 +946,7 @@ validate_subreg (machine_mode omode, machine_mode imode,
  if this ought to be represented at all -- why can't this all be hidden
  in post-reload splitters that make arbitrarily mode changes to the
  registers themselves.  */
-  else if (VECTOR_MODE_P (omode)
+  else if ((VECTOR_MODE_P (omode) || COMPLEX_MODE_P (omode))
   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
 ;
   /* Subregs involving floating point modes are not allowed to
diff --git a/gcc/explow.cc b/gcc/explow.cc
index 6424c0802f0..48572a40eab 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -56,7 +56,7 @@ trunc_int_for_mode (HOST_WIDE_INT c, machine_mode mode)
   int width = GET_MODE_PRECISION (smode);
 
   /* You want to truncate to a _what_?  */
-  gcc_assert (SCALAR_INT_MODE_P (mode));
+  gcc_assert (SCALAR_INT_MODE_P (mode) || COMPLEX_INT_MODE_P (mode));
 
   /* Canonicalize BImode to 0 and STORE_FLAG_VALUE.  */
   if (smode == BImode)
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 12b74273144..01462486631 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -3842,8 +3842,14 @@ emit_move_complex_parts (rtx x, rtx y)
   && REG_P (x) && !reg_overlap_mentioned_p (x, y))
 emit_clobber (x);
 
-  write_complex_part (x, read_complex_part (y, REAL_P), REAL_P, true);
-  write_complex_part (x, read_complex_part (y, IMAG_P), IMAG_P, false);
+  machine_mode mode = GET_MODE (x);
+  if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)
+write_complex_part (x, read_complex_part (y, BOTH_P), BOTH_P, true);
+  else
+{
+  write_complex_part (x, read_complex_part (y, REAL_P), REAL_P, true);
+  write_complex_part (x, read_complex_part (y, IMAG_P), IMAG_P, false);
+}
 
   r

[PATCH v2 09/11] Native complex ops: remove useless special cases

2023-09-12 Thread Sylvain Noiry via Gcc-patches
Summary:
Remove two special cases which are now useless with the new complex
handling.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (pass_forwprop::execute): Remove
  two special cases
---
 gcc/tree-ssa-forwprop.cc | 133 +--
 1 file changed, 3 insertions(+), 130 deletions(-)

diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 30e99f812f1..0c968f6ca32 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -3670,61 +3670,8 @@ pass_forwprop::execute (function *fun)
   != TARGET_MEM_REF)
   && !stmt_can_throw_internal (fun, stmt))
{
- /* Rewrite loads used only in real/imagpart extractions to
-component-wise loads.  */
- use_operand_p use_p;
- imm_use_iterator iter;
- bool rewrite = true;
- FOR_EACH_IMM_USE_FAST (use_p, iter, lhs)
-   {
- gimple *use_stmt = USE_STMT (use_p);
- if (is_gimple_debug (use_stmt))
-   continue;
- if (!is_gimple_assign (use_stmt)
- || (gimple_assign_rhs_code (use_stmt) != REALPART_EXPR
- && gimple_assign_rhs_code (use_stmt) != IMAGPART_EXPR)
- || TREE_OPERAND (gimple_assign_rhs1 (use_stmt), 0) != lhs)
-   {
- rewrite = false;
- break;
-   }
-   }
- if (rewrite)
-   {
- gimple *use_stmt;
- FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
-   {
- if (is_gimple_debug (use_stmt))
-   {
- if (gimple_debug_bind_p (use_stmt))
-   {
- gimple_debug_bind_reset_value (use_stmt);
- update_stmt (use_stmt);
-   }
- continue;
-   }
-
- tree new_rhs = build1 (gimple_assign_rhs_code (use_stmt),
-TREE_TYPE (TREE_TYPE (rhs)),
-unshare_expr (rhs));
- gimple *new_stmt
-   = gimple_build_assign (gimple_assign_lhs (use_stmt),
-  new_rhs);
-
- location_t loc = gimple_location (use_stmt);
- gimple_set_location (new_stmt, loc);
- gimple_stmt_iterator gsi2 = gsi_for_stmt (use_stmt);
- unlink_stmt_vdef (use_stmt);
- gsi_remove (&gsi2, true);
-
- gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
-   }
-
- release_defs (stmt);
- gsi_remove (&gsi, true);
-   }
- else
-   gsi_next (&gsi);
+ /* Special case removed due to better complex processing.  */
+ gsi_next (&gsi);
}
  else if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE
   && (TYPE_MODE (TREE_TYPE (lhs)) == BLKmode
@@ -3739,81 +3686,7 @@ pass_forwprop::execute (function *fun)
optimize_vector_load (&gsi);
 
  else if (code == COMPLEX_EXPR)
-   {
- /* Rewrite stores of a single-use complex build expression
-to component-wise stores.  */
- use_operand_p use_p;
- gimple *use_stmt, *def1, *def2;
- tree rhs2;
- if (single_imm_use (lhs, &use_p, &use_stmt)
- && gimple_store_p (use_stmt)
- && !gimple_has_volatile_ops (use_stmt)
- && is_gimple_assign (use_stmt)
- && (TREE_CODE (gimple_assign_lhs (use_stmt))
- != TARGET_MEM_REF))
-   {
- tree use_lhs = gimple_assign_lhs (use_stmt);
- if (auto_var_p (use_lhs))
-   DECL_NOT_GIMPLE_REG_P (use_lhs) = 1;
- tree new_lhs = build1 (REALPART_EXPR,
-TREE_TYPE (TREE_TYPE (use_lhs)),
-unshare_expr (use_lhs));
- gimple *new_stmt = gimple_build_assign (new_lhs, rhs);
- location_t loc = gimple_location (use_stmt);
- gimple_set_location (new_stmt, loc);
- gimple_set_vuse (new_stmt, gimple_vuse (use_stmt));
- gimple_set_vdef (new_stmt, make_ssa_name (gimple_vop (fun)));
- SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
- gimple_set_vuse (use_stmt, gimple_vdef (new_stmt));
- gimple_stmt_iterator gsi2 = gsi_for_stmt (use_stmt);
- gsi_insert_before (&gsi2, new_stmt, GSI_SAME_STMT);
-
- new_lhs = build1

[PATCH v2 11/11] Native complex ops: Experimental support in x86 backend

2023-09-12 Thread Sylvain Noiry via Gcc-patches
Summary:
Add an experimental support for native complex operation handling in
the x86 backend. For now it only support add, sub, mul, conj, neg, mov
in SCmode (complex float). Performance gains are still marginal on this
target because there are no particular instructions to speedup complex
operation, except some SIMD tricks.

gcc/ChangeLog:

* config/i386/i386.cc (classify_argument): Align complex
element to the whole size, not size of the parts
(ix86_return_in_memory): Handle complex modes like a scalar
with the same size
(ix86_class_max_nregs): Likewise
(ix86_hard_regno_nregs): Likewise
(function_value_ms_64): Add case for SCmode
(ix86_build_const_vector): Likewise
(ix86_build_signbit_mask): Likewise
(x86_gen_rtx_complex): New: Implement the gen_rtx_complex
hook, use registers of complex modes to represent complex
elements in rtl
(x86_read_complex_part): New: Implement the read_complex_part
hook, handle registers of complex modes
(x86_write_complex_part): New: Implement the write_complex_part
hook, handle registers of complex modes
* config/i386/i386.h: Add SCmode in several predicates
* config/i386/sse.md: Add pattern for some complex operations in
SCmode. This includes movsc, addsc3, subsc3, negsc2, mulsc3,
and conjsc2
---
 gcc/config/i386/i386.cc | 296 +++-
 gcc/config/i386/i386.h  |  11 +-
 gcc/config/i386/sse.md  | 144 +++
 3 files changed, 440 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 477e6cecc38..77bf80b64b1 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -2348,8 +2348,8 @@ classify_argument (machine_mode mode, const_tree type,
mode_alignment = 128;
   else if (mode == XCmode)
mode_alignment = 256;
-  if (COMPLEX_MODE_P (mode))
-   mode_alignment /= 2;
+  /*if (COMPLEX_MODE_P (mode))
+   mode_alignment /= 2;*/
   /* Misaligned fields are always returned in memory.  */
   if (bit_offset % mode_alignment)
return 0;
@@ -3023,6 +3023,7 @@ pass_in_reg:
 case E_V4BFmode:
 case E_V2SImode:
 case E_V2SFmode:
+case E_SCmode:
 case E_V1TImode:
 case E_V1DImode:
   if (!type || !AGGREGATE_TYPE_P (type))
@@ -3273,6 +3274,7 @@ pass_in_reg:
 case E_V4BFmode:
 case E_V2SImode:
 case E_V2SFmode:
+case E_SCmode:
 case E_V1TImode:
 case E_V1DImode:
   if (!type || !AGGREGATE_TYPE_P (type))
@@ -4187,8 +4189,8 @@ function_value_ms_64 (machine_mode orig_mode, 
machine_mode mode,
  && !INTEGRAL_TYPE_P (valtype)
  && !VECTOR_FLOAT_TYPE_P (valtype))
break;
- if ((SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode))
- && !COMPLEX_MODE_P (mode))
+ if ((SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode)))
+// && !COMPLEX_MODE_P (mode))
regno = FIRST_SSE_REG;
  break;
case 8:
@@ -4295,7 +4297,7 @@ ix86_return_in_memory (const_tree type, const_tree fntype 
ATTRIBUTE_UNUSED)
   || INTEGRAL_TYPE_P (type)
   || VECTOR_FLOAT_TYPE_P (type))
  && (SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode))
- && !COMPLEX_MODE_P (mode)
+ //&& !COMPLEX_MODE_P (mode)
  && (GET_MODE_SIZE (mode) == 16 || size == 16))
return false;
 
@@ -15752,6 +15754,7 @@ ix86_build_const_vector (machine_mode mode, bool vect, 
rtx value)
 case E_V8SFmode:
 case E_V4SFmode:
 case E_V2SFmode:
+case E_SCmode:
 case E_V8DFmode:
 case E_V4DFmode:
 case E_V2DFmode:
@@ -15800,6 +15803,7 @@ ix86_build_signbit_mask (machine_mode mode, bool vect, 
bool invert)
 case E_V8SFmode:
 case E_V4SFmode:
 case E_V2SFmode:
+case E_SCmode:
 case E_V2SImode:
   vec_mode = mode;
   imode = SImode;
@@ -19894,7 +19898,8 @@ ix86_class_max_nregs (reg_class_t rclass, machine_mode 
mode)
   else
 {
   if (COMPLEX_MODE_P (mode))
-   return 2;
+   return CEIL (GET_MODE_SIZE (mode), UNITS_PER_WORD);
+   //return 2;
   else
return 1;
 }
@@ -20230,7 +20235,8 @@ ix86_hard_regno_nregs (unsigned int regno, machine_mode 
mode)
   return CEIL (GET_MODE_SIZE (mode), UNITS_PER_WORD);
 }
   if (COMPLEX_MODE_P (mode))
-return 2;
+return 1;
+//return 2;
   /* Register pair for mask registers.  */
   if (mode == P2QImode || mode == P2HImode)
 return 2;
@@ -23757,6 +23763,273 @@ ix86_preferred_simd_mode (scalar_mode mode)
 }
 }
 
+static rtx
+x86_gen_rtx_complex (machine_mode mode, rtx real_part, rtx imag_part)
+{
+  machine_mode imode = GET_MODE_INNER (mode);
+
+  if ((real_part == imag_part) && (real_part == CONST0_RTX (imode)))
+{
+  if (CONST_DOUBLE_P (real_part))
+   return const_double_from_real_val

[PATCH v2 05/11] Native complex ops: Add the conjugate op in optabs

2023-09-12 Thread Sylvain Noiry via Gcc-patches
Summary:
Add an optab and rtl operation for the conjugate, called conj,
to expand CONJ_EXPR.

gcc/ChangeLog:

* rtl.def: Add a conj operation in rtl
* optabs.def: Add a conj optab
* optabs-tree.cc (optab_for_tree_code): use the
conj_optab to convert a CONJ_EXPR
* expr.cc (expand_expr_real_2): Add a case to expand
native CONJ_EXPR
(expand_expr_real_1): Likewise
---
 gcc/expr.cc| 17 -
 gcc/optabs-tree.cc |  3 +++
 gcc/optabs.def |  3 +++
 gcc/rtl.def|  3 +++
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 01462486631..937c2375133 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -10487,6 +10487,18 @@ expand_expr_real_2 (sepops ops, rtx target, 
machine_mode tmode,
return dst;
   }
 
+case CONJ_EXPR:
+  op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL);
+  if (modifier == EXPAND_STACK_PARM)
+   target = 0;
+  temp = expand_unop (mode,
+ optab_for_tree_code (CONJ_EXPR, type,
+  optab_default),
+ op0, target, 0);
+  gcc_assert (temp);
+  return REDUCE_BIT_FIELD (temp);
+
+
 default:
   gcc_unreachable ();
 }
@@ -12099,6 +12111,10 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
   op0 = expand_normal (treeop0);
   return read_complex_part (op0, IMAG_P);
 
+case CONJ_EXPR:
+  op0 = expand_normal (treeop0);
+  return op0;
+
 case RETURN_EXPR:
 case LABEL_EXPR:
 case GOTO_EXPR:
@@ -12122,7 +12138,6 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
 case VA_ARG_EXPR:
 case BIND_EXPR:
 case INIT_EXPR:
-case CONJ_EXPR:
 case COMPOUND_EXPR:
 case PREINCREMENT_EXPR:
 case PREDECREMENT_EXPR:
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index 40bfbb1a5ad..ee5d52a7d50 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -271,6 +271,9 @@ optab_for_tree_code (enum tree_code code, const_tree type,
return TYPE_UNSIGNED (type) ? usneg_optab : ssneg_optab;
   return trapv ? negv_optab : neg_optab;
 
+case CONJ_EXPR:
+  return conj_optab;
+
 case ABS_EXPR:
   return trapv ? absv_optab : abs_optab;
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 2ccbe4197b7..8405d365c97 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -162,6 +162,9 @@ OPTAB_NL(umax_optab, "umax$I$a3", UMAX, "umax", '3', 
gen_int_libfunc)
 OPTAB_NL(neg_optab, "neg$P$a2", NEG, "neg", '2', gen_int_fp_fixed_libfunc)
 OPTAB_NX(neg_optab, "neg$F$a2")
 OPTAB_NX(neg_optab, "neg$Q$a2")
+OPTAB_NL(conj_optab, "conj$P$a2", CONJ, "conj", '2', gen_int_fp_fixed_libfunc)
+OPTAB_NX(conj_optab, "conj$F$a2")
+OPTAB_NX(conj_optab, "conj$Q$a2")
 OPTAB_VL(negv_optab, "negv$I$a2", NEG, "neg", '2', gen_intv_fp_libfunc)
 OPTAB_VX(negv_optab, "neg$F$a2")
 OPTAB_NL(ssneg_optab, "ssneg$Q$a2", SS_NEG, "ssneg", '2', 
gen_signed_fixed_libfunc)
diff --git a/gcc/rtl.def b/gcc/rtl.def
index 88e2b198503..0312b3ea262 100644
--- a/gcc/rtl.def
+++ b/gcc/rtl.def
@@ -460,6 +460,9 @@ DEF_RTL_EXPR(MINUS, "minus", "ee", RTX_BIN_ARITH)
 /* Minus operand 0.  */
 DEF_RTL_EXPR(NEG, "neg", "e", RTX_UNARY)
 
+/* Conj operand 0.  */
+DEF_RTL_EXPR(CONJ, "conj", "e", RTX_UNARY)
+
 DEF_RTL_EXPR(MULT, "mult", "ee", RTX_COMM_ARITH)
 
 /* Multiplication with signed saturation */
-- 
2.17.1







[PATCH v2 07/11] Native complex ops: Vectorization of native complex operations

2023-09-12 Thread Sylvain Noiry via Gcc-patches
Summary:
Add vectors of complex types to vectorize native operations. Because of
the vectorize was designed to work with scalar elements, several functions
and target hooks have to be adapted or duplicated to support complex types.
After that, the vectorization of native complex operations follows exactly
the same flow as scalars operations.

gcc/ChangeLog:

* target.def: Add preferred_simd_mode_complex and
related_mode_complex by duplicating their scalar counterparts
* targhooks.h: Add default_preferred_simd_mode_complex and
default_vectorize_related_mode_complex
* targhooks.cc (default_preferred_simd_mode_complex): New:
Default implementation of preferred_simd_mode_complex
(default_vectorize_related_mode_complex): New: Default
implementation of related_mode_complex
* doc/tm.texi: Document
TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX
and TARGET_VECTORIZE_RELATED_MODE_COMPLEX
* doc/tm.texi.in: Add TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX
and TARGET_VECTORIZE_RELATED_MODE_COMPLEX
* emit-rtl.cc (init_emit_once): Add the zero constant for vectors
of complex modes
* genmodes.cc (vector_class): Add case for vectors of complex
(complete_mode): Likewise
(make_complex_modes): Likewise
* gensupport.cc (match_pattern): Likewise
* machmode.h: Add vectors of complex in predicates and redefine
mode_for_vector and related_vector_mode for complex types
* mode-classes.def: Add MODE_VECTOR_COMPLEX_INT and
MODE_VECTOR_COMPLEX_FLOAT classes
* stor-layout.cc (mode_for_vector): Adapt for complex modes
using sub-functions calling a common one
(int_mode_for_mode): Add case for complex vectors
(related_vector_mode): Implement the function for complex modes
* tree-vect-generic.cc (type_for_widest_vector_mode): Add
cases for complex modes
* tree-vect-stmts.cc (get_related_vectype_for_scalar_type):
Adapt for complex modes
* tree.cc (build_vector_type_for_mode): Add cases for complex
modes
---
 gcc/doc/tm.texi  | 31 ++
 gcc/doc/tm.texi.in   |  4 +++
 gcc/emit-rtl.cc  | 10 +++
 gcc/genmodes.cc  |  8 ++
 gcc/gensupport.cc|  3 +++
 gcc/machmode.h   | 19 +++---
 gcc/mode-classes.def |  2 ++
 gcc/stor-layout.cc   | 45 ---
 gcc/target.def   | 39 +++
 gcc/targhooks.cc | 29 
 gcc/targhooks.h  |  4 +++
 gcc/tree-vect-generic.cc |  4 +++
 gcc/tree-vect-stmts.cc   | 57 
 gcc/tree.cc  |  2 ++
 14 files changed, 232 insertions(+), 25 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 1e87f798449..f7a8a5351e2 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6251,6 +6251,13 @@ equal to @code{word_mode}, because the vectorizer can do 
some
 transformations even in absence of specialized @acronym{SIMD} hardware.
 @end deftypefn
 
+@deftypefn {Target Hook} machine_mode 
TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX (complex_mode @var{mode})
+This hook should return the preferred mode for vectorizing complex
+mode @var{mode}.  The default is
+equal to @code{word_mode}, because the vectorizer can do some
+transformations even in absence of specialized @acronym{SIMD} hardware.
+@end deftypefn
+
 @deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_SPLIT_REDUCTION 
(machine_mode)
 This hook should return the preferred mode to split the final reduction
 step on @var{mode} to.  The reduction is then carried out reducing upper
@@ -6313,6 +6320,30 @@ requested mode, returning a mode with the same size as 
@var{vector_mode}
 when @var{nunits} is zero.  This is the correct behavior for most targets.
 @end deftypefn
 
+@deftypefn {Target Hook} opt_machine_mode 
TARGET_VECTORIZE_RELATED_MODE_COMPLEX (machine_mode @var{vector_mode}, 
complex_mode @var{element_mode}, poly_uint64 @var{nunits})
+If a piece of code is using vector mode @var{vector_mode} and also wants
+to operate on elements of mode @var{element_mode}, return the vector mode
+it should use for those elements.  If @var{nunits} is nonzero, ensure that
+the mode has exactly @var{nunits} elements, otherwise pick whichever vector
+size pairs the most naturally with @var{vector_mode}.  Return an empty
+@code{opt_machine_mode} if there is no supported vector mode with the
+required properties.
+
+There is no prescribed way of handling the case in which @var{nunits}
+is zero.  One common choice is to pick a vector mode with the same size
+as @var{vector_mode}; this is the natural choice if the target has a
+fixed vector size.  Another option is to choose a vector mode with the
+same number of elements as @var{vector_mode}; this is the natural choice
+if the target has a fixed number of elemen

Re: [PATCH] match: Don't sink comparisons into vec_cond operands.

2023-09-12 Thread Richard Biener via Gcc-patches
On Fri, Sep 8, 2023 at 7:55 PM Robin Dapp via Gcc-patches
 wrote:
>
> Hi,
>
> on riscv gcc.dg/pr70252.c ICEs at gimple-isel.cc:283.  This is because
> we created the gimple statement
>
>   mask__7.36_170 = VEC_COND_EXPR  }>;
>
> during vrp2.
>
> What happens is that, starting with
>   maskdest = (vec_cond mask1 1 0) >= (vec_cond mask2 1 0)
> we fold to
>   maskdest = mask1 >= (vec_cond (mask2 1 0))
> and then sink the "mask1 >=" into the vec_cond so we end up with
>   maskdest = vec_cond (mask2 ? mask1 : 0),
> i.e. a vec_cond with a mask "data mode".

I don't see how the patterns change the modes involved in the vec_cond
nor how they change the condition.

> In gimple-isel, when the target does not provide a vcond_mask
> implementation for that (which none does) we fail the assertion that the
> mask mode be MODE_VECTOR_INT.
>
> To prevent this, this patch restricts the match.pd sinking pattern to
> non-mask types.  I was also thinking about restricting the type of
> the operands, wondering if that would be less intrusive.

If you can show what vec_cond is supported before the transform
(with types/modes shown) and what vec_cond is not, after the transform
then those patterns need to be adjusted to check for the support of
the target operation.  I'll note that we have many patterns like

(simplify
 (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2)
 (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
  (vec_cond (bit_and @0 @3) @1 @2)))

which check optimize_vectors_before_lowering_p () (but even then if
the new vec_cond isn't supported by the taget but the original ones
are we get sub-optimal code).

Richard.

> Bootstrapped and regression-tested on x86 and aarch64.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
> PR target/111337
> * match.pd: Do not sink comparisons into vec_conds when the type
> is a vector mask.
> ---
>  gcc/match.pd | 24 +++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 8c24dae71cd..db3e698f471 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4856,7 +4856,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(vec_cond @0 (view_convert! @1) (view_convert! @2
>
>  /* Sink binary operation to branches, but only if we can fold it.  */
> -(for op (tcc_comparison plus minus mult bit_and bit_ior bit_xor
> +(for op (plus minus mult bit_and bit_ior bit_xor
>  lshift rshift rdiv trunc_div ceil_div floor_div round_div
>  trunc_mod ceil_mod floor_mod round_mod min max)
>  /* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
> @@ -4872,6 +4872,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(op @3 (vec_cond:s @0 @1 @2))
>(vec_cond @0 (op! @3 @1) (op! @3 @2
>
> +/* Comparison sinks might be folded into vector masks which could
> +   end up as "data" operand of a vec_cond
> +   e.g. (vec_cond @0 (mask1) (...)).
> +   gimple-isel does not handle such cases if the target does not provide
> +   a vcond_mask.  Therefore, restrict the operands to non-mask classes.  */
> +(for op (tcc_comparison)
> +/* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
> + (simplify
> +  (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
> +  (if (GET_MODE_CLASS (TYPE_MODE (type)) != MODE_VECTOR_BOOL)
> +(vec_cond @0 (op! @1 @3) (op! @2 @4
> +
> +/* (c ? a : b) op d  -->  c ? (a op d) : (b op d) */
> + (simplify
> +  (op (vec_cond:s @0 @1 @2) @3)
> +  (if (GET_MODE_CLASS (TYPE_MODE (type)) != MODE_VECTOR_BOOL)
> +(vec_cond @0 (op! @1 @3) (op! @2 @3
> + (simplify
> +  (op @3 (vec_cond:s @0 @1 @2))
> +  (if (GET_MODE_CLASS (TYPE_MODE (type)) != MODE_VECTOR_BOOL)
> +(vec_cond @0 (op! @3 @1) (op! @3 @2)
> +
>  #if GIMPLE
>  (match (nop_atomic_bit_test_and_p @0 @1 @4)
>   (bit_and (convert?@4 (ATOMIC_FETCH_OR_XOR_N @2 INTEGER_CST@0 @3))
> --
> 2.41.0
>


Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

2023-09-12 Thread Manolis Tsamis
On Tue, Sep 12, 2023 at 3:47 AM Jeff Law  wrote:
>
>
>
> On 9/9/23 02:46, Manolis Tsamis wrote:
> > This is a new RTL pass that tries to optimize memory offset calculations
> > by moving them from add immediate instructions to the memory loads/stores.
> > For example it can transform this:
> >
> >addi t4,sp,16
> >add  t2,a6,t4
> >shl  t3,t2,1
> >ld   a2,0(t3)
> >addi a2,1
> >sd   a2,8(t2)
> >
> > into the following (one instruction less):
> >
> >add  t2,a6,sp
> >shl  t3,t2,1
> >ld   a2,32(t3)
> >addi a2,1
> >sd   a2,24(t2)
> >
> > Although there are places where this is done already, this pass is more
> > powerful and can handle the more difficult cases that are currently not
> > optimized. Also, it runs late enough and can optimize away unnecessary
> > stack pointer calculations.
> >
> > gcc/ChangeLog:
> >
> >   * Makefile.in: Add fold-mem-offsets.o.
> >   * passes.def: Schedule a new pass.
> >   * tree-pass.h (make_pass_fold_mem_offsets): Declare.
> >   * common.opt: New options.
> >   * doc/invoke.texi: Document new option.
> >   * fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> >   * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> >   * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> > Changes in v5:
> >  - Introduce new helper function fold_offsets_1.
> >  - Fix bug because constants could be partially propagated
> >through instructions that weren't understood.
> >  - Introduce helper class fold_mem_info that stores f-m-o
> >info for an instruction.
> >  - Calculate fold_offsets only once with do_fold_info_calculation.
> >  - Fix correctness issue by introducing compute_validity_closure.
> >  - Propagate in more cases for PLUS/MINUS with constant.
> >
> > Changes in v4:
> >  - Add DF_EQ_NOTES flag to avoid incorrect state in notes.
> >  - Remove fold_mem_offsets_driver and enum fold_mem_phase.
> >  - Call recog when patching offsets in do_commit_offset.
> >  - Restore INSN_CODE after modifying insn in do_check_validity.
> >
> > Changes in v3:
> >  - Added propagation for more codes:
> >sub, neg, mul.
> >  - Added folding / elimination for sub and
> >const int moves.
> >  - For the validity check of the generated addresses
> >also test memory_address_addr_space_p.
> >  - Replaced GEN_INT with gen_int_mode.
> >  - Replaced some bitmap_head with auto_bitmap.
> >  - Refactor each phase into own function for readability.
> >  - Add dump details.
> >  - Replace rtx iteration with reg_mentioned_p.
> >  - Return early for codes that we can't propagate through.
> >
> > Changes in v2:
> >  - Made the pass target-independant instead of RISCV specific.
> >  - Fixed a number of bugs.
> >  - Add code to handle more ADD patterns as found
> >in other targets (x86, aarch64).
> >  - Improved naming and comments.
> >  - Fixed bitmap memory leak.
> >
>
>
> > +
> > +/* Get the single reaching definition of an instruction inside a BB.
> > +   The definition is desired for REG used in INSN.
> > +   Return the definition insn or NULL if there's no definition with
> > +   the desired criteria.  */
> > +static rtx_insn*
> > +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> > +{
> > +  df_ref use;
> > +  struct df_link *ref_chain, *ref_link;
> > +
> > +  FOR_EACH_INSN_USE (use, insn)
> > +{
> > +  if (GET_CODE (DF_REF_REG (use)) == SUBREG)
> > + return NULL;
> > +  if (REGNO (DF_REF_REG (use)) == REGNO (reg))
> > + break;
> > +}
> > +
> > +  if (!use)
> > +return NULL;
> > +
> > +  ref_chain = DF_REF_CHAIN (use);
> So what if there's two uses of REG in INSN?  I don't think it's be
> common at all, but probably better safe and reject than sorry, right? Or
> is that case filtered out earlier?
>
If the REG is the same won't the definitions be the same even if that
REG appears multiple times in INSN?
fold_offsets_1 should be able to handle the folding with multiple uses
of REG just fine, for example add R1, R1 or add (ashift R1, 1), R1.
If there's no other issue here I assume we want to keep that as-is in
order to not reduce the propagation power (Which I assume is similar
to ree which uses the same logic).

>
>
>
> > +
> > +  rtx_insn* def = DF_REF_INSN (ref_chain->ref);
> Formatting nit.  The '*' should be next to the variable, not the type.
>
Thanks, I fixed this in multiple places. I thought that
check_GNU_style would whine about this...

> > +
>
> > +
> > +static HOST_WIDE_INT
> > +fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap 
> > foldable_insns);
> > +
> > +/*  Helper function for fold_offsets.
> > +
> > +If 

Re: [PATCH] sccvn: Avoid ICEs on _BitInt load BIT_AND_EXPR mask [PR111338]

2023-09-12 Thread Richard Biener via Gcc-patches
On Mon, 11 Sep 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase ICEs, because vn_walk_cb_data::push_partial_def
> uses a fixed size buffer (64 target bytes) for its
> construction/deconstruction of partial stores and fails if larger precision
> than that is needed, and the PR93582 changes assert push_partial_def
> succeeds (and check the various other conditions much earlier when seeing
> the BIT_AND_EXPR statement, like CHAR_BIT == 8, BITS_PER_UNIT == 8,
> BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN, etc.).  So, just removing the assert
> and allowing it fail there doesn't really work and ICEs later on.
> 
> The following patch moves the bufsize out of the method and tests it
> together with the other checks.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> BTW, perhaps we could increase the bufsize as well or in addition to
> increasing it make the buffer allocated using XALLOCAVEC, but still I think
> it is useful to have some upper bound and so I think this patch is useful
> even in that case.

Yeah, the size is choosen to match the largest vector mode we currently
have.

Richard.

> 2023-09-11  Jakub Jelinek  
> 
>   PR middle-end/111338
>   * tree-ssa-sccvn.cc (struct vn_walk_cb_data): Add bufsize non-static
>   data member.
>   (vn_walk_cb_data::push_partial_def): Remove bufsize variable.
>   (visit_nary_op): Avoid the BIT_AND_EXPR with constant rhs2
>   optimization if type's precision is too large for
>   vn_walk_cb_data::bufsize.
> 
>   * gcc.dg/bitint-37.c: New test.
> 
> --- gcc/tree-ssa-sccvn.cc.jj  2023-09-06 17:28:24.232977433 +0200
> +++ gcc/tree-ssa-sccvn.cc 2023-09-08 13:22:27.928158846 +0200
> @@ -1903,6 +1903,7 @@ struct vn_walk_cb_data
>alias_set_type first_base_set;
>splay_tree known_ranges;
>obstack ranges_obstack;
> +  static constexpr HOST_WIDE_INT bufsize = 64;
>  };
>  
>  vn_walk_cb_data::~vn_walk_cb_data ()
> @@ -1973,7 +1974,6 @@ vn_walk_cb_data::push_partial_def (pd_da
>  HOST_WIDE_INT offseti,
>  HOST_WIDE_INT maxsizei)
>  {
> -  const HOST_WIDE_INT bufsize = 64;
>/* We're using a fixed buffer for encoding so fail early if the object
>   we want to interpret is bigger.  */
>if (maxsizei > bufsize * BITS_PER_UNIT
> @@ -5414,6 +5414,7 @@ visit_nary_op (tree lhs, gassign *stmt)
> && CHAR_BIT == 8
> && BITS_PER_UNIT == 8
> && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
> +   && TYPE_PRECISION (type) <= vn_walk_cb_data::bufsize * BITS_PER_UNIT
> && !integer_all_onesp (gimple_assign_rhs2 (stmt))
> && !integer_zerop (gimple_assign_rhs2 (stmt)))
>   {
> --- gcc/testsuite/gcc.dg/bitint-37.c.jj   2023-09-08 13:27:51.676882523 
> +0200
> +++ gcc/testsuite/gcc.dg/bitint-37.c  2023-09-08 13:27:22.460268614 +0200
> @@ -0,0 +1,11 @@
> +/* PR middle-end/111338 */
> +/* { dg-do compile { target bitint575 } } */
> +/* { dg-options "-O1" } */
> +
> +_BitInt(575) e;
> +
> +_BitInt(575)
> +foo (void)
> +{
> +  return e & 1;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] small _BitInt tweaks

2023-09-12 Thread Richard Biener via Gcc-patches
On Mon, 11 Sep 2023, Jakub Jelinek wrote:

> Hi!
> 
> When discussing PR111369 with Andrew Pinski, I've realized that
> I haven't added BITINT_TYPE handling to range_check_type.  Right now
> (unsigned) max + 1 == (unsigned) min for signed _BitInt,l so I think we
> don't need to do the extra hops for BITINT_TYPE (though possibly we don't
> need them for INTEGER_TYPE either in the two's complement word and we don't
> support anything else, though I really don't know if Ada or some other
> FEs don't create weird INTEGER_TYPEs).
> And, also I think it is undesirable when being asked for signed_type_for
> of unsigned _BitInt(1) (which is valid) to get signed _BitInt(1) (which is
> invalid, the standard only allows signed _BitInt(2) and larger), so the
> patch returns 1-bit signed INTEGER_TYPE for those cases.

I think the last bit is a bit surprising - do the frontends use
signed_or_unsigned_type_for and would they be confused if getting
back an INTEGER_TYPE here?

The range_check_type bits are OK.  For the tree.cc part I think
the middle-end can just handle signed 1-bit BITINT fine?

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2023-09-11  Jakub Jelinek  
> 
> gcc/
>   * tree.cc (signed_or_unsigned_type_for): Return INTEGER_TYPE for
>   signed variant of unsigned _BitInt(1).
>   * fold-const.cc (range_check_type): Handle BITINT_TYPE like
>   OFFSET_TYPE.
> gcc/c-family/
>   * c-common.cc (c_common_signed_or_unsigned_type): Return INTEGER_TYPE
>   for signed variant of unsigned _BitInt(1).
> 
> --- gcc/tree.cc.jj2023-09-06 17:50:30.707589026 +0200
> +++ gcc/tree.cc   2023-09-11 16:24:58.749625569 +0200
> @@ -11096,7 +11096,7 @@ signed_or_unsigned_type_for (int unsigne
>else
>  return NULL_TREE;
>  
> -  if (TREE_CODE (type) == BITINT_TYPE)
> +  if (TREE_CODE (type) == BITINT_TYPE && (unsignedp || bits > 1))
>  return build_bitint_type (bits, unsignedp);
>return build_nonstandard_integer_type (bits, unsignedp);
>  }
> --- gcc/c-family/c-common.cc.jj   2023-09-06 17:34:24.467254960 +0200
> +++ gcc/c-family/c-common.cc  2023-09-11 16:24:07.873300311 +0200
> @@ -2739,7 +2739,9 @@ c_common_signed_or_unsigned_type (int un
>|| TYPE_UNSIGNED (type) == unsignedp)
>  return type;
>  
> -  if (TREE_CODE (type) == BITINT_TYPE)
> +  if (TREE_CODE (type) == BITINT_TYPE
> +  /* signed _BitInt(1) is invalid, avoid creating that.  */
> +  && (unsignedp || TYPE_PRECISION (type) > 1))
>  return build_bitint_type (TYPE_PRECISION (type), unsignedp);
>  
>  #define TYPE_OK(node)
> \
> --- gcc/fold-const.cc.jj  2023-09-11 11:05:47.473728473 +0200
> +++ gcc/fold-const.cc 2023-09-11 16:28:06.052141516 +0200
> @@ -5565,7 +5565,12 @@ range_check_type (tree etype)
>else
>   return NULL_TREE;
>  }
> -  else if (POINTER_TYPE_P (etype) || TREE_CODE (etype) == OFFSET_TYPE)
> +  else if (POINTER_TYPE_P (etype)
> +|| TREE_CODE (etype) == OFFSET_TYPE
> +/* Right now all BITINT_TYPEs satisfy
> +   (unsigned) max + 1 == (unsigned) min, so no need to verify
> +   that like for INTEGER_TYPEs.  */
> +|| TREE_CODE (etype) == BITINT_TYPE)
>  etype = unsigned_type_for (etype);
>return etype;
>  }
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] MATCH: Simplify (a CMP1 b) ^ (a CMP2 b)

2023-09-12 Thread Richard Biener via Gcc-patches
On Tue, Sep 12, 2023 at 6:22 AM Andrew Pinski via Gcc-patches
 wrote:
>
> This adds the missing optimizations here.
> Note we don't need to match where CMP1 and CMP2 are complements of each
> other as that is already handled elsewhere.
>
> I added a new executable testcase to make sure we optimize it correctly
> as I had originally messed up one of the entries for the resulting
> comparison to make sure they were 100% correct.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> PR tree-optimization/107881
>
> gcc/ChangeLog:
>
> * match.pd (`(a CMP1 b) ^ (a CMP2 b)`): New pattern.
> (`(a CMP1 b) == (a CMP2 b)`): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/execute/pr107881-1.c: New test.
> * gcc.dg/tree-ssa/cmpeq-4.c: New test.
> * gcc.dg/tree-ssa/cmpxor-1.c: New test.
> ---
>  gcc/match.pd  |  20 +++
>  .../gcc.c-torture/execute/pr107881-1.c| 115 ++
>  gcc/testsuite/gcc.dg/tree-ssa/cmpeq-4.c   |  51 
>  gcc/testsuite/gcc.dg/tree-ssa/cmpxor-1.c  |  51 
>  4 files changed, 237 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr107881-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpeq-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpxor-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index e96e385c6fa..39c7ea1088f 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3154,6 +3154,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>{ constant_boolean_node (true, type); })
>   ))
>
> +/* Optimize (a CMP b) ^ (a CMP b)  */
> +/* Optimize (a CMP b) != (a CMP b)  */
> +(for op (bit_xor ne)
> + (for cmp1 (lt lt lt le le le)
> +  cmp2 (gt eq ne ge eq ne)
> +  rcmp (ne le gt ne lt ge)
> +  (simplify
> +   (op:c (cmp1:c @0 @1) (cmp2:c @0 @1))
> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
> +(rcmp @0 @1)
> +
> +/* Optimize (a CMP b) == (a CMP b)  */
> +(for cmp1 (lt lt lt le le le)
> + cmp2 (gt eq ne ge eq ne)
> + rcmp (eq gt le eq ge lt)
> + (simplify
> +  (eq:c (cmp1:c @0 @1) (cmp2:c @0 @1))
> +  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
> +(rcmp @0 @1
> +
>  /* We can't reassociate at all for saturating types.  */
>  (if (!TYPE_SATURATING (type))
>
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr107881-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr107881-1.c
> new file mode 100644
> index 000..063ec4c2797
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr107881-1.c
> @@ -0,0 +1,115 @@
> +#define func(vol, op1, op2, op3)   \
> +_Bool op1##_##op2##_##op3##_##vol (int a, int b)   \
> +{  \
> + vol _Bool x = op_##op1(a, b); \
> + vol _Bool y = op_##op2(a, b); \
> + return op_##op3(x, y);\
> +}
> +
> +#define op_lt(a, b) ((a) < (b))
> +#define op_le(a, b) ((a) <= (b))
> +#define op_eq(a, b) ((a) == (b))
> +#define op_ne(a, b) ((a) != (b))
> +#define op_gt(a, b) ((a) > (b))
> +#define op_ge(a, b) ((a) >= (b))
> +#define op_xor(a, b) ((a) ^ (b))
> +
> +
> +#define funcs(a) \
> + a(lt,lt,ne) \
> + a(lt,lt,eq) \
> + a(lt,lt,xor) \
> + a(lt,le,ne) \
> + a(lt,le,eq) \
> + a(lt,le,xor) \
> + a(lt,gt,ne) \
> + a(lt,gt,eq) \
> + a(lt,gt,xor) \
> + a(lt,ge,ne) \
> + a(lt,ge,eq) \
> + a(lt,ge,xor) \
> + a(lt,eq,ne) \
> + a(lt,eq,eq) \
> + a(lt,eq,xor) \
> + a(lt,ne,ne) \
> + a(lt,ne,eq) \
> + a(lt,ne,xor) \
> +  \
> + a(le,lt,ne) \
> + a(le,lt,eq) \
> + a(le,lt,xor) \
> + a(le,le,ne) \
> + a(le,le,eq) \
> + a(le,le,xor) \
> + a(le,gt,ne) \
> + a(le,gt,eq) \
> + a(le,gt,xor) \
> + a(le,ge,ne) \
> + a(le,ge,eq) \
> + a(le,ge,xor) \
> + a(le,eq,ne) \
> + a(le,eq,eq) \
> + a(le,eq,xor) \
> + a(le,ne,ne) \
> + a(le,ne,eq) \
> + a(le,ne,xor)  \
> + \
> + a(gt,lt,ne) \
> + a(gt,lt,eq) \
> + a(gt,lt,xor) \
> + a(gt,le,ne) \
> + a(gt,le,eq) \
> + a(gt,le,xor) \
> + a(gt,gt,ne) \
> + a(gt,gt,eq) \
> + a(gt,gt,xor) \
> + a(gt,ge,ne) \
> + a(gt,ge,eq) \
> + a(gt,ge,xor) \
> + a(gt,eq,ne) \
> + a(gt,eq,eq) \
> + a(gt,eq,xor) \
> + a(gt,ne,ne) \
> + a(gt,ne,eq) \
> + a(gt,ne,xor) \
> +  \
> + a(ge,lt,ne) \
> + a(ge,lt,eq) \
> + a(ge,lt,xor) \
> + a(ge,le,ne) \
> + a(ge,le,eq) \
> + a(ge,le,xor) \
> + a(ge,gt,ne) \
> + a(ge,gt,eq) \
> + a(ge,gt,xor) \
> + a(ge,ge,ne) \
> + a(ge,ge,eq) \
> + a(ge,ge,xor) \
> + a(ge,eq,ne) \
> + a(ge,eq,eq) \
> + a(ge,eq,xor) \
> + a(ge,ne,ne) \
> + a(ge,ne,eq) \
> + a(ge,ne,xor)
> +
> +#define funcs1(a,b,c) \
> +func(,a,b,c) \
> +func(volatile,a,b,c)
> +
> +funcs(funcs1)
> +
> +#define test(op1,op2,op3)  \
> +do {   \
> +  if (op1##_##op2##_##op3##_(x,y)  \
> +  != op1##_##op2##_##op3##_volatile(x,y))  \
> +__builtin_abort(); \
> +} while(0);
> +
> +int main()
> +{
> +  for(int x = 

Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread Robin Dapp via Gcc-patches


> This is first version of dynamic LMUL.
> I didn't test it with full GCC testsuite.
> 
> My plan is to first pass all GCC testsuite (including vect.exp) with default 
> LMUL = M1.
> Then enable dynamic LMUL to test it.
> 
> Maybe we could tolerate this ICE issue for now. Then we can test it
> with full GCC testsuite (I belive we can reproduce with some case in
> GCC testsuite in the future).
> 
> Is that reasonable ? If yes, I will fix all your comments and send V5.

Yes, works for me.

Regards
 Robin



libffi: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951] (was: [PATCH v5 GCC] libffi/test: Fix compilation for build sysroot)

2023-09-12 Thread Thomas Schwinge
Hi!

On 2020-04-20T14:18:40+0100, "Maciej W. Rozycki via Gcc-patches" 
 wrote:
> Fix a problem with the libffi testsuite using a method to determine the
> compiler to use resulting in the tool being different from one the
> library has been built with, and causing a catastrophic failure from the
> inability to actually choose any compiler at all in a cross-compilation
> configuration.

This has since, as far as I can tell, been resolved properly by H.J. Lu's
GCC commit 5be7b66998127286fada45e4f23bd8a2056d553e,
"libffi: Integrate build with GCC", and
GCC commit 4824ed41ba7cd63e60fd9f8769a58b79935a90d1
"libffi: Integrate testsuite with GCC testsuite".

> Address this problem by providing a DejaGNU configuration file defining
> the compiler to use, via the CC_FOR_TARGET TCL variable, set from $CC by
> autoconf, which will have all the required options set for the target
> compiler to build executables in the environment configured

As we've found, this is conceptually problematic, as discussed in

"Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing 
(instead of build-time 'CC' etc.) [PR109951]".
I therefore suggest to apply to GCC libffi the conceptually same changes
as I've just pushed for libgomp:

"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' 
build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]".
OK to push the attached
"libffi: Consider '--with-build-sysroot=[...]' for target libraries' build-tree 
testing (instead of build-time 'CC' etc.) [PR109951]"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 8b8654d04dcbb7f0a5947bc21efc5b9c60b3b6c6 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 11 Sep 2023 10:50:00 +0200
Subject: [PATCH] libffi: Consider '--with-build-sysroot=[...]' for target
 libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951]

Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit a0b48358cb1e70e161a87ec5deb7a4b25defba6b
"libffi/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC', 'CXX'.

	PR testsuite/109951
	libffi/
	* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
	: Don't set 'CC_FOR_TARGET', 'CXX_FOR_TARGET', instead
	set 'SYSROOT_CFLAGS_FOR_TARGET'.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* include/Makefile.in: Likewise.
	* man/Makefile.in: Likewise.
	* testsuite/Makefile.in: Likewise.
	* testsuite/lib/libffi.exp (libffi_target_compile): If
	'--with-build-sysroot=[...]' was specified, use it for build-tree
	testing.
---
 libffi/Makefile.in  |  1 +
 libffi/configure| 10 ++
 libffi/configure.ac |  5 +++--
 libffi/include/Makefile.in  |  1 +
 libffi/man/Makefile.in  |  1 +
 libffi/testsuite/Makefile.in|  1 +
 libffi/testsuite/lib/libffi.exp |  7 +++
 7 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/libffi/Makefile.in b/libffi/Makefile.in
index 1d936b5c8a5..3a55212cc00 100644
--- a/libffi/Makefile.in
+++ b/libffi/Makefile.in
@@ -383,6 +383,7 @@ SED = @SED@
 SET_MAKE = @SET_MAKE@
 SHELL = @SHELL@
 STRIP = @STRIP@
+SYSROOT_CFLAGS_FOR_TARGET = @SYSROOT_CFLAGS_FOR_TARGET@
 TARGET = @TARGET@
 TARGETDIR = @TARGETDIR@
 TARGET_OBJ = @TARGET_OBJ@
diff --git a/libffi/configure b/libffi/configure
index 9eac9c907bf..f1efd6987a3 100755
--- a/libffi/configure
+++ b/libffi/configure
@@ -666,6 +666,7 @@ TESTSUBDIR_TRUE
 MAINT
 MAINTAINER_MODE_FALSE
 MAINTAINER_MODE_TRUE
+SYSROOT_CFLAGS_FOR_TARGET
 READELF
 CXXCPP
 CPP
@@ -11634,7 +11635,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11637 "configure"
+#line 11638 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11740,7 +11741,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11743 "configure"
+#line 11744 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15137,9 +15138,10 @@ _ACEOF
 
 
 
+
+
 cat > local.exp < local.exp <

[PATCH V5] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread Juzhe-Zhong
This patch support dynamic LMUL cost modeling with 
--param=riscv-autovec-lmul=dynamic.

Consider this following case:
void
foo (int32_t *__restrict a, int32_t *__restrict b,int32_t *__restrict c,
  int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
  int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
  int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
  int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
  int32_t *__restrict d,
  int32_t *__restrict d2,
  int32_t *__restrict d3,
  int32_t *__restrict d4,
  int32_t *__restrict d5,
  int n)
{
  for (int i = 0; i < n; i++)
{
  a[i] = b[i] + c[i];
  b5[i] = b[i] + c[i];
  a2[i] = b2[i] + c2[i];
  a3[i] = b3[i] + c3[i];
  a4[i] = b4[i] + c4[i];
  a5[i] = a[i] + a4[i];
  d2[i] = a2[i] + c2[i];
  d3[i] = a3[i] + c3[i];
  d4[i] = a4[i] + c4[i];
  d5[i] = a[i] + a4[i];
  a[i] = a5[i] + b5[i] + a[i];

  c2[i] = a[i] + c[i];
  c3[i] = b5[i] * a5[i];
  c4[i] = a2[i] * a3[i];
  c5[i] = b5[i] * a2[i];
  c[i] = a[i] + c3[i];
  c2[i] = a[i] + c4[i];
  a5[i] = a[i] + a4[i];
  a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i]
  * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
  * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
}
}

Demo: https://godbolt.org/z/x1acoMxGT

You can see it will produce register spilling if you specify LMUL >= 4

Now, with --param=riscv-autovec-lmul=dynamic.

GCC is able to pick LMUL = 2 to optimized this case.

This feature is supported by linear scan based local live ranges analysis and
compute maximum live V_REGS in specific program point of the function to 
determine the VF/LMUL.

Note that this patch can well handle both SLP and non-SLP loop.

Currenty approach didn't consider the later instruction scheduler which may 
improve the register pressure.
In this case, we are conservatively applying smaller VF/LMUL. (Not sure whether 
we should support live range shrink for such corner case since we don't known 
whether it can improve performance a lot.)

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (get_last_live_range): New 
function.
(compute_nregs_for_mode): Ditto.
(live_range_conflict_p): Ditto.
(max_number_of_live_regs): Ditto.
(compute_lmul): Ditto.
(costs::prefer_new_lmul_p): Ditto.
(costs::better_main_loop_than_p): Ditto.
* config/riscv/riscv-vector-costs.h (struct stmt_point): New struct.
(struct var_live_range): Ditto.
(struct autovec_info): Ditto.
* config/riscv/t-riscv: Update makefile for COST model.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/

libatomic: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951] (was: [PATCH v4 1/5] libatomic/test: Fix compilation for build sy

2023-09-12 Thread Thomas Schwinge
Hi!

On 2020-04-04T00:00:44+0100, "Maciej W. Rozycki via Gcc-patches" 
 wrote:
> Fix a problem with the libatomic testsuite using a method to determine
> the compiler to use resulting in the tool being different from one the
> library has been built with, and causing a catastrophic failure from the
> lack of a suitable `--sysroot=' option where the `--with-build-sysroot='
> configuration option has been used to build the compiler resulting in
> the inability to link executables.
>
> Address this problem by providing a DejaGNU configuration file defining
> the compiler to use, via the GCC_UNDER_TEST TCL variable, set from $CC
> by autoconf, which will have all the required options set for the target
> compiler to build executables in the environment configured

As we've found, this is conceptually problematic, as discussed in

"Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing 
(instead of build-time 'CC' etc.)
[PR109951]".
I therefore suggest to apply to libatomic the conceptually same changes
as I've just pushed for libgomp:

"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' 
build-tree testing (instead of build-time 'CC'
etc.) [PR91884, PR109951]".
OK to push the attached
"libatomic: Consider '--with-build-sysroot=[...]' for target libraries' 
build-tree testing (instead of build-time 'CC' etc.) [PR109951]"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 584bfb74e802b94c490b963bd05ed520b5c6e453 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 11 Sep 2023 11:36:31 +0200
Subject: [PATCH] libatomic: Consider '--with-build-sysroot=[...]' for target
 libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951]

Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit 5ff06d762a88077aff0fb637c931c64e6f47f93d
"libatomic/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC'.

	PR testsuite/109951
	libatomic/
	* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
	* testsuite/lib/libatomic.exp (libatomic_init): If
	'--with-build-sysroot=[...]' was specified, use it for build-tree
	testing.
	* testsuite/libatomic-site-extra.exp.in (GCC_UNDER_TEST): Don't
	set.
	(SYSROOT_CFLAGS_FOR_TARGET): Set.
---
 libatomic/Makefile.in   | 1 +
 libatomic/configure | 7 +--
 libatomic/configure.ac  | 2 ++
 libatomic/testsuite/Makefile.in | 1 +
 libatomic/testsuite/lib/libatomic.exp   | 5 +
 libatomic/testsuite/libatomic-site-extra.exp.in | 2 +-
 6 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index 83efe7d2694..2d2d64ee947 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -328,6 +328,7 @@ SET_MAKE = @SET_MAKE@
 SHELL = @SHELL@
 SIZES = @SIZES@
 STRIP = @STRIP@
+SYSROOT_CFLAGS_FOR_TARGET = @SYSROOT_CFLAGS_FOR_TARGET@
 VERSION = @VERSION@
 XCFLAGS = @XCFLAGS@
 XLDFLAGS = @XLDFLAGS@
diff --git a/libatomic/configure b/libatomic/configure
index 57f320753e1..629ad22e833 100755
--- a/libatomic/configure
+++ b/libatomic/configure
@@ -656,6 +656,7 @@ LIBAT_BUILD_VERSIONED_SHLIB_FALSE
 LIBAT_BUILD_VERSIONED_SHLIB_TRUE
 OPT_LDFLAGS
 SECTION_LDFLAGS
+SYSROOT_CFLAGS_FOR_TARGET
 enable_aarch64_lse
 libtool_VERSION
 MAINT
@@ -11402,7 +11403,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11405 "configure"
+#line 11406 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11508,7 +11509,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11511 "configure"
+#line 11512 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11866,6 +11867,8 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
 ;;
 esac
 
+
+
 # Get target configury.
 . ${srcdir}/configure.tgt
 if test -n "$UNSUPPORTED"; then
diff --git a/libatomic/configure.ac b/libatomic/configure.ac
index 318b605a1d7..4beff2d681f 100644
--- a/libatomic/configure.ac
+++ b/libatomic/configure.ac
@@ -170,6 +170,8 @@ case "$target" in
 ;;
 esac
 
+AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)
+
 # Get target configury.
 . ${srcdir}/configure.tgt
 if test -n "$UNSUPPORTED";

Re: [PATCH V5] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread Robin Dapp via Gcc-patches
LGTM.  We should just keep in mind the restrictions discussed in the
other thread.

Regards
 Robin


libgo: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951] (was: [PATCH 3/4] libgo/test: Fix compilation for build sysroot)

2023-09-12 Thread Thomas Schwinge
Hi!

On 2019-11-11T18:12:44+, "Maciej W. Rozycki"  wrote:
> Fix a problem with the libgo testsuite using a method to determine the
> compiler to use resulting in the tool being different from one the
> library has been built with, and causing a catastrophic failure from the
> lack of a suitable `--sysroot=' option where the `--with-build-sysroot='
> configuration option has been used to build the compiler resulting in
> the inability to link executables.
>
> Address this problem by providing a DejaGNU configuration file defining
> the compiler to use, via the GOC_UNDER_TEST TCL variable, set from $GOC
> by autoconf, which will have all the required options set for the target
> compiler to build executables in the environment configured

As we've found, this is conceptually problematic, as discussed in

"Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing 
(instead of build-time 'CC' etc.)
[PR109951]".
I therefore suggest to apply to libgo the conceptually same changes
as I've just pushed for libgomp:

"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' 
build-tree testing (instead of build-time 'CC'
etc.) [PR91884, PR109951]".
OK to push (via Ian/Go upstream) the attached
"libgo: Consider '--with-build-sysroot=[...]' for target libraries' build-tree 
testing (instead of build-time 'CC' etc.) [PR109951]"?

By the way, I've tested this one via hard-coding
'libgo/configure.ac:USE_DEJAGNU' to 'yes', and observing that my
"quick hack to replicate the original requirement"
('internal_error ("MISSING SYSROOT");') no longer triggers.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 81a73112e3d0b43c240c7c9040c24d68c2739bf3 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 11 Sep 2023 16:55:24 +0200
Subject: [PATCH] libgo: Consider '--with-build-sysroot=[...]' for target
 libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951]

Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit b72813a68c943643a6241418f27aa8b9d4614647
"libgo: fix DejaGNU testsuite compiler when using build sysroot" done
differently, avoiding build-tree testing use of any random gunk that may
appear in build-time 'GOC'.

	PR testsuite/109951
	libgo/
	* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
	* testsuite/lib/libgo.exp (libgo_init): If
	'--with-build-sysroot=[...]' was specified, use it for build-tree
	testing.
	* testsuite/libgo-test-support.exp.in (GOC_UNDER_TEST): Don't set.
	(SYSROOT_CFLAGS_FOR_TARGET): Set.
---
 libgo/Makefile.in | 1 +
 libgo/configure   | 7 +--
 libgo/configure.ac| 2 ++
 libgo/testsuite/Makefile.in   | 1 +
 libgo/testsuite/lib/libgo.exp | 8 
 libgo/testsuite/libgo-test-support.exp.in | 2 +-
 6 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/libgo/Makefile.in b/libgo/Makefile.in
index 40340bfb7a5..8dcb6d6a354 100644
--- a/libgo/Makefile.in
+++ b/libgo/Makefile.in
@@ -474,6 +474,7 @@ SPLIT_STACK = @SPLIT_STACK@
 STRINGOPS_FLAG = @STRINGOPS_FLAG@
 STRIP = @STRIP@
 STRUCT_EPOLL_EVENT_FD_OFFSET = @STRUCT_EPOLL_EVENT_FD_OFFSET@
+SYSROOT_CFLAGS_FOR_TARGET = @SYSROOT_CFLAGS_FOR_TARGET@
 USE_DEJAGNU = @USE_DEJAGNU@
 VERSION = @VERSION@
 WARN_FLAGS = @WARN_FLAGS@
diff --git a/libgo/configure b/libgo/configure
index a607dbff68e..2f1609b42b5 100755
--- a/libgo/configure
+++ b/libgo/configure
@@ -633,6 +633,7 @@ ac_subst_vars='am__EXEEXT_FALSE
 am__EXEEXT_TRUE
 LTLIBOBJS
 LIBOBJS
+SYSROOT_CFLAGS_FOR_TARGET
 HAVE_STATIC_LINK_FALSE
 HAVE_STATIC_LINK_TRUE
 HAVE_STAT_TIMESPEC_FALSE
@@ -11544,7 +11545,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11547 "configure"
+#line 11548 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11650,7 +11651,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11653 "configure"
+#line 11654 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -16147,6 +16148,8 @@ else
 fi
 
 
+
+
 cat >confcache <<\_ACEOF
 # This file is a shell script that caches the results of configure
 # tests run on this system so they can be shared between configure
diff --git a/libgo/configure.ac b/libgo/configure.ac
index a5

Re: [PATCH] small _BitInt tweaks

2023-09-12 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 12, 2023 at 10:27:18AM +, Richard Biener wrote:
> On Mon, 11 Sep 2023, Jakub Jelinek wrote:
> > And, also I think it is undesirable when being asked for signed_type_for
> > of unsigned _BitInt(1) (which is valid) to get signed _BitInt(1) (which is
> > invalid, the standard only allows signed _BitInt(2) and larger), so the
> > patch returns 1-bit signed INTEGER_TYPE for those cases.
> 
> I think the last bit is a bit surprising - do the frontends use
> signed_or_unsigned_type_for and would they be confused if getting
> back an INTEGER_TYPE here?

I see a single c-family/c-pretty-print.cc use of signed_or_unsigned_type_for
and none of signed_type_for in the C/C++ FEs (unsigned_type_for is used in a
couple of spots, but that isn't affected), c_common_signed_type
or c_common_signed_or_unsigned_type is used more than that, but I still
think it is mostly used for warning stuff and similar or when called with
some specific types like sizetype.  I don't think the FE uses (or should
use) those functions to decide e.g. on types of expressions etc., that is
what common_type etc. are for.
And, for the very small precisions the distinction between BITINT_TYPE and
INTEGER_TYPE should be limited to just loads/stores from memory (in case
there are different rules for what to do with padding bits in those cases if
any) and on function arguments/return values, I think none of this is really
affected by those signed_type_for/c_common_signed_type results.

And by ensuring we never create 1-bit signed BITINT_TYPE e.g. the backends
don't need to worry about them.

But I admit I don't feel strongly about that.

Joseph, what do you think about this?

Jakub



RE: [PATCH V5] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread Li, Pan2 via Gcc-patches
Committed, thanks Robin.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Robin Dapp via Gcc-patches
Sent: Tuesday, September 12, 2023 7:07 PM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: kito.ch...@sifive.com; kito.ch...@gmail.com
Subject: Re: [PATCH V5] RISC-V: Support Dynamic LMUL Cost model

LGTM.  We should just keep in mind the restrictions discussed in the
other thread.

Regards
 Robin


[committed] libstdc++: Format Python code according to PEP8

2023-09-12 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

These files were filtered through autopep8 to reformat them more
conventionally.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py: Reformat.
* python/libstdcxx/v6/xmethods.py: Likewise.
---
 libstdc++-v3/python/libstdcxx/v6/printers.py | 651 +++
 libstdc++-v3/python/libstdcxx/v6/xmethods.py |  58 +-
 2 files changed, 446 insertions(+), 263 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 37a447b514b..c0056de2565 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -18,10 +18,12 @@
 import gdb
 import itertools
 import re
-import sys, os, errno
+import sys
+import os
+import errno
 import datetime
 
-### Python 2 + Python 3 compatibility code
+# Python 2 + Python 3 compatibility code
 
 # Resources about compatibility:
 #
@@ -38,7 +40,7 @@ import datetime
 # 
 
 if sys.version_info[0] > 2:
-### Python 3 stuff
+# Python 3 stuff
 Iterator = object
 # Python 3 folds these into the normal functions.
 imap = map
@@ -47,7 +49,7 @@ if sys.version_info[0] > 2:
 long = int
 _utc_timezone = datetime.timezone.utc
 else:
-### Python 2 stuff
+# Python 2 stuff
 class Iterator:
 """Compatibility mixin for iterators
 
@@ -98,6 +100,8 @@ except ImportError:
 # Starting with the type ORIG, search for the member type NAME.  This
 # handles searching upward through superclasses.  This is needed to
 # work around http://sourceware.org/bugzilla/show_bug.cgi?id=13615.
+
+
 def find_type(orig, name):
 typ = orig.strip_typedefs()
 while True:
@@ -116,8 +120,10 @@ def find_type(orig, name):
 else:
 raise ValueError("Cannot find type %s::%s" % (str(orig), name))
 
+
 _versioned_namespace = '__8::'
 
+
 def lookup_templ_spec(templ, *args):
 """
 Lookup template specialization templ
@@ -139,6 +145,8 @@ def lookup_templ_spec(templ, *args):
 
 # Use this to find container node types instead of find_type,
 # see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91997 for details.
+
+
 def lookup_node_type(nodename, containertype):
 """
 Lookup specialization of template NODENAME corresponding to CONTAINERTYPE.
@@ -168,6 +176,7 @@ def lookup_node_type(nodename, containertype):
 pass
 return None
 
+
 def is_member_of_namespace(typ, *namespaces):
 """
 Test whether a type is a member of one of the specified namespaces.
@@ -181,6 +190,7 @@ def is_member_of_namespace(typ, *namespaces):
 return True
 return False
 
+
 def is_specialization_of(x, template_name):
 """
 Test whether a type is a specialization of the named class template.
@@ -195,12 +205,14 @@ def is_specialization_of(x, template_name):
 return re.match('^std::(%s)?%s<.*>$' % (_versioned_namespace, 
template_name), x) is not None
 return re.match('^std::%s<.*>$' % template_name, x) is not None
 
+
 def strip_versioned_namespace(typename):
 global _versioned_namespace
 if _versioned_namespace:
 return typename.replace(_versioned_namespace, '')
 return typename
 
+
 def strip_inline_namespaces(type_str):
 "Remove known inline namespaces from the canonical name of a type."
 type_str = strip_versioned_namespace(type_str)
@@ -212,6 +224,7 @@ def strip_inline_namespaces(type_str):
 type_str = type_str.replace(fs_ns+'v1::', fs_ns)
 return type_str
 
+
 def get_template_arg_list(type_obj):
 "Return a type's template arguments as a list"
 n = 0
@@ -223,6 +236,7 @@ def get_template_arg_list(type_obj):
 return template_args
 n += 1
 
+
 class SmartPtrIterator(Iterator):
 "An iterator for smart pointer types with a single 'child' value"
 
@@ -238,28 +252,29 @@ class SmartPtrIterator(Iterator):
 self.val, val = None, self.val
 return ('get()', val)
 
+
 class SharedPointerPrinter:
 "Print a shared_ptr, weak_ptr, atomic, or atomic"
 
-def __init__ (self, typename, val):
+def __init__(self, typename, val):
 self.typename = strip_versioned_namespace(typename)
 self.val = val
 self.pointer = val['_M_ptr']
 
-def children (self):
+def children(self):
 return SmartPtrIterator(self.pointer)
 
 # Return the _Sp_counted_base<>* that holds the refcounts.
-def _get_refcounts (self):
+def _get_refcounts(self):
 if self.typename == 'std::atomic':
 # A tagged pointer is stored as uintptr_t.
 ptr_val = self.val['_M_refcount']['_M_val']['_M_i']
-ptr_val = ptr_val - (ptr_val % 2) # clear lock bit
+ptr_val = ptr_val - (ptr_val % 2)  # clear lock bit
 ptr_type = find_type(self.val['_M_refcount'].type, 'pointer')
 return ptr_val.cast(ptr_type)
 ret

[committed] contrib: Quote variable in test expression [PR111360]

2023-09-12 Thread Jonathan Wakely via Gcc-patches
Committed as obvious.

-- >8 --

Without the quotes some shells will always return true and some will
print an error. It should be quoted so that a null variable works as
intended.

contrib/ChangeLog:

PR other/111360
* gcc_update: Quote variable.
---
 contrib/gcc_update | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/gcc_update b/contrib/gcc_update
index 1d7bfab4935..cda2bdb0df9 100755
--- a/contrib/gcc_update
+++ b/contrib/gcc_update
@@ -343,7 +343,7 @@ case $vcs_type in
revision=`$GCC_GIT log -n1 --pretty=tformat:%h`
r=`$GCC_GIT describe --all --match 'basepoints/gcc-[0-9]*' HEAD \
   | sed -n 
's,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)-\([0-9]\+\)-g[0-9a-f]*$,r\2-\3,p;s,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)$,r\2-0,p'`;
-   if test -n $r; then
+   if test -n "$r"; then
o=`$GCC_GIT config --get gcc-config.upstream`;
rr=`echo $r | sed -n 
's,^r\([0-9]\+\)-[0-9]\+\(-g[0-9a-f]\+\)\?$,\1,p'`;
if $GCC_GIT rev-parse --verify --quiet 
${o:-origin}/releases/gcc-$rr >/dev/null; then
-- 
2.41.0



Re: [PATCH 1/3] libstdc++: Remove std::bind_front specialization for no bound args

2023-09-12 Thread Patrick Palka via Gcc-patches
On Mon, 11 Sep 2023, Patrick Palka wrote:

> This specialization for the case of no bound args, added by
> r13-4214-gcbd05ca5ab1231, seems to be mostly obsoleted by
> r13-5033-ge2eab3c4edb6aa which added [[no_unique_address]] to the
> main template's data members.  And the compile time advantage of
> avoiding an empty tuple and index_sequence seems minimal.  Removing this
> specialization also means we don't have to fix the PR111327 bug in
> another place.

FWIW I don't feel strongly about removing this specialization.  If we
keep it We'd at least be able to reuse it for std::bind_back, and it
wouldn't be hard to fix the PR111327 bug in its implementation.

> 
>   PR libstdc++/111327
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/std/functional (_Bind_front0): Remove.
>   (_Bind_front_t): Adjust.
> ---
>  libstdc++-v3/include/std/functional | 63 +
>  1 file changed, 1 insertion(+), 62 deletions(-)
> 
> diff --git a/libstdc++-v3/include/std/functional 
> b/libstdc++-v3/include/std/functional
> index 60d4d1f3dd2..7d1b890bb4e 100644
> --- a/libstdc++-v3/include/std/functional
> +++ b/libstdc++-v3/include/std/functional
> @@ -996,69 +996,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>[[no_unique_address]] std::tuple<_BoundArgs...> _M_bound_args;
>  };
>  
> -  // Avoid the overhead of an empty tuple<> if there are no bound args.
> -  template
> -struct _Bind_front0
> -{
> -  static_assert(is_move_constructible_v<_Fd>);
> -
> -  // First parameter is to ensure this constructor is never used
> -  // instead of the copy/move constructor.
> -  template
> - explicit constexpr
> - _Bind_front0(int, _Fn&& __fn)
> - noexcept(is_nothrow_constructible_v<_Fd, _Fn>)
> - : _M_fd(std::forward<_Fn>(__fn))
> - { }
> -
> -  _Bind_front0(const _Bind_front0&) = default;
> -  _Bind_front0(_Bind_front0&&) = default;
> -  _Bind_front0& operator=(const _Bind_front0&) = default;
> -  _Bind_front0& operator=(_Bind_front0&&) = default;
> -  ~_Bind_front0() = default;
> -
> -  template
> - constexpr
> - invoke_result_t<_Fd&, _CallArgs...>
> - operator()(_CallArgs&&... __call_args) &
> - noexcept(is_nothrow_invocable_v<_Fd&, _CallArgs...>)
> - { return std::invoke(_M_fd, std::forward<_CallArgs>(__call_args)...); }
> -
> -  template
> - constexpr
> - invoke_result_t
> - operator()(_CallArgs&&... __call_args) const &
> - noexcept(is_nothrow_invocable_v)
> - { return std::invoke(_M_fd, std::forward<_CallArgs>(__call_args)...); }
> -
> -  template
> - constexpr
> - invoke_result_t<_Fd, _CallArgs...>
> - operator()(_CallArgs&&... __call_args) &&
> - noexcept(is_nothrow_invocable_v<_Fd, _CallArgs...>)
> - {
> -   return std::invoke(std::move(_M_fd),
> -  std::forward<_CallArgs>(__call_args)...);
> - }
> -
> -  template
> - constexpr
> - invoke_result_t
> - operator()(_CallArgs&&... __call_args) const &&
> - noexcept(is_nothrow_invocable_v)
> - {
> -   return std::invoke(std::move(_M_fd),
> -  std::forward<_CallArgs>(__call_args)...);
> - }
> -
> -private:
> -  [[no_unique_address]] _Fd _M_fd;
> -};
> -
>template
> -using _Bind_front_t
> -  = __conditional_t>,
> - _Bind_front, decay_t<_Args>...>>;
> +using _Bind_front_t = _Bind_front, decay_t<_Args>...>;
>  
>/** Create call wrapper by partial application of arguments to function.
> *
> -- 
> 2.42.0.158.g94e83dcf5b
> 
> 



[PATCH v1] RISC-V: Remove unused structure in cost model

2023-09-12 Thread Pan Li via Gcc-patches
From: Pan Li 

The struct range is unused, remove it.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.h (struct range): Removed.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-vector-costs.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-costs.h 
b/gcc/config/riscv/riscv-vector-costs.h
index 7f120b79619..7b5814a4cff 100644
--- a/gcc/config/riscv/riscv-vector-costs.h
+++ b/gcc/config/riscv/riscv-vector-costs.h
@@ -40,13 +40,6 @@ struct autovec_info
   bool end_p;
 };
 
-struct range
-{
-  unsigned int pt;
-  bool start;
-  unsigned int nregs;
-};
-
 /* rvv-specific vector costs.  */
 class costs : public vector_costs
 {
-- 
2.34.1



Re: [PATCH 00/13] libstdc++: Add support for running tests with multiple -std options

2023-09-12 Thread Jonathan Wakely via Gcc-patches
On Mon, 11 Sept 2023 at 17:37, Jonathan Wakely via Libstdc++
 wrote:
>
> This patch series replicates the behaviour of the g++ testsuite, so that
> libstdc++ tests can easily be run for multiple different -std options in
> a single testsuite run.  As described in the updated docs, the -std
> options to use for every test can be overridden by setting v3_std_list
> in ~/.dejagnurc or $DEJAGNU, or setting $GLIBCXX_TESTSUITE_STDS in the
> environment.  If not overridden, the default is just to run with
> -std=gnu++17 (so that we don't increase the time taken for a full
> testsuite run).
>
> Tests that require a newer standard than C++17 will default to that
> newer standard and C++26, so e.g. std::format tests will be run with
> both -std=gnu++20 and -std=gnu++26.  This does increase the number of
> tests, but only for the subset of tests for C++20/23/26 features.  If
> this is too costly for testers, we can change that (this might be
> needed, because the C++20 tests for std::ranges and std::format are
> particularly slow to compile).
>
> Because a correct default will be chosen for tests that require
> something newer than C++17, we no longer need dg-options "-std=gnu++20"
> or similar in any tests.  Removing the explicit -std option allows the
> test to be run for later standards via the v3_std_list settings, so that
> we can verify that C++20 features still work in C++23 and C++26, for
> example.  This change already found some tests which failed when run
> with a later standard (see r14-3771-gf12e26f3496275).
>
> Patches 2-13 in the series remove those unnecessary dg-options from
> about half the relevant tests, but there are more than 500 others that
> still need adjusting.
>
> We can remove files like testsuite/std/format/functions/format_c++23.cc
> which only exist to duplicate existing tests with a different -std
> option.  We can remove that file now, and rely on format.cc being run
> with multiple -std options by libstdc++ maintainers.
>
> It might also be useful to add a 'make check-quick' target which runs a
> small subset of smoke tests with every standard version in v3_std_list.
> This would be a suitable target for CI bots and for packagers who want
> to verify that a build of GCC is functional, without running the entire
> libstdc++ testsuite.

There's a problem with this change. Some of our tests fail if they're
run more than once.

We have some static data files which are copied into the test
directory by libstdc++_init at the start of the run. But some tests
modify those files, so if the same test gets run multiple times, the
file is no longer in the expected state after the first test.

This only shows up when overriding the list of -std modes to include
more than one option. The tests pass on the first run, and fail for
subsequent ones:

Running /home/test/src/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
...
PASS: 27_io/basic_filebuf/seekoff/char/1-io.cc  -std=gnu++98 (test for
excess errors)
PASS: 27_io/basic_filebuf/seekoff/char/1-io.cc  -std=gnu++98 execution test
PASS: 27_io/basic_filebuf/seekoff/char/1-io.cc  -std=gnu++11 (test for
excess errors)
FAIL: 27_io/basic_filebuf/seekoff/char/1-io.cc  -std=gnu++11 execution test
PASS: 27_io/basic_filebuf/seekoff/char/1-io.cc  -std=gnu++14 (test for
excess errors)
FAIL: 27_io/basic_filebuf/seekoff/char/1-io.cc  -std=gnu++14 execution test
PASS: 27_io/basic_filebuf/seekoff/char/1-io.cc  -std=gnu++17 (test for
excess errors)
FAIL: 27_io/basic_filebuf/seekoff/char/1-io.cc  -std=gnu++17 execution test
PASS: 27_io/basic_filebuf/seekoff/char/1-io.cc  -std=gnu++20 (test for
excess errors)
FAIL: 27_io/basic_filebuf/seekoff/char/1-io.cc  -std=gnu++20 execution test
PASS: 27_io/basic_filebuf/seekoff/char/1-io.cc  -std=gnu++23 (test for
excess errors)
FAIL: 27_io/basic_filebuf/seekoff/char/1-io.cc  -std=gnu++23 execution test

We either need to copy the data files again after each test, or
rewrite the tests to be idempotent.



Re: [PATCH v1] RISC-V: Remove unused structure in cost model

2023-09-12 Thread Jeff Law via Gcc-patches




On 9/12/23 07:02, Pan Li via Gcc-patches wrote:

From: Pan Li 

The struct range is unused, remove it.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.h (struct range): Removed.

OK
jeff


Re: [PATCH 1/3] libstdc++: Remove std::bind_front specialization for no bound args

2023-09-12 Thread Jonathan Wakely via Gcc-patches
On Tue, 12 Sept 2023 at 13:46, Patrick Palka via Libstdc++
 wrote:
>
> On Mon, 11 Sep 2023, Patrick Palka wrote:
>
> > This specialization for the case of no bound args, added by
> > r13-4214-gcbd05ca5ab1231, seems to be mostly obsoleted by
> > r13-5033-ge2eab3c4edb6aa which added [[no_unique_address]] to the
> > main template's data members.  And the compile time advantage of
> > avoiding an empty tuple and index_sequence seems minimal.  Removing this
> > specialization also means we don't have to fix the PR111327 bug in
> > another place.
>
> FWIW I don't feel strongly about removing this specialization.  If we
> keep it We'd at least be able to reuse it for std::bind_back, and it
> wouldn't be hard to fix the PR111327 bug in its implementation.

Yeah, I'm ambivalent. But since you've got a patch to fix 111327 ready
which doesn't include this specialization, let's remove it.

The empty std::tuple is at least already explicitly specialized, so I
agree its overhead probably isn't very significant.

OK for trunk. I'm not sure if we should change it in gcc-13 now though.

>
> >
> >   PR libstdc++/111327
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * include/std/functional (_Bind_front0): Remove.
> >   (_Bind_front_t): Adjust.
> > ---
> >  libstdc++-v3/include/std/functional | 63 +
> >  1 file changed, 1 insertion(+), 62 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/std/functional 
> > b/libstdc++-v3/include/std/functional
> > index 60d4d1f3dd2..7d1b890bb4e 100644
> > --- a/libstdc++-v3/include/std/functional
> > +++ b/libstdc++-v3/include/std/functional
> > @@ -996,69 +996,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >[[no_unique_address]] std::tuple<_BoundArgs...> _M_bound_args;
> >  };
> >
> > -  // Avoid the overhead of an empty tuple<> if there are no bound args.
> > -  template
> > -struct _Bind_front0
> > -{
> > -  static_assert(is_move_constructible_v<_Fd>);
> > -
> > -  // First parameter is to ensure this constructor is never used
> > -  // instead of the copy/move constructor.
> > -  template
> > - explicit constexpr
> > - _Bind_front0(int, _Fn&& __fn)
> > - noexcept(is_nothrow_constructible_v<_Fd, _Fn>)
> > - : _M_fd(std::forward<_Fn>(__fn))
> > - { }
> > -
> > -  _Bind_front0(const _Bind_front0&) = default;
> > -  _Bind_front0(_Bind_front0&&) = default;
> > -  _Bind_front0& operator=(const _Bind_front0&) = default;
> > -  _Bind_front0& operator=(_Bind_front0&&) = default;
> > -  ~_Bind_front0() = default;
> > -
> > -  template
> > - constexpr
> > - invoke_result_t<_Fd&, _CallArgs...>
> > - operator()(_CallArgs&&... __call_args) &
> > - noexcept(is_nothrow_invocable_v<_Fd&, _CallArgs...>)
> > - { return std::invoke(_M_fd, std::forward<_CallArgs>(__call_args)...); 
> > }
> > -
> > -  template
> > - constexpr
> > - invoke_result_t
> > - operator()(_CallArgs&&... __call_args) const &
> > - noexcept(is_nothrow_invocable_v)
> > - { return std::invoke(_M_fd, std::forward<_CallArgs>(__call_args)...); 
> > }
> > -
> > -  template
> > - constexpr
> > - invoke_result_t<_Fd, _CallArgs...>
> > - operator()(_CallArgs&&... __call_args) &&
> > - noexcept(is_nothrow_invocable_v<_Fd, _CallArgs...>)
> > - {
> > -   return std::invoke(std::move(_M_fd),
> > -  std::forward<_CallArgs>(__call_args)...);
> > - }
> > -
> > -  template
> > - constexpr
> > - invoke_result_t
> > - operator()(_CallArgs&&... __call_args) const &&
> > - noexcept(is_nothrow_invocable_v)
> > - {
> > -   return std::invoke(std::move(_M_fd),
> > -  std::forward<_CallArgs>(__call_args)...);
> > - }
> > -
> > -private:
> > -  [[no_unique_address]] _Fd _M_fd;
> > -};
> > -
> >template
> > -using _Bind_front_t
> > -  = __conditional_t>,
> > - _Bind_front, decay_t<_Args>...>>;
> > +using _Bind_front_t = _Bind_front, decay_t<_Args>...>;
> >
> >/** Create call wrapper by partial application of arguments to function.
> > *
> > --
> > 2.42.0.158.g94e83dcf5b
> >
> >
>



Re: [PATCH 2/3] libstdc++: Fix std::bind_front perfect forwarding [PR111327]

2023-09-12 Thread Jonathan Wakely via Gcc-patches
On Tue, 12 Sept 2023 at 02:09, Patrick Palka via Libstdc++
 wrote:
>
> In order to properly implement a perfect forwarding call wrapper
> (before 'deducing this' at least) we need a total of 8 operator()
> overloads, 4 main ones and 4 deleted ones for each const/ref qual pair,
> as described in section 5.5 of P0847R6.  Otherwise the wrapper may
> not perfectly forward according to the value category and constness
> of the wrapped object.  This patch fixes this bug in std::bind_front.

OK for trunk, thanks.

>
> PR libstdc++/111327
>
> libstdc++-v3/ChangeLog:
>
> * include/std/functional (_Bind_front::operator()): Add deleted
> fallback overloads for each const/ref qualifier pair.  Give the
> main overloads dummy constraints to make them more specialized
> than the deleted overloads.
> * testsuite/20_util/function_objects/bind_front/111327.cc: New test.
> ---
>  libstdc++-v3/include/std/functional   | 16 
>  .../function_objects/bind_front/111327.cc | 41 +++
>  2 files changed, 57 insertions(+)
>  create mode 100644 
> libstdc++-v3/testsuite/20_util/function_objects/bind_front/111327.cc
>
> diff --git a/libstdc++-v3/include/std/functional 
> b/libstdc++-v3/include/std/functional
> index 7d1b890bb4e..c50b9e4d365 100644
> --- a/libstdc++-v3/include/std/functional
> +++ b/libstdc++-v3/include/std/functional
> @@ -938,6 +938,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>~_Bind_front() = default;
>
>template
> +   requires true
> constexpr
> invoke_result_t<_Fd&, _BoundArgs&..., _CallArgs...>
> operator()(_CallArgs&&... __call_args) &
> @@ -948,6 +949,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> }
>
>template
> +   requires true
> constexpr
> invoke_result_t
> operator()(_CallArgs&&... __call_args) const &
> @@ -959,6 +961,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> }
>
>template
> +   requires true
> constexpr
> invoke_result_t<_Fd, _BoundArgs..., _CallArgs...>
> operator()(_CallArgs&&... __call_args) &&
> @@ -969,6 +972,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> }
>
>template
> +   requires true
> constexpr
> invoke_result_t
> operator()(_CallArgs&&... __call_args) const &&
> @@ -979,6 +983,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   std::forward<_CallArgs>(__call_args)...);
> }
>
> +  template
> +   void operator()(_CallArgs&&...) & = delete;
> +
> +  template
> +   void operator()(_CallArgs&&...) const & = delete;
> +
> +  template
> +   void operator()(_CallArgs&&...) && = delete;
> +
> +  template
> +   void operator()(_CallArgs&&...) const && = delete;
> +
>  private:
>using _BoundIndices = index_sequence_for<_BoundArgs...>;
>
> diff --git 
> a/libstdc++-v3/testsuite/20_util/function_objects/bind_front/111327.cc 
> b/libstdc++-v3/testsuite/20_util/function_objects/bind_front/111327.cc
> new file mode 100644
> index 000..6eb51994476
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/20_util/function_objects/bind_front/111327.cc
> @@ -0,0 +1,41 @@
> +// PR libstdc++/111327 - std::bind_front doesn't perfectly forward according
> +// to value category of the call wrapper object
> +// { dg-options "-std=gnu++20" }
> +// { dg-do compile { target c++20 } }
> +
> +#include 
> +#include 
> +
> +struct F {
> +  void operator()(...) & = delete;
> +  void operator()(...) const &;
> +};
> +
> +struct G {
> +  void operator()(...) && = delete;
> +  void operator()(...) const &&;
> +};
> +
> +int main() {
> +  auto f0 = std::bind_front(F{});
> +  f0(); // { dg-error "deleted" }
> +  std::move(f0)();
> +  std::as_const(f0)();
> +  std::move(std::as_const(f0))();
> +
> +  auto g0 = std::bind_front(G{});
> +  g0(); // { dg-error "deleted" }
> +  std::move(g0)(); // { dg-error "deleted" }
> +  std::move(std::as_const(g0))();
> +
> +  auto f1 = std::bind_front(F{}, 42);
> +  f1(); // { dg-error "deleted" }
> +  std::move(f1)();
> +  std::as_const(f1)();
> +  std::move(std::as_const(f1))();
> +
> +  auto g1 = std::bind_front(G{}, 42);
> +  g1(); // { dg-error "deleted" }
> +  std::move(g1)(); // { dg-error "deleted" }
> +  std::move(std::as_const(g1))();
> +}
> --
> 2.42.0.158.g94e83dcf5b
>



Re: [PATCH 3/3] libstdc++: Fix std::not_fn perfect forwarding [PR111327]

2023-09-12 Thread Jonathan Wakely via Gcc-patches
On Tue, 12 Sept 2023 at 02:11, Patrick Palka via Libstdc++
 wrote:
>
> The previous patch fixed perfect forwarding for std::bind_front.
> This patch fixes the same issue for std::not_fn.
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk and
> perhaps 13?

Yes for both, thanks.

>
> PR libstdc++/111327
>
> libstdc++-v3/ChangeLog:
>
> * include/std/functional (_GLIBCXX_NOT_FN_CALL_OP): Also define
> a deleted fallback operator() overload.  Constrain both the
> main and deleted overloads accordingly.
> * testsuite/20_util/function_objects/not_fn/111327.cc: New test.
> ---
>  libstdc++-v3/include/std/functional   | 10 +--
>  .../20_util/function_objects/not_fn/111327.cc | 29 +++
>  2 files changed, 37 insertions(+), 2 deletions(-)
>  create mode 100644 
> libstdc++-v3/testsuite/20_util/function_objects/not_fn/111327.cc
>
> diff --git a/libstdc++-v3/include/std/functional 
> b/libstdc++-v3/include/std/functional
> index c50b9e4d365..9551e38dfdb 100644
> --- a/libstdc++-v3/include/std/functional
> +++ b/libstdc++-v3/include/std/functional
> @@ -1061,7 +1061,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>// forwarding _M_fn and the function arguments with the same 
> qualifiers,
>// and deducing the return type and exception-specification.
>  #define _GLIBCXX_NOT_FN_CALL_OP( _QUALS )  \
> -  template  \
> +  template +  typename = enable_if_t<__is_invocable<_Fn _QUALS, 
> _Args...>::value>> \
> _GLIBCXX20_CONSTEXPR\
> decltype(_S_not<__inv_res_t<_Fn _QUALS, _Args...>>())   \
> operator()(_Args&&... __args) _QUALS\
> @@ -1070,7 +1071,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> {   \
>   return !std::__invoke(std::forward< _Fn _QUALS >(_M_fn),  \
> std::forward<_Args>(__args)...);\
> -   }
> +   }   \
> +   \
> +  template +  typename = enable_if_t _Args...>::value>> \
> +   void operator()(_Args&&... __args) _QUALS = delete;
> +
>_GLIBCXX_NOT_FN_CALL_OP( & )
>_GLIBCXX_NOT_FN_CALL_OP( const & )
>_GLIBCXX_NOT_FN_CALL_OP( && )
> diff --git a/libstdc++-v3/testsuite/20_util/function_objects/not_fn/111327.cc 
> b/libstdc++-v3/testsuite/20_util/function_objects/not_fn/111327.cc
> new file mode 100644
> index 000..93e00ee8057
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/20_util/function_objects/not_fn/111327.cc
> @@ -0,0 +1,29 @@
> +// PR libstdc++/111327 - std::bind_front (and std::not_fn) doesn't perfectly
> +// forward according to value category of the call wrapper object
> +// { dg-do compile { target c++17 } }
> +
> +#include 
> +#include 
> +
> +struct F {
> +  void operator()(...) & = delete;
> +  bool operator()(...) const &;
> +};
> +
> +struct G {
> +  void operator()(...) && = delete;
> +  bool operator()(...) const &&;
> +};
> +
> +int main() {
> +  auto f = std::not_fn(F{});
> +  f(); // { dg-error "deleted" }
> +  std::move(f)();
> +  std::as_const(f)();
> +  std::move(std::as_const(f))();
> +
> +  auto g = std::not_fn(G{});
> +  g(); // { dg-error "deleted" }
> +  std::move(g)(); // { dg-error "deleted" }
> +  std::move(std::as_const(g))();
> +}
> --
> 2.42.0.158.g94e83dcf5b
>



[PATCH] RISC-V: Support VECTOR BOOL vcond_mask optab[PR111337]

2023-09-12 Thread Juzhe-Zhong
As this PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111337

We support VECTOR BOOL vcond_mask to fix this following ICE:
0x1a9e309 gimple_expand_vec_cond_expr
../../../../gcc/gcc/gimple-isel.cc:283
0x1a9ea56 execute
../../../../gcc/gcc/gimple-isel.cc:390

gcc/ChangeLog:

* config/riscv/autovec.md (@vcond_mask_): New pattern.

---
 gcc/config/riscv/autovec.md | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index e9dd40af935..45a70f16ee1 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -565,6 +565,40 @@
   [(set_attr "type" "vector")]
 )
 
+;; -
+;;  [BOOL] Select based on masks
+;; -
+;; Includes merging patterns for:
+;; - vmand.mm
+;; - vmor.mm
+;; - vmnot.m
+;; -
+
+(define_expand "@vcond_mask_"
+  [(match_operand:VB 0 "register_operand")
+   (match_operand:VB 1 "register_operand")
+   (match_operand:VB 2 "register_operand")
+   (match_operand:VB 3 "register_operand")]
+  "TARGET_VECTOR"
+  {
+/* mask1 = operands[3] & operands[1].  */
+rtx mask1 = expand_binop (mode, and_optab, operands[1],
+ operands[3], NULL_RTX, 0,
+ OPTAB_DIRECT);
+/* mask2 = ~operands[3] & operands[2].  */
+rtx inverse = expand_unop (mode, one_cmpl_optab, operands[3],
+  NULL_RTX, 0);
+rtx mask2 = expand_binop (mode, and_optab, operands[2],
+ inverse, NULL_RTX, 0,
+ OPTAB_DIRECT);
+/* result = mask1 | mask2.  */
+rtx result = expand_binop (mode, ior_optab, mask1,
+  mask2, NULL_RTX, 0,
+  OPTAB_DIRECT);
+emit_move_insn (operands[0], result);
+DONE;
+  })
+
 ;; -
 ;;  [INT,FP] Comparisons
 ;; -
-- 
2.36.3



RE: [PATCH v1] RISC-V: Remove unused structure in cost model

2023-09-12 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Tuesday, September 12, 2023 9:12 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: Wang, Yanzhang ; kito.ch...@gmail.com; 
juzhe.zh...@rivai.ai
Subject: Re: [PATCH v1] RISC-V: Remove unused structure in cost model



On 9/12/23 07:02, Pan Li via Gcc-patches wrote:
> From: Pan Li 
> 
> The struct range is unused, remove it.
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv-vector-costs.h (struct range): Removed.
OK
jeff


Re: [PATCH] RISC-V: Support VECTOR BOOL vcond_mask optab[PR111337]

2023-09-12 Thread Robin Dapp via Gcc-patches
Maybe you want to add PR target/111337 to the changelog?

The rest LGTM.

Regards
 Robin


[PATCH V2] RISC-V: Support VECTOR BOOL vcond_mask optab[PR111337]

2023-09-12 Thread Juzhe-Zhong
   PR target/111337

As this PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111337

We support VECTOR BOOL vcond_mask to fix this following ICE:
0x1a9e309 gimple_expand_vec_cond_expr
../../../../gcc/gcc/gimple-isel.cc:283
0x1a9ea56 execute
../../../../gcc/gcc/gimple-isel.cc:390

gcc/ChangeLog:

* config/riscv/autovec.md (vcond_mask_): New pattern.

---
 gcc/config/riscv/autovec.md | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index e9dd40af935..50c0104550b 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -565,6 +565,40 @@
   [(set_attr "type" "vector")]
 )
 
+;; -
+;;  [BOOL] Select based on masks
+;; -
+;; Includes merging patterns for:
+;; - vmand.mm
+;; - vmor.mm
+;; - vmnot.m
+;; -
+
+(define_expand "vcond_mask_"
+  [(match_operand:VB 0 "register_operand")
+   (match_operand:VB 1 "register_operand")
+   (match_operand:VB 2 "register_operand")
+   (match_operand:VB 3 "register_operand")]
+  "TARGET_VECTOR"
+  {
+/* mask1 = operands[3] & operands[1].  */
+rtx mask1 = expand_binop (mode, and_optab, operands[1],
+ operands[3], NULL_RTX, 0,
+ OPTAB_DIRECT);
+/* mask2 = ~operands[3] & operands[2].  */
+rtx inverse = expand_unop (mode, one_cmpl_optab, operands[3],
+  NULL_RTX, 0);
+rtx mask2 = expand_binop (mode, and_optab, operands[2],
+ inverse, NULL_RTX, 0,
+ OPTAB_DIRECT);
+/* result = mask1 | mask2.  */
+rtx result = expand_binop (mode, ior_optab, mask1,
+  mask2, NULL_RTX, 0,
+  OPTAB_DIRECT);
+emit_move_insn (operands[0], result);
+DONE;
+  })
+
 ;; -
 ;;  [INT,FP] Comparisons
 ;; -
-- 
2.36.3



Re: Re: [PATCH] RISC-V: Support VECTOR BOOL vcond_mask optab[PR111337]

2023-09-12 Thread juzhe.zh...@rivai.ai
Ok add it in V2:

https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630048.html 



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 21:29
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support VECTOR BOOL vcond_mask optab[PR111337]
Maybe you want to add PR target/111337 to the changelog?
 
The rest LGTM.
 
Regards
Robin
 


Re: [PATCH V2] RISC-V: Support VECTOR BOOL vcond_mask optab[PR111337]

2023-09-12 Thread Robin Dapp via Gcc-patches
The PR thing needs to be moved but I can commit it.

Regards
 Robin



[PATCH 14/13] libstdc++: Re-initialize static data files used by tests

2023-09-12 Thread Jonathan Wakely via Gcc-patches
This fixes the problem observed with some filebuf tests.

The "@require@" string seems a bit hacky, as I don't know why that
string is in the tests in the first palce ... but it is there, so this
works.

-- > 8--

Some tests rely on text files with specific content being present in the
test directory. Because the tests modify those files, running the same
test more than once in the same directory will FAIL because the content
of the file is not in the expected state.

This uses a "@require@" marker that happens to be present in those tests
to decide when we need to copy the original files into the test dir
again, so that repeated tests always see the initial file content.

libstdc++-v3/ChangeLog:

* testsuite/lib/libstdc++.exp (v3-init-data-files): New proc.
(libstdc++_init): Use v3-init-data-files.
(v3-dg-runtest): Use v3-init-data-files to update test data
files for repeated tests.
---
 libstdc++-v3/testsuite/lib/libstdc++.exp | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
b/libstdc++-v3/testsuite/lib/libstdc++.exp
index 2c497707184..daace4c1d59 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -102,6 +102,12 @@ proc v3-copy-files {srcfiles} {
 }
 }
 
+proc v3-init-data-files { } {
+global srcdir
+v3-copy-files [glob -nocomplain "$srcdir/data/*.tst"]
+v3-copy-files [glob -nocomplain "$srcdir/data/*.txt"]
+}
+
 # Called once, during runtest.exp setup.
 proc libstdc++_init { testfile } {
 global env
@@ -159,8 +165,7 @@ proc libstdc++_init { testfile } {
 set dg-do-what-default run
 
 # Copy any required data files.
-v3-copy-files [glob -nocomplain "$srcdir/data/*.tst"]
-v3-copy-files [glob -nocomplain "$srcdir/data/*.txt"]
+v3-init-data-files
 
 set ld_library_path_tmp ""
 
@@ -556,11 +561,26 @@ proc v3-dg-runtest { testcases flags default-extra-flags 
} {
set option_list { "" }
}
 
+   # Some tests (e.g. 27_io/basic_filebuf/seek{off,pos}/char/[12]-io.cc)
+   # rely on text files with specific data being present in the test dir.
+   # Because the tests modify those files, running the same test a second
+   # time will FAIL due to the files not being in their initial state.
+   # We rely on the fact that those files contain a "@require@" comment
+   # to trigger creating fresh copies of the files for repeated tests.
+   if [search_for $test "@require@"] {
+   set need_fresh_data_files [llength $option_list]
+   } else {
+   set need_fresh_data_files 0
+   }
+
set nshort [file tail [file dirname $test]]/[file tail $test]
 
foreach flags_t $option_list {
verbose "Testing $nshort, $flags $flags_t" 1
dg-test $test "$flags $flags_t" ${default-extra-flags}
+   if { $need_fresh_data_files > 1 } {
+   v3-init-data-files
+   }
}
 }
 }
-- 
2.41.0



[PATCH] libgomp, nvptx, amdgcn: parallel reverse offload

2023-09-12 Thread Andrew Stubbs

Hi all,

This patch implements parallel execution of OpenMP reverse offload kernels.

The first problem was that GPU device kernels may request reverse 
offload (via the "ancestor" clause) once for each running offload thread 
-- of which there may be thousands -- and the existing implementation 
ran each request serially, whilst blocking all other I/O from that 
device kernel.


The second problem was that the NVPTX plugin runs the reverse offload 
kernel in the context of whichever host thread sees the request first, 
regardless of which kernel originated the request. This is probably 
logically harmless, but may lead to surprising timing when it blocks the 
wrong kernel from exiting until the reverse offload is done. It was also 
only capable of receiving and processing a single request at a time, 
across all running kernels. (GCN did not have these problems.)


Both problems are now solved by making the reverse offload requests 
asynchronous. The host threads still recieve the requests in the same 
way, but instead of running them inline the request is queued for 
execution later in another thread. The requests are then consumed from 
the message passing buffer imediately (allowing I/O to continue, in the 
case of GCN). The device threads that sent requests are still blocked 
waiting for the completion signal, but any other threads may continue as 
usual.


The queued requests are processed by a thread pool created on demand and 
limited by a new environment variable GOMP_REVERSE_OFFLOAD_THREADS. By 
this means reverse offload should become much less of a bottleneck.


In the process of this work I have found and fixed a couple of 
target-specific issues. NVPTX asynchronous streams were independent of 
each other, but still synchronous w.r.t. the default NULL stream. Some 
GCN devices (at least gfx908) seem to have a race condition in the 
message passing system whereby the cache write-back triggered by 
__ATOMIC_RELEASE occurs slower than the atomically written value.


OK for mainline?

Andrewlibgomp: parallel reverse offload

Extend OpenMP reverse offload support to allow running the host kernels
on multiple threads.  The device plugin API for reverse offload is now made
non-blocking, meaning that running the host kernel in the wrong device
context is no longer a problem.  The NVPTX message passing interface now
uses a ring buffer aproximately matching GCN.

include/ChangeLog:

* gomp-constants.h (GOMP_VERSION): Bump.

libgomp/ChangeLog:

* config/gcn/target.c (GOMP_target_ext): Add "signal" field.
Fix atomics race condition.
* config/nvptx/libgomp-nvptx.h (REV_OFFLOAD_QUEUE_SIZE): New define.
(struct rev_offload): Implement ring buffer.
* config/nvptx/target.c (GOMP_target_ext): Likewise.
* env.c (initialize_env): Read GOMP_REVERSE_OFFLOAD_THREADS.
* libgomp-plugin.c (GOMP_PLUGIN_target_rev): Replace "aq" parameter
with "signal" and "use_aq".
* libgomp-plugin.h (GOMP_PLUGIN_target_rev): Likewise.
* libgomp.h (gomp_target_rev): Likewise.
* plugin/plugin-gcn.c (process_reverse_offload): Add "signal".
(console_output): Pass signal value through.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_openacc_async_construct):
Attach new threads to the numbered device.
Change the flag to CU_STREAM_NON_BLOCKING.
(GOMP_OFFLOAD_run): Implement ring-buffer and remove signalling.
* target.c (gomp_target_rev): Rename to ...
(gomp_target_rev_internal): ... this, and change "dev_num" to
"devicep".
(gomp_target_rev_worker_thread): New function.
(gomp_target_rev): New function (old name).
* libgomp.texi: Document GOMP_REVERSE_OFFLOAD_THREADS.
* testsuite/libgomp.c/reverse-offload-threads-1.c: New test.
* testsuite/libgomp.c/reverse-offload-threads-2.c: New test.

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 8d4e8e81303..7ce07508e9d 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -314,7 +314,7 @@ enum gomp_map_kind
 /* Versions of libgomp and device-specific plugins.  GOMP_VERSION
should be incremented whenever an ABI-incompatible change is introduced
to the plugin interface defined in libgomp/libgomp.h.  */
-#define GOMP_VERSION   2
+#define GOMP_VERSION   3
 #define GOMP_VERSION_NVIDIA_PTX 1
 #define GOMP_VERSION_GCN 3
 
diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c
index ea5eb1ff5ed..906b04ca41e 100644
--- a/libgomp/config/gcn/target.c
+++ b/libgomp/config/gcn/target.c
@@ -103,19 +103,38 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t 
mapnum,
   <= (index - 1024))
   asm ("s_sleep 64");
 
+  /* In theory, it should be enough to write "written" with __ATOMIC_RELEASE,
+ and have the rest of the data flushed to memory automatically, but some
+ devices (gfx908) seem to have a race condition where the flushed data
+ a

gcc-patches From rewriting mailman settings (Was: [Linaro-TCWG-CI] gcc patch #75674: FAIL: 68 regressions)

2023-09-12 Thread Mark Wielaard
Hi Maxim,

Adding Jeff to CC who is the official gcc-patches mailinglist admin.

On Tue, 2023-09-12 at 11:08 +0400, Maxim Kuvyrkov wrote:
> Normally, notifications from Linaro TCWG precommit CI are sent only to
> patch author and patch submitter.  In this case the sender was rewritten
> to "Benjamin Priour via Gcc-patches ",
> which was detected by Patchwork [1] as patch submitter.

BTW. Really looking forward to your talk at Cauldron about this!

> Is "From:" re-write on gcc-patches@ mailing list a side-effect of [2]?
> I see that some, but not all messages to gcc-patches@ have their
> "From:" re-written.
> 
> Also, do you know if re-write of "From:" on gcc-patches@ is expected?

Yes, it is expected for emails that come from domains with a dmarc
policy. That is because the current settings of the gcc-patches
mailinglist might slightly alter the message or headers in a way that
invalidates the DKIM signature. Without From rewriting those messages
would be bounced by recipients that check the dmarc policy/dkim
signature.

As you noticed the glibc hackers have recently worked together with the
sourceware overseers to upgrade mailman and alter the postfix and the
libc-alpha mailinglist setting so it doesn't require From rewriting
anymore (the message and header aren't altered anymore to invalidate
the DKIM signatures).

We (Jeff or anyone else with mailman admin privs) could use the same
settings for gcc-patches. The settings that need to be set are in that
bug:

- subject_prefix (general): (empty)
- from_is_list (general): No
- anonymous_list (general): No
- first_strip_reply_to (general): No
- reply_goes_to_list (general): Poster
- reply_to_address (general): (empty)
- include_sender_header (general): No
- drop_cc (general): No
- msg_header (nondigest): (empty)
- msg_footer (nondigest): (empty)
- scrub_nondigest (nondigest): No
- dmarc_moderation_action (privacy): Accept
- filter_content (contentfilter): No

The only visible change (apart from no more From rewriting) is that
HTML multi-parts aren't scrubbed anymore (that would be a message
altering issue). The html part is still scrubbed from the
inbox.sourceware.org archive, so b4 works just fine. But I don't know
what patchwork.sourceware.org does with HTML attachements. Of course
people really shouldn't sent HTML attachments to gcc-patches, so maybe
this is no real problem.

Let me know if you want Jeff (or me or one of the other overseers) make
the above changes to the gcc-patches mailman settings.

Cheers,

Mark

> [1] https://patchwork.sourceware.org/project/gcc/list/
> [2] https://sourceware.org/bugzilla/show_bug.cgi?id=29713



[PATCH 00/19] aarch64: Fix -fstack-protector issue

2023-09-12 Thread Richard Sandiford via Gcc-patches
This series of patches fixes deficiencies in GCC's -fstack-protector
implementation for AArch64 when using dynamically allocated stack space.
This is CVE-2023-4039.  See:

https://developer.arm.com/Arm%20Security%20Center/GCC%20Stack%20Protector%20Vulnerability%20AArch64
https://github.com/metaredteam/external-disclosures/security/advisories/GHSA-x7ch-h5rf-w2mf

for more details.

The fix is to put the saved registers above the locals area when
-fstack-protector is used.

The series also fixes a stack-clash problem that I found while working
on the CVE.  In unpatched sources, the stack-clash problem would only
trigger for unrealistic numbers of arguments (8K 64-bit arguments, or an
equivalent).  But it would be a more significant issue with the new
-fstack-protector frame layout.  It's therefore important that both
problems are fixed together.

Some reorganisation of the code seemed necessary to fix the problems in a
cleanish way.  The series is therefore quite long, but only a handful of
patches should have any effect on code generation.

See the individual patches for a detailed description.

Tested on aarch64-linux-gnu. Pushed to trunk and to all active branches.
I've also pushed backports to GCC 7+ to vendors/ARM/heads/CVE-2023-4039.

Richard Sandiford (19):
  aarch64: Use local frame vars in shrink-wrapping code
  aarch64: Avoid a use of callee_offset
  aarch64: Explicitly handle frames with no saved registers
  aarch64: Add bytes_below_saved_regs to frame info
  aarch64: Add bytes_below_hard_fp to frame info
  aarch64: Tweak aarch64_save/restore_callee_saves
  aarch64: Only calculate chain_offset if there is a chain
  aarch64: Rename locals_offset to bytes_above_locals
  aarch64: Rename hard_fp_offset to bytes_above_hard_fp
  aarch64: Tweak frame_size comment
  aarch64: Measure reg_offset from the bottom of the frame
  aarch64: Simplify top of frame allocation
  aarch64: Minor initial adjustment tweak
  aarch64: Tweak stack clash boundary condition
  aarch64: Put LR save probe in first 16 bytes
  aarch64: Simplify probe of final frame allocation
  aarch64: Explicitly record probe registers in frame info
  aarch64: Remove below_hard_fp_saved_regs_size
  aarch64: Make stack smash canary protect saved registers

 gcc/config/aarch64/aarch64.cc | 518 ++
 gcc/config/aarch64/aarch64.h  |  44 +-
 .../aarch64/stack-check-prologue-17.c |  55 ++
 .../aarch64/stack-check-prologue-18.c | 100 
 .../aarch64/stack-check-prologue-19.c | 100 
 .../aarch64/stack-check-prologue-20.c |   3 +
 .../gcc.target/aarch64/stack-protector-8.c|  95 
 .../gcc.target/aarch64/stack-protector-9.c|  33 ++
 .../aarch64/sve/pcs/stack_clash_3.c   |   6 +-
 9 files changed, 699 insertions(+), 255 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stack-check-prologue-17.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stack-check-prologue-18.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stack-check-prologue-19.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stack-check-prologue-20.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stack-protector-8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stack-protector-9.c

-- 
2.25.1



[PATCH 01/19] aarch64: Use local frame vars in shrink-wrapping code

2023-09-12 Thread Richard Sandiford via Gcc-patches
aarch64_layout_frame uses a shorthand for referring to
cfun->machine->frame:

  aarch64_frame &frame = cfun->machine->frame;

This patch does the same for some other heavy users of the structure.
No functional change intended.

gcc/
* config/aarch64/aarch64.cc (aarch64_save_callee_saves): Use
a local shorthand for cfun->machine->frame.
(aarch64_restore_callee_saves, aarch64_get_separate_components):
(aarch64_process_components): Likewise.
(aarch64_allocate_and_probe_stack_space): Likewise.
(aarch64_expand_prologue, aarch64_expand_epilogue): Likewise.
(aarch64_layout_frame): Use existing shorthand for one more case.
---
 gcc/config/aarch64/aarch64.cc | 123 ++
 1 file changed, 64 insertions(+), 59 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 37d414021ca..b91f77d7b1f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8651,7 +8651,7 @@ aarch64_layout_frame (void)
   frame.is_scs_enabled
 = (!crtl->calls_eh_return
&& sanitize_flags_p (SANITIZE_SHADOW_CALL_STACK)
-   && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0));
+   && known_ge (frame.reg_offset[LR_REGNUM], 0));
 
   /* When shadow call stack is enabled, the scs_pop in the epilogue will
  restore x30, and we don't need to pop x30 again in the traditional
@@ -9117,6 +9117,7 @@ aarch64_save_callee_saves (poly_int64 start_offset,
   unsigned start, unsigned limit, bool skip_wb,
   bool hard_fp_valid_p)
 {
+  aarch64_frame &frame = cfun->machine->frame;
   rtx_insn *insn;
   unsigned regno;
   unsigned regno2;
@@ -9131,8 +9132,8 @@ aarch64_save_callee_saves (poly_int64 start_offset,
   bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno);
 
   if (skip_wb
- && (regno == cfun->machine->frame.wb_push_candidate1
- || regno == cfun->machine->frame.wb_push_candidate2))
+ && (regno == frame.wb_push_candidate1
+ || regno == frame.wb_push_candidate2))
continue;
 
   if (cfun->machine->reg_is_wrapped_separately[regno])
@@ -9140,7 +9141,7 @@ aarch64_save_callee_saves (poly_int64 start_offset,
 
   machine_mode mode = aarch64_reg_save_mode (regno);
   reg = gen_rtx_REG (mode, regno);
-  offset = start_offset + cfun->machine->frame.reg_offset[regno];
+  offset = start_offset + frame.reg_offset[regno];
   rtx base_rtx = stack_pointer_rtx;
   poly_int64 sp_offset = offset;
 
@@ -9153,7 +9154,7 @@ aarch64_save_callee_saves (poly_int64 start_offset,
{
  gcc_assert (known_eq (start_offset, 0));
  poly_int64 fp_offset
-   = cfun->machine->frame.below_hard_fp_saved_regs_size;
+   = frame.below_hard_fp_saved_regs_size;
  if (hard_fp_valid_p)
base_rtx = hard_frame_pointer_rtx;
  else
@@ -9175,8 +9176,7 @@ aarch64_save_callee_saves (poly_int64 start_offset,
  && (regno2 = aarch64_next_callee_save (regno + 1, limit)) <= limit
  && !cfun->machine->reg_is_wrapped_separately[regno2]
  && known_eq (GET_MODE_SIZE (mode),
-  cfun->machine->frame.reg_offset[regno2]
-  - cfun->machine->frame.reg_offset[regno]))
+  frame.reg_offset[regno2] - frame.reg_offset[regno]))
{
  rtx reg2 = gen_rtx_REG (mode, regno2);
  rtx mem2;
@@ -9226,6 +9226,7 @@ static void
 aarch64_restore_callee_saves (poly_int64 start_offset, unsigned start,
  unsigned limit, bool skip_wb, rtx *cfi_ops)
 {
+  aarch64_frame &frame = cfun->machine->frame;
   unsigned regno;
   unsigned regno2;
   poly_int64 offset;
@@ -9242,13 +9243,13 @@ aarch64_restore_callee_saves (poly_int64 start_offset, 
unsigned start,
   rtx reg, mem;
 
   if (skip_wb
- && (regno == cfun->machine->frame.wb_pop_candidate1
- || regno == cfun->machine->frame.wb_pop_candidate2))
+ && (regno == frame.wb_pop_candidate1
+ || regno == frame.wb_pop_candidate2))
continue;
 
   machine_mode mode = aarch64_reg_save_mode (regno);
   reg = gen_rtx_REG (mode, regno);
-  offset = start_offset + cfun->machine->frame.reg_offset[regno];
+  offset = start_offset + frame.reg_offset[regno];
   rtx base_rtx = stack_pointer_rtx;
   if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
aarch64_adjust_sve_callee_save_base (mode, base_rtx, anchor_reg,
@@ -9259,8 +9260,7 @@ aarch64_restore_callee_saves (poly_int64 start_offset, 
unsigned start,
  && (regno2 = aarch64_next_callee_save (regno + 1, limit)) <= limit
  && !cfun->machine->reg_is_wrapped_separately[regno2]
  && known_eq (GET_MODE_SIZE (mode),
-  cfun->machine->frame.reg_offset[regno2]
-  - cfun->machine->frame.reg_offset[regno]))
+   

[PATCH 05/19] aarch64: Add bytes_below_hard_fp to frame info

2023-09-12 Thread Richard Sandiford via Gcc-patches
Following on from the previous bytes_below_saved_regs patch, this one
records the number of bytes that are below the hard frame pointer.
This eventually replaces below_hard_fp_saved_regs_size.

If a frame pointer is not needed, the epilogue adds final_adjust
to the stack pointer before restoring registers:

 aarch64_add_sp (tmp1_rtx, tmp0_rtx, final_adjust, true);

Therefore, if the epilogue needs to restore the stack pointer from
the hard frame pointer, the directly corresponding offset is:

 -bytes_below_hard_fp + final_adjust

i.e. go from the hard frame pointer to the bottom of the frame,
then add the same amount as if we were using the stack pointer
from the outset.

gcc/
* config/aarch64/aarch64.h (aarch64_frame::bytes_below_hard_fp): New
field.
* config/aarch64/aarch64.cc (aarch64_layout_frame): Initialize it.
(aarch64_expand_epilogue): Use it instead of
below_hard_fp_saved_regs_size.
---
 gcc/config/aarch64/aarch64.cc | 6 +++---
 gcc/config/aarch64/aarch64.h  | 5 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 49c2fbedd14..58dd8946232 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8569,6 +8569,7 @@ aarch64_layout_frame (void)
  of the callee save area.  */
   bool saves_below_hard_fp_p = maybe_ne (offset, 0);
   frame.below_hard_fp_saved_regs_size = offset;
+  frame.bytes_below_hard_fp = offset + frame.bytes_below_saved_regs;
   if (frame.emit_frame_chain)
 {
   /* FP and LR are placed in the linkage record.  */
@@ -10220,8 +10221,7 @@ aarch64_expand_epilogue (bool for_sibcall)
   poly_int64 final_adjust = frame.final_adjust;
   poly_int64 callee_offset = frame.callee_offset;
   poly_int64 sve_callee_adjust = frame.sve_callee_adjust;
-  poly_int64 below_hard_fp_saved_regs_size
-= frame.below_hard_fp_saved_regs_size;
+  poly_int64 bytes_below_hard_fp = frame.bytes_below_hard_fp;
   unsigned reg1 = frame.wb_pop_candidate1;
   unsigned reg2 = frame.wb_pop_candidate2;
   unsigned int last_gpr = (frame.is_scs_enabled
@@ -10279,7 +10279,7 @@ aarch64_expand_epilogue (bool for_sibcall)
is restored on the instruction doing the writeback.  */
 aarch64_add_offset (Pmode, stack_pointer_rtx,
hard_frame_pointer_rtx,
-   -callee_offset - below_hard_fp_saved_regs_size,
+   -bytes_below_hard_fp + final_adjust,
tmp1_rtx, tmp0_rtx, callee_adjust == 0);
   else
  /* The case where we need to re-use the register here is very rare, so
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 83939991eb1..75fd3b59b0d 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -785,6 +785,11 @@ struct GTY (()) aarch64_frame
  are saved below the hard frame pointer.  */
   poly_int64 below_hard_fp_saved_regs_size;
 
+  /* The number of bytes between the bottom of the static frame (the bottom
+ of the outgoing arguments) and the hard frame pointer.  This value is
+ always a multiple of STACK_BOUNDARY.  */
+  poly_int64 bytes_below_hard_fp;
+
   /* Offset from the base of the frame (incomming SP) to the
  top of the locals area.  This value is always a multiple of
  STACK_BOUNDARY.  */
-- 
2.25.1



[PATCH 07/19] aarch64: Only calculate chain_offset if there is a chain

2023-09-12 Thread Richard Sandiford via Gcc-patches
After previous patches, it is no longer necessary to calculate
a chain_offset in cases where there is no chain record.

gcc/
* config/aarch64/aarch64.cc (aarch64_expand_prologue): Move the
calculation of chain_offset into the emit_frame_chain block.
---
 gcc/config/aarch64/aarch64.cc | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 2c218c90906..25b5fb243a6 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -10111,16 +10111,16 @@ aarch64_expand_prologue (void)
   if (callee_adjust != 0)
 aarch64_push_regs (reg1, reg2, callee_adjust);
 
-  /* The offset of the frame chain record (if any) from the current SP.  */
-  poly_int64 chain_offset = (initial_adjust + callee_adjust
-- frame.hard_fp_offset);
-  gcc_assert (known_ge (chain_offset, 0));
-
   /* The offset of the current SP from the bottom of the static frame.  */
   poly_int64 bytes_below_sp = frame_size - initial_adjust - callee_adjust;
 
   if (emit_frame_chain)
 {
+  /* The offset of the frame chain record (if any) from the current SP.  */
+  poly_int64 chain_offset = (initial_adjust + callee_adjust
+- frame.hard_fp_offset);
+  gcc_assert (known_ge (chain_offset, 0));
+
   if (callee_adjust == 0)
{
  reg1 = R29_REGNUM;
-- 
2.25.1



[PATCH 12/19] aarch64: Simplify top of frame allocation

2023-09-12 Thread Richard Sandiford via Gcc-patches
After previous patches, it no longer really makes sense to allocate
the top of the frame in terms of varargs_and_saved_regs_size and
saved_regs_and_above.

gcc/
* config/aarch64/aarch64.cc (aarch64_layout_frame): Simplify
the allocation of the top of the frame.
---
 gcc/config/aarch64/aarch64.cc | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index ca2e6af5d12..9578592d256 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8625,23 +8625,16 @@ aarch64_layout_frame (void)
 
   frame.saved_regs_size = offset - frame.bytes_below_saved_regs;
 
-  poly_int64 varargs_and_saved_regs_size
-= frame.saved_regs_size + frame.saved_varargs_size;
-
-  poly_int64 saved_regs_and_above
-= aligned_upper_bound (varargs_and_saved_regs_size
-  + get_frame_size (),
-  STACK_BOUNDARY / BITS_PER_UNIT);
-
-  frame.bytes_above_hard_fp
-= saved_regs_and_above - frame.below_hard_fp_saved_regs_size;
+  offset += get_frame_size ();
+  offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+  auto top_of_locals = offset;
 
-  /* Both these values are already aligned.  */
-  gcc_assert (multiple_p (frame.bytes_below_saved_regs,
- STACK_BOUNDARY / BITS_PER_UNIT));
-  frame.frame_size = saved_regs_and_above + frame.bytes_below_saved_regs;
+  offset += frame.saved_varargs_size;
+  gcc_assert (multiple_p (offset, STACK_BOUNDARY / BITS_PER_UNIT));
+  frame.frame_size = offset;
 
-  frame.bytes_above_locals = frame.saved_varargs_size;
+  frame.bytes_above_hard_fp = frame.frame_size - frame.bytes_below_hard_fp;
+  frame.bytes_above_locals = frame.frame_size - top_of_locals;
 
   frame.initial_adjust = 0;
   frame.final_adjust = 0;
-- 
2.25.1



[PATCH 02/19] aarch64: Avoid a use of callee_offset

2023-09-12 Thread Richard Sandiford via Gcc-patches
When we emit the frame chain, i.e. when we reach Here in this statement
of aarch64_expand_prologue:

  if (emit_frame_chain)
{
  // Here
  ...
}

the stack is in one of two states:

- We've allocated up to the frame chain, but no more.

- We've allocated the whole frame, and the frame chain is within easy
  reach of the new SP.

The offset of the frame chain from the current SP is available
in aarch64_frame as callee_offset.  It is also available as the
chain_offset local variable, where the latter is calculated from other
data.  (However, chain_offset is not always equal to callee_offset when
!emit_frame_chain, so chain_offset isn't redundant.)

In c600df9a4060da3c6121ff4d0b93f179eafd69d1 I switched to using
chain_offset for the initialisation of the hard frame pointer:

   aarch64_add_offset (Pmode, hard_frame_pointer_rtx,
- stack_pointer_rtx, callee_offset,
+ stack_pointer_rtx, chain_offset,
  tmp1_rtx, tmp0_rtx, frame_pointer_needed);

But the later REG_CFA_ADJUST_CFA handling still used callee_offset.

I think the difference is harmless, but it's more logical for the
CFA note to be in sync, and it's more convenient for later patches
if it uses chain_offset.

gcc/
* config/aarch64/aarch64.cc (aarch64_expand_prologue): Use
chain_offset rather than callee_offset.
---
 gcc/config/aarch64/aarch64.cc | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index b91f77d7b1f..9fb94623693 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -10034,7 +10034,6 @@ aarch64_expand_prologue (void)
   poly_int64 initial_adjust = frame.initial_adjust;
   HOST_WIDE_INT callee_adjust = frame.callee_adjust;
   poly_int64 final_adjust = frame.final_adjust;
-  poly_int64 callee_offset = frame.callee_offset;
   poly_int64 sve_callee_adjust = frame.sve_callee_adjust;
   poly_int64 below_hard_fp_saved_regs_size
 = frame.below_hard_fp_saved_regs_size;
@@ -10147,8 +10146,7 @@ aarch64_expand_prologue (void)
 implicit.  */
  if (!find_reg_note (insn, REG_CFA_ADJUST_CFA, NULL_RTX))
{
- rtx src = plus_constant (Pmode, stack_pointer_rtx,
-  callee_offset);
+ rtx src = plus_constant (Pmode, stack_pointer_rtx, chain_offset);
  add_reg_note (insn, REG_CFA_ADJUST_CFA,
gen_rtx_SET (hard_frame_pointer_rtx, src));
}
-- 
2.25.1



[PATCH 06/19] aarch64: Tweak aarch64_save/restore_callee_saves

2023-09-12 Thread Richard Sandiford via Gcc-patches
aarch64_save_callee_saves and aarch64_restore_callee_saves took
a parameter called start_offset that gives the offset of the
bottom of the saved register area from the current stack pointer.
However, it's more convenient for later patches if we use the
bottom of the entire frame as the reference point, rather than
the bottom of the saved registers.

Doing that removes the need for the callee_offset field.
Other than that, this is not a win on its own.  It only really
makes sense in combination with the follow-on patches.

gcc/
* config/aarch64/aarch64.h (aarch64_frame::callee_offset): Delete.
* config/aarch64/aarch64.cc (aarch64_layout_frame): Remove
callee_offset handling.
(aarch64_save_callee_saves): Replace the start_offset parameter
with a bytes_below_sp parameter.
(aarch64_restore_callee_saves): Likewise.
(aarch64_expand_prologue): Update accordingly.
(aarch64_expand_epilogue): Likewise.
---
 gcc/config/aarch64/aarch64.cc | 56 +--
 gcc/config/aarch64/aarch64.h  |  4 ---
 2 files changed, 28 insertions(+), 32 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 58dd8946232..2c218c90906 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8643,7 +8643,6 @@ aarch64_layout_frame (void)
   frame.final_adjust = 0;
   frame.callee_adjust = 0;
   frame.sve_callee_adjust = 0;
-  frame.callee_offset = 0;
 
   frame.wb_pop_candidate1 = frame.wb_push_candidate1;
   frame.wb_pop_candidate2 = frame.wb_push_candidate2;
@@ -8711,7 +8710,6 @@ aarch64_layout_frame (void)
 stp reg1, reg2, [sp, bytes_below_saved_regs]
 stp reg3, reg4, [sp, bytes_below_saved_regs + 16]  */
   frame.initial_adjust = frame.frame_size;
-  frame.callee_offset = const_below_saved_regs;
 }
   else if (saves_below_hard_fp_p
   && known_eq (frame.saved_regs_size,
@@ -9112,12 +9110,13 @@ aarch64_add_cfa_expression (rtx_insn *insn, rtx reg,
 }
 
 /* Emit code to save the callee-saved registers from register number START
-   to LIMIT to the stack at the location starting at offset START_OFFSET,
-   skipping any write-back candidates if SKIP_WB is true.  HARD_FP_VALID_P
-   is true if the hard frame pointer has been set up.  */
+   to LIMIT to the stack.  The stack pointer is currently BYTES_BELOW_SP
+   bytes above the bottom of the static frame.  Skip any write-back
+   candidates if SKIP_WB is true.  HARD_FP_VALID_P is true if the hard
+   frame pointer has been set up.  */
 
 static void
-aarch64_save_callee_saves (poly_int64 start_offset,
+aarch64_save_callee_saves (poly_int64 bytes_below_sp,
   unsigned start, unsigned limit, bool skip_wb,
   bool hard_fp_valid_p)
 {
@@ -9145,7 +9144,9 @@ aarch64_save_callee_saves (poly_int64 start_offset,
 
   machine_mode mode = aarch64_reg_save_mode (regno);
   reg = gen_rtx_REG (mode, regno);
-  offset = start_offset + frame.reg_offset[regno];
+  offset = (frame.reg_offset[regno]
+   + frame.bytes_below_saved_regs
+   - bytes_below_sp);
   rtx base_rtx = stack_pointer_rtx;
   poly_int64 sp_offset = offset;
 
@@ -9156,9 +9157,7 @@ aarch64_save_callee_saves (poly_int64 start_offset,
   else if (GP_REGNUM_P (regno)
   && (!offset.is_constant (&const_offset) || const_offset >= 512))
{
- gcc_assert (known_eq (start_offset, 0));
- poly_int64 fp_offset
-   = frame.below_hard_fp_saved_regs_size;
+ poly_int64 fp_offset = frame.bytes_below_hard_fp - bytes_below_sp;
  if (hard_fp_valid_p)
base_rtx = hard_frame_pointer_rtx;
  else
@@ -9222,12 +9221,13 @@ aarch64_save_callee_saves (poly_int64 start_offset,
 }
 
 /* Emit code to restore the callee registers from register number START
-   up to and including LIMIT.  Restore from the stack offset START_OFFSET,
-   skipping any write-back candidates if SKIP_WB is true.  Write the
-   appropriate REG_CFA_RESTORE notes into CFI_OPS.  */
+   up to and including LIMIT.  The stack pointer is currently BYTES_BELOW_SP
+   bytes above the bottom of the static frame.  Skip any write-back
+   candidates if SKIP_WB is true.  Write the appropriate REG_CFA_RESTORE
+   notes into CFI_OPS.  */
 
 static void
-aarch64_restore_callee_saves (poly_int64 start_offset, unsigned start,
+aarch64_restore_callee_saves (poly_int64 bytes_below_sp, unsigned start,
  unsigned limit, bool skip_wb, rtx *cfi_ops)
 {
   aarch64_frame &frame = cfun->machine->frame;
@@ -9253,7 +9253,9 @@ aarch64_restore_callee_saves (poly_int64 start_offset, 
unsigned start,
 
   machine_mode mode = aarch64_reg_save_mode (regno);
   reg = gen_rtx_REG (mode, regno);
-  offset = start_offset + frame.reg_offset[regno];
+  offset = (frame.reg_offset[regno]
+   + frame.bytes_below_saved_regs
+ 

[PATCH 1/2] MATCH: [PR111364] Add some more minmax cmp operand simplifications

2023-09-12 Thread Andrew Pinski via Gcc-patches
This adds a few more minmax cmp operand simplifications which were missed 
before.
`MIN(a,b) < a` -> `a > b`
`MIN(a,b) >= a` -> `a <= b`
`MAX(a,b) > a` -> `a < b`
`MAX(a,b) <= a` -> `a >= b`

OK? Bootstrapped and tested on x86_64-linux-gnu.

Note gcc.dg/pr96708-negative.c needed to updated to remove the
check for MIN/MAX as they have been optimized (correctly) away.

PR tree-optimization/111364

gcc/ChangeLog:

* match.pd (`MIN (X, Y) == X`): Extend
to min/lt, min/ge, max/gt, max/le.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/minmaxcmp-1.c: New test.
* gcc.dg/tree-ssa/minmaxcmp-2.c: New test.
* gcc.dg/pr96708-negative.c: Update testcase.
* gcc.dg/pr96708-positive.c: Add comment about `return 0`.
---
 gcc/match.pd  |  8 +--
 .../gcc.c-torture/execute/minmaxcmp-1.c   | 51 +++
 gcc/testsuite/gcc.dg/pr96708-negative.c   |  4 +-
 gcc/testsuite/gcc.dg/pr96708-positive.c   |  1 +
 gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c   | 30 +++
 5 files changed, 89 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 51985c1bad4..36e3da4841b 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3902,9 +3902,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (maxmin @0 (bit_not @1
 
 /* MIN (X, Y) == X -> X <= Y  */
-(for minmax (min min max max)
- cmp(eq  ne  eq  ne )
- out(le  gt  ge  lt )
+/* MIN (X, Y) < X -> X > Y  */
+/* MIN (X, Y) >= X -> X <= Y  */
+(for minmax (min min min min max max max max)
+ cmp(eq  ne  lt  ge  eq  ne  gt  le )
+ out(le  gt  gt  le  ge  lt  lt  ge )
  (simplify
   (cmp:c (minmax:c @0 @1) @0)
   (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0)))
diff --git a/gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c 
b/gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c
new file mode 100644
index 000..6705a053768
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c
@@ -0,0 +1,51 @@
+#define func(vol, op1, op2)\
+_Bool op1##_##op2##_##vol (int a, int b)   \
+{  \
+ vol int x = op_##op1(a, b);   \
+ return op_##op2(x, a);\
+}
+
+#define op_lt(a, b) ((a) < (b))
+#define op_le(a, b) ((a) <= (b))
+#define op_eq(a, b) ((a) == (b))
+#define op_ne(a, b) ((a) != (b))
+#define op_gt(a, b) ((a) > (b))
+#define op_ge(a, b) ((a) >= (b))
+#define op_min(a, b) ((a) < (b) ? (a) : (b))
+#define op_max(a, b) ((a) > (b) ? (a) : (b))
+
+
+#define funcs(a) \
+ a(min,lt) \
+ a(max,lt) \
+ a(min,gt) \
+ a(max,gt) \
+ a(min,le) \
+ a(max,le) \
+ a(min,ge) \
+ a(max,ge) \
+ a(min,ne) \
+ a(max,ne) \
+ a(min,eq) \
+ a(max,eq)
+
+#define funcs1(a,b) \
+func(,a,b) \
+func(volatile,a,b)
+
+funcs(funcs1)
+
+#define test(op1,op2)   \
+do {\
+  if (op1##_##op2##_(x,y) != op1##_##op2##_volatile(x,y))   \
+__builtin_abort();  \
+} while(0);
+
+int main()
+{
+  for(int x = -10; x < 10; x++)
+for(int y = -10; y < 10; y++)
+{
+funcs(test)
+}
+}
diff --git a/gcc/testsuite/gcc.dg/pr96708-negative.c 
b/gcc/testsuite/gcc.dg/pr96708-negative.c
index 91964d3b971..c9c1aa85558 100644
--- a/gcc/testsuite/gcc.dg/pr96708-negative.c
+++ b/gcc/testsuite/gcc.dg/pr96708-negative.c
@@ -42,7 +42,7 @@ int main()
 return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "MAX_EXPR" 2 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" } } */
+/* Even though test[1-4] originally has MIN/MAX, those can be optimized away
+   into just comparing a and b arguments. */
 /* { dg-final { scan-tree-dump-times "return 0;" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-not { "return 1;" } "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/pr96708-positive.c 
b/gcc/testsuite/gcc.dg/pr96708-positive.c
index 65af85344b6..12c5fedfd30 100644
--- a/gcc/testsuite/gcc.dg/pr96708-positive.c
+++ b/gcc/testsuite/gcc.dg/pr96708-positive.c
@@ -42,6 +42,7 @@ int main()
 return 0;
 }
 
+/* Note main has one `return 0`. */
 /* { dg-final { scan-tree-dump-times "return 0;" 3 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
 /* { dg-final { scan-tree-dump-not { "MAX_EXPR" } "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c
new file mode 100644
index 000..f64a9253cfb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-original" } */
+/* PR tree-optimization/111364 */
+
+#define min1(a, b) ((a) < (b) ? (a) : (b))
+#define max1(a, b) ((a) > (b) ? (a) : (b))
+
+int minlt(int a, int b)
+{
+return min1(a, b) < a; // b < a or a > b
+}

[PATCH 09/19] aarch64: Rename hard_fp_offset to bytes_above_hard_fp

2023-09-12 Thread Richard Sandiford via Gcc-patches
Similarly to the previous locals_offset patch, hard_fp_offset
was described as:

  /* Offset from the base of the frame (incomming SP) to the
 hard_frame_pointer.  This value is always a multiple of
 STACK_BOUNDARY.  */
  poly_int64 hard_fp_offset;

which again took an “upside-down” view: higher offsets meant lower
addresses.  This patch renames the field to bytes_above_hard_fp instead.

gcc/
* config/aarch64/aarch64.h (aarch64_frame::hard_fp_offset): Rename
to...
(aarch64_frame::bytes_above_hard_fp): ...this.
* config/aarch64/aarch64.cc (aarch64_layout_frame)
(aarch64_expand_prologue): Update accordingly.
(aarch64_initial_elimination_offset): Likewise.
---
 gcc/config/aarch64/aarch64.cc | 26 +-
 gcc/config/aarch64/aarch64.h  |  6 +++---
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index bcd1dec6f51..7d642d06871 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8629,7 +8629,7 @@ aarch64_layout_frame (void)
   + get_frame_size (),
   STACK_BOUNDARY / BITS_PER_UNIT);
 
-  frame.hard_fp_offset
+  frame.bytes_above_hard_fp
 = saved_regs_and_above - frame.below_hard_fp_saved_regs_size;
 
   /* Both these values are already aligned.  */
@@ -8678,13 +8678,13 @@ aarch64_layout_frame (void)
   else if (frame.wb_pop_candidate1 != INVALID_REGNUM)
 max_push_offset = 256;
 
-  HOST_WIDE_INT const_size, const_below_saved_regs, const_fp_offset;
+  HOST_WIDE_INT const_size, const_below_saved_regs, const_above_fp;
   HOST_WIDE_INT const_saved_regs_size;
   if (known_eq (frame.saved_regs_size, 0))
 frame.initial_adjust = frame.frame_size;
   else if (frame.frame_size.is_constant (&const_size)
   && const_size < max_push_offset
-  && known_eq (frame.hard_fp_offset, const_size))
+  && known_eq (frame.bytes_above_hard_fp, const_size))
 {
   /* Simple, small frame with no data below the saved registers.
 
@@ -8701,8 +8701,8 @@ aarch64_layout_frame (void)
  case that it hardly seems worth the effort though.  */
   && (!saves_below_hard_fp_p || const_below_saved_regs == 0)
   && !(cfun->calls_alloca
-   && frame.hard_fp_offset.is_constant (&const_fp_offset)
-   && const_fp_offset < max_push_offset))
+   && frame.bytes_above_hard_fp.is_constant (&const_above_fp)
+   && const_above_fp < max_push_offset))
 {
   /* Frame with small area below the saved registers:
 
@@ -8720,12 +8720,12 @@ aarch64_layout_frame (void)
 sub sp, sp, hard_fp_offset + below_hard_fp_saved_regs_size
 save SVE registers relative to SP
 sub sp, sp, bytes_below_saved_regs  */
-  frame.initial_adjust = (frame.hard_fp_offset
+  frame.initial_adjust = (frame.bytes_above_hard_fp
  + frame.below_hard_fp_saved_regs_size);
   frame.final_adjust = frame.bytes_below_saved_regs;
 }
-  else if (frame.hard_fp_offset.is_constant (&const_fp_offset)
-  && const_fp_offset < max_push_offset)
+  else if (frame.bytes_above_hard_fp.is_constant (&const_above_fp)
+  && const_above_fp < max_push_offset)
 {
   /* Frame with large area below the saved registers, or with SVE saves,
 but with a small area above:
@@ -8735,7 +8735,7 @@ aarch64_layout_frame (void)
 [sub sp, sp, below_hard_fp_saved_regs_size]
 [save SVE registers relative to SP]
 sub sp, sp, bytes_below_saved_regs  */
-  frame.callee_adjust = const_fp_offset;
+  frame.callee_adjust = const_above_fp;
   frame.sve_callee_adjust = frame.below_hard_fp_saved_regs_size;
   frame.final_adjust = frame.bytes_below_saved_regs;
 }
@@ -8750,7 +8750,7 @@ aarch64_layout_frame (void)
 [sub sp, sp, below_hard_fp_saved_regs_size]
 [save SVE registers relative to SP]
 sub sp, sp, bytes_below_saved_regs  */
-  frame.initial_adjust = frame.hard_fp_offset;
+  frame.initial_adjust = frame.bytes_above_hard_fp;
   frame.sve_callee_adjust = frame.below_hard_fp_saved_regs_size;
   frame.final_adjust = frame.bytes_below_saved_regs;
 }
@@ -10118,7 +10118,7 @@ aarch64_expand_prologue (void)
 {
   /* The offset of the frame chain record (if any) from the current SP.  */
   poly_int64 chain_offset = (initial_adjust + callee_adjust
-- frame.hard_fp_offset);
+- frame.bytes_above_hard_fp);
   gcc_assert (known_ge (chain_offset, 0));
 
   if (callee_adjust == 0)
@@ -12851,10 +12851,10 @@ aarch64_initial_elimination_offset (unsigned from, 
unsigned to)
   if (to == HARD_FRAME_POINTER_REGNUM)
 {
   if (from == ARG_POINTER_REGNUM)
-   return frame.hard_fp_offset;
+   return frame.bytes_above_hard_fp;
 
   if (f

  1   2   >