Re: [PATCH v2] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64

2023-06-19 Thread juzhe.zh...@rivai.ai

LGTM


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-19 14:46
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64
From: Pan Li 
 
The rvv widdening reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.
 
code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx16hf; // ZVE128+
 
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf;  // ZVE64
 
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf;  // ZVE32
}
 
Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of
the ZVE128+ instead of the ZVE32 logically.
 
This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32
will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32
will be returned as expectation.
 
Please note both GCC 13 and 14 are impacted by this issue.
 
Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 
 
gcc/ChangeLog:
 
PR target/110299
* config/riscv/riscv-vector-builtins-bases.cc: Adjust expand for
modes.
* config/riscv/vector-iterators.md: Remove VWLMUL1, VWLMUL1_ZVE64,
VWLMUL1_ZVE32, VI_ZVE64, VI_ZVE32, VWI, VWI_ZVE64, VWI_ZVE32,
VF_ZVE63 and VF_ZVE32.
* config/riscv/vector.md
(@pred_widen_reduc_plus): Removed.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): New pattern.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
 
gcc/testsuite/ChangeLog:
 
PR target/110299
* gcc.target/riscv/rvv/base/pr110299-1.c: New test.
* gcc.target/riscv/rvv/base/pr110299-1.h: New test.
* gcc.target/riscv/rvv/base/pr110299-2.c: New test.
* gcc.target/riscv/rvv/base/pr110299-2.h: New test.
* gcc.target/riscv/rvv/base/pr110299-3.c: New test.
* gcc.target/riscv/rvv/base/pr110299-3.h: New test.
* gcc.target/riscv/rvv/base/pr110299-4.c: New test.
* gcc.target/riscv/rvv/base/pr110299-4.h: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  16 +-
gcc/config/riscv/vector-iterators.md  | 103 
gcc/config/riscv/vector.md| 243 --
.../gcc.target/riscv/rvv/base/pr110299-1.c|   7 +
.../gcc.target/riscv/rvv/base/pr110299-1.h|   9 +
.../gcc.target/riscv/rvv/base/pr110299-2.c|   8 +
.../gcc.target/riscv/rvv/base/pr110299-2.h|  17 ++
.../gcc.target/riscv/rvv/base/pr110299-3.c|   7 +
.../gcc.target/riscv/rvv/base/pr110299-3.h|  17 ++
.../gcc.target/riscv/rvv/base/pr110299-4.c|   8 +
.../gcc.target/riscv/rvv/base/pr110299-4.h|  17 ++
11 files changed, 253 insertions(+), 199 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-3.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-4.h
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 27545113996..c6c53dc13a5 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1396,16 +1396,8 @@ public:
   rtx expand (function_expander &e) const override
   {
-machine_mode mode = e.vector_mode ();
-machine_mode ret_mode = e.ret_mode ();
-
-/* TODO: we will use ret_mode after all types of PR110265 are addressed.  
*/
-if (GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
-  return e.use_exact_insn (
- code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
-else
-  return e.use_exact_insn (
- code_for_pred_reduc (CODE, e.vector_mode (), e.ret_mode ()));
+return e.use_exact_insn (
+  code_for_pred_reduc (CODE, e.vector_mode (), e.ret_mode ()));
   }
};
@@ -1420,7 +1412,7 @@ public:
   {
 return e.use_exact_insn (code_for_pred_widen_reduc_plus (UNSPEC,
 e.vector_mode (),
-  e.vector_mode ()));
+  e.ret_mode ()));
   }
};
@@ -1449,7 +1441,7 @@ public:
   {
 return e.use_exact_insn (code_for_pred_widen_reduc_plus (UNSPEC,
 

Re: [PATCH v2] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-19 Thread Jan Beulich via Gcc-patches
On 19.06.2023 04:07, Liu, Hongtao wrote:
>> -Original Message-
>> From: Jan Beulich 
>> Sent: Friday, June 16, 2023 2:22 PM
>>
>> --- a/gcc/config/i386/sse.md
>> +++ b/gcc/config/i386/sse.md
>> @@ -12597,11 +12597,11 @@
>> (set_attr "mode" "")])
>>
>>  (define_insn "*_vternlog_all"
>> -  [(set (match_operand:V 0 "register_operand" "=v")
>> +  [(set (match_operand:V 0 "register_operand" "=v,v")
>>  (unspec:V
>> -  [(match_operand:V 1 "register_operand" "0")
>> -   (match_operand:V 2 "register_operand" "v")
>> -   (match_operand:V 3 "bcst_vector_operand" "vmBr")
>> +  [(match_operand:V 1 "register_operand" "0,0")
>> +   (match_operand:V 2 "register_operand" "v,v")
>> +   (match_operand:V 3 "bcst_vector_operand" "vBr,m")
>> (match_operand:SI 4 "const_0_to_255_operand")]
>>UNSPEC_VTERNLOG))]
>>"TARGET_AVX512F
> Change condition to  == 64 || TARGET_AVX512VL || (TARGET_AVX512F 
> && !TARGET_PREFER_AVX256)

May I ask why you think this is necessary? The condition of the insn
already wasn't in sync with the condition used in all three splitters,
and I didn't see any reason why now they would need to be brought in
sync. First and foremost because of the use of the UNSPEC (equally
before and after this patch).

Furthermore, isn't it the case that I'm already mostly expressing
this with the "enabled" attribute? At the very least I think I
should drop that again then if following your request?

> Also please add a testcase for case TARGET_AVX512F && !TARGET_PREFER_AVX256.

Especially in a case like this one I'm wondering about the usefulness
of a contrived testcase: It won't test more than one minor sub-case of
the whole set of constructs covered here. But well, here as well as
for the other change I'll invent something.

Jan


Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-19 Thread Kewen.Lin via Gcc-patches
Hi Carl,

on 2023/5/31 04:46, Carl Love wrote:
> GCC maintainers:
> 
> The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
> 6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
> test files by functionality rather than processor version.
> 
> The patch has been tested on Power 10 with no regressions.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
>Carl
> 
> --
> rs6000: Update the vsx-vector-6.* tests.
> 
> The vsx-vector-6.h file is included into the processor specific test files
> vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
> contains a large number of vsx vector builtin tests.  The processor
> specific files contain the number of instructions that the tests are
> expected to generate for that processor.  The tests are compile only.
> 
> The tests are broken up into a seriers of files for related tests.  The
> new tests are runnable tests to verify the builtin argument types and the

But the newly added test cases are all with "dg-do compile", it doesn't
match what you said here.

> functional correctness of each test rather then verifying the type and
> number of instructions generated.

It's good to have more coverage with runnable case, but we miss some test
coverages on the expected insn counts which cases p{7,8,9}.c can provide
originally.  Unless we can ensure it's already tested somewhere else (do
we? it wasn't stated in this patch), I think we still need those checks.

> 
> gcc/testsuite/
>   * gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6.h: Remove test file.
>   * gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
>   * gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
>   * gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
> ---
>  .../powerpc/vsx-vector-6-func-1op.c   | 319 +
>  .../powerpc/vsx-vector-6-func-2lop.c  | 305 +
>  .../powerpc/vsx-vector-6-func-2op.c   | 278 
>  .../powerpc/vsx-vector-6-func-3op.c   | 229 ++
>  .../powerpc/vsx-vector-6-func-cmp-all.c   | 429 ++
>  .../powerpc/vsx-vector-6-func-cmp.c   | 237 ++
>  .../gcc.target/powerpc/vsx-vector-6.h | 154 ---
>  .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 --
>  .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 --
>  .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 --
>  10 files changed, 1797 insertions(+), 282 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2lop.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2op.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-3op.c
>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp-all.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp.c
>  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
>  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p7.c
>  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p8.c
>  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p9.c
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
> new file mode 100644
> index 000..90a360ea158
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
> @@ -0,0 +1,319 @@
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power7" } */
> +
> +/* Functional test of the one operand vector builtins.  */
> +
> +#include 
> +#include 
> +#include 
> +
> +#define DEBUG 0
> +
> +void abort (void);
> +
> +int
> +main () {
> +  int i;
> +  vector float f_src = { 125.44, 23.04, -338.56, 17.64};
> +  vector float f_result;
> +  vector float f_abs_expected = { 125.44, 23.04, 338.56, 17.64};
> +  vector float f_ceil_expected = { 126.0, 24.0, -338, 18.0};
> +  vector float f_floor_expected = { 125.0, 23.0, -339, 17.0};
> +  vector float f_nearbyint_expected = { 125.0, 23.0, -339, 18.0};
> +  vector float f_rint_expected = { 125.0, 23.0, -339, 18.0};
> +  vector float f_sqrt_expected = { 11.2, 4.8, 18.4, 4.2};
> +  vector float f_trunc_expected = { 125.0, 23.0, -338, 17};
> +
> +  vector double d_src = { 125.44, -338.56};
> +  vector double d_result;
> +  vector double d_abs_

Re: [PATCH V7] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-19 Thread Richard Biener via Gcc-patches
On Sun, 18 Jun 2023, ??? wrote:

> Bootstrap and Regreesion on X86 passed.
> Jeff and Richi approved.
> 
> Let's wait for Richard S final approve.

No need to wait.

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: juzhe.zhong
> Date: 2023-06-18 06:53
> To: gcc-patches
> CC: jeffreyalaw; rguenther; richard.sandiford; Ju-Zhe Zhong; Robin Dapp
> Subject: [PATCH V7] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs
> From: Ju-Zhe Zhong 
>  
> This patch bootstrap pass on X86, ok for trunk ?
>  
> Accoding to comments from Richi, split the first patch to add ifn && optabs
> of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this
> patch. And also add BIAS argument for possible s390's future use.
>  
> The description of the patterns in doc are coming Robin.
>  
> Fix for Jeff's comments.
>  
> I didn't change BIAS argument for len_load/len_store, but will need it
> in a separate patch.
>  
> After this patch is approved, will send the second patch to apply len_mask_*
> patterns into vectorizer.
>  
> Target like ARM SVE in GCC has an elegant way to handle both loop control
> and flow control simultaneously:
>  
> loop_control_mask = WHILE_ULT
> flow_control_mask = comparison
> control_mask = loop_control_mask & flow_control_mask;
> MASK_LOAD (control_mask)
> MASK_STORE (control_mask)
>  
> However, targets like RVV (RISC-V Vector) can not use this approach in
> auto-vectorization since RVV use length in loop control.
>  
> This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
> like RISC-V that uses length in loop control.
> Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
> or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
> Mask is the outcome of comparison.
>  
> LEN_MASK_ LOAD/STORE format is defined as follows:
> 1). LEN_MASK_LOAD (ptr, align, length, mask).
> 2). LEN_MASK_STORE (ptr, align, length, mask, vec).
>  
> Consider these 4 following cases:
>  
> VLA: Variable-length auto-vectorization
> VLS: Specific-length auto-vectorization
>  
> Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
> Code: v1 = MEM (...)
>   for (int i = 0; i < 4; i++)   v2 = MEM (...)
> a[i] = b[i] + c[i]; v3 = v1 + v2 
> MEM[...] = v3
>  
> Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = 
> comparison):
> Code:   mask = comparison
>   for (int i = 0; i < 4; i++)   v1 = LEN_MASK_LOAD (length = VF, mask)
> if (cond[i])v2 = LEN_MASK_LOAD (length = VF, 
> mask) 
>   a[i] = b[i] + c[i];   v3 = v1 + v2
> LEN_MASK_STORE (length = VF, mask, v3)
>
> Case 3 (VLA):
> Code:   loop_len = SELECT_VL or MIN
>   for (int i = 0; i < n; i++)   v1 = LEN_MASK_LOAD (length = 
> loop_len, mask = {-1,-1,...})
>   a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = 
> loop_len, mask = {-1,-1,...})
> v3 = v1 + v2  
>   
> LEN_MASK_STORE (length = loop_len, 
> mask = {-1,-1,...}, v3)
>  
> Case 4 (VLA):
> Code:   loop_len = SELECT_VL or MIN
>   for (int i = 0; i < n; i++)   mask = comparison
>   if (cond[i])  v1 = LEN_MASK_LOAD (length = 
> loop_len, mask)
>   a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = 
> loop_len, mask)
> v3 = v1 + v2  
>   
> LEN_MASK_STORE (length = loop_len, 
> mask, v3)
>  
> Co-authored-by: Robin Dapp 
>  
> gcc/ChangeLog:
>  
> * doc/md.texi: Add len_mask{load,store}.
> * genopinit.cc (main): Ditto.
> (CMP_NAME): Ditto.
> * internal-fn.cc (len_maskload_direct): Ditto.
> (len_maskstore_direct): Ditto.
> (expand_call_mem_ref): Ditto.
> (expand_partial_load_optab_fn): Ditto.
> (expand_len_maskload_optab_fn): Ditto.
> (expand_partial_store_optab_fn): Ditto.
> (expand_len_maskstore_optab_fn): Ditto.
> (direct_len_maskload_optab_supported_p): Ditto.
> (direct_len_maskstore_optab_supported_p): Ditto.
> * internal-fn.def (LEN_MASK_LOAD): Ditto.
> (LEN_MASK_STORE): Ditto.
> * optabs.def (OPTAB_CD): Ditto.
>  
> ---
> gcc/doc/md.texi | 55 -
> gcc/genopinit.cc|  6 +++--
> gcc/internal-fn.cc  | 43 +++
> gcc/internal-fn.def |  4 
> gcc/optabs.def  |  2 ++
> 5 files changed, 103 insertions(+), 7 deletions(-)
>  
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index a43fd65a2b2..2e1a7916a99 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5094

Re: [PATCH 8/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE

2023-06-19 Thread Kewen.Lin via Gcc-patches
Hi Hongtao,

on 2023/6/14 16:17, Hongtao Liu wrote:
> On Tue, Jun 13, 2023 at 10:07 AM Kewen Lin via Gcc-patches
>  wrote:
>>
>> This patch adjusts the cost handling on
>> VMAT_CONTIGUOUS_PERMUTE in function vectorizable_load.  We
>> don't call function vect_model_load_cost for it any more.
>>
>> As the affected test case gcc.target/i386/pr70021.c shows,
>> the previous costing can under-cost the total generated
>> vector loads as for VMAT_CONTIGUOUS_PERMUTE function
>> vect_model_load_cost doesn't consider the group size which
>> is considered as vec_num during the transformation.
> The original PR is for the correctness issue, and I'm not sure how
> much of a performance impact the patch would be, but the change looks
> reasonable, so the test change looks ok to me.
> I'll track performance impact on SPEC2017 to see if there's any
> regression caused by the patch(Guess probably not).

Thanks for the feedback and further tracking!  Hope this (and
this whole series) doesn't impact SPEC2017 performance on x86. :)

BR,
Kewen

>>
>> This patch makes the count of vector load in costing become
>> consistent with what we generates during the transformation.
>> To be more specific, for the given test case, for memory
>> access b[i_20], it costed for 2 vector loads before,
>> with this patch it costs 8 instead, it matches the final
>> count of generated vector loads basing from b.  This costing
>> change makes cost model analysis feel it's not profitable
>> to vectorize the first loop, so this patch adjusts the test
>> case without vect cost model any more.
>>
>> But note that this test case also exposes something we can
>> improve further is that although the number of vector
>> permutation what we costed and generated are consistent,
>> but DCE can further optimize some unused permutation out,
>> it would be good if we can predict that and generate only
>> those necessary permutations.
>>
>> gcc/ChangeLog:
>>
>> * tree-vect-stmts.cc (vect_model_load_cost): Assert this function 
>> only
>> handle memory_access_type VMAT_CONTIGUOUS, remove some
>> VMAT_CONTIGUOUS_PERMUTE related handlings.
>> (vectorizable_load): Adjust the cost handling on 
>> VMAT_CONTIGUOUS_PERMUTE
>> without calling vect_model_load_cost.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/i386/pr70021.c: Adjust with -fno-vect-cost-model.
>> ---
>>  gcc/testsuite/gcc.target/i386/pr70021.c |  2 +-
>>  gcc/tree-vect-stmts.cc  | 88 ++---
>>  2 files changed, 51 insertions(+), 39 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/i386/pr70021.c 
>> b/gcc/testsuite/gcc.target/i386/pr70021.c
>> index 6562c0f2bd0..d509583601e 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr70021.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr70021.c
>> @@ -1,7 +1,7 @@
>>  /* PR target/70021 */
>>  /* { dg-do run } */
>>  /* { dg-require-effective-target avx2 } */
>> -/* { dg-options "-O2 -ftree-vectorize -mavx2 -fdump-tree-vect-details 
>> -mtune=skylake" } */
>> +/* { dg-options "-O2 -ftree-vectorize -mavx2 -fdump-tree-vect-details 
>> -mtune=skylake -fno-vect-cost-model" } */
>>
>>  #include "avx2-check.h"
>>
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index 7f8d9db5363..e7a97dbe05d 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -1134,8 +1134,7 @@ vect_model_load_cost (vec_info *vinfo,
>>   slp_tree slp_node,
>>   stmt_vector_for_cost *cost_vec)
>>  {
>> -  gcc_assert (memory_access_type == VMAT_CONTIGUOUS
>> - || memory_access_type == VMAT_CONTIGUOUS_PERMUTE);
>> +  gcc_assert (memory_access_type == VMAT_CONTIGUOUS);
>>
>>unsigned int inside_cost = 0, prologue_cost = 0;
>>bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
>> @@ -1174,26 +1173,6 @@ vect_model_load_cost (vec_info *vinfo,
>>   once per group anyhow.  */
>>bool first_stmt_p = (first_stmt_info == stmt_info);
>>
>> -  /* We assume that the cost of a single load-lanes instruction is
>> - equivalent to the cost of DR_GROUP_SIZE separate loads.  If a grouped
>> - access is instead being provided by a load-and-permute operation,
>> - include the cost of the permutes.  */
>> -  if (first_stmt_p
>> -  && memory_access_type == VMAT_CONTIGUOUS_PERMUTE)
>> -{
>> -  /* Uses an even and odd extract operations or shuffle operations
>> -for each needed permute.  */
>> -  int group_size = DR_GROUP_SIZE (first_stmt_info);
>> -  int nstmts = ncopies * ceil_log2 (group_size) * group_size;
>> -  inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm,
>> -  stmt_info, 0, vect_body);
>> -
>> -  if (dump_enabled_p ())
>> -dump_printf_loc (MSG_NOTE, vect_location,
>> - "vect_model_load_cost: strided group_size = %d 
>> .\n",
>> - group_size);
>> -}
>> -
>>vect_ge

Re: [x86 PATCH] Convert ptestz of pandn into ptestc.

2023-06-19 Thread Uros Bizjak via Gcc-patches
On Fri, Jun 16, 2023 at 3:27 PM Roger Sayle  wrote:
>
>
> Hi Uros,
> Here's an updated version of this patch incorporating your comments.
> It uses emit_insn (target, const1_rtx), bt_comparison operator to
> combine the sete/setne to setc/setnc, and je/jne to jc/jnc patterns,
> uses scan-assembler-times in the test cases, and cleans up the silly
> cut'n'paste issue that mangled strict_low_part/subreg of a register
> that was already QImode.  I tried, but the strict_low_part variant
> really is required (some of the new test cases fail without it), but
> things are much neater now, and have few patterns than the original.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-06-16  Roger Sayle  
> Uros Bizjak  
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_expand_sse_ptest): Recognize
> expansion of ptestc with equal operands as producing const1_rtx.
> * config/i386/i386.cc (ix86_rtx_costs): Provide accurate cost
> estimates of UNSPEC_PTEST, where the ptest performs the PAND
> or PAND of its operands.
> * config/i386/sse.md (define_split): Transform CCCmode UNSPEC_PTEST
> of reg_equal_p operands into an x86_stc instruction.
> (define_split): Split pandn/ptestz/set{n?}e into ptestc/set{n?}c.
> (define_split): Similar to above for strict_low_part destinations.
> (define_split): Split pandn/ptestz/j{n?}e into ptestc/j{n?}c.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/avx-vptest-4.c: New test case.
> * gcc.target/i386/avx-vptest-5.c: Likewise.
> * gcc.target/i386/avx-vptest-6.c: Likewise.
> * gcc.target/i386/pr109973-1.c: Update test case.
> * gcc.target/i386/pr109973-2.c: Likewise.
> * gcc.target/i386/sse4_1-ptest-4.c: New test case.
> * gcc.target/i386/sse4_1-ptest-5.c: Likewise.
> * gcc.target/i386/sse4_1-ptest-6.c: Likewise.

+(define_split
+  [(set (strict_low_part (subreg:QI (match_operand:SI 0 "register_operand") 0))

I think you should use

(set (strict_low_part (match_operand:QI 0 "register_operand")) ... here and ...

+   (set (strict_low_part (subreg:QI (match_dup 0) 0))

corresponding

(set (strict_low_part (match_dup 0))...

without explicit SUBREG here. This will handle all subregs
automatically, as they are also matched by "register_operand"
predicate.

OK with the above change.

Thanks,
Uros.


Tiny phiprop compile time optimization

2023-06-19 Thread Jan Hubicka via Gcc-patches
Hi,
this patch avoids unnecessary post dominator and update_ssa in phiprop.

Bootstrapped/regtested x86_64-linux, OK?

gcc/ChangeLog:

* tree-ssa-phiprop.cc (propagate_with_phi): Add 
post_dominators_computed;
compute post dominators lazilly.
(const pass_data pass_data_phiprop): Remove TODO_update_ssa.
(pass_phiprop::execute): Update; return TODO_update_ssa if something
changed.

diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
index 3cb4900b6be..87e3a2ccf3a 100644
--- a/gcc/tree-ssa-phiprop.cc
+++ b/gcc/tree-ssa-phiprop.cc
@@ -260,7 +260,7 @@ chk_uses (tree, tree *idx, void *data)
 
 static bool
 propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
-   size_t n)
+   size_t n, bool *post_dominators_computed)
 {
   tree ptr = PHI_RESULT (phi);
   gimple *use_stmt;
@@ -324,6 +324,12 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
phiprop_d *phivn,
   gimple *def_stmt;
   tree vuse;
 
+  if (!*post_dominators_computed)
+{
+ calculate_dominance_info (CDI_POST_DOMINATORS);
+ *post_dominators_computed = true;
+   }
+
   /* Only replace loads in blocks that post-dominate the PHI node.  That
  makes sure we don't end up speculating loads.  */
   if (!dominated_by_p (CDI_POST_DOMINATORS,
@@ -465,7 +471,7 @@ const pass_data pass_data_phiprop =
   0, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_update_ssa, /* todo_flags_finish */
+  0, /* todo_flags_finish */
 };
 
 class pass_phiprop : public gimple_opt_pass
@@ -490,9 +497,9 @@ pass_phiprop::execute (function *fun)
   gphi_iterator gsi;
   unsigned i;
   size_t n;
+  bool post_dominators_computed = false;
 
   calculate_dominance_info (CDI_DOMINATORS);
-  calculate_dominance_info (CDI_POST_DOMINATORS);
 
   n = num_ssa_names;
   phivn = XCNEWVEC (struct phiprop_d, n);
@@ -508,7 +515,8 @@ pass_phiprop::execute (function *fun)
   if (bb_has_abnormal_pred (bb))
continue;
   for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-   did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
+   did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n,
+&post_dominators_computed);
 }
 
   if (did_something)
@@ -516,9 +524,10 @@ pass_phiprop::execute (function *fun)
 
   free (phivn);
 
-  free_dominance_info (CDI_POST_DOMINATORS);
+  if (post_dominators_computed)
+free_dominance_info (CDI_POST_DOMINATORS);
 
-  return 0;
+  return did_something ? TODO_update_ssa : 0;
 }
 
 } // anon namespace


Do not account __builtin_unreachable guards in inliner

2023-06-19 Thread Jan Hubicka via Gcc-patches
Hi,
this was suggested earlier somewhere, but I can not find the thread.
C++ has assume attribute that expands int
  if (conditional)
__builtin_unreachable ()
We do not want to account the conditional in inline heuristics since
we know that it is going to be optimized out.

Bootstrapped/regtested x86_64-linux, will commit it later today if
thre are no complains.

gcc/ChangeLog:

* ipa-fnsummary.cc (builtin_unreachable_bb_p): New function.
(analyze_function_body): Do not account conditionals guarding
builtin_unreachable calls.

gcc/testsuite/ChangeLog:

* gcc.dg/ipa/fnsummary-1.c: New test.

diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
index a5f5a50c8a5..987da29ec34 100644
--- a/gcc/ipa-fnsummary.cc
+++ b/gcc/ipa-fnsummary.cc
@@ -2649,6 +2649,54 @@ points_to_possible_sra_candidate_p (tree t)
   return false;
 }
 
+/* Return true if BB is builtin_unreachable.
+   We skip empty basic blocks, debug statements, clobbers and predicts.
+   CACHE is used to memoize already analyzed blocks.  */
+
+static bool
+builtin_unreachable_bb_p (basic_block bb, vec &cache)
+{
+  if (cache[bb->index])
+return cache[bb->index] - 1;
+  gimple_stmt_iterator si;
+  auto_vec  visited_bbs;
+  bool ret = false;
+  while (true)
+{
+  bool empty_bb = true;
+  visited_bbs.safe_push (bb);
+  cache[bb->index] = 3;
+  for (si = gsi_start_nondebug_bb (bb);
+  !gsi_end_p (si) && empty_bb;
+  gsi_next_nondebug (&si))
+   {
+ if (gimple_code (gsi_stmt (si)) != GIMPLE_PREDICT
+ && !gimple_clobber_p (gsi_stmt (si))
+ && !gimple_nop_p (gsi_stmt (si)))
+   {
+ empty_bb = false;
+ break;
+   }
+   }
+  if (!empty_bb)
+   break;
+  else
+   bb = single_succ_edge (bb)->dest;
+  if (cache[bb->index])
+   {
+ ret = cache[bb->index] == 3 ? false : cache[bb->index] - 1;
+ goto done;
+   }
+}
+  if (gimple_call_builtin_p (gsi_stmt (si), BUILT_IN_UNREACHABLE)
+  || gimple_call_builtin_p (gsi_stmt (si), BUILT_IN_UNREACHABLE_TRAP))
+ret = true;
+done:
+  for (basic_block vbb:visited_bbs)
+cache[vbb->index] = (unsigned char)ret + 1;
+  return ret;
+}
+
 /* Analyze function body for NODE.
EARLY indicates run from early optimization pipeline.  */
 
@@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, bool 
early)
   const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
   order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
   nblocks = pre_and_rev_post_order_compute (NULL, order, false);
+  auto_vec cache;
+  cache.safe_grow_cleared (last_basic_block_for_fn (cfun));
   for (n = 0; n < nblocks; n++)
 {
   bb = BASIC_BLOCK_FOR_FN (cfun, order[n]);
@@ -2901,6 +2951,24 @@ analyze_function_body (struct cgraph_node *node, bool 
early)
}
}
 
+ /* Conditionals guarding __builtin_unreachable will be
+optimized out.  */
+ if (gimple_code (stmt) == GIMPLE_COND)
+   {
+ edge_iterator ei;
+ edge e;
+ FOR_EACH_EDGE (e, ei, bb->succs)
+   if (builtin_unreachable_bb_p (e->dest, cache))
+ {
+   if (dump_file)
+ fprintf (dump_file,
+  "\t\tConditional guarding __builtin_unreachable; 
ignored\n");
+   this_time = 0;
+   this_size = 0;
+   break;
+ }
+   }
+
  /* TODO: When conditional jump or switch is known to be constant, but
 we did not translate it into the predicates, we really can account
 just maximum of the possible paths.  */
diff --git a/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c 
b/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c
new file mode 100644
index 000..a0ece0c300b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c
@@ -0,0 +1,8 @@
+/* { dg-options "-O2 -fdump-ipa-fnsummary"  } */
+int
+test(int a)
+{
+   if (a > 10)
+   __builtin_unreachable ();
+}
+/* { dg-final { scan-ipa-dump "Conditional guarding __builtin_unreachable" 
"fnsummary"  } } */


[committed] libgomp.c/target-51.c: Accept more error-msg variants in dg-output (was: Re: [committed] libgomp: Fix OMP_TARGET_OFFLOAD=mandatory)

2023-06-19 Thread Tobias Burnus

On 16.06.23 22:42, Thomas Schwinge wrote:

I see the new tests PASS, but with offloading enabled (nvptx) also see:

 PASS: libgomp.c/target-51.c (test for excess errors)
 PASS: libgomp.c/target-51.c execution test
 [-PASS:-]{+FAIL:+} libgomp.c/target-51.c output pattern test

... due to:

 Output was:

 libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device cannot be used 
for offloading

 Should match:
 .*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device not found.*


Thanks for the report. I can offer yet another wording for the same program – 
and also
with nvptx enabled:

libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device cannot be used for 
offloading

And I can also offer (which is already in the testcase with "! offload_device"):

libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is 
available

I think I will just match "..., but .*" without distinguishing 
check_effective_target_* ...

... which I now did in commit r14-1926-g01fe115ba7eafe (see also attached 
patch).

* * *

With offloading, there are simply too many possibilities:

* Not compiled with offloading support - vs. with (ENABLE_OFFLOADING)
* Support compiled in but either compiler or library support not installed
  (requires configuring with --enable-offload-defaulted)
* Offloading libgomp plugins there but no CUDA or hsa runtime libraries
* The latter being installed but no device available

Plus -foffload=disable or only enabling an (at runtime) unavailable or
unsupported device type or other issues like CUDA and device present but
an issue with the kernel driver (or similar half-broken states) or ...

[And with remote testing issues related to dg-set-target-env-var and only
few systems supporting offloading, a full test coverage is even harder.]

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 01fe115ba7eafebcf97bbac9e157038a003d0c85
Author: Tobias Burnus 
Date:   Mon Jun 19 09:52:10 2023 +0200

libgomp.c/target-51.c: Accept more error-msg variants in dg-output

Depending on the details, the testcase can fail with different but
related messages; all of the following all could be observed for this
testcase:

  libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device cannot be used for offloading
  libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device not found
  libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is available

Before, the last two were tested for with 'target offload_device' and
'! offload_device', respectively. Now, all three are accepted by matching
'.*' already after 'but' and without distinguishing whether the effective
target is an offload_device or not.

(For completeness, there is a fourth error that follows this pattern:
'OMP_TARGET_OFFLOAD is set to MANDATORY, but device is finalized'.)

libgomp/

* testsuite/libgomp.c/target-51.c: Accept more error msg variants
as expected dg-output.
---
 libgomp/testsuite/libgomp.c/target-51.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c/target-51.c b/libgomp/testsuite/libgomp.c/target-51.c
index bbe9ade6e24..db0363bfc14 100644
--- a/libgomp/testsuite/libgomp.c/target-51.c
+++ b/libgomp/testsuite/libgomp.c/target-51.c
@@ -9,8 +9,7 @@
 
 /* See comment in target-50.c/target-50.c for why the output differs.  */
 
-/* { dg-output ".*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is available.*" { target { ! offload_device } } } */
-/* { dg-output ".*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device not found.*" { target offload_device } } */
+/* { dg-output ".*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but .*" } } */
 
 int
 main ()


[PATCH v1] RISC-V: Fix out of range memory access when lto mode init

2023-06-19 Thread Pan Li via Gcc-patches
From: Pan Li 

We extend the machine mode from 8 to 16 bits already. But there still
one placing missing from the tree-streamer. It has one hard coded array
for the machine code like size 256.

In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
value of the MAX_MACHINE_MODE will grow as more and more modes are added.
While the machine mode array in tree-streamer still leave 256 as is.

Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
lto_output_init_mode_table will touch the memory out of range unexpected.

This patch would like to take the MAX_MACHINE_MODE as the size of the
array in tree-streamer, to make sure there is no potential unexpected
memory access in future.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* tree-streamer.cc (streamer_mode_table): Use MAX_MACHINE_MODE
as array size.
* tree-streamer.h (streamer_mode_table): Ditto.
---
 gcc/tree-streamer.cc | 2 +-
 gcc/tree-streamer.h  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-streamer.cc b/gcc/tree-streamer.cc
index ed65a7692e3..a28ef9c7920 100644
--- a/gcc/tree-streamer.cc
+++ b/gcc/tree-streamer.cc
@@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
During streaming in, we translate the on the disk mode using this
table.  For normal LTO it is set to identity, for ACCEL_COMPILER
depending on the mode_table content.  */
-unsigned char streamer_mode_table[1 << 8];
+unsigned char streamer_mode_table[MAX_MACHINE_MODE];
 
 /* Check that all the TS_* structures handled by the streamer_write_* and
streamer_read_* routines are exactly ALL the structures defined in
diff --git a/gcc/tree-streamer.h b/gcc/tree-streamer.h
index 170d61cf20b..51a292c8d80 100644
--- a/gcc/tree-streamer.h
+++ b/gcc/tree-streamer.h
@@ -75,7 +75,7 @@ void streamer_write_tree_body (struct output_block *, tree);
 void streamer_write_integer_cst (struct output_block *, tree);
 
 /* In tree-streamer.cc.  */
-extern unsigned char streamer_mode_table[1 << 8];
+extern unsigned char streamer_mode_table[MAX_MACHINE_MODE];
 void streamer_check_handled_ts_structures (void);
 bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
 hashval_t, unsigned *);
-- 
2.34.1



RE: [PATCH v1] RISC-V: Fix out of range memory access when lto mode init

2023-06-19 Thread Li, Pan2 via Gcc-patches
Add Richard Biener for reviewing, sorry for inconvenient.

Pan

-Original Message-
From: Li, Pan2  
Sent: Monday, June 19, 2023 4:07 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; rdapp@gmail.com; jeffreya...@gmail.com; Li, Pan2 
; Wang, Yanzhang ; 
kito.ch...@gmail.com
Subject: [PATCH v1] RISC-V: Fix out of range memory access when lto mode init

From: Pan Li 

We extend the machine mode from 8 to 16 bits already. But there still
one placing missing from the tree-streamer. It has one hard coded array
for the machine code like size 256.

In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
value of the MAX_MACHINE_MODE will grow as more and more modes are added.
While the machine mode array in tree-streamer still leave 256 as is.

Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
lto_output_init_mode_table will touch the memory out of range unexpected.

This patch would like to take the MAX_MACHINE_MODE as the size of the
array in tree-streamer, to make sure there is no potential unexpected
memory access in future.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* tree-streamer.cc (streamer_mode_table): Use MAX_MACHINE_MODE
as array size.
* tree-streamer.h (streamer_mode_table): Ditto.
---
 gcc/tree-streamer.cc | 2 +-
 gcc/tree-streamer.h  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-streamer.cc b/gcc/tree-streamer.cc
index ed65a7692e3..a28ef9c7920 100644
--- a/gcc/tree-streamer.cc
+++ b/gcc/tree-streamer.cc
@@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
During streaming in, we translate the on the disk mode using this
table.  For normal LTO it is set to identity, for ACCEL_COMPILER
depending on the mode_table content.  */
-unsigned char streamer_mode_table[1 << 8];
+unsigned char streamer_mode_table[MAX_MACHINE_MODE];
 
 /* Check that all the TS_* structures handled by the streamer_write_* and
streamer_read_* routines are exactly ALL the structures defined in
diff --git a/gcc/tree-streamer.h b/gcc/tree-streamer.h
index 170d61cf20b..51a292c8d80 100644
--- a/gcc/tree-streamer.h
+++ b/gcc/tree-streamer.h
@@ -75,7 +75,7 @@ void streamer_write_tree_body (struct output_block *, tree);
 void streamer_write_integer_cst (struct output_block *, tree);
 
 /* In tree-streamer.cc.  */
-extern unsigned char streamer_mode_table[1 << 8];
+extern unsigned char streamer_mode_table[MAX_MACHINE_MODE];
 void streamer_check_handled_ts_structures (void);
 bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
 hashval_t, unsigned *);
-- 
2.34.1



Re: [PATCH V7] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-19 Thread Robin Dapp via Gcc-patches
>> Bootstrap and Regreesion on X86 passed.
>> Jeff and Richi approved.
>>
>> Let's wait for Richard S final approve.
> 
> No need to wait.

Thanks, I pushed it in Juzhe's stead, fixing the LEN_LOAD/LEN_STORE
documentation (+ vs -) as r14-1932.

Regards
 Robin


Re: Tiny phiprop compile time optimization

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, 19 Jun 2023, Jan Hubicka wrote:

> Hi,
> this patch avoids unnecessary post dominator and update_ssa in phiprop.
> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-phiprop.cc (propagate_with_phi): Add 
> post_dominators_computed;
>   compute post dominators lazilly.
>   (const pass_data pass_data_phiprop): Remove TODO_update_ssa.
>   (pass_phiprop::execute): Update; return TODO_update_ssa if something
>   changed.
> 
> diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
> index 3cb4900b6be..87e3a2ccf3a 100644
> --- a/gcc/tree-ssa-phiprop.cc
> +++ b/gcc/tree-ssa-phiprop.cc
> @@ -260,7 +260,7 @@ chk_uses (tree, tree *idx, void *data)
>  
>  static bool
>  propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
> - size_t n)
> + size_t n, bool *post_dominators_computed)
>  {
>tree ptr = PHI_RESULT (phi);
>gimple *use_stmt;
> @@ -324,6 +324,12 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
> phiprop_d *phivn,
>gimple *def_stmt;
>tree vuse;
>  
> +  if (!*post_dominators_computed)
> +{
> +   calculate_dominance_info (CDI_POST_DOMINATORS);
> +   *post_dominators_computed = true;

I think you can save the parameter by using dom_info_available_p () here
and ...

> + }
> +
>/* Only replace loads in blocks that post-dominate the PHI node.  That
>   makes sure we don't end up speculating loads.  */
>if (!dominated_by_p (CDI_POST_DOMINATORS,
> @@ -465,7 +471,7 @@ const pass_data pass_data_phiprop =
>0, /* properties_provided */
>0, /* properties_destroyed */
>0, /* todo_flags_start */
> -  TODO_update_ssa, /* todo_flags_finish */
> +  0, /* todo_flags_finish */
>  };
>  
>  class pass_phiprop : public gimple_opt_pass
> @@ -490,9 +497,9 @@ pass_phiprop::execute (function *fun)
>gphi_iterator gsi;
>unsigned i;
>size_t n;
> +  bool post_dominators_computed = false;
>  
>calculate_dominance_info (CDI_DOMINATORS);
> -  calculate_dominance_info (CDI_POST_DOMINATORS);
>  
>n = num_ssa_names;
>phivn = XCNEWVEC (struct phiprop_d, n);
> @@ -508,7 +515,8 @@ pass_phiprop::execute (function *fun)
>if (bb_has_abnormal_pred (bb))
>   continue;
>for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> - did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
> + did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n,
> +  &post_dominators_computed);
>  }
>  
>if (did_something)
> @@ -516,9 +524,10 @@ pass_phiprop::execute (function *fun)
>  
>free (phivn);
>  
> -  free_dominance_info (CDI_POST_DOMINATORS);
> +  if (post_dominators_computed)
> +free_dominance_info (CDI_POST_DOMINATORS);

unconditionally free_dominance_info here.

> -  return 0;
> +  return did_something ? TODO_update_ssa : 0;

I guess that change is following general practice and good to catch
undesired changes (update_ssa will exit early when there's nothing
to do anyway).

OK with those changes.


[PATCH v4 3/9] MIPS: Add instruction about global pointer register for mips16e2

2023-06-19 Thread Jie Mei
The mips16e2 ASE uses eight general-purpose registers
from mips32, with some special-purpose registers,
these registers are GPRs: s0-1, v0-1, a0-3, and
special registers: t8, gp, sp, ra.

As mentioned above, the special register gp is
used in mips16e2, which is the global pointer register,
it is used by some of the instructions in the ASE,
for instance, ADDIU, LB/LBU, etc. .

This patch adds these instructions with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.cc(mips_regno_mode_ok_for_base_p): Generate 
instructions
that uses global pointer register.
(mips16_unextended_reference_p): Same as above.
(mips_pic_base_register): Same as above.
(mips_init_relocs): Same as above.
* config/mips/mips.h(MIPS16_GP_LOADS): Defined a new macro.
(GLOBAL_POINTER_REGNUM): Moved to machine description `mips.md`.
* config/mips/mips.md(GLOBAL_POINTER_REGNUM): Moved to here from above.
(*lowsi_mips16_gp):New `define_insn *low_mips16`.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-gp.c: New tests for mips16e2.
---
 gcc/config/mips/mips.cc |  10 +-
 gcc/config/mips/mips.h  |   6 +-
 gcc/config/mips/mips.md |  11 +++
 gcc/testsuite/gcc.target/mips/mips16e2-gp.c | 101 
 4 files changed, 121 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-gp.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 585a3682c7b..be470bbb50d 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -2474,6 +2474,9 @@ mips_regno_mode_ok_for_base_p (int regno, machine_mode 
mode,
   if (TARGET_MIPS16 && regno == STACK_POINTER_REGNUM)
 return GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
 
+  if (MIPS16_GP_LOADS && regno == GLOBAL_POINTER_REGNUM)
+return (UNITS_PER_WORD > 4 ? GET_MODE_SIZE (mode) <= 4 : true);
+
   return TARGET_MIPS16 ? M16_REG_P (regno) : GP_REG_P (regno);
 }
 
@@ -2689,7 +2692,8 @@ static bool
 mips16_unextended_reference_p (machine_mode mode, rtx base,
   unsigned HOST_WIDE_INT offset)
 {
-  if (mode != BLKmode && offset % GET_MODE_SIZE (mode) == 0)
+  if (mode != BLKmode && offset % GET_MODE_SIZE (mode) == 0
+  && REGNO (base) != GLOBAL_POINTER_REGNUM)
 {
   if (GET_MODE_SIZE (mode) == 4 && base == stack_pointer_rtx)
return offset < 256U * GET_MODE_SIZE (mode);
@@ -3249,7 +3253,7 @@ mips16_gp_pseudo_reg (void)
 rtx
 mips_pic_base_register (rtx temp)
 {
-  if (!TARGET_MIPS16)
+  if (MIPS16_GP_LOADS ||!TARGET_MIPS16)
 return pic_offset_table_rtx;
 
   if (currently_expanding_to_rtl)
@@ -8756,7 +8760,7 @@ mips_init_relocs (void)
}
 }
 
-  if (TARGET_MIPS16)
+  if (!MIPS16_GP_LOADS && TARGET_MIPS16)
 {
   /* The high part is provided by a pseudo copy of $gp.  */
   mips_split_p[SYMBOL_GP_RELATIVE] = true;
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index a94b253e898..3ec33fbba71 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1375,6 +1375,8 @@ struct mips_cpu_info {
 /* ISA includes the pop instruction.  */
 #define ISA_HAS_POP(TARGET_OCTEON && !TARGET_MIPS16)
 
+#define MIPS16_GP_LOADS(ISA_HAS_MIPS16E2 && !TARGET_64BIT)
+
 /* The CACHE instruction is available in non-MIPS16 code.  */
 #define TARGET_CACHE_BUILTIN (mips_isa >= MIPS_ISA_MIPS3)
 
@@ -2067,10 +2069,6 @@ FP_ASM_SPEC "\
function address than to call an address kept in a register.  */
 #define NO_FUNCTION_CSE 1
 
-/* The ABI-defined global pointer.  Sometimes we use a different
-   register in leaf functions: see PIC_OFFSET_TABLE_REGNUM.  */
-#define GLOBAL_POINTER_REGNUM (GP_REG_FIRST + 28)
-
 /* We normally use $28 as the global pointer.  However, when generating
n32/64 PIC, it is better for leaf functions to use a call-clobbered
register instead.  They can then avoid saving and restoring $28
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 5c00ecd50c1..71050ba4f43 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -167,6 +167,7 @@
(GET_FCSR_REGNUM2)
(SET_FCSR_REGNUM4)
(PIC_FUNCTION_ADDR_REGNUM   25)
+   (GLOBAL_POINTER_REGNUM  28)
(RETURN_ADDR_REGNUM 31)
(CPRESTORE_SLOT_REGNUM  76)
(GOT_VERSION_REGNUM 79)
@@ -4678,6 +4679,16 @@
   [(set_attr "alu_type" "add")
(set_attr "mode" "")])
 
+(define_insn "*lowsi_mips16_gp"
+  [(set (match_operand:SI 0 "register_operand" "=d")
+(lo_sum:SI (reg:SI GLOBAL_POINTER_REGNUM)
+ (match_operand 1 "immediate_operand" "")))]
+  "MIPS16_GP_LOADS"
+  "addiu\t%0,$28,%R1"
+  [(set_attr "alu_type" "add")
+   (set_attr "mode" "SI")
+   (set_attr "extended_mips16" "yes")])
+
 (define_insn "*low_mips16"
   [(set (match_operand:P 0 "register_operand" "=d")
(lo_sum:P (match_operand:P 1 "register_operand" "0

[PATCH v4 5/9] MIPS: Add LUI instruction for mips16e2

2023-06-19 Thread Jie Mei
This patch adds LUI instruction from mips16e2
with corresponding test.

gcc/ChangeLog:

* config/mips/mips.cc(mips_symbol_insns_1): Generates LUI instruction.
(mips_const_insns): Same as above.
(mips_output_move): Same as above.
(mips_output_function_prologue): Same as above.
* config/mips/mips.md: Same as above

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: Add new tests for mips16e2.
---
 gcc/config/mips/mips.cc  | 44 ++--
 gcc/config/mips/mips.md  |  2 +-
 gcc/testsuite/gcc.target/mips/mips16e2.c | 22 
 3 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 33a1bada831..cdc04bec217 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -2295,7 +2295,9 @@ mips_symbol_insns_1 (enum mips_symbol_type type, 
machine_mode mode)
 The final address is then $at + %lo(symbol).  With 32-bit
 symbols we just need a preparatory LUI for normal mode and
 a preparatory LI and SLL for MIPS16.  */
-  return ABI_HAS_64BIT_SYMBOLS ? 6 : TARGET_MIPS16 ? 3 : 2;
+  return ABI_HAS_64BIT_SYMBOLS
+ ? 6
+ : (TARGET_MIPS16 && !ISA_HAS_MIPS16E2) ? 3 : 2;
 
 case SYMBOL_GP_RELATIVE:
   /* Treat GP-relative accesses as taking a single instruction on
@@ -2867,7 +2869,7 @@ mips_const_insns (rtx x)
 
   /* This is simply an LUI for normal mode.  It is an extended
 LI followed by an extended SLL for MIPS16.  */
-  return TARGET_MIPS16 ? 4 : 1;
+  return TARGET_MIPS16 ? (ISA_HAS_MIPS16E2 ? 2 : 4) : 1;
 
 case CONST_INT:
   if (TARGET_MIPS16)
@@ -2879,7 +2881,10 @@ mips_const_insns (rtx x)
: SMALL_OPERAND_UNSIGNED (INTVAL (x)) ? 2
: IN_RANGE (-INTVAL (x), 0, 255) ? 2
: SMALL_OPERAND_UNSIGNED (-INTVAL (x)) ? 3
-   : 0);
+   : ISA_HAS_MIPS16E2
+ ? (trunc_int_for_mode (INTVAL (x), SImode) == INTVAL (x)
+? 4 : 8)
+ : 0);
 
   return mips_build_integer (codes, INTVAL (x));
 
@@ -5252,6 +5257,11 @@ mips_output_move (rtx dest, rtx src)
  if (!TARGET_MIPS16)
return "li\t%0,%1\t\t\t# %X1";
 
+ if (ISA_HAS_MIPS16E2
+ && LUI_INT (src)
+ && !SMALL_OPERAND_UNSIGNED (INTVAL (src)))
+   return "lui\t%0,%%hi(%1)\t\t\t# %X1";
+
  if (SMALL_OPERAND_UNSIGNED (INTVAL (src)))
return "li\t%0,%1";
 
@@ -5260,7 +5270,7 @@ mips_output_move (rtx dest, rtx src)
}
 
   if (src_code == HIGH)
-   return TARGET_MIPS16 ? "#" : "lui\t%0,%h1";
+   return (TARGET_MIPS16 && !ISA_HAS_MIPS16E2) ? "#" : "lui\t%0,%h1";
 
   if (CONST_GP_P (src))
return "move\t%0,%1";
@@ -11983,13 +11993,25 @@ mips_output_function_prologue (FILE *file)
 {
   if (TARGET_MIPS16)
{
- /* This is a fixed-form sequence.  The position of the
-first two instructions is important because of the
-way _gp_disp is defined.  */
- output_asm_insn ("li\t$2,%%hi(_gp_disp)", 0);
- output_asm_insn ("addiu\t$3,$pc,%%lo(_gp_disp)", 0);
- output_asm_insn ("sll\t$2,16", 0);
- output_asm_insn ("addu\t$2,$3", 0);
+ if (ISA_HAS_MIPS16E2)
+   {
+ /* This is a fixed-form sequence.  The position of the
+first two instructions is important because of the
+way _gp_disp is defined.  */
+ output_asm_insn ("lui\t$2,%%hi(_gp_disp)", 0);
+ output_asm_insn ("addiu\t$3,$pc,%%lo(_gp_disp)", 0);
+ output_asm_insn ("addu\t$2,$3", 0);
+   }
+ else
+   {
+ /* This is a fixed-form sequence.  The position of the
+first two instructions is important because of the
+way _gp_disp is defined.  */
+ output_asm_insn ("li\t$2,%%hi(_gp_disp)", 0);
+ output_asm_insn ("addiu\t$3,$pc,%%lo(_gp_disp)", 0);
+ output_asm_insn ("sll\t$2,16", 0);
+ output_asm_insn ("addu\t$2,$3", 0);
+   }
}
   else
{
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index db11646709c..b2ab23dc931 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -4634,7 +4634,7 @@
 (define_split
   [(set (match_operand:P 0 "d_operand")
(high:P (match_operand:P 1 "symbolic_operand_with_high")))]
-  "TARGET_MIPS16 && reload_completed"
+  "TARGET_MIPS16 && reload_completed && !ISA_HAS_MIPS16E2"
   [(set (match_dup 0) (unspec:P [(match_dup 1)] UNSPEC_UNSHIFTED_HIGH))
(set (match_dup 0) (ashift:P (match_dup 0) (const_int 16)))])
 
diff --git a/gcc/testsuite/gcc.target/mips/mips16e2.c 
b/gcc/testsuite/gcc.target/mips/mips16e2.c
index ce8b4f1819b..780891b4056 100644
--- a/gcc/testsuite/gcc.target/mips/mips16e2.c

[PATCH v4 7/9] MIPS: Use ISA_HAS_9BIT_DISPLACEMENT for mips16e2

2023-06-19 Thread Jie Mei
The MIPS16e2 ASE has PREF, LL and SC instructions,
they use 9 bits immediate, like mips32r6.
The MIPS32 PRE-R6 uses 16 bits immediate.

gcc/ChangeLog:

* config/mips/mips.h(ISA_HAS_9BIT_DISPLACEMENT): Add clause
for ISA_HAS_MIPS16E2.
(ISA_HAS_SYNC): Same as above.
(ISA_HAS_LL_SC): Same as above.
---
 gcc/config/mips/mips.h | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index e09a6c60157..05ccd2061c7 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1248,7 +1248,8 @@ struct mips_cpu_info {
 && !TARGET_MIPS16)
 
 /* ISA has data prefetch, LL and SC with limited 9-bit displacement.  */
-#define ISA_HAS_9BIT_DISPLACEMENT  (mips_isa_rev >= 6)
+#define ISA_HAS_9BIT_DISPLACEMENT  (mips_isa_rev >= 6  \
+|| ISA_HAS_MIPS16E2)
 
 /* ISA has data indexed prefetch instructions.  This controls use of
'prefx', along with TARGET_HARD_FLOAT and TARGET_DOUBLE_FLOAT.
@@ -1341,7 +1342,8 @@ struct mips_cpu_info {
 #define ISA_HAS_SYNCI (mips_isa_rev >= 2 && !TARGET_MIPS16)
 
 /* ISA includes sync.  */
-#define ISA_HAS_SYNC ((mips_isa >= MIPS_ISA_MIPS2 || TARGET_MIPS3900) && 
!TARGET_MIPS16)
+#define ISA_HAS_SYNC ((mips_isa >= MIPS_ISA_MIPS2 || TARGET_MIPS3900)  \
+ && (!TARGET_MIPS16 || ISA_HAS_MIPS16E2))
 #define GENERATE_SYNC  \
   (target_flags_explicit & MASK_LLSC   \
? TARGET_LLSC && !TARGET_MIPS16 \
@@ -1350,7 +1352,8 @@ struct mips_cpu_info {
 /* ISA includes ll and sc.  Note that this implies ISA_HAS_SYNC
because the expanders use both ISA_HAS_SYNC and ISA_HAS_LL_SC
instructions.  */
-#define ISA_HAS_LL_SC (mips_isa >= MIPS_ISA_MIPS2 && !TARGET_MIPS5900 && 
!TARGET_MIPS16)
+#define ISA_HAS_LL_SC (mips_isa >= MIPS_ISA_MIPS2 && !TARGET_MIPS5900  \
+  && (!TARGET_MIPS16 || ISA_HAS_MIPS16E2))
 #define GENERATE_LL_SC \
   (target_flags_explicit & MASK_LLSC   \
? TARGET_LLSC && !TARGET_MIPS16 \
-- 
2.40.1


[PATCH v4 0/9] MIPS: Add MIPS16e2 ASE instrucions.

2023-06-19 Thread Jie Mei
Patch V2: adds new patch.
Patch V3: `%{mmips16e2} \` puts the wrong palce in first patch,
V3 fix it.
Patch V4: fixed style error for the patch.

The MIPS16e2 ASE is an enhancement to the MIPS16e ASE,
which includes all MIPS16e instructions, with some addition.

This series of patches adds all instructions from MIPS16E2 ASE
with corresponding tests.

Jie Mei (9):
  MIPS: Add basic support for mips16e2
  MIPS: Add MOVx instructions support for mips16e2
  MIPS: Add instruction about global pointer register for mips16e2
  MIPS: Add bitwise instructions for mips16e2
  MIPS: Add LUI instruction for mips16e2
  MIPS: Add load/store word left/right instructions for mips16e2
  MIPS: Use ISA_HAS_9BIT_DISPLACEMENT for mips16e2
  MIPS: Add CACHE instruction for mips16e2
  MIPS: Make mips16e2 generating ZEB/ZEH instead of ANDI under certain
conditions

 gcc/config/mips/constraints.md|   4 +
 gcc/config/mips/mips-protos.h |   4 +
 gcc/config/mips/mips.cc   | 164 ++--
 gcc/config/mips/mips.h|  33 ++-
 gcc/config/mips/mips.md   | 200 ---
 gcc/config/mips/mips.opt  |   4 +
 gcc/config/mips/predicates.md |  21 +-
 gcc/doc/invoke.texi   |   7 +
 gcc/testsuite/gcc.target/mips/mips.exp|  10 +
 .../gcc.target/mips/mips16e2-cache.c  |  34 +++
 gcc/testsuite/gcc.target/mips/mips16e2-cmov.c |  68 +
 gcc/testsuite/gcc.target/mips/mips16e2-gp.c   | 101 
 gcc/testsuite/gcc.target/mips/mips16e2.c  | 240 ++
 13 files changed, 826 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-cache.c
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-cmov.c
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-gp.c
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2.c

-- 
2.40.1


[PATCH v4 6/9] MIPS: Add load/store word left/right instructions for mips16e2

2023-06-19 Thread Jie Mei
This patch adds LWL/LWR, SWL/SWR instructions with their
corresponding tests.

gcc/ChangeLog:

* config/mips/mips.cc(mips_expand_ins_as_unaligned_store):
Add logics for generating instruction.
* config/mips/mips.h(ISA_HAS_LWL_LWR): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md(mov_l): Generates instructions.
(mov_r): Same as above.
(mov_l): Adjusted for the conditions above.
(mov_r): Same as above.
(mov_l_mips16e2): Add machine description for `define_insn 
mov_l_mips16e2`.
(mov_r_mips16e2): Add machine description for `define_insn 
mov_r_mips16e2`.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: New tests for mips16e2.
---
 gcc/config/mips/mips.cc  |  15 ++-
 gcc/config/mips/mips.h   |   3 +-
 gcc/config/mips/mips.md  |  43 +++--
 gcc/testsuite/gcc.target/mips/mips16e2.c | 116 +++
 4 files changed, 169 insertions(+), 8 deletions(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index cdc04bec217..124a82b6a46 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -8603,12 +8603,25 @@ mips_expand_ins_as_unaligned_store (rtx dest, rtx src, 
HOST_WIDE_INT width,
 return false;
 
   mode = int_mode_for_size (width, 0).require ();
-  src = gen_lowpart (mode, src);
+  if (TARGET_MIPS16
+  && src == const0_rtx)
+src = force_reg (mode, src);
+  else
+src = gen_lowpart (mode, src);
+
   if (mode == DImode)
 {
+  if (TARGET_MIPS16)
+   gcc_unreachable ();
   emit_insn (gen_mov_sdl (dest, src, left));
   emit_insn (gen_mov_sdr (copy_rtx (dest), copy_rtx (src), right));
 }
+  else if (TARGET_MIPS16)
+{
+  emit_insn (gen_mov_swl_mips16e2 (dest, src, left));
+  emit_insn (gen_mov_swr_mips16e2 (copy_rtx (dest), copy_rtx (src),
+  right));
+}
   else
 {
   emit_insn (gen_mov_swl (dest, src, left));
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index eefe2aa0910..e09a6c60157 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1180,7 +1180,8 @@ struct mips_cpu_info {
  && (MODE) == V2SFmode))   \
 && !TARGET_MIPS16)
 
-#define ISA_HAS_LWL_LWR(mips_isa_rev <= 5 && !TARGET_MIPS16)
+#define ISA_HAS_LWL_LWR(mips_isa_rev <= 5 \
+&& (!TARGET_MIPS16 || ISA_HAS_MIPS16E2))
 
 #define ISA_HAS_IEEE_754_LEGACY(mips_isa_rev <= 5)
 
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index b2ab23dc931..a0c1ea8e762 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -4488,10 +4488,12 @@
(unspec:GPR [(match_operand:BLK 1 "memory_operand" "m")
 (match_operand:QI 2 "memory_operand" "ZC")]
UNSPEC_LOAD_LEFT))]
-  "!TARGET_MIPS16 && mips_mem_fits_mode_p (mode, operands[1])"
+  "(!TARGET_MIPS16 || ISA_HAS_MIPS16E2)
+&& mips_mem_fits_mode_p (mode, operands[1])"
   "l\t%0,%2"
   [(set_attr "move_type" "load")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")
+   (set_attr "extended_mips16" "yes")])
 
 (define_insn "mov_r"
   [(set (match_operand:GPR 0 "register_operand" "=d")
@@ -4499,17 +4501,20 @@
 (match_operand:QI 2 "memory_operand" "ZC")
 (match_operand:GPR 3 "register_operand" "0")]
UNSPEC_LOAD_RIGHT))]
-  "!TARGET_MIPS16 && mips_mem_fits_mode_p (mode, operands[1])"
+  "(!TARGET_MIPS16 || ISA_HAS_MIPS16E2)
+&& mips_mem_fits_mode_p (mode, operands[1])"
   "r\t%0,%2"
   [(set_attr "move_type" "load")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")
+   (set_attr "extended_mips16" "yes")])
 
 (define_insn "mov_l"
   [(set (match_operand:BLK 0 "memory_operand" "=m")
(unspec:BLK [(match_operand:GPR 1 "reg_or_0_operand" "dJ")
 (match_operand:QI 2 "memory_operand" "ZC")]
UNSPEC_STORE_LEFT))]
-  "!TARGET_MIPS16 && mips_mem_fits_mode_p (mode, operands[0])"
+  "!TARGET_MIPS16
+   && mips_mem_fits_mode_p (mode, operands[0])"
   "l\t%z1,%2"
   [(set_attr "move_type" "store")
(set_attr "mode" "")])
@@ -4520,11 +4525,37 @@
 (match_operand:QI 2 "memory_operand" "ZC")
 (match_dup 0)]
UNSPEC_STORE_RIGHT))]
-  "!TARGET_MIPS16 && mips_mem_fits_mode_p (mode, operands[0])"
+  "!TARGET_MIPS16
+   && mips_mem_fits_mode_p (mode, operands[0])"
   "r\t%z1,%2"
   [(set_attr "move_type" "store")
(set_attr "mode" "")])
 
+(define_insn "mov_l_mips16e2"
+  [(set (match_operand:BLK 0 "memory_operand" "=m")
+(unspec:BLK [(match_operand:GPR 1 "register_operand" "d")
+(match_operand:QI 2 "memory_operand" "ZC")]
+   UNSPEC_STORE_LEFT))]
+  "TARGET_MIPS16 && ISA_HAS_MIPS16E2
+   && mips_mem_fits_mode_p (m

[PATCH v4 9/9] MIPS: Make mips16e2 generating ZEB/ZEH instead of ANDI under certain conditions

2023-06-19 Thread Jie Mei
This patch allows mips16e2 acts the same with -O1~3
when generating ZEB/ZEH instead of ANDI under
the -O0 option, which shrinks the code size.

gcc/ChangeLog:
* config/mips/mips.md(*and3_mips16): Generates
ZEB/ZEH instructions.
---
 gcc/config/mips/mips.md | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 8a8663a171f..e1beb84a287 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -3357,9 +3357,9 @@
(set_attr "mode" "")])
 
 (define_insn "*and3_mips16"
-  [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d,d,d,d,d")
-   (and:GPR (match_operand:GPR 1 "nonimmediate_operand" 
"%W,W,W,d,0,d,0,0?")
-(match_operand:GPR 2 "and_operand" "Yb,Yh,Yw,Yw,d,Yx,Yz,K")))]
+  [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d,d,d,d,d,d,d")
+   (and:GPR (match_operand:GPR 1 "nonimmediate_operand" 
"%0,0,W,W,W,d,0,d,0,0?")
+(match_operand:GPR 2 "and_operand" 
"Yb,Yh,Yb,Yh,Yw,Yw,d,Yx,Yz,K")))]
   "TARGET_MIPS16 && and_operands_ok (mode, operands[1], operands[2])"
 {
   int len;
@@ -3368,38 +3368,42 @@
   switch (which_alternative)
 {
 case 0:
+  return "zeb\t%0";
+case 1:
+  return "zeh\t%0";
+case 2:
   operands[1] = gen_lowpart (QImode, operands[1]);
   return "lbu\t%0,%1";
-case 1:
+case 3:
   operands[1] = gen_lowpart (HImode, operands[1]);
   return "lhu\t%0,%1";
-case 2:
+case 4:
   operands[1] = gen_lowpart (SImode, operands[1]);
   return "lwu\t%0,%1";
-case 3:
+case 5:
   return "#";
-case 4:
+case 6:
   return "and\t%0,%2";
-case 5:
+case 7:
   len = low_bitmask_len (mode, INTVAL (operands[2]));
   operands[2] = GEN_INT (len);
   return "ext\t%0,%1,0,%2";
-case 6:
+case 8:
   mips_bit_clear_info (mode, INTVAL (operands[2]), &pos, &len);
   operands[1] = GEN_INT (pos);
   operands[2] = GEN_INT (len);
   return "ins\t%0,$0,%1,%2";
-case 7:
+case 9:
   return "andi\t%0,%x2";
 default:
   gcc_unreachable ();
 }
 }
-  [(set_attr "move_type" 
"load,load,load,shift_shift,logical,ext_ins,ext_ins,andi")
+  [(set_attr "move_type" 
"andi,andi,load,load,load,shift_shift,logical,ext_ins,ext_ins,andi")
(set_attr "mode" "")
-   (set_attr "extended_mips16" "no,no,no,no,no,yes,yes,yes")
+   (set_attr "extended_mips16" "no,no,no,no,no,no,no,yes,yes,yes")
(set (attr "enabled")
-   (cond [(and (eq_attr "alternative" "7")
+   (cond [(and (eq_attr "alternative" "9")
   (not (match_test "ISA_HAS_MIPS16E2")))
  (const_string "no")
  (and (eq_attr "alternative" "0,1")
-- 
2.40.1


[PATCH v4 8/9] MIPS: Add CACHE instruction for mips16e2

2023-06-19 Thread Jie Mei
This patch adds CACHE instruction from mips16e2
with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.c(mips_9bit_offset_address_p): Restrict the
address register to M16_REGS for MIPS16.
(BUILTIN_AVAIL_MIPS16E2): Defined a new macro.
(AVAIL_MIPS16E2_OR_NON_MIPS16): Same as above.
(AVAIL_NON_MIPS16 (cache..)): Update to
AVAIL_MIPS16E2_OR_NON_MIPS16.
* config/mips/mips.h (ISA_HAS_CACHE): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md (mips_cache): Mark as extended MIPS16.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-cache.c: New tests for mips16e2.
---
 gcc/config/mips/mips.cc   | 25 --
 gcc/config/mips/mips.h|  3 +-
 gcc/config/mips/mips.md   |  3 +-
 .../gcc.target/mips/mips16e2-cache.c  | 34 +++
 4 files changed, 60 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-cache.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 124a82b6a46..b286e927fea 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -2845,6 +2845,9 @@ mips_9bit_offset_address_p (rtx x, machine_mode mode)
   return (mips_classify_address (&addr, x, mode, false)
  && addr.type == ADDRESS_REG
  && CONST_INT_P (addr.offset)
+ && (!TARGET_MIPS16E2
+ || M16_REG_P (REGNO (addr.reg))
+ || REGNO (addr.reg) >= FIRST_PSEUDO_REGISTER)
  && MIPS_9BIT_OFFSET_P (INTVAL (addr.offset)));
 }
 
@@ -15412,9 +15415,13 @@ mips_loongson_ext2_prefetch_cookie (rtx write, rtx)
The function is available on the current target if !TARGET_MIPS16.
 
BUILTIN_AVAIL_MIPS16
-   The function is available on the current target if TARGET_MIPS16.  */
+   The function is available on the current target if TARGET_MIPS16.
+
+   BUILTIN_AVAIL_MIPS16E2
+   The function is available on the current target if TARGET_MIPS16E2.  */
 #define BUILTIN_AVAIL_NON_MIPS16 1
 #define BUILTIN_AVAIL_MIPS16 2
+#define BUILTIN_AVAIL_MIPS16E2 4
 
 /* Declare an availability predicate for built-in functions that
require non-MIPS16 mode and also require COND to be true.
@@ -15426,6 +15433,17 @@ mips_loongson_ext2_prefetch_cookie (rtx write, rtx)
return (COND) ? BUILTIN_AVAIL_NON_MIPS16 : 0;   \
  }
 
+/* Declare an availability predicate for built-in functions that
+   require non-MIPS16 mode or MIPS16E2 and also require COND to be true.
+   NAME is the main part of the predicate's name.  */
+#define AVAIL_MIPS16E2_OR_NON_MIPS16(NAME, COND)   \
+ static unsigned int   \
+ mips_builtin_avail_##NAME (void)  \
+ { \
+   return ((COND) ? BUILTIN_AVAIL_NON_MIPS16 | BUILTIN_AVAIL_MIPS16E2  \
+  : 0);\
+ }
+
 /* Declare an availability predicate for built-in functions that
support both MIPS16 and non-MIPS16 code and also require COND
to be true.  NAME is the main part of the predicate's name.  */
@@ -15471,7 +15489,7 @@ AVAIL_NON_MIPS16 (dsp_32, !TARGET_64BIT && TARGET_DSP)
 AVAIL_NON_MIPS16 (dsp_64, TARGET_64BIT && TARGET_DSP)
 AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BIT && TARGET_DSPR2)
 AVAIL_NON_MIPS16 (loongson, TARGET_LOONGSON_MMI)
-AVAIL_NON_MIPS16 (cache, TARGET_CACHE_BUILTIN)
+AVAIL_MIPS16E2_OR_NON_MIPS16 (cache, TARGET_CACHE_BUILTIN)
 AVAIL_NON_MIPS16 (msa, TARGET_MSA)
 
 /* Construct a mips_builtin_description from the given arguments.
@@ -17471,7 +17489,8 @@ mips_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
   d = &mips_builtins[fcode];
   avail = d->avail ();
   gcc_assert (avail != 0);
-  if (TARGET_MIPS16 && !(avail & BUILTIN_AVAIL_MIPS16))
+  if (TARGET_MIPS16 && !(avail & BUILTIN_AVAIL_MIPS16)
+  && (!TARGET_MIPS16E2 || !(avail & BUILTIN_AVAIL_MIPS16E2)))
 {
   error ("built-in function %qE not supported for MIPS16",
 DECL_NAME (fndecl));
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 05ccd2061c7..0b6ea78290e 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1386,7 +1386,8 @@ struct mips_cpu_info {
 #define TARGET_CACHE_BUILTIN (mips_isa >= MIPS_ISA_MIPS3)
 
 /* The CACHE instruction is available.  */
-#define ISA_HAS_CACHE (TARGET_CACHE_BUILTIN && !TARGET_MIPS16)
+#define ISA_HAS_CACHE (TARGET_CACHE_BUILTIN && (!TARGET_MIPS16 \
+   || TARGET_MIPS16E2))
 
 /* Tell collect what flags to pass to nm.  */
 #ifndef NM_FLAGS
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index a0c1ea8e762..8a8663a171f 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -5751,7 +5751,8 @@
 (match_operand:QI 1 "address_op

[PATCH v4 1/9] MIPS: Add basic support for mips16e2

2023-06-19 Thread Jie Mei
The MIPS16e2 ASE is an enhancement to the MIPS16e ASE,
which includes all MIPS16e instructions, with some addition.
It defines new special instructions for increasing
code density (e.g. Extend, PC-relative instructions, etc.).

This patch adds basic support for mips16e2 used by the
following series of patches.

gcc/ChangeLog:

* config/mips/mips.cc(mips_file_start): Add mips16e2 info
for output file.
* config/mips/mips.h(__mips_mips16e2): Defined a new
predefine macro.
(ISA_HAS_MIPS16E2): Defined a new macro.
(ASM_SPEC): Pass mmips16e2 to the assembler.
* config/mips/mips.opt: Add -m(no-)mips16e2 option.
* config/mips/predicates.md: Add clause for TARGET_MIPS16E2.
* doc/invoke.texi: Add -m(no-)mips16e2 option..

gcc/testsuite/ChangeLog:
* gcc.target/mips/mips.exp(mips_option_groups): Add -mmips16e2
option.
(mips-dg-init): Handle the recognization of mips16e2 targets.
(mips-dg-options): Add dependencies for mips16e2.
---
 gcc/config/mips/mips.cc|  3 ++-
 gcc/config/mips/mips.h |  8 
 gcc/config/mips/mips.opt   |  4 
 gcc/config/mips/predicates.md  |  2 +-
 gcc/doc/invoke.texi|  7 +++
 gcc/testsuite/gcc.target/mips/mips.exp | 10 ++
 6 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index ca822758b41..585a3682c7b 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -10047,7 +10047,8 @@ mips_file_start (void)
 fputs ("\t.module\tmsa\n", asm_out_file);
   if (TARGET_XPA)
 fputs ("\t.module\txpa\n", asm_out_file);
-  /* FIXME: MIPS16E2 is not supported by GCC? gas does support it */
+  if (TARGET_MIPS16E2)
+fputs ("\t.module\tmips16e2\n", asm_out_file);
   if (TARGET_CRC)
 fputs ("\t.module\tcrc\n", asm_out_file);
   if (TARGET_GINV)
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 6daf6d37165..c6781670a54 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -475,6 +475,9 @@ struct mips_cpu_info {
   if (mips_base_compression_flags & MASK_MIPS16)   \
builtin_define ("__mips16");\
\
+  if (TARGET_MIPS16E2) \
+   builtin_define ("__mips_mips16e2"); \
+   \
   if (TARGET_MIPS3D)   \
builtin_define ("__mips3d");\
\
@@ -1291,6 +1294,10 @@ struct mips_cpu_info {
 /* The MSA ASE is available.  */
 #define ISA_HAS_MSA(TARGET_MSA && !TARGET_MIPS16)
 
+/* The MIPS16e V2 instructions are available.  */
+#define ISA_HAS_MIPS16E2   (TARGET_MIPS16 && TARGET_MIPS16E2 \
+   && !TARGET_64BIT)
+
 /* True if the result of a load is not available to the next instruction.
A nop will then be needed between instructions like "lw $4,..."
and "addiu $4,$4,1".  */
@@ -1450,6 +1457,7 @@ struct mips_cpu_info {
 %{msym32} %{mno-sym32} \
 %{mtune=*}" \
 FP_ASM_SPEC "\
+%{mmips16e2} \
 %(subtarget_asm_spec)"
 
 /* Extra switches sometimes passed to the linker.  */
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index 195f5be01cc..4968ed0d544 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -380,6 +380,10 @@ msplit-addresses
 Target Mask(SPLIT_ADDRESSES)
 Optimize lui/addiu address loads.
 
+mmips16e2
+Target Var(TARGET_MIPS16E2) Init(0)
+Enable the MIPS16e V2 instructions.
+
 msym32
 Target Var(TARGET_SYM32)
 Assume all symbols have 32-bit values.
diff --git a/gcc/config/mips/predicates.md b/gcc/config/mips/predicates.md
index e34de2937cc..87460a64652 100644
--- a/gcc/config/mips/predicates.md
+++ b/gcc/config/mips/predicates.md
@@ -369,7 +369,7 @@
 {
   /* When generating mips16 code, TARGET_LEGITIMATE_CONSTANT_P rejects
  CONST_INTs that can't be loaded using simple insns.  */
-  if (TARGET_MIPS16)
+  if (TARGET_MIPS16 && !TARGET_MIPS16E2)
 return false;
 
   /* Don't handle multi-word moves this way; we don't want to introduce
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ee78591c73e..1fa00594a91 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -26735,6 +26735,13 @@ MIPS16 code generation can also be controlled on a 
per-function basis
 by means of @code{mips16} and @code{nomips16} attributes.
 @xref{Function Attributes}, for more information.
 
+@opindex mmips16e2
+@opindex mno-mips16e2
+@item -mmips16e2
+@itemx -mno-mips16e2
+Use (do not use) the MIPS16e2 ASE.  This option modifies the behavior
+of the @option{-mips16} option such that it targets th

[PATCH v4 2/9] MIPS: Add MOVx instructions support for mips16e2

2023-06-19 Thread Jie Mei
This patch adds MOVx instructions from mips16e2
(movn,movz,movtn,movtz) with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.h(ISA_HAS_CONDMOVE): Add condition for 
ISA_HAS_MIPS16E2.
* config/mips/mips.md(*mov_on_): Add logics for 
MOVx insts.
(*mov_on__mips16e2): Generate MOVx instruction.
(*mov_on__ne): Add logics for MOVx insts.
(*mov_on__ne_mips16e2): Generate MOVx instruction.
* config/mips/predicates.md(reg_or_0_operand_mips16e2): New predicate 
for MOVx insts.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-cmov.c: Added tests for MOVx instructions.
---
 gcc/config/mips/mips.h|  1 +
 gcc/config/mips/mips.md   | 38 ++-
 gcc/config/mips/predicates.md |  6 ++
 gcc/testsuite/gcc.target/mips/mips16e2-cmov.c | 68 +++
 4 files changed, 111 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-cmov.c

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index c6781670a54..a94b253e898 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1081,6 +1081,7 @@ struct mips_cpu_info {
ST Loongson 2E/2F.  */
 #define ISA_HAS_CONDMOVE(ISA_HAS_FP_CONDMOVE   \
 || TARGET_MIPS5900 \
+|| ISA_HAS_MIPS16E2\
 || TARGET_LOONGSON_2EF)
 
 /* ISA has LDC1 and SDC1.  */
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index ac1d77afc7d..5c00ecd50c1 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -7341,26 +7341,60 @@
 (const_int 0)])
 (match_operand:GPR 2 "reg_or_0_operand" "dJ,0")
 (match_operand:GPR 3 "reg_or_0_operand" "0,dJ")))]
-  "ISA_HAS_CONDMOVE"
+  "!TARGET_MIPS16 && ISA_HAS_CONDMOVE"
   "@
 mov%T4\t%0,%z2,%1
 mov%t4\t%0,%z3,%1"
   [(set_attr "type" "condmove")
(set_attr "mode" "")])
 
+(define_insn "*mov_on__mips16e2"
+  [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d")
+   (if_then_else:GPR
+(match_operator 4 "equality_operator"
+   [(match_operand:MOVECC 1 "register_operand" 
",,t,t")
+(const_int 0)])
+(match_operand:GPR 2 "reg_or_0_operand_mips16e2" "dJ,0,dJ,0")
+(match_operand:GPR 3 "reg_or_0_operand_mips16e2" "0,dJ,0,dJ")))]
+  "ISA_HAS_MIPS16E2 && ISA_HAS_CONDMOVE"
+  "@
+mov%T4\t%0,%z2,%1
+mov%t4\t%0,%z3,%1
+movt%T4\t%0,%z2
+movt%t4\t%0,%z3"
+  [(set_attr "type" "condmove")
+   (set_attr "mode" "")
+   (set_attr "extended_mips16" "yes")])
+
 (define_insn "*mov_on__ne"
   [(set (match_operand:GPR 0 "register_operand" "=d,d")
(if_then_else:GPR
 (match_operand:GPR2 1 "register_operand" ",")
 (match_operand:GPR 2 "reg_or_0_operand" "dJ,0")
 (match_operand:GPR 3 "reg_or_0_operand" "0,dJ")))]
-  "ISA_HAS_CONDMOVE"
+  "!TARGET_MIPS16 && ISA_HAS_CONDMOVE"
   "@
 movn\t%0,%z2,%1
 movz\t%0,%z3,%1"
   [(set_attr "type" "condmove")
(set_attr "mode" "")])
 
+(define_insn "*mov_on__ne_mips16e2"
+  [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d")
+  (if_then_else:GPR
+   (match_operand:GPR2 1 "register_operand" 
",,t,t")
+   (match_operand:GPR 2 "reg_or_0_operand_mips16e2" "dJ,0,dJ,0")
+   (match_operand:GPR 3 "reg_or_0_operand_mips16e2" "0,dJ,0,dJ")))]
+ "ISA_HAS_MIPS16E2 && ISA_HAS_CONDMOVE"
+  "@
+movn\t%0,%z2,%1
+movz\t%0,%z3,%1
+movtn\t%0,%z2
+movtz\t%0,%z3"
+  [(set_attr "type" "condmove")
+   (set_attr "mode" "")
+   (set_attr "extended_mips16" "yes")])
+
 (define_insn "*mov_on_"
   [(set (match_operand:SCALARF 0 "register_operand" "=f,f")
(if_then_else:SCALARF
diff --git a/gcc/config/mips/predicates.md b/gcc/config/mips/predicates.md
index 87460a64652..4f9458ed72f 100644
--- a/gcc/config/mips/predicates.md
+++ b/gcc/config/mips/predicates.md
@@ -114,6 +114,12 @@
(not (match_test "TARGET_MIPS16")))
(match_operand 0 "register_operand")))
 
+(define_predicate "reg_or_0_operand_mips16e2"
+  (ior (and (match_operand 0 "const_0_operand")
+   (ior (not (match_test "TARGET_MIPS16"))
+ (match_test "ISA_HAS_MIPS16E2")))
+   (match_operand 0 "register_operand")))
+
 (define_predicate "const_1_operand"
   (and (match_code "const_int,const_double,const_vector")
(match_test "op == CONST1_RTX (GET_MODE (op))")))
diff --git a/gcc/testsuite/gcc.target/mips/mips16e2-cmov.c 
b/gcc/testsuite/gcc.target/mips/mips16e2-cmov.c
new file mode 100644
index 000..6e9dd82ebf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/mips16e2-cmov.c
@@ -0,0 +1,68 @@
+/* { dg-options "-mno-abicalls -mgpopt -G8 -mabi=32 -mips16 -mmips16e2" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+
+/* Test MOVN.  */
+
+

[PATCH v4 4/9] MIPS: Add bitwise instructions for mips16e2

2023-06-19 Thread Jie Mei
There are shortened bitwise instructions in the mips16e2 ASE,
for instance, ANDI, ORI/XORI, EXT, INS etc. .

This patch adds these instrutions with corresponding tests.

gcc/ChangeLog:

* config/mips/constraints.md(Yz): New constraints for mips16e2.
* config/mips/mips-protos.h(mips_bit_clear_p): Declared new function.
(mips_bit_clear_info): Same as above.
* config/mips/mips.cc(mips_bit_clear_info): New function for
generating instructions.
(mips_bit_clear_p): Same as above.
* config/mips/mips.h(ISA_HAS_EXT_INS): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md(extended_mips16): Generates EXT and INS 
instructions.
(*and3): Generates INS instruction.
(*and3_mips16): Generates EXT, INS and ANDI instructions.
(ior3): Add logics for ORI instruction.
(*ior3_mips16_asmacro): Generates ORI instrucion.
(*ior3_mips16): Add logics for XORI instruction.
(*xor3_mips16): Generates XORI instrucion.
(*extzv): Add logics for EXT instruction.
(*insv): Add logics for INS instruction.
* config/mips/predicates.md(bit_clear_operand): New predicate for
generating bitwise instructions.
(and_reg_operand): Add logics for generating bitwise instructions.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: New tests for mips16e2.
---
 gcc/config/mips/constraints.md   |   4 +
 gcc/config/mips/mips-protos.h|   4 +
 gcc/config/mips/mips.cc  |  67 ++-
 gcc/config/mips/mips.h   |   3 +-
 gcc/config/mips/mips.md  |  91 
 gcc/config/mips/predicates.md|  13 ++-
 gcc/testsuite/gcc.target/mips/mips16e2.c | 102 +++
 7 files changed, 263 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2.c

diff --git a/gcc/config/mips/constraints.md b/gcc/config/mips/constraints.md
index 49d1a43c613..22d4d84f074 100644
--- a/gcc/config/mips/constraints.md
+++ b/gcc/config/mips/constraints.md
@@ -264,6 +264,10 @@
   (and (match_code "const_vector")
(match_test "op == CONST0_RTX (mode)")))
 
+(define_constraint "Yz"
+  "@internal"
+  (match_operand 0 "bit_clear_operand"))
+
 (define_constraint "YA"
   "@internal
An unsigned 6-bit constant."
diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index 20483469105..2791b9f220a 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -388,4 +388,8 @@ extern void mips_register_frame_header_opt (void);
 extern void mips_expand_vec_cond_expr (machine_mode, machine_mode, rtx *);
 extern void mips_expand_vec_cmp_expr (rtx *);
 
+extern bool mips_bit_clear_p (enum machine_mode, unsigned HOST_WIDE_INT);
+extern void mips_bit_clear_info (enum machine_mode, unsigned HOST_WIDE_INT,
+ int *, int *);
+
 #endif /* ! GCC_MIPS_PROTOS_H */
diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index be470bbb50d..33a1bada831 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -3895,6 +3895,10 @@ mips16_constant_cost (int code, HOST_WIDE_INT x)
return 0;
   return -1;
 
+case ZERO_EXTRACT:
+  /* The bit position and size are immediate operands.  */
+  return ISA_HAS_EXT_INS ? COSTS_N_INSNS (1) : -1;
+
 default:
   return -1;
 }
@@ -22753,7 +22757,68 @@ mips_asm_file_end (void)
   if (NEED_INDICATE_EXEC_STACK)
 file_end_indicate_exec_stack ();
 }
-
+
+void
+mips_bit_clear_info (enum machine_mode mode, unsigned HOST_WIDE_INT m,
+ int *start_pos, int *size)
+{
+  unsigned int shift = 0;
+  unsigned int change_count = 0;
+  unsigned int prev_val = 1;
+  unsigned int curr_val = 0;
+  unsigned int end_pos = GET_MODE_SIZE (mode) * BITS_PER_UNIT;
+
+  for (shift = 0 ; shift < (GET_MODE_SIZE (mode) * BITS_PER_UNIT) ; shift++)
+{
+  curr_val = (unsigned int)((m & (unsigned int)(1 << shift)) >> shift);
+  if (curr_val != prev_val)
+   {
+ change_count++;
+ switch (change_count)
+   {
+ case 1:
+   *start_pos = shift;
+   break;
+ case 2:
+   end_pos = shift;
+   break;
+ default:
+   gcc_unreachable ();
+   }
+   }
+  prev_val = curr_val;
+   }
+  *size = (end_pos - *start_pos);
+}
+
+bool
+mips_bit_clear_p (enum machine_mode mode, unsigned HOST_WIDE_INT m)
+{
+  unsigned int shift = 0;
+  unsigned int change_count = 0;
+  unsigned int prev_val = 1;
+  unsigned int curr_val = 0;
+
+  if (mode != SImode && mode != VOIDmode)
+return false;
+
+  if (!ISA_HAS_EXT_INS)
+return false;
+
+  for (shift = 0 ; shift < (UNITS_PER_WORD * BITS_PER_UNIT) ; shift++)
+{
+  curr_val = (unsigned int)((m & (unsigned int)(1 << shift)) >> shift);
+  if (curr_val != prev_val)
+   chan

[PATCH] Fix build of aarc64

2023-06-19 Thread Richard Biener via Gcc-patches
The following fixes a reference to LOOP_VINFO_MASKS array in the
aarch64 backend after my changes.

Building on aarch64-linux, will push if that succeeds.

Richard.

* config/aarch64/aarch64.cc
(aarch64_vector_costs::analyze_loop_vinfo): Fix reference
to LOOP_VINFO_MASKS.
---
 gcc/config/aarch64/aarch64.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index df37bde6a78..ee37ceaa255 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16256,7 +16256,8 @@ aarch64_vector_costs::analyze_loop_vinfo (loop_vec_info 
loop_vinfo)
   unsigned int num_masks = 0;
   rgroup_controls *rgm;
   unsigned int num_vectors_m1;
-  FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), num_vectors_m1, rgm)
+  FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo).rgc_vec,
+   num_vectors_m1, rgm)
if (rgm->type)
  num_masks += num_vectors_m1 + 1;
   for (auto &ops : m_ops)
-- 
2.35.3


Re: [PATCH 2/2] rust: update usage of TARGET_AIX to TARGET_AIX_OS

2023-06-19 Thread Thomas Schwinge
Hi Paul!

On 2023-06-16T11:00:02-0500, "Paul E. Murphy via Gcc-patches" 
 wrote:
> This was noticed when fixing the gccgo usage of the macro, the
> rust usage is very similar.
>
> TARGET_AIX is defined as a non-zero value on linux/powerpc64le
> which may cause unexpected behavior.  TARGET_AIX_OS should be
> used to toggle AIX specific behavior.
>
> gcc/rust/ChangeLog:
>
>   * rust-object-export.cc [TARGET_AIX]: Rename and update
>   usage to TARGET_AIX_OS.

I don't have rights to formally approve this GCC/Rust change, but I'll
note that it follows "as obvious" (see
, "Obvious fixes") to the
corresponding GCC/Go change, which has been approved:
,
and which is where this GCC/Rust code has been copied from, so I suggest
you push both patches at once.


Grüße
 Thomas


> ---
>  gcc/rust/rust-object-export.cc | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/rust/rust-object-export.cc b/gcc/rust/rust-object-export.cc
> index 1143c767784..f9a395f6964 100644
> --- a/gcc/rust/rust-object-export.cc
> +++ b/gcc/rust/rust-object-export.cc
> @@ -46,8 +46,8 @@
>  #define RUST_EXPORT_SECTION_NAME ".rust_export"
>  #endif
>
> -#ifndef TARGET_AIX
> -#define TARGET_AIX 0
> +#ifndef TARGET_AIX_OS
> +#define TARGET_AIX_OS 0
>  #endif
>
>  /* Return whether or not GCC has reported any errors.  */
> @@ -91,7 +91,7 @@ rust_write_export_data (const char *bytes, unsigned int 
> size)
>  {
>gcc_assert (targetm_common.have_named_sections);
>sec = get_section (RUST_EXPORT_SECTION_NAME,
> -  TARGET_AIX ? SECTION_EXCLUDE : SECTION_DEBUG, NULL);
> +  TARGET_AIX_OS ? SECTION_EXCLUDE : SECTION_DEBUG, NULL);
>  }
>
>switch_to_section (sec);
> --
> 2.31.1
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


RE: [PATCH v1] RISC-V: Fix out of range memory access when lto mode init

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, 19 Jun 2023, Li, Pan2 wrote:

> Add Richard Biener for reviewing, sorry for inconvenient.
> 
> Pan
> 
> -Original Message-
> From: Li, Pan2  
> Sent: Monday, June 19, 2023 4:07 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; rdapp@gmail.com; jeffreya...@gmail.com; Li, 
> Pan2 ; Wang, Yanzhang ; 
> kito.ch...@gmail.com
> Subject: [PATCH v1] RISC-V: Fix out of range memory access when lto mode init
> 
> From: Pan Li 
> 
> We extend the machine mode from 8 to 16 bits already. But there still
> one placing missing from the tree-streamer. It has one hard coded array
> for the machine code like size 256.
> 
> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
> value of the MAX_MACHINE_MODE will grow as more and more modes are added.
> While the machine mode array in tree-streamer still leave 256 as is.
> 
> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
> lto_output_init_mode_table will touch the memory out of range unexpected.
> 
> This patch would like to take the MAX_MACHINE_MODE as the size of the
> array in tree-streamer, to make sure there is no potential unexpected
> memory access in future.

You also have to fix bp_pack_machine_mode/bp_unpack_machine_mode which
streams exactly values in [0, 1<<8 - 1].

CCing Jakub who invented this code.

Richard.


> Signed-off-by: Pan Li 
> 
> gcc/ChangeLog:
> 
>   * tree-streamer.cc (streamer_mode_table): Use MAX_MACHINE_MODE
>   as array size.
>   * tree-streamer.h (streamer_mode_table): Ditto.
> ---
>  gcc/tree-streamer.cc | 2 +-
>  gcc/tree-streamer.h  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/tree-streamer.cc b/gcc/tree-streamer.cc
> index ed65a7692e3..a28ef9c7920 100644
> --- a/gcc/tree-streamer.cc
> +++ b/gcc/tree-streamer.cc
> @@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
> During streaming in, we translate the on the disk mode using this
> table.  For normal LTO it is set to identity, for ACCEL_COMPILER
> depending on the mode_table content.  */
> -unsigned char streamer_mode_table[1 << 8];
> +unsigned char streamer_mode_table[MAX_MACHINE_MODE];
>  
>  /* Check that all the TS_* structures handled by the streamer_write_* and
> streamer_read_* routines are exactly ALL the structures defined in
> diff --git a/gcc/tree-streamer.h b/gcc/tree-streamer.h
> index 170d61cf20b..51a292c8d80 100644
> --- a/gcc/tree-streamer.h
> +++ b/gcc/tree-streamer.h
> @@ -75,7 +75,7 @@ void streamer_write_tree_body (struct output_block *, tree);
>  void streamer_write_integer_cst (struct output_block *, tree);
>  
>  /* In tree-streamer.cc.  */
> -extern unsigned char streamer_mode_table[1 << 8];
> +extern unsigned char streamer_mode_table[MAX_MACHINE_MODE];
>  void streamer_check_handled_ts_structures (void);
>  bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
>hashval_t, unsigned *);
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: Optimize std::max early

2023-06-19 Thread Richard Biener via Gcc-patches
On Sun, Jun 18, 2023 at 5:55 PM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> we currently produce very bad code on loops using std::vector as a stack, 
> since
> we fail to inline push_back which in turn prevents SRA and we fail to optimize
> out some store-to-load pairs (PR109849).
>
> I looked into why this function is not inlined and it is inlined by clang.  We
> currently estimate it to 66 instructions and inline limits are 15 at -O2 and 
> 30
> at -O3.  Clang has similar estimate, but still decides to inline at -O2.
>
> I looked into reason why the body is so large and one problem I spotted is the
> way std::max is implemented by taking and returning reference to the values.
>
>   const T& max( const T& a, const T& b );
>
> This makes it necessary to store the values to memory and load them later
> (Max is used by code computing new size of vector on resize.)
> Two stores, conditional and load accounts as 8 instructions, while
> MAX_EXPR as 1 and has a lot better chance to fold with the surrounding
> code.
>
> We optimize this to MAX_EXPR, but only during late optimizations.  I think 
> this
> is a common enough coding pattern and we ought to make this transparent to
> early opts and IPA.  The following is easist fix that simply adds phiprop pass
> that turns the PHI of address values into PHI of values so later FRE can
> propagate values across memory, phiopt discover the MAX_EXPR pattern and DSE
> remove the memory stores.
>
> Bootstrapped/regtested x86_64-linux, does this look resonable thing to do?

OK.

Thanks,
Richard.

> Looking into how expensive the pass is, I think it is very cheap, except
> that it computes postdominator and updates ssa even if no patterns
> are matched.  I will send patch to avoid that.
>
> gcc/ChangeLog:
>
> PR tree-optimization/109811
> PR tree-optimization/109849
> * passes.def: Add phiprop to early optimization passes.
> * tree-ssa-phiprop.cc: Allow clonning.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/109811
> PR tree-optimization/109849
> * gcc.dg/tree-ssa/phiprop-1.c: New test.
>
> diff --git a/gcc/passes.def b/gcc/passes.def
> index c9a8f19747b..faa5208b26b 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -88,6 +88,8 @@ along with GCC; see the file COPYING3.  If not see
>   /* pass_build_ealias is a dummy pass that ensures that we
>  execute TODO_rebuild_alias at this point.  */
>   NEXT_PASS (pass_build_ealias);
> + /* Do phiprop before FRE so we optimize std::min and std::max well. 
>  */
> + NEXT_PASS (pass_phiprop);
>   NEXT_PASS (pass_fre, true /* may_iterate */);
>   NEXT_PASS (pass_early_vrp);
>   NEXT_PASS (pass_merge_phi);
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c
> new file mode 100644
> index 000..9f52c2a7298
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-phiprop1-details -fdump-tree-release_ssa" } 
> */
> +int max(int a, int b)
> +{
> +int *ptr;
> +if (a > b)
> +  ptr = &a;
> +else
> +  ptr = &b;
> +return *ptr;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 
> "phiprop1"} } */
> +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "release_ssa"} } */
> diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
> index 3cb4900b6be..5dc505df420 100644
> --- a/gcc/tree-ssa-phiprop.cc
> +++ b/gcc/tree-ssa-phiprop.cc
> @@ -476,6 +476,7 @@ public:
>{}
>
>/* opt_pass methods: */
> +  opt_pass * clone () final override { return new pass_phiprop (m_ctxt); }
>bool gate (function *) final override { return flag_tree_phiprop; }
>unsigned int execute (function *) final override;
>


[committed] vect: Restore aarch64 bootstrap

2023-06-19 Thread Richard Sandiford via Gcc-patches
Spot-tested on aarch64-linux-gnu, pushed as obvious.

Richard


gcc/
* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors):
Handle null niters_skip.
---
 gcc/tree-vect-loop-manip.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 213d248b485..20f570e4a0d 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -820,7 +820,8 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   tree ni_actual_type = TREE_TYPE (niters);
   unsigned int ni_actual_precision = TYPE_PRECISION (ni_actual_type);
   tree niters_skip = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo);
-  niters_skip = gimple_convert (&preheader_seq, compare_type, niters_skip);
+  if (niters_skip)
+niters_skip = gimple_convert (&preheader_seq, compare_type, niters_skip);
 
   /* Convert NITERS to the same size as the compare.  */
   if (compare_precision > ni_actual_precision
-- 
2.25.1



Re: [PATCH v2] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-19 Thread Hongtao Liu via Gcc-patches
On Mon, Jun 19, 2023 at 3:09 PM Jan Beulich via Gcc-patches
 wrote:
>
> On 19.06.2023 04:07, Liu, Hongtao wrote:
> >> -Original Message-
> >> From: Jan Beulich 
> >> Sent: Friday, June 16, 2023 2:22 PM
> >>
> >> --- a/gcc/config/i386/sse.md
> >> +++ b/gcc/config/i386/sse.md
> >> @@ -12597,11 +12597,11 @@
> >> (set_attr "mode" "")])
> >>
> >>  (define_insn "*_vternlog_all"
> >> -  [(set (match_operand:V 0 "register_operand" "=v")
> >> +  [(set (match_operand:V 0 "register_operand" "=v,v")
> >>  (unspec:V
> >> -  [(match_operand:V 1 "register_operand" "0")
> >> -   (match_operand:V 2 "register_operand" "v")
> >> -   (match_operand:V 3 "bcst_vector_operand" "vmBr")
> >> +  [(match_operand:V 1 "register_operand" "0,0")
> >> +   (match_operand:V 2 "register_operand" "v,v")
> >> +   (match_operand:V 3 "bcst_vector_operand" "vBr,m")
> >> (match_operand:SI 4 "const_0_to_255_operand")]
> >>UNSPEC_VTERNLOG))]
> >>"TARGET_AVX512F
> > Change condition to  == 64 || TARGET_AVX512VL || (TARGET_AVX512F 
> > && !TARGET_PREFER_AVX256)
>
> May I ask why you think this is necessary? The condition of the insn
> already wasn't in sync with the condition used in all three splitters,
I think it's a latent bug for the original
*_vternlog_all, it should be  == 64 ||
TARGET_AVX512VL instead of TARGET_AVX512F, since TARGET_AVX512VL is
needed for 128/256-bit vectors. The bug won't be exposed since the
pattern is only generated by those 3 splitters which are guarded by
TARGET_AVX512VL. But We can just fix this to make the pattern exactly
correct.
> and I didn't see any reason why now they would need to be brought in
> sync. First and foremost because of the use of the UNSPEC (equally
> before and after this patch).
>
> Furthermore, isn't it the case that I'm already mostly expressing
> this with the "enabled" attribute? At the very least I think I
> should drop that again then if following your request?
You only handle alternative 1, but for alternative 0, it is still
enabled when TARGET_PREFER_AVX256 && !TARGET_AVX512VL for 128/256-bit
vectors. You don't need the drop that, alternative 1 still needs
 == 64 || TARGET_AVX512VL since memory_operand can't be
operated with the zmm instruction for 128/256-bit vectors.
>
> > Also please add a testcase for case TARGET_AVX512F && !TARGET_PREFER_AVX256.
>
> Especially in a case like this one I'm wondering about the usefulness
> of a contrived testcase: It won't test more than one minor sub-case of
> the whole set of constructs covered here. But well, here as well as
> for the other change I'll invent something.
We don't need all sub-case, one is enough to guard that your
optimization won't be corrupted by a later commit.
.i.e
typedef int v4si __attribute((vector_size(16)));

v4si
foo (v4si a, v4si b, v4si c)
{
return (a & b) | c;
}

We can now generate vpternlog with -mavx512f -O2 -mno-avx512vl
-mprefer-vector-width=512.

>
> Jan



-- 
BR,
Hongtao


[committed] Doc update: -foffload-options= examples + OpenMP in Fortran intrinsic modules

2023-06-19 Thread Tobias Burnus

Since r14-1807-g4bcb46b3ade179, using -foffload-options='-lgfortran -lm' are
no longer required as they get automatically linked on the offload side, if
linked on the host side. (Linking with g++ implies -lm, with gfortran
'-lgfortran -lm', while an explicit 'gcc -lm -lgfortran' would also do the
autolinking.) [Note: This only affects those two libraries.]

As there the number of useful flags reduced, new ones were added, including
-foffload-options=-O3. While possible, I believe that flag is misleading as
it implies that no optimization is done unless an -O... flag has been passed
to the run-time library.

But like now for -lm and -lgfortran, also the -O... flags are automatically
passed on from the host flags. (While they can be overridden, this is usually
not required.) — As the previous line already shows an example, i.e.
  "-foffload-options='-fno-math-errno -ffinite-math-only'",
I think we can get rid of this line without needing to find another example.

* * *

Looking at Fortran's fortran/intrinsic.texi I saw references to 4.5 and 5.0;
I have now added 5.1 and 5.2 – hopefully, we can at replace them soon
(GCC 15?) the v4.5 to 5.2 by just 5.2, having implemented all of 5.x.

(The reference to v4.5 in GCC's and gfortran's invoke.texi feels also odd
as v5.* is supported and the attribute syntax shown for C/C++ is only in
v5.x. To be changed - but not now.)


Additionally, fortran/intrinsic.texi listed the content of the OMP_* modules
(except for the API routines) but missed two recently added named constants,
which I now added.

Committed as r14-1936-ge9c1679c350be0.

Like always, comments are highly welcome!

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit e9c1679c350be09cec5354a3d98915c3afe02c87
Author: Tobias Burnus 
Date:   Mon Jun 19 10:24:08 2023 +0200

Doc update: -foffload-options= examples + OpenMP in Fortran intrinsic modules

With LTO, the -O.. flags of the host are passed on to the lto compiler, which
also includes offloading compilers. Therefore, using --foffload-options=-O3 is
misleading as it implies that without the default optimizations are used. Hence,
this flags has now been removed from the usage examples.

The Fortran documentation lists the content (except for API routines) routines
of the intrinsic OpenMP modules OMP_LIB and OMP_LIB_KINDS; this commit adds
two missing named constants and links also to the OpenMP 5.1 and 5.2
OpenMP spec for completeness.

gcc/ChangeLog:

* doc/invoke.texi (-foffload-options): Remove '-O3' from the examples.

gcc/fortran/ChangeLog:

* intrinsic.texi (OpenMP Modules OMP_LIB and OMP_LIB_KINDS): Also
add references to the OpenMP 5.1 and 5.2 spec; add omp_initial_device
and omp_invalid_device named constants.
---
 gcc/doc/invoke.texi|  2 +-
 gcc/fortran/intrinsic.texi | 20 
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fafdee30f66..215ab0dd05c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2718,7 +2718,7 @@ Typical command lines are
 
 @smallexample
 -foffload-options='-fno-math-errno -ffinite-math-only' -foffload-options=nvptx-none=-latomic
--foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-O3
+-foffload-options=amdgcn-amdhsa=-march=gfx906
 @end smallexample
 
 @opindex fopenacc
diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index db227ea..6c7ad03a02c 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -15247,8 +15247,9 @@ with the following options: @code{-fno-unsafe-math-optimizations
 @table @asis
 @item @emph{Standard}:
 OpenMP Application Program Interface v4.5,
-OpenMP Application Program Interface v5.0 (partially supported) and
-OpenMP Application Program Interface v5.1 (partially supported).
+OpenMP Application Program Interface v5.0 (partially supported),
+OpenMP Application Program Interface v5.1 (partially supported) and
+OpenMP Application Program Interface v5.2 (partially supported).
 @end table
 
 The OpenMP Fortran runtime library routines are provided both in
@@ -15262,9 +15263,13 @@ below.
 
 For details refer to the actual
 @uref{https://www.openmp.org/wp-content/uploads/openmp-4.5.pdf,
-OpenMP Application Program Interface v4.5} and
+OpenMP Application Program Interface v4.5},
 @uref{https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf,
-OpenMP Application Program Interface v5.0}.
+OpenMP Application Program Interface v5.0},
+@uref{https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-1.pdf,
+OpenMP Application Program Interface v5.1} and
+@uref{https://www.openmp.org/wp-content/uplo

Re: Do not account __builtin_unreachable guards in inliner

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> this was suggested earlier somewhere, but I can not find the thread.
> C++ has assume attribute that expands int
>   if (conditional)
> __builtin_unreachable ()
> We do not want to account the conditional in inline heuristics since
> we know that it is going to be optimized out.
>
> Bootstrapped/regtested x86_64-linux, will commit it later today if
> thre are no complains.

I think we also had the request to not account the condition feeding
stmts (if they only feed it and have no side-effects).  libstdc++ has
complex range comparisons here.  Also ...

> gcc/ChangeLog:
>
> * ipa-fnsummary.cc (builtin_unreachable_bb_p): New function.
> (analyze_function_body): Do not account conditionals guarding
> builtin_unreachable calls.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/ipa/fnsummary-1.c: New test.
>
> diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
> index a5f5a50c8a5..987da29ec34 100644
> --- a/gcc/ipa-fnsummary.cc
> +++ b/gcc/ipa-fnsummary.cc
> @@ -2649,6 +2649,54 @@ points_to_possible_sra_candidate_p (tree t)
>return false;
>  }
>
> +/* Return true if BB is builtin_unreachable.
> +   We skip empty basic blocks, debug statements, clobbers and predicts.
> +   CACHE is used to memoize already analyzed blocks.  */
> +
> +static bool
> +builtin_unreachable_bb_p (basic_block bb, vec &cache)
> +{
> +  if (cache[bb->index])
> +return cache[bb->index] - 1;
> +  gimple_stmt_iterator si;
> +  auto_vec  visited_bbs;
> +  bool ret = false;
> +  while (true)
> +{
> +  bool empty_bb = true;
> +  visited_bbs.safe_push (bb);
> +  cache[bb->index] = 3;
> +  for (si = gsi_start_nondebug_bb (bb);
> +  !gsi_end_p (si) && empty_bb;
> +  gsi_next_nondebug (&si))
> +   {
> + if (gimple_code (gsi_stmt (si)) != GIMPLE_PREDICT
> + && !gimple_clobber_p (gsi_stmt (si))
> + && !gimple_nop_p (gsi_stmt (si)))
> +   {
> + empty_bb = false;
> + break;
> +   }
> +   }
> +  if (!empty_bb)
> +   break;
> +  else
> +   bb = single_succ_edge (bb)->dest;
> +  if (cache[bb->index])
> +   {
> + ret = cache[bb->index] == 3 ? false : cache[bb->index] - 1;
> + goto done;
> +   }
> +}
> +  if (gimple_call_builtin_p (gsi_stmt (si), BUILT_IN_UNREACHABLE)
> +  || gimple_call_builtin_p (gsi_stmt (si), BUILT_IN_UNREACHABLE_TRAP))

... we do code generate BUILT_IN_UNREACHABLE_TRAP, no?

> +ret = true;
> +done:
> +  for (basic_block vbb:visited_bbs)
> +cache[vbb->index] = (unsigned char)ret + 1;
> +  return ret;
> +}
> +
>  /* Analyze function body for NODE.
> EARLY indicates run from early optimization pipeline.  */
>
> @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, bool 
> early)
>const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
>order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
>nblocks = pre_and_rev_post_order_compute (NULL, order, false);
> +  auto_vec cache;
> +  cache.safe_grow_cleared (last_basic_block_for_fn (cfun));

A sbitmap with two bits per entry would be more space efficient here.  bitmap
has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience,
adding the corresponding to sbitmap.h would likely ease use there as well.

>for (n = 0; n < nblocks; n++)
>  {
>bb = BASIC_BLOCK_FOR_FN (cfun, order[n]);
> @@ -2901,6 +2951,24 @@ analyze_function_body (struct cgraph_node *node, bool 
> early)
> }
> }
>
> + /* Conditionals guarding __builtin_unreachable will be
> +optimized out.  */
> + if (gimple_code (stmt) == GIMPLE_COND)
> +   {
> + edge_iterator ei;
> + edge e;
> + FOR_EACH_EDGE (e, ei, bb->succs)
> +   if (builtin_unreachable_bb_p (e->dest, cache))
> + {
> +   if (dump_file)
> + fprintf (dump_file,
> +  "\t\tConditional guarding 
> __builtin_unreachable; ignored\n");
> +   this_time = 0;
> +   this_size = 0;
> +   break;
> + }
> +   }
> +
>   /* TODO: When conditional jump or switch is known to be constant, 
> but
>  we did not translate it into the predicates, we really can 
> account
>  just maximum of the possible paths.  */
> diff --git a/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c 
> b/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c
> new file mode 100644
> index 000..a0ece0c300b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-options "-O2 -fdump-ipa-fnsummary"  } */
> +int
> +test(int a)
> +{
> +   if (a > 10)
> +   __builtin_unreachable ();
> +}
> +/* { dg-final { scan-ipa-dump "Conditional guarding

[PATCH] RISC-V: Fix out of range memory access of machine mode table

2023-06-19 Thread Pan Li via Gcc-patches
From: Pan Li 

We extend the machine mode from 8 to 16 bits already. But there still
one placing missing from the tree-streamer. It has one hard coded array
for the machine code like size 256.

In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
value of the MAX_MACHINE_MODE will grow as more and more modes are added.
While the machine mode array in tree-streamer still leave 256 as is.

Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
lto_output_init_mode_table will touch the memory out of range unexpected.

This patch would like to take the MAX_MACHINE_MODE as the size of the
array in tree-streamer, to make sure there is no potential unexpected
memory access in future.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* lto-streamer-in.cc (lto_input_mode_table): Use
MAX_MACHINE_MODE for memory allocation.
* tree-streamer.cc: Use MAX_MACHINE_MODE for array size.
* tree-streamer.h (streamer_mode_table): Ditto.
(bp_pack_machine_mode): Ditto.
(bp_unpack_machine_mode): Ditto.
---
 gcc/lto-streamer-in.cc | 3 ++-
 gcc/tree-streamer.cc   | 2 +-
 gcc/tree-streamer.h| 7 ---
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 2cb83406db5..102b7e18526 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1985,7 +1985,8 @@ lto_input_mode_table (struct lto_file_decl_data 
*file_data)
 internal_error ("cannot read LTO mode table from %s",
file_data->file_name);
 
-  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
+  unsigned char *table = ggc_cleared_vec_alloc (
+MAX_MACHINE_MODE);
   file_data->mode_table = table;
   const struct lto_simple_header_with_strings *header
 = (const struct lto_simple_header_with_strings *) data;
diff --git a/gcc/tree-streamer.cc b/gcc/tree-streamer.cc
index ed65a7692e3..a28ef9c7920 100644
--- a/gcc/tree-streamer.cc
+++ b/gcc/tree-streamer.cc
@@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
During streaming in, we translate the on the disk mode using this
table.  For normal LTO it is set to identity, for ACCEL_COMPILER
depending on the mode_table content.  */
-unsigned char streamer_mode_table[1 << 8];
+unsigned char streamer_mode_table[MAX_MACHINE_MODE];
 
 /* Check that all the TS_* structures handled by the streamer_write_* and
streamer_read_* routines are exactly ALL the structures defined in
diff --git a/gcc/tree-streamer.h b/gcc/tree-streamer.h
index 170d61cf20b..be3a1938e76 100644
--- a/gcc/tree-streamer.h
+++ b/gcc/tree-streamer.h
@@ -75,7 +75,7 @@ void streamer_write_tree_body (struct output_block *, tree);
 void streamer_write_integer_cst (struct output_block *, tree);
 
 /* In tree-streamer.cc.  */
-extern unsigned char streamer_mode_table[1 << 8];
+extern unsigned char streamer_mode_table[MAX_MACHINE_MODE];
 void streamer_check_handled_ts_structures (void);
 bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
 hashval_t, unsigned *);
@@ -108,7 +108,7 @@ inline void
 bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
 {
   streamer_mode_table[mode] = 1;
-  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
+  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, mode);
 }
 
 inline machine_mode
@@ -116,7 +116,8 @@ bp_unpack_machine_mode (struct bitpack_d *bp)
 {
   return (machine_mode)
   ((class lto_input_block *)
-   bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
+   bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode,
+   MAX_MACHINE_MODE)];
 }
 
 #endif  /* GCC_TREE_STREAMER_H  */
-- 
2.34.1



[PATCH] tree-optimization/110298 - CFG cleanup and stale nb_iterations

2023-06-19 Thread Richard Biener via Gcc-patches
When unrolling we eventually kill nb_iterations info since it may
refer to removed SSA names.  But we do this only after cleaning
up the CFG which in turn can end up accessing it.  Fixed by
swapping the two.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110298
* tree-ssa-loop-ivcanon.cc (tree_unroll_loops_completely):
Clear number of iterations info before cleaning up the CFG.

* gcc.dg/torture/pr110298.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr110298.c | 20 
 gcc/tree-ssa-loop-ivcanon.cc|  7 ---
 2 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110298.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110298.c 
b/gcc/testsuite/gcc.dg/torture/pr110298.c
new file mode 100644
index 000..139f5c77d89
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110298.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+int a, b, c, d, e;
+int f() {
+  c = 0;
+  for (; c >= 0; c--) {
+d = 0;
+for (; d <= 0; d++) {
+  e = 0;
+  for (; d + c + e >= 0; e--)
+;
+  a = 1;
+  b = 0;
+  for (; a; ++b)
+a *= 2;
+  for (; b + d >= 0;)
+return 0;
+}
+  }
+}
diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
index 6a962a9f503..491b57ec0f1 100644
--- a/gcc/tree-ssa-loop-ivcanon.cc
+++ b/gcc/tree-ssa-loop-ivcanon.cc
@@ -1520,15 +1520,16 @@ tree_unroll_loops_completely (bool may_increase_size, 
bool unroll_outer)
}
  BITMAP_FREE (fathers);
 
+ /* Clean up the information about numbers of iterations, since
+complete unrolling might have invalidated it.  */
+ scev_reset ();
+
  /* This will take care of removing completely unrolled loops
 from the loop structures so we can continue unrolling now
 innermost loops.  */
  if (cleanup_tree_cfg ())
update_ssa (TODO_update_ssa_only_virtuals);
 
- /* Clean up the information about numbers of iterations, since
-complete unrolling might have invalidated it.  */
- scev_reset ();
  if (flag_checking && loops_state_satisfies_p (LOOP_CLOSED_SSA))
verify_loop_closed_ssa (true);
}
-- 
2.35.3


RE: [PATCH v1] RISC-V: Fix out of range memory access when lto mode init

2023-06-19 Thread Li, Pan2 via Gcc-patches
Thanks Richard for the review, just go thru the word (1 << 8) and found another 
one besides bp. Update the PATCH v2 as below.

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622151.html

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, June 19, 2023 4:41 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; rdapp@gmail.com; 
jeffreya...@gmail.com; Wang, Yanzhang ; 
kito.ch...@gmail.com; Jakub Jelinek 
Subject: RE: [PATCH v1] RISC-V: Fix out of range memory access when lto mode 
init

On Mon, 19 Jun 2023, Li, Pan2 wrote:

> Add Richard Biener for reviewing, sorry for inconvenient.
> 
> Pan
> 
> -Original Message-
> From: Li, Pan2  
> Sent: Monday, June 19, 2023 4:07 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; rdapp@gmail.com; jeffreya...@gmail.com; Li, 
> Pan2 ; Wang, Yanzhang ; 
> kito.ch...@gmail.com
> Subject: [PATCH v1] RISC-V: Fix out of range memory access when lto mode init
> 
> From: Pan Li 
> 
> We extend the machine mode from 8 to 16 bits already. But there still
> one placing missing from the tree-streamer. It has one hard coded array
> for the machine code like size 256.
> 
> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
> value of the MAX_MACHINE_MODE will grow as more and more modes are added.
> While the machine mode array in tree-streamer still leave 256 as is.
> 
> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
> lto_output_init_mode_table will touch the memory out of range unexpected.
> 
> This patch would like to take the MAX_MACHINE_MODE as the size of the
> array in tree-streamer, to make sure there is no potential unexpected
> memory access in future.

You also have to fix bp_pack_machine_mode/bp_unpack_machine_mode which
streams exactly values in [0, 1<<8 - 1].

CCing Jakub who invented this code.

Richard.


> Signed-off-by: Pan Li 
> 
> gcc/ChangeLog:
> 
>   * tree-streamer.cc (streamer_mode_table): Use MAX_MACHINE_MODE
>   as array size.
>   * tree-streamer.h (streamer_mode_table): Ditto.
> ---
>  gcc/tree-streamer.cc | 2 +-
>  gcc/tree-streamer.h  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/tree-streamer.cc b/gcc/tree-streamer.cc
> index ed65a7692e3..a28ef9c7920 100644
> --- a/gcc/tree-streamer.cc
> +++ b/gcc/tree-streamer.cc
> @@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
> During streaming in, we translate the on the disk mode using this
> table.  For normal LTO it is set to identity, for ACCEL_COMPILER
> depending on the mode_table content.  */
> -unsigned char streamer_mode_table[1 << 8];
> +unsigned char streamer_mode_table[MAX_MACHINE_MODE];
>  
>  /* Check that all the TS_* structures handled by the streamer_write_* and
> streamer_read_* routines are exactly ALL the structures defined in
> diff --git a/gcc/tree-streamer.h b/gcc/tree-streamer.h
> index 170d61cf20b..51a292c8d80 100644
> --- a/gcc/tree-streamer.h
> +++ b/gcc/tree-streamer.h
> @@ -75,7 +75,7 @@ void streamer_write_tree_body (struct output_block *, tree);
>  void streamer_write_integer_cst (struct output_block *, tree);
>  
>  /* In tree-streamer.cc.  */
> -extern unsigned char streamer_mode_table[1 << 8];
> +extern unsigned char streamer_mode_table[MAX_MACHINE_MODE];
>  void streamer_check_handled_ts_structures (void);
>  bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
>hashval_t, unsigned *);
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH v1] RISC-V: Fix out of range memory access when lto mode init

2023-06-19 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 19, 2023 at 08:40:58AM +, Richard Biener wrote:
> You also have to fix bp_pack_machine_mode/bp_unpack_machine_mode which
> streams exactly values in [0, 1<<8 - 1].
> 
> CCing Jakub who invented this code.

For stream-out, all it stores is a bool flag whether the mode is streamed
out, and on stream in it contains a mapping table between host and
offloading modes.
For stream in, it actually isn't used despite the comment maybe suggesting
it is, so I guess using MAX_MACHINE_MODE for it is ok.

As you said,
inline void
bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
{
  streamer_mode_table[mode] = 1;
  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
}

inline machine_mode
bp_unpack_machine_mode (struct bitpack_d *bp)
{
  return (machine_mode)
   ((class lto_input_block *)
bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
}
needs changing for the case when MAX_MACHINE_MODE > 256, but far more
places make similar assumptions:
E.g. lto_write_mode_table has
  bp_pack_value (&bp, m, 8);
(if MAX_MACHINE_MODE > 256, this can't encode all modes),
  bp_pack_value (&bp, GET_MODE_INNER (m), 8);
(ditto).
lto_input_mode_table has e.g.
  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
  file_data->mode_table = table;
Here we need to decide if we keep requiring that offloading architectures
still have MAX_MACHINE_MODE <= 256 or not.  Currently the offloading
arches are nvptx and amdgcn, intelmic support has been removed.
If yes, table can have unsigned char elements, but its size actually depends
on the number of modes on the host side, so lto_write_mode_table would need
to stream out the host MAX_MACHINE_MODE value and lto_input_mode_table
stream it in and use instead of the 1 << 8 size above.
If not, mode_table and unsigned char * would need to change to unsigned
short *, or just conditionally depending on if MAX_MACHINE_MODE <= 256 or
not.
Then
  while ((m = bp_unpack_value (&bp, 8)) != VOIDmode)
{
...
  machine_mode inner = (machine_mode) bp_unpack_value (&bp, 8);
again hardcode 8 bits, that needs to match how many bits packs the host
compiler in lto_write_mode_table.

Jakub



Re: [PATCH] RISC-V: Fix out of range memory access of machine mode table

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, 19 Jun 2023, pan2...@intel.com wrote:

> From: Pan Li 
> 
> We extend the machine mode from 8 to 16 bits already. But there still
> one placing missing from the tree-streamer. It has one hard coded array
> for the machine code like size 256.
> 
> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
> value of the MAX_MACHINE_MODE will grow as more and more modes are added.
> While the machine mode array in tree-streamer still leave 256 as is.
> 
> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
> lto_output_init_mode_table will touch the memory out of range unexpected.
> 
> This patch would like to take the MAX_MACHINE_MODE as the size of the
> array in tree-streamer, to make sure there is no potential unexpected
> memory access in future.

Please review more careful:

void
lto_input_mode_table (struct lto_file_decl_data *file_data)
{
...
  while ((m = bp_unpack_value (&bp, 8)) != VOIDmode)

reads 8 bits again.

  ibit = bp_unpack_value (&bp, 8);
  fbit = bp_unpack_value (&bp, 8);

likewise.

Also file_data->mode_table is indexed by the _host_ mode, so you
have to allocate enough space to fill in all streamed modes but
you are using the targets MAX_MACHINE_MODE here.  I think we
need to stream the hosts MAX_MACHINE_MODE.

Richard.


> Signed-off-by: Pan Li 
> 
> gcc/ChangeLog:
> 
>   * lto-streamer-in.cc (lto_input_mode_table): Use
>   MAX_MACHINE_MODE for memory allocation.
>   * tree-streamer.cc: Use MAX_MACHINE_MODE for array size.
>   * tree-streamer.h (streamer_mode_table): Ditto.
>   (bp_pack_machine_mode): Ditto.
>   (bp_unpack_machine_mode): Ditto.
> ---
>  gcc/lto-streamer-in.cc | 3 ++-
>  gcc/tree-streamer.cc   | 2 +-
>  gcc/tree-streamer.h| 7 ---
>  3 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
> index 2cb83406db5..102b7e18526 100644
> --- a/gcc/lto-streamer-in.cc
> +++ b/gcc/lto-streamer-in.cc
> @@ -1985,7 +1985,8 @@ lto_input_mode_table (struct lto_file_decl_data 
> *file_data)
>  internal_error ("cannot read LTO mode table from %s",
>   file_data->file_name);
>  
> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
> +  unsigned char *table = ggc_cleared_vec_alloc (
> +MAX_MACHINE_MODE);
>file_data->mode_table = table;
>const struct lto_simple_header_with_strings *header
>  = (const struct lto_simple_header_with_strings *) data;
> diff --git a/gcc/tree-streamer.cc b/gcc/tree-streamer.cc
> index ed65a7692e3..a28ef9c7920 100644
> --- a/gcc/tree-streamer.cc
> +++ b/gcc/tree-streamer.cc
> @@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
> During streaming in, we translate the on the disk mode using this
> table.  For normal LTO it is set to identity, for ACCEL_COMPILER
> depending on the mode_table content.  */
> -unsigned char streamer_mode_table[1 << 8];
> +unsigned char streamer_mode_table[MAX_MACHINE_MODE];
>  
>  /* Check that all the TS_* structures handled by the streamer_write_* and
> streamer_read_* routines are exactly ALL the structures defined in
> diff --git a/gcc/tree-streamer.h b/gcc/tree-streamer.h
> index 170d61cf20b..be3a1938e76 100644
> --- a/gcc/tree-streamer.h
> +++ b/gcc/tree-streamer.h
> @@ -75,7 +75,7 @@ void streamer_write_tree_body (struct output_block *, tree);
>  void streamer_write_integer_cst (struct output_block *, tree);
>  
>  /* In tree-streamer.cc.  */
> -extern unsigned char streamer_mode_table[1 << 8];
> +extern unsigned char streamer_mode_table[MAX_MACHINE_MODE];
>  void streamer_check_handled_ts_structures (void);
>  bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
>hashval_t, unsigned *);
> @@ -108,7 +108,7 @@ inline void
>  bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
>  {
>streamer_mode_table[mode] = 1;
> -  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
> +  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, mode);
>  }
>  
>  inline machine_mode
> @@ -116,7 +116,8 @@ bp_unpack_machine_mode (struct bitpack_d *bp)
>  {
>return (machine_mode)
>  ((class lto_input_block *)
> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
> + bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode,
> + MAX_MACHINE_MODE)];
>  }
>  
>  #endif  /* GCC_TREE_STREAMER_H  */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] RISC-V: Fix out of range memory access of machine mode table

2023-06-19 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 19, 2023 at 05:05:48PM +0800, pan2...@intel.com wrote:
> --- a/gcc/lto-streamer-in.cc
> +++ b/gcc/lto-streamer-in.cc
> @@ -1985,7 +1985,8 @@ lto_input_mode_table (struct lto_file_decl_data 
> *file_data)
>  internal_error ("cannot read LTO mode table from %s",
>   file_data->file_name);
>  
> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
> +  unsigned char *table = ggc_cleared_vec_alloc (
> +MAX_MACHINE_MODE);

Incorrect formatting.  And, see my other mail, this is wrong.

> @@ -108,7 +108,7 @@ inline void
>  bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
>  {
>streamer_mode_table[mode] = 1;
> -  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
> +  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, mode);
>  }
>  
>  inline machine_mode
> @@ -116,7 +116,8 @@ bp_unpack_machine_mode (struct bitpack_d *bp)
>  {
>return (machine_mode)
>  ((class lto_input_block *)
> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
> + bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode,
> + MAX_MACHINE_MODE)];
>  }

And these two are wrong as well.  The value passed to bp_pack_enum
has to match the one used on bp_unpack_enum.  But that is not the case
after your changes.  You stream out with the host MAX_MACHINE_MODE, and
stream in for normal LTO with the same value (ok), but for offloading
targets (nvptx, amdgcn) with a different MAX_MACHINE_MODE.  That will
immediate result in LTO streaming being out of sync and ICEs all around.
The reason for using 1 << 8 there was exactly to make it interoperable for
offloading.  What could be perhaps done is that you stream out the
host MAX_MACHINE_MODE value somewhere and stream it in inside of
lto_input_mode_table before you allocate the table.  But, that streamed
in host max_machine_mdoe has to be remembered somewhere and used e.g. in
bp_unpack_machine_mode instead of MAX_MACHINE_MODE.

Jakub



RE: [PATCH] Remove -save-temps from tests using -flto

2023-06-19 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 19, 2023 7:28 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Tamar Christina 
> Subject: [PATCH] Remove -save-temps from tests using -flto
> 
> The following removes -save-temps that doesn't seem to have any good
> reason from tests that also run with -flto added.  That can cause ltrans 
> files to
> race with other multilibs tested and I'm frequently seeing linker complaints
> that the architecture doesn't match here.
> 
> I'm not sure whether the .ltrans.o files end up in a non gccN/ specific 
> directory
> or if we end up sharing the same dir for different multilibs (not sure if 
> it's easily
> possible to avoid that).
> 
> Parallel testing on x86_64-unknown-linux-gnu in progress.
> 
> Tamar, was there any reason to use -save-temps here?

At the time I was getting unresolved errors from these without it.
But perhaps that's something to do with dejagnu versions?

Tamar

> 
>   * gcc.dg/vect/vect-bic-bitmask-2.c: Remove -save-temps.
>   * gcc.dg/vect/vect-bic-bitmask-3.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-4.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-5.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-6.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-8.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-9.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-10.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-11.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-6.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-8.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-9.c  | 2 +-
>  9 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> index e9ec9603af6..e6810433d70 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> index 06c103d3885..f83078b5d51 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> index 059bfb3ae62..e33a824df07 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> index 059bfb3ae62..e33a824df07 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> index 91b82fb5988..8895d5c263c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
> index 59f339fb8c5..77d4deb633c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vect

Re: [PATCH] [contrib] validate_failures.py: Don't consider summary line in wrong place

2023-06-19 Thread Thiago Jung Bauermann via Gcc-patches


Jeff Law  writes:

> On 6/16/23 06:02, Thiago Jung Bauermann via Gcc-patches wrote:
>> contrib/ChangeLog:
>>  * testsuite-management/validate_failures.py (IsInterestingResult):
>>  Add result_set argument and use it.  Adjust callers.
> Thanks.  I pushed this to the trunk.

Thank you!

-- 
Thiago


Re: [libstdc++] Improve M_check_len

2023-06-19 Thread Jonathan Wakely via Gcc-patches
On Sun, 18 Jun 2023 at 19:37, Jan Hubicka  wrote:

> Hi,
> _M_check_len is used in vector reallocations. It computes __n + __s but
> does
> checking for case that (__n + __s) * sizeof (Tp) would overflow ptrdiff_t.
> Since we know that __s is a size of already allocated memory block if __n
> is
> not too large, this will never happen on 64bit systems since memory is not
> that
> large.  This patch adds __builtin_constant_p checks for this case.  This
> size
> of fully inlined push_back function that is critical for loops that are
> controlled by std::vector based stack.
>
> With the patch to optimize std::max and to handle SRA candidates, we
> fully now inline push_back with -O3 (not with -O2), however there are still
> quite few silly things for example:
>
>   //  _78 is original size of the allocated vector.
>
>   _76 = stack$_M_end_of_storage_177 - _142;
>   _77 = _76 /[ex] 8;
>   _78 = (long unsigned int) _77;
>   _79 = MAX_EXPR <_78, 1>;
>   _80 = _78 + _79; // this is result of _M_check_len doubling the
> allocated vector size.
>   if (_80 != 0)// result will always be non-zero.
> goto ; [54.67%]
>   else
> goto ; [45.33%]
>
>[local count: 30795011]:
>   if (_80 > 1152921504606846975)  // doubling succesfully allocated
> memmory will never get so large.
> goto ; [10.00%]
>   else
> goto ; [90.00%]
>
>[local count: 3079501]:
>   if (_80 > 2305843009213693951)  // I wonder if we really want to have
> two different throws
> goto ; [50.00%]
>   else
> goto ; [50.00%]
>
>[local count: 1539750]:
>   std::__throw_bad_array_new_length ();
>
>[local count: 1539750]:
>   std::__throw_bad_alloc ();
>
>[local count: 27715510]:
>   _108 = _80 * 8;
>   _109 = operator new (_108);
>
> Maybe we want to add assumption that result of the function is never
> greater than max_size to get rid of the two checks above.  However this
> will still be recongized only after inlining and will continue confusing
> inliner heuristics.
>
> Bootstrapped/regtested x86_64-linux.  I am not too familiar with libstdc++
> internals,
> so would welcome comments and ideas.
>
> libstdc++-v3/ChangeLog:
>
> PR tree-optimization/110287
> * include/bits/stl_vector.h: Optimize _M_check_len for constantly
> sized
> types and allocations.
>
> diff --git a/libstdc++-v3/include/bits/stl_vector.h
> b/libstdc++-v3/include/bits/stl_vector.h
> index 70ced3d101f..3ad59fe3e2b 100644
> --- a/libstdc++-v3/include/bits/stl_vector.h
> +++ b/libstdc++-v3/include/bits/stl_vector.h
> @@ -1895,11 +1895,22 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>size_type
>_M_check_len(size_type __n, const char* __s) const
>{
> -   if (max_size() - size() < __n)
> - __throw_length_error(__N(__s));
> +   // On 64bit systems vectors of small sizes can not
> +   // reach overflow by growing by small sizes; before
> +   // this happens, we will run out of memory.
> +   if (__builtin_constant_p (sizeof (_Tp))
>

This shouldn't be here, of course sizeof is a constant.

No space before the opening parens, libstdc++ doesn't follow GNU style.



> +   && __builtin_constant_p (__n)
> +   && sizeof (ptrdiff_t) >= 8
> +   && __n < max_size () / 2)
>

This check is not OK. As I said in Bugzilla just now, max_size() depends on
the allocator, which could return something much smaller than PTRDIFF_MAX.
You can't make this assumption for all specializations of std::vector.

If Alloc::max_size() == 100 and this->size() == 100 then this function
needs to throw length_error for *any* n. In the general case you cannot
remove size() from this condition.

For std::allocator it's safe to assume that max_size() is related to
PTRDIFF_MAX/sizeof(T), but this patch would apply to all allocators.



> + return size() + (std::max)(size(), __n);
>
+   else
> + {
> +   if (max_size() - size() < __n)
> + __throw_length_error(__N(__s));
>
> -   const size_type __len = size() + (std::max)(size(), __n);
> -   return (__len < size() || __len > max_size()) ? max_size() : __len;
> +   const size_type __len = size() + (std::max)(size(), __n);
> +   return (__len < size() || __len > max_size()) ? max_size() :
> __len;
> + }
>}
>
>// Called by constructors to check initial size.
>
>


[PATCH] debug/110295 - mixed up early/late debug for member DIEs

2023-06-19 Thread Richard Biener via Gcc-patches
When we process a scope typedef during early debug creation and
we have already created a DIE for the type when the decl is
TYPE_DECL_IS_STUB and this DIE is still in limbo we end up
just re-parenting that type DIE instead of properly creating
a DIE for the decl, eventually picking up the now completed
type and creating DIEs for the members.  Instead this is currently
defered to the second time we come here, when we annotate the
DIEs with locations late where now the type DIE is no longer
in limbo and we fall through doing the job for the decl.

The following makes sure we perform the necessary early tasks
for this by continuing with the decl DIE creation after setting
a parent for the limbo type DIE.

[LTO] Bootstrapped on x86_64-unknown-linux-gnu.

OK for trunk?

Thanks,
Richard.

PR debug/110295
* dwarf2out.cc (process_scope_var): Continue processing
the decl after setting a parent in case the existing DIE
was in limbo.

* g++.dg/debug/pr110295.C: New testcase.
---
 gcc/dwarf2out.cc  |  3 ++-
 gcc/testsuite/g++.dg/debug/pr110295.C | 19 +++
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/debug/pr110295.C

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index d89ffa66847..e70c47cec8d 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -26533,7 +26533,8 @@ process_scope_var (tree stmt, tree decl, tree origin, 
dw_die_ref context_die)
 
   if (die != NULL && die->die_parent == NULL)
 add_child_die (context_die, die);
-  else if (TREE_CODE (decl_or_origin) == IMPORTED_DECL)
+
+  if (TREE_CODE (decl_or_origin) == IMPORTED_DECL)
 {
   if (early_dwarf)
dwarf2out_imported_module_or_decl_1 (decl_or_origin, DECL_NAME 
(decl_or_origin),
diff --git a/gcc/testsuite/g++.dg/debug/pr110295.C 
b/gcc/testsuite/g++.dg/debug/pr110295.C
new file mode 100644
index 000..10cad557095
--- /dev/null
+++ b/gcc/testsuite/g++.dg/debug/pr110295.C
@@ -0,0 +1,19 @@
+// { dg-do compile }
+// { dg-options "-g" }
+
+template 
+struct QCachedT
+{
+  void operator delete(void *, T *) {}
+};
+template
+void exercise()
+{
+  struct thing_t
+: QCachedT
+  {
+  };
+  thing_t *list[1];
+  new thing_t; // { dg-warning "" }
+}
+int main() { exercise<1>(); }
-- 
2.35.3


Re: Do not account __builtin_unreachable guards in inliner

2023-06-19 Thread Jan Hubicka via Gcc-patches
> On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
>  wrote:
> >
> > Hi,
> > this was suggested earlier somewhere, but I can not find the thread.
> > C++ has assume attribute that expands int
> >   if (conditional)
> > __builtin_unreachable ()
> > We do not want to account the conditional in inline heuristics since
> > we know that it is going to be optimized out.
> >
> > Bootstrapped/regtested x86_64-linux, will commit it later today if
> > thre are no complains.
> 
> I think we also had the request to not account the condition feeding
> stmts (if they only feed it and have no side-effects).  libstdc++ has
> complex range comparisons here.  Also ...

I was thinking of this: it depends on how smart do we want to get.
We also have dead conditionals guarding clobbers, predicts and other
stuff.  In general we can use mark phase of cd-dce telling it to ignore
those statements and then use its resut in the analysis.

Also question is how much we can rely on middle-end optimizing out
unreachables.  For example:
int a;
int b[3];
test()
{
  if (a > 0)
{
  for (int i = 0; i < 3; i++)
  b[i] = 0;
  __builtin_unreachable ();
}
}

IMO can be optimized to empty function.  I believe we used to have code
in tree-cfgcleanup to remove statements just before
__builtin_unreachable which can not terminate execution, but perhaps it
existed only in my local tree?
We could also perhaps declare unreachable NOVOPs which would make DSE to
remove the stores.

> 
> ... we do code generate BUILT_IN_UNREACHABLE_TRAP, no?

You are right.  I tested it with -funreachable-traps but it does not do
what I expected, I need -fsanitize=unreachable -fsanitize-trap=unreachable

Also if I try to call it by hand I get:

jan@localhost:/tmp> gcc t.c -S -O2 -funreachable-traps -fdump-tree-all-details 
-fsanitize=unreachable -fsanitize-trap=unreachable
t.c: In function ‘test’:
t.c:9:13: warning: implicit declaration of function 
‘__builtin_unreachable_trap’; did you mean ‘__builtin_unreachable trap’? 
[-Wimplicit-function-declaration]
9 | __builtin_unreachable_trap ();
  | ^~
  | __builtin_unreachable trap

Which is not as helpful as it is trying to be.
> 
> > +ret = true;
> > +done:
> > +  for (basic_block vbb:visited_bbs)
> > +cache[vbb->index] = (unsigned char)ret + 1;
> > +  return ret;
> > +}
> > +
> >  /* Analyze function body for NODE.
> > EARLY indicates run from early optimization pipeline.  */
> >
> > @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, bool 
> > early)
> >const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
> >order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> >nblocks = pre_and_rev_post_order_compute (NULL, order, false);
> > +  auto_vec cache;
> > +  cache.safe_grow_cleared (last_basic_block_for_fn (cfun));
> 
> A sbitmap with two bits per entry would be more space efficient here.  bitmap
> has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience,
> adding the corresponding to sbitmap.h would likely ease use there as well.

I did not know about the chunk API which is certainly nice :)
sbitmap will always allocate, while here we stay on stack for small
functions and I am not sure how much extra bit operations would not
offset extra memset, but overall I think it is all in noise.

Honza


RE: [PATCH] Remove -save-temps from tests using -flto

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, 19 Jun 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, June 19, 2023 7:28 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Tamar Christina 
> > Subject: [PATCH] Remove -save-temps from tests using -flto
> > 
> > The following removes -save-temps that doesn't seem to have any good
> > reason from tests that also run with -flto added.  That can cause ltrans 
> > files to
> > race with other multilibs tested and I'm frequently seeing linker complaints
> > that the architecture doesn't match here.
> > 
> > I'm not sure whether the .ltrans.o files end up in a non gccN/ specific 
> > directory
> > or if we end up sharing the same dir for different multilibs (not sure if 
> > it's easily
> > possible to avoid that).
> > 
> > Parallel testing on x86_64-unknown-linux-gnu in progress.
> > 
> > Tamar, was there any reason to use -save-temps here?
> 
> At the time I was getting unresolved errors from these without it.
> But perhaps that's something to do with dejagnu versions?

I don't know.  Can you check if there's an issue on your side when
removing -save-temps?

Richard.

> Tamar
> 
> > 
> > * gcc.dg/vect/vect-bic-bitmask-2.c: Remove -save-temps.
> > * gcc.dg/vect/vect-bic-bitmask-3.c: Likewise.
> > * gcc.dg/vect/vect-bic-bitmask-4.c: Likewise.
> > * gcc.dg/vect/vect-bic-bitmask-5.c: Likewise.
> > * gcc.dg/vect/vect-bic-bitmask-6.c: Likewise.
> > * gcc.dg/vect/vect-bic-bitmask-8.c: Likewise.
> > * gcc.dg/vect/vect-bic-bitmask-9.c: Likewise.
> > * gcc.dg/vect/vect-bic-bitmask-10.c: Likewise.
> > * gcc.dg/vect/vect-bic-bitmask-11.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c  | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c  | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c  | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c  | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-6.c  | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-8.c  | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-9.c  | 2 +-
> >  9 files changed, 9 insertions(+), 9 deletions(-)
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > index e9ec9603af6..e6810433d70 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
> >  /* { dg-do run } */
> > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > 
> >  #include 
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > index 06c103d3885..f83078b5d51 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
> >  /* { dg-do run } */
> > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > 
> >  #include 
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > index 059bfb3ae62..e33a824df07 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
> >  /* { dg-do run } */
> > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > 
> >  #include 
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > index 059bfb3ae62..e33a824df07 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
> >  /* { dg-do run } */
> > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > 
> >  #include 
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> > index 91b82fb5988..8895d5c263c 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
> >  /* { dg-do run } */
> > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> > +/* { dg-additional-options "-O3 -fdump-tree-dce -w

Fix DejaGnu directive syntax error in 'libgomp.c/target-51.c' (was: [committed] libgomp.c/target-51.c: Accept more error-msg variants in dg-output (was: Re: [committed] libgomp: Fix OMP_TARGET_OFFLOAD

2023-06-19 Thread Thomas Schwinge
Hi!

On 2023-06-19T10:02:58+0200, Tobias Burnus  wrote:
> On 16.06.23 22:42, Thomas Schwinge wrote:
>> I see the new tests PASS, but with offloading enabled (nvptx) also see:
>>
>>  PASS: libgomp.c/target-51.c (test for excess errors)
>>  PASS: libgomp.c/target-51.c execution test
>>  [-PASS:-]{+FAIL:+} libgomp.c/target-51.c output pattern test
>>
>> ... due to:
>>
>>  Output was:
>>
>>  libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device cannot be 
>> used for offloading
>>
>>  Should match:
>>  .*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device not 
>> found.*
>
> Thanks for the report. I can offer yet another wording for the same program – 
> and also
> with nvptx enabled:
>
> libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device cannot be used 
> for offloading
>
> And I can also offer (which is already in the testcase with "! 
> offload_device"):
>
> libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is 
> available
>
> I think I will just match "..., but .*" without distinguishing 
> check_effective_target_* ...
>
> ... which I now did in commit r14-1926-g01fe115ba7eafe (see also attached 
> patch).

Pushed commit de2d3b69eefde005759279d6739d9a0dbd2a05cc
"Fix DejaGnu directive syntax error in 'libgomp.c/target-51.c'",
see attached.


Grüße
 Thomas


> * * *
>
> With offloading, there are simply too many possibilities:
>
> * Not compiled with offloading support - vs. with (ENABLE_OFFLOADING)
> * Support compiled in but either compiler or library support not installed
>(requires configuring with --enable-offload-defaulted)
> * Offloading libgomp plugins there but no CUDA or hsa runtime libraries
> * The latter being installed but no device available
>
> Plus -foffload=disable or only enabling an (at runtime) unavailable or
> unsupported device type or other issues like CUDA and device present but
> an issue with the kernel driver (or similar half-broken states) or ...
>
> [And with remote testing issues related to dg-set-target-env-var and only
> few systems supporting offloading, a full test coverage is even harder.]
>
> Tobias


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From de2d3b69eefde005759279d6739d9a0dbd2a05cc Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 19 Jun 2023 12:20:15 +0200
Subject: [PATCH] Fix DejaGnu directive syntax error in 'libgomp.c/target-51.c'

ERROR: libgomp.c/target-51.c: unknown dg option: \} for "}"

Fix-up for recent commit 01fe115ba7eafebcf97bbac9e157038a003d0c85
"libgomp.c/target-51.c: Accept more error-msg variants in dg-output".

	libgomp/
	* testsuite/libgomp.c/target-51.c: Fix DejaGnu directive syntax
	error.
---
 libgomp/testsuite/libgomp.c/target-51.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.c/target-51.c b/libgomp/testsuite/libgomp.c/target-51.c
index db0363bfc14..7ff8122861f 100644
--- a/libgomp/testsuite/libgomp.c/target-51.c
+++ b/libgomp/testsuite/libgomp.c/target-51.c
@@ -9,7 +9,7 @@
 
 /* See comment in target-50.c/target-50.c for why the output differs.  */
 
-/* { dg-output ".*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but .*" } } */
+/* { dg-output ".*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but .*" } */
 
 int
 main ()
-- 
2.34.1



[PATCH] Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans

2023-06-19 Thread Toru Kisuki via Gcc-patches
Hi,


With -O3 -fsignaling-nans -fno-signed-zeros, compiler should not simplify 'x + 
0.0' to 'x'.


GCC Bugzilla : Bug 110305


gcc/ChangeLog:

2023-06-19  Toru Kisuki  

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):

---
 gcc/simplify-rtx.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index e152918b0f1..cc96b36ad4e 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -2698,7 +2698,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
 when x is NaN, infinite, or finite and nonzero.  They aren't
 when x is -0 and the rounding mode is not towards -infinity,
 since (-0) + 0 is then 0.  */
-  if (!HONOR_SIGNED_ZEROS (mode) && trueop1 == CONST0_RTX (mode))
+  if (!HONOR_SIGNED_ZEROS (mode) && !HONOR_SNANS (mode)
+  && trueop1 == CONST0_RTX (mode))
return op0;

   /* ((-a) + b) -> (b - a) and similarly for (a + (-b)).  These
--
2.38.1



RE: [PATCH] Remove -save-temps from tests using -flto

2023-06-19 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 19, 2023 11:19 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] Remove -save-temps from tests using -flto
> 
> On Mon, 19 Jun 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Monday, June 19, 2023 7:28 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Tamar Christina 
> > > Subject: [PATCH] Remove -save-temps from tests using -flto
> > >
> > > The following removes -save-temps that doesn't seem to have any good
> > > reason from tests that also run with -flto added.  That can cause
> > > ltrans files to race with other multilibs tested and I'm frequently
> > > seeing linker complaints that the architecture doesn't match here.
> > >
> > > I'm not sure whether the .ltrans.o files end up in a non gccN/
> > > specific directory or if we end up sharing the same dir for
> > > different multilibs (not sure if it's easily possible to avoid that).
> > >
> > > Parallel testing on x86_64-unknown-linux-gnu in progress.
> > >
> > > Tamar, was there any reason to use -save-temps here?
> >
> > At the time I was getting unresolved errors from these without it.
> > But perhaps that's something to do with dejagnu versions?
> 
> I don't know.  Can you check if there's an issue on your side when removing -
> save-temps?

Nope no issues, all tests still pass.

Regards,
Tamar
> 
> Richard.
> 
> > Tamar
> >
> > >
> > >   * gcc.dg/vect/vect-bic-bitmask-2.c: Remove -save-temps.
> > >   * gcc.dg/vect/vect-bic-bitmask-3.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-4.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-5.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-6.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-8.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-9.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-10.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-11.c: Likewise.
> > > ---
> > >  gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-6.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-8.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-9.c  | 2 +-
> > >  9 files changed, 9 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > > index e9ec9603af6..e6810433d70 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } }
> > > */
> > >  /* { dg-do run } */
> > > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" }
> > > */
> > > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > >
> > >  #include 
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > > index 06c103d3885..f83078b5d51 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } }
> > > */
> > >  /* { dg-do run } */
> > > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" }
> > > */
> > > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > >
> > >  #include 
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > > index 059bfb3ae62..e33a824df07 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } }
> > > */
> > >  /* { dg-do run } */
> > > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" }
> > > */
> > > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > >
> > >  #include 
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > > index 059bfb3ae62..e33a824df07 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } }
> > > */
> > >  /* { dg-do run } */
> > > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" }
> > > */
> > > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > >
> > >  #include 
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/ve

Re: [libstdc++] Improve M_check_len

2023-06-19 Thread Jan Hubicka via Gcc-patches
> > -   if (max_size() - size() < __n)
> > - __throw_length_error(__N(__s));
> > +   // On 64bit systems vectors of small sizes can not
> > +   // reach overflow by growing by small sizes; before
> > +   // this happens, we will run out of memory.
> > +   if (__builtin_constant_p (sizeof (_Tp))
> >
> 
> This shouldn't be here, of course sizeof is a constant.
OK :)
> 
> No space before the opening parens, libstdc++ doesn't follow GNU style.
Fixed.
> 
> 
> 
> > +   && __builtin_constant_p (__n)
> > +   && sizeof (ptrdiff_t) >= 8
> > +   && __n < max_size () / 2)
> >
> 
> This check is not OK. As I said in Bugzilla just now, max_size() depends on
> the allocator, which could return something much smaller than PTRDIFF_MAX.
> You can't make this assumption for all specializations of std::vector.
> 
> If Alloc::max_size() == 100 and this->size() == 100 then this function
> needs to throw length_error for *any* n. In the general case you cannot
> remove size() from this condition.
> 
> For std::allocator it's safe to assume that max_size() is related to
> PTRDIFF_MAX/sizeof(T), but this patch would apply to all allocators.

Here is updated version.  I simply __builtin_constant_p max_size and
test it is large enough.  For that we need to copy it into temporary
variable since we fold-const __builtin_constant_p (function (x))
early, before function gets inlined.

I also added __builtin_unreachable to determine return value range
as discussed in PR.

Honza

diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index 70ced3d101f..7a1966405ca 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1895,11 +1895,29 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   size_type
   _M_check_len(size_type __n, const char* __s) const
   {
-   if (max_size() - size() < __n)
- __throw_length_error(__N(__s));
+   const size_type __max_size = max_size();
+   // On 64bit systems vectors can not reach overflow by growing
+   // by small sizes; before this happens, we will run out of memory.
+   if (__builtin_constant_p(__n)
+   && __builtin_constant_p(__max_size)
+   && sizeof(ptrdiff_t) >= 8
+   && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60)
+   && __n < __max_size / 2)
+ {
+   const size_type __len = size() + (std::max)(size(), __n);
+   // let compiler know that __len has sane value range.
+   if (__len < __n || __len >= __max_size)
+ __builtin_unreachable();
+   return __len;
+ }
+   else
+ {
+   if (__max_size - size() < __n)
+ __throw_length_error(__N(__s));
 
-   const size_type __len = size() + (std::max)(size(), __n);
-   return (__len < size() || __len > max_size()) ? max_size() : __len;
+   const size_type __len = size() + (std::max)(size(), __n);
+   return (__len < size() || __len > __max_size) ? __max_size : __len;
+ }
   }
 
   // Called by constructors to check initial size.


RE: [PATCH] Remove -save-temps from tests using -flto

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, 19 Jun 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, June 19, 2023 11:19 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH] Remove -save-temps from tests using -flto
> > 
> > On Mon, 19 Jun 2023, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Monday, June 19, 2023 7:28 AM
> > > > To: gcc-patches@gcc.gnu.org
> > > > Cc: Tamar Christina 
> > > > Subject: [PATCH] Remove -save-temps from tests using -flto
> > > >
> > > > The following removes -save-temps that doesn't seem to have any good
> > > > reason from tests that also run with -flto added.  That can cause
> > > > ltrans files to race with other multilibs tested and I'm frequently
> > > > seeing linker complaints that the architecture doesn't match here.
> > > >
> > > > I'm not sure whether the .ltrans.o files end up in a non gccN/
> > > > specific directory or if we end up sharing the same dir for
> > > > different multilibs (not sure if it's easily possible to avoid that).
> > > >
> > > > Parallel testing on x86_64-unknown-linux-gnu in progress.
> > > >
> > > > Tamar, was there any reason to use -save-temps here?
> > >
> > > At the time I was getting unresolved errors from these without it.
> > > But perhaps that's something to do with dejagnu versions?
> > 
> > I don't know.  Can you check if there's an issue on your side when removing 
> > -
> > save-temps?
> 
> Nope no issues, all tests still pass.

Pushed then.

Richard.


Re: [libstdc++] Improve M_check_len

2023-06-19 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches wrote:
> - if (max_size() - size() < __n)
> -   __throw_length_error(__N(__s));
> + const size_type __max_size = max_size();
> + // On 64bit systems vectors can not reach overflow by growing
> + // by small sizes; before this happens, we will run out of memory.
> + if (__builtin_constant_p(__n)
> + && __builtin_constant_p(__max_size)
> + && sizeof(ptrdiff_t) >= 8
> + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60)

Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation?

Jakub



Re: Do not account __builtin_unreachable guards in inliner

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka  wrote:
>
> > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
> >  wrote:
> > >
> > > Hi,
> > > this was suggested earlier somewhere, but I can not find the thread.
> > > C++ has assume attribute that expands int
> > >   if (conditional)
> > > __builtin_unreachable ()
> > > We do not want to account the conditional in inline heuristics since
> > > we know that it is going to be optimized out.
> > >
> > > Bootstrapped/regtested x86_64-linux, will commit it later today if
> > > thre are no complains.
> >
> > I think we also had the request to not account the condition feeding
> > stmts (if they only feed it and have no side-effects).  libstdc++ has
> > complex range comparisons here.  Also ...
>
> I was thinking of this: it depends on how smart do we want to get.
> We also have dead conditionals guarding clobbers, predicts and other
> stuff.  In general we can use mark phase of cd-dce telling it to ignore
> those statements and then use its resut in the analysis.

Hmm, possible but a bit heavy-handed.  There's simple_dce_from_worklist
which might be a way to do this (of course we cannot use that 1:1).  Also
then consider

 a = a + 1;
 if (a > 10)
   __builtin_unreachable ();
 if (a < 5)
   __builtin_unreachable ();

and a has more than one use but both are going away.  So indeed a
more global analysis would be needed to get the full benefit.

> Also question is how much we can rely on middle-end optimizing out
> unreachables.  For example:
> int a;
> int b[3];
> test()
> {
>   if (a > 0)
> {
>   for (int i = 0; i < 3; i++)
>   b[i] = 0;
>   __builtin_unreachable ();
> }
> }
>
> IMO can be optimized to empty function.  I believe we used to have code
> in tree-cfgcleanup to remove statements just before
> __builtin_unreachable which can not terminate execution, but perhaps it
> existed only in my local tree?

I think we rely on DCE/DSE here and explicit unreachable () pruning after
VRP picked up things (I think it simply gets the secondary effect optimizing
the condition it created the range for in the first pass).

DSE is appearantly not able to kill the stores, I will fix that.  I
think DCE can,
but only for non-aliased stores.

> We could also perhaps declare unreachable NOVOPs which would make DSE to
> remove the stores.

But only because of a bug in DSE ... it also removes them if that
__builtin_unreachable ()
is GIMPLE_RESX.

> >
> > ... we do code generate BUILT_IN_UNREACHABLE_TRAP, no?
>
> You are right.  I tested it with -funreachable-traps but it does not do
> what I expected, I need -fsanitize=unreachable -fsanitize-trap=unreachable
>
> Also if I try to call it by hand I get:
>
> jan@localhost:/tmp> gcc t.c -S -O2 -funreachable-traps 
> -fdump-tree-all-details -fsanitize=unreachable -fsanitize-trap=unreachable
> t.c: In function ‘test’:
> t.c:9:13: warning: implicit declaration of function 
> ‘__builtin_unreachable_trap’; did you mean ‘__builtin_unreachable trap’? 
> [-Wimplicit-function-declaration]
> 9 | __builtin_unreachable_trap ();
>   | ^~
>   | __builtin_unreachable trap
>
> Which is not as helpful as it is trying to be.
> >
> > > +ret = true;
> > > +done:
> > > +  for (basic_block vbb:visited_bbs)
> > > +cache[vbb->index] = (unsigned char)ret + 1;
> > > +  return ret;
> > > +}
> > > +
> > >  /* Analyze function body for NODE.
> > > EARLY indicates run from early optimization pipeline.  */
> > >
> > > @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, 
> > > bool early)
> > >const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
> > >order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> > >nblocks = pre_and_rev_post_order_compute (NULL, order, false);
> > > +  auto_vec cache;
> > > +  cache.safe_grow_cleared (last_basic_block_for_fn (cfun));
> >
> > A sbitmap with two bits per entry would be more space efficient here.  
> > bitmap
> > has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience,
> > adding the corresponding to sbitmap.h would likely ease use there as well.
>
> I did not know about the chunk API which is certainly nice :)
> sbitmap will always allocate, while here we stay on stack for small
> functions and I am not sure how much extra bit operations would not
> offset extra memset, but overall I think it is all in noise.

Ah, yeah.
> Honza


[committed] amdgcn: Delete inactive libfuncs

2023-06-19 Thread Andrew Stubbs
There were implementations for HImode division in libgcc, but there were 
no matching libfuncs defined in the compiler, so the code was inactive 
(GCC only defines SImode and DImode, by default, and amdgcn only adds 
TImode explicitly).


On trying to activate it I find that the definition of 
TARGET_PROMOTE_FUNCTION_MODE causes all unsigned HImode values to be 
sign-extended to SImode when calling libfuncs, thus breaking the values 
(presumably because they don't have a prototype?). I can't see an 
obvious advantage for having these functions for scalars, at this time, 
so I'm just deleting them ahead of adding divmod and vector implementations.


Committed to mainline, and OG13 will follow shortly.

Andrewamdgcn: Delete inactive libfuncs

The HImode libfuncs weren't called and trying to enable them fails because
TARGET_PROMOTE_FUNCTION_MODE wants to widen the arguments but the signedness
isn't known.

libgcc/ChangeLog:

* config/gcn/lib2-gcn.h (QItype, UQItype, HItype, UHItype): Delete.
(__divhi3, __modhi3, __udivhi3, __umodhi3): Delete.
* config/gcn/t-amdgcn: Don't build lib2-divmod-hi.c.
* config/gcn/lib2-divmod-hi.c: Removed.

diff --git a/libgcc/config/gcn/lib2-divmod-hi.c 
b/libgcc/config/gcn/lib2-divmod-hi.c
deleted file mode 100644
index f4584aabcd9..000
--- a/libgcc/config/gcn/lib2-divmod-hi.c
+++ /dev/null
@@ -1,117 +0,0 @@
-/* Copyright (C) 2012-2023 Free Software Foundation, Inc.
-   Contributed by Altera and Mentor Graphics, Inc.
-
-This file is free software; you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the
-Free Software Foundation; either version 3, or (at your option) any
-later version.
-
-This file is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-General Public License for more details.
-
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
-
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-.  */
-
-#include "lib2-gcn.h"
-
-/* 16-bit HI divide and modulo as used in gcn.  */
-
-static UHItype
-udivmodhi4 (UHItype num, UHItype den, word_type modwanted)
-{
-  UHItype bit = 1;
-  UHItype res = 0;
-
-  while (den < num && bit && !(den & (1L<<15)))
-{
-  den <<=1;
-  bit <<=1;
-}
-  while (bit)
-{
-  if (num >= den)
-   {
- num -= den;
- res |= bit;
-   }
-  bit >>=1;
-  den >>=1;
-}
-  if (modwanted)
-return num;
-  return res;
-}
-
-
-HItype
-__divhi3 (HItype a, HItype b)
-{
-  word_type neg = 0;
-  HItype res;
-
-  if (a < 0)
-{
-  a = -a;
-  neg = !neg;
-}
-
-  if (b < 0)
-{
-  b = -b;
-  neg = !neg;
-}
-
-  res = udivmodhi4 (a, b, 0);
-
-  if (neg)
-res = -res;
-
-  return res;
-}
-
-
-HItype
-__modhi3 (HItype a, HItype b)
-{
-  word_type neg = 0;
-  HItype res;
-
-  if (a < 0)
-{
-  a = -a;
-  neg = 1;
-}
-
-  if (b < 0)
-b = -b;
-
-  res = udivmodhi4 (a, b, 1);
-
-  if (neg)
-res = -res;
-
-  return res;
-}
-
-
-UHItype
-__udivhi3 (UHItype a, UHItype b)
-{
-  return udivmodhi4 (a, b, 0);
-}
-
-
-UHItype
-__umodhi3 (UHItype a, UHItype b)
-{
-  return udivmodhi4 (a, b, 1);
-}
-
diff --git a/libgcc/config/gcn/lib2-gcn.h b/libgcc/config/gcn/lib2-gcn.h
index 645245b2128..67ad9bafc19 100644
--- a/libgcc/config/gcn/lib2-gcn.h
+++ b/libgcc/config/gcn/lib2-gcn.h
@@ -27,10 +27,6 @@
 
 /* Types.  */
 
-typedef char QItype __attribute__ ((mode (QI)));
-typedef unsigned char UQItype __attribute__ ((mode (QI)));
-typedef short HItype __attribute__ ((mode (HI)));
-typedef unsigned short UHItype __attribute__ ((mode (HI)));
 typedef int SItype __attribute__ ((mode (SI)));
 typedef unsigned int USItype __attribute__ ((mode (SI)));
 typedef int DItype __attribute__ ((mode (DI)));
@@ -48,10 +44,6 @@ extern SItype __divsi3 (SItype, SItype);
 extern SItype __modsi3 (SItype, SItype);
 extern USItype __udivsi3 (USItype, USItype);
 extern USItype __umodsi3 (USItype, USItype);
-extern HItype __divhi3 (HItype, HItype);
-extern HItype __modhi3 (HItype, HItype);
-extern UHItype __udivhi3 (UHItype, UHItype);
-extern UHItype __umodhi3 (UHItype, UHItype);
 extern SItype __mulsi3 (SItype, SItype);
 
 #endif /* LIB2_GCN_H */
diff --git a/libgcc/config/gcn/t-amdgcn b/libgcc/config/gcn/t-amdgcn
index 38bde54a096..e64953e6185 100644
--- a/libgcc/config/gcn/t-amdgcn
+++ b/libgcc/config/gcn/t-amdgcn
@@ -1,6 +1,5 @@
 LIB2ADD += $(srcdir)/config/gcn/atomic.c \
   $(srcdir)/config/gcn/lib2-divmod.c \
-  $(srcdir)/config/gcn/lib2-divmod-hi

[committed] amdgcn: minimal V64TImode vector support

2023-06-19 Thread Andrew Stubbs
This patch adds just enough TImode vector support to use them for moving 
data about. This is primarily for the use of divmodv64di4, which will 
use TImode to return a pair of DImode values.


The TImode vectors have no other operators defined, and there are no 
hardware instructions to support this mode, beyond load and store.


Committed to mainline, and OG13 will follow shortly.

Andrewamdgcn: minimal V64TImode vector support

Just enough support for TImode vectors to exist, load, store, move,
without any real instructions available.

This is primarily for the use of divmodv64di4, which uses TImode to
return a pair of DImode values.

gcc/ChangeLog:

* config/gcn/gcn-protos.h (vgpr_4reg_mode_p): New function.
* config/gcn/gcn-valu.md (V_4REG, V_4REG_ALT): New iterators.
(V_MOV, V_MOV_ALT): Likewise.
(scalar_mode, SCALAR_MODE): Add TImode.
(vnsi, VnSI, vndi, VnDI): Likewise.
(vec_merge, vec_merge_with_clobber, vec_merge_with_vcc): Use V_MOV.
(mov, mov_unspec): Use V_MOV.
(*mov_4reg): New insn.
(mov_exec): New 4reg variant.
(mov_sgprbase): Likewise.
(reload_in, reload_out): Use V_MOV.
(vec_set): Likewise.
(vec_duplicate): New 4reg variant.
(vec_extract): Likewise.
(vec_extract): Rename to ...
(vec_extract): ... this, and use V_MOV.
(vec_extract_nop): New 4reg variant.
(fold_extract_last_): Use V_MOV.
(vec_init): Rename to ...
(vec_init): ... this, and use V_MOV.
(gather_load, gather_expr,
gather_insn_1offset, gather_insn_1offset_ds,
gather_insn_2offsets): Use V_MOV.
(scatter_store, scatter_expr,
scatter_insn_1offset,
scatter_insn_1offset_ds,
scatter_insn_2offsets): Likewise.
(maskloaddi, maskstoredi, mask_gather_load,
mask_scatter_store): Likewise.
* config/gcn/gcn.cc (gcn_class_max_nregs): Use vgpr_4reg_mode_p.
(gcn_hard_regno_mode_ok): Likewise.
(GEN_VNM): Add TImode support.
(USE_TI): New macro. Separate TImode operations from non-TImode ones.
(gcn_vector_mode_supported_p): Add V64TImode, V32TImode, V16TImode,
V8TImode, and V2TImode.
(print_operand):  Add 'J' and 'K' print codes.

diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h
index 287ce17d422..3befb2b7caa 100644
--- a/gcc/config/gcn/gcn-protos.h
+++ b/gcc/config/gcn/gcn-protos.h
@@ -136,6 +136,17 @@ vgpr_2reg_mode_p (machine_mode mode)
   return (mode == DImode || mode == DFmode);
 }
 
+/* Return true if MODE is valid for four VGPR registers.  */
+
+inline bool
+vgpr_4reg_mode_p (machine_mode mode)
+{
+  if (VECTOR_MODE_P (mode))
+mode = GET_MODE_INNER (mode);
+
+  return (mode == TImode);
+}
+
 /* Return true if MODE can be handled directly by VGPR operations.  */
 
 inline bool
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 7290cdc2fd0..284dda73da9 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -96,6 +96,10 @@ (define_mode_iterator V_2REG_ALT
   V32DI V32DF
   V64DI V64DF])
 
+; Vector modes for four vector registers
+(define_mode_iterator V_4REG [V2TI V4TI V8TI V16TI V32TI V64TI])
+(define_mode_iterator V_4REG_ALT [V2TI V4TI V8TI V16TI V32TI V64TI])
+
 ; Vector modes with native support
 (define_mode_iterator V_noQI
  [V2HI V2HF V2SI V2SF V2DI V2DF
@@ -136,7 +140,7 @@ (define_mode_iterator SV_SFDF
   V32SF V32DF
   V64SF V64DF])
 
-; All of above
+; All modes in which we want to do more than just moves.
 (define_mode_iterator V_ALL
  [V2QI V2HI V2HF V2SI V2SF V2DI V2DF
   V4QI V4HI V4HF V4SI V4SF V4DI V4DF
@@ -175,97 +179,113 @@ (define_mode_iterator SV_FP
   V32HF V32SF V32DF
   V64HF V64SF V64DF])
 
+; All modes that need moves, including those without many insns.
+(define_mode_iterator V_MOV
+ [V2QI V2HI V2HF V2SI V2SF V2DI V2DF V2TI
+  V4QI V4HI V4HF V4SI V4SF V4DI V4DF V4TI
+  V8QI V8HI V8HF V8SI V8SF V8DI V8DF V8TI
+  V16QI V16HI V16HF V16SI V16SF V16DI V16DF V16TI
+  V32QI V32HI V32HF V32SI V32SF V32DI V32DF V32TI
+  V64QI V64HI V64HF V64SI V64SF V64DI V64DF V64TI])
+(define_mode_iterator V_MOV_ALT
+ [V2QI V2HI V2HF V2SI V2SF V2DI V2DF V2TI
+  V4QI V4HI V4HF V4SI V4SF V4DI V4DF V4TI
+  V8QI V8HI V8HF V8SI V8SF V8DI V8DF V8TI
+  V16QI V16HI V16HF V16SI V16SF V16DI V16DF V16TI
+  V32QI V32HI V32HF V32SI V32SF V32DI V32DF V32TI
+  V64QI V64HI V64HF V64SI V64SF V64DI V64DF V64TI])
+
 (define_mode_attr scalar_mode
-  [(QI "qi") (HI "hi") (SI "si")
+  [(QI "qi") (HI "hi

Re: Do not account __builtin_unreachable guards in inliner

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, Jun 19, 2023 at 1:30 PM Richard Biener
 wrote:
>
> On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka  wrote:
> >
> > > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
> > >  wrote:
> > > >
> > > > Hi,
> > > > this was suggested earlier somewhere, but I can not find the thread.
> > > > C++ has assume attribute that expands int
> > > >   if (conditional)
> > > > __builtin_unreachable ()
> > > > We do not want to account the conditional in inline heuristics since
> > > > we know that it is going to be optimized out.
> > > >
> > > > Bootstrapped/regtested x86_64-linux, will commit it later today if
> > > > thre are no complains.
> > >
> > > I think we also had the request to not account the condition feeding
> > > stmts (if they only feed it and have no side-effects).  libstdc++ has
> > > complex range comparisons here.  Also ...
> >
> > I was thinking of this: it depends on how smart do we want to get.
> > We also have dead conditionals guarding clobbers, predicts and other
> > stuff.  In general we can use mark phase of cd-dce telling it to ignore
> > those statements and then use its resut in the analysis.
>
> Hmm, possible but a bit heavy-handed.  There's simple_dce_from_worklist
> which might be a way to do this (of course we cannot use that 1:1).  Also
> then consider
>
>  a = a + 1;
>  if (a > 10)
>__builtin_unreachable ();
>  if (a < 5)
>__builtin_unreachable ();
>
> and a has more than one use but both are going away.  So indeed a
> more global analysis would be needed to get the full benefit.
>
> > Also question is how much we can rely on middle-end optimizing out
> > unreachables.  For example:
> > int a;
> > int b[3];
> > test()
> > {
> >   if (a > 0)
> > {
> >   for (int i = 0; i < 3; i++)
> >   b[i] = 0;
> >   __builtin_unreachable ();
> > }
> > }
> >
> > IMO can be optimized to empty function.  I believe we used to have code
> > in tree-cfgcleanup to remove statements just before
> > __builtin_unreachable which can not terminate execution, but perhaps it
> > existed only in my local tree?
>
> I think we rely on DCE/DSE here and explicit unreachable () pruning after
> VRP picked up things (I think it simply gets the secondary effect optimizing
> the condition it created the range for in the first pass).
>
> DSE is appearantly not able to kill the stores, I will fix that.  I
> think DCE can,
> but only for non-aliased stores.
>
> > We could also perhaps declare unreachable NOVOPs which would make DSE to
> > remove the stores.
>
> But only because of a bug in DSE ... it also removes them if that
> __builtin_unreachable ()
> is GIMPLE_RESX.

Oh, and __builtin_unreachable is already 'const' and thus without any VOPs.  The
issue in DSE is that DSE will not run into __builtin_unreachable because it has
no VOPs.  Instead DSE relies on eventually seeing a VUSE for all paths leaving
a function, it doesn't have a way to consider __builtin_unreachable killing all
memory (it would need a VDEF for that).

It might be possible to record which virtual operands are live at BBs without
successors (but the VUSE on returns was an attempt to avoid the need for that).

So there's no easy way to fix DSE here.

> > >
> > > ... we do code generate BUILT_IN_UNREACHABLE_TRAP, no?
> >
> > You are right.  I tested it with -funreachable-traps but it does not do
> > what I expected, I need -fsanitize=unreachable -fsanitize-trap=unreachable
> >
> > Also if I try to call it by hand I get:
> >
> > jan@localhost:/tmp> gcc t.c -S -O2 -funreachable-traps 
> > -fdump-tree-all-details -fsanitize=unreachable -fsanitize-trap=unreachable
> > t.c: In function ‘test’:
> > t.c:9:13: warning: implicit declaration of function 
> > ‘__builtin_unreachable_trap’; did you mean ‘__builtin_unreachable trap’? 
> > [-Wimplicit-function-declaration]
> > 9 | __builtin_unreachable_trap ();
> >   | ^~
> >   | __builtin_unreachable trap
> >
> > Which is not as helpful as it is trying to be.
> > >
> > > > +ret = true;
> > > > +done:
> > > > +  for (basic_block vbb:visited_bbs)
> > > > +cache[vbb->index] = (unsigned char)ret + 1;
> > > > +  return ret;
> > > > +}
> > > > +
> > > >  /* Analyze function body for NODE.
> > > > EARLY indicates run from early optimization pipeline.  */
> > > >
> > > > @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, 
> > > > bool early)
> > > >const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN 
> > > > (cfun)->count;
> > > >order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> > > >nblocks = pre_and_rev_post_order_compute (NULL, order, false);
> > > > +  auto_vec cache;
> > > > +  cache.safe_grow_cleared (last_basic_block_for_fn (cfun));
> > >
> > > A sbitmap with two bits per entry would be more space efficient here.  
> > > bitmap
> > > has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience,
> > > adding the corresponding to sbitmap.h

Re: [PATCH V6] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 00:56, Robin Dapp wrote:

If the pattern is not allowed to fail, then what code enforces the bias
argument's restrictions?  I don't see it in the generic expander code.


I have no ideal since this is just copied from len_load/len_store which is
s390 target dependent stuff.

I have sent V7 patch with fixing doc by following your suggestion.



We have:

signed char
internal_len_load_store_bias (internal_fn ifn, machine_mode mode)
{
   optab optab = direct_internal_fn_optab (ifn);
   insn_code icode = direct_optab_handler (optab, mode);

   if (icode != CODE_FOR_nothing)
 {
   /* For now we only support biases of 0 or -1.  Try both of them.  */
   if (insn_operand_matches (icode, 3, GEN_INT (0)))
return 0;
   if (insn_operand_matches (icode, 3, GEN_INT (-1)))
return -1;
 }

   return VECT_PARTIAL_BIAS_UNSUPPORTED;

Ah.  That's not where I expected to find it.  Thanks for pointing it out.

Jeff


Re: [PATCH] Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, Jun 19, 2023 at 12:33 PM Toru Kisuki via Gcc-patches
 wrote:
>
> Hi,
>
>
> With -O3 -fsignaling-nans -fno-signed-zeros, compiler should not simplify 'x 
> + 0.0' to 'x'.
>

OK if you bootstrapped / tested this change.

Thanks,
Richard.

> GCC Bugzilla : Bug 110305
>
>
> gcc/ChangeLog:
>
> 2023-06-19  Toru Kisuki  
>
> * simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
>
> ---
>  gcc/simplify-rtx.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index e152918b0f1..cc96b36ad4e 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -2698,7 +2698,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
> code,
>  when x is NaN, infinite, or finite and nonzero.  They aren't
>  when x is -0 and the rounding mode is not towards -infinity,
>  since (-0) + 0 is then 0.  */
> -  if (!HONOR_SIGNED_ZEROS (mode) && trueop1 == CONST0_RTX (mode))
> +  if (!HONOR_SIGNED_ZEROS (mode) && !HONOR_SNANS (mode)
> +  && trueop1 == CONST0_RTX (mode))
> return op0;
>
>/* ((-a) + b) -> (b - a) and similarly for (a + (-b)).  These
> --
> 2.38.1
>


Re: [PATCH v2] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 01:01, juzhe.zh...@rivai.ai wrote:


LGTM

ACK for the trunk.
jeff


[PATCH] [i386] Reject too large vectors for partial vector vectorization

2023-06-19 Thread Richard Biener via Gcc-patches
The following works around the lack of the x86 backend making the
vectorizer compare the costs of the different possible vector
sizes the backed advertises through the vector_modes hook.  When
enabling masked epilogues or main loops then this means we will
select the prefered vector mode which is usually the largest even
for loops that do not iterate close to the times the vector has
lanes.  When not using masking the vectorizer would reject any
mode resulting in a VF bigger than the number of iterations
but with masking they are simply masked out.

So this overloads the finish_cost function and matches for
the problematic case, forcing a high cost to make us try a
smaller vector size.


Bootstrapped and tested on x86_64-unknown-linux-gnu.  This should
avoid regressing 525.x264_r with partial vector epilogues and
instead improves it by 25% with -march=znver4 (need to re-check
that, that was true with some earlier attempt).

This falls short of enabling cost comparison in the x86 backend
which I also considered doing for --param vect-partial-vector-usage=1
but which will also cause a much larger churn and compile-time
impact (but it should be bearable as seen with aarch64).

I've filed PR110310 for an oddity I noticed around vectorizing
epilogues, I failed to adjust things for the case in that PR.

I'm using INT_MAX to fend off the vectorizer, I wondered if
we should be able to signal that with a bool return value of
finish_cost?  Though INT_MAX seems to work fine.

Does this look reasonable?

Thanks,
Richard.

* config/i386/i386.cc (ix86_vector_costs::finish_cost):
Overload.  For masked main loops make sure the vectorization
factor isn't more than double the number of iterations.


* gcc.target/i386/vect-partial-vectors-1.c: New testcase.
* gcc.target/i386/vect-partial-vectors-2.c: Likewise.
---
 gcc/config/i386/i386.cc   | 26 +++
 .../gcc.target/i386/vect-partial-vectors-1.c  | 13 ++
 .../gcc.target/i386/vect-partial-vectors-2.c  | 12 +
 3 files changed, 51 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-partial-vectors-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-partial-vectors-2.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b20cb86b822..32851a514a9 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23666,6 +23666,7 @@ class ix86_vector_costs : public vector_costs
  stmt_vec_info stmt_info, slp_tree node,
  tree vectype, int misalign,
  vect_cost_model_location where) override;
+  void finish_cost (const vector_costs *) override;
 };
 
 /* Implement targetm.vectorize.create_costs.  */
@@ -23918,6 +23919,31 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
   return retval;
 }
 
+void
+ix86_vector_costs::finish_cost (const vector_costs *scalar_costs)
+{
+  loop_vec_info loop_vinfo = dyn_cast (m_vinfo);
+  if (loop_vinfo && !m_costing_for_scalar)
+{
+  /* We are currently not asking the vectorizer to compare costs
+between different vector mode sizes.  When using predication
+that will end up always choosing the prefered mode size even
+if there's a smaller mode covering all lanes.  Test for this
+situation and artificially reject the larger mode attempt.
+???  We currently lack masked ops for sub-SSE sized modes,
+so we could restrict this rejection to AVX and AVX512 modes
+but error on the safe side for now.  */
+  if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
+ && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+ && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+ && (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ())
+ > ceil_log2 (LOOP_VINFO_INT_NITERS (loop_vinfo
+   m_costs[vect_body] = INT_MAX;
+}
+
+  vector_costs::finish_cost (scalar_costs);
+}
+
 /* Validate target specific memory model bits in VAL. */
 
 static unsigned HOST_WIDE_INT
diff --git a/gcc/testsuite/gcc.target/i386/vect-partial-vectors-1.c 
b/gcc/testsuite/gcc.target/i386/vect-partial-vectors-1.c
new file mode 100644
index 000..3834720e8e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-partial-vectors-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mavx512vl -mprefer-vector-width=512 --param 
vect-partial-vector-usage=1" } */
+
+void foo (int * __restrict a, int *b)
+{
+  for (int i = 0; i < 4; ++i)
+a[i] = b[i] + 42;
+}
+
+/* We do not want to optimize this using masked AVX or AXV512
+   but unmasked SSE.  */
+/* { dg-final { scan-assembler-not "\[yz\]mm" } } */
+/* { dg-final { scan-assembler "xmm" } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-partial-vectors-2.c 
b/gcc/testsuite/gcc.target/i386/vect-partial-vectors-2.c
new file mode 100644
index 000..4ab2cbc4203
--- /dev/nul

RE: [PATCH v2] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64

2023-06-19 Thread Li, Pan2 via Gcc-patches
Thanks Jeff, will commit this one after the RVV float reduction PATCH (reviewed 
by Juzhe already).

Pan

-Original Message-
From: Jeff Law  
Sent: Monday, June 19, 2023 7:45 PM
To: juzhe.zh...@rivai.ai; Li, Pan2 ; gcc-patches 

Cc: Robin Dapp ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v2] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64



On 6/19/23 01:01, juzhe.zh...@rivai.ai wrote:
> 
> LGTM
ACK for the trunk.
jeff


Re: [PATCH v2] RISC-V: Fix VWEXTF iterator requirement

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 00:05, juzhe.zh...@rivai.ai wrote:

LGTM.

OK
jeff


RE: [PATCH] RISC-V: Fix out of range memory access of machine mode table

2023-06-19 Thread Li, Pan2 via Gcc-patches
Thanks Jakub for reviewing, sorry for misleading and will have a try for PATCH 
v3.

Pan

-Original Message-
From: Jakub Jelinek  
Sent: Monday, June 19, 2023 5:17 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; rdapp@gmail.com; 
jeffreya...@gmail.com; Wang, Yanzhang ; 
kito.ch...@gmail.com; rguent...@suse.de
Subject: Re: [PATCH] RISC-V: Fix out of range memory access of machine mode 
table

On Mon, Jun 19, 2023 at 05:05:48PM +0800, pan2...@intel.com wrote:
> --- a/gcc/lto-streamer-in.cc
> +++ b/gcc/lto-streamer-in.cc
> @@ -1985,7 +1985,8 @@ lto_input_mode_table (struct lto_file_decl_data 
> *file_data)
>  internal_error ("cannot read LTO mode table from %s",
>   file_data->file_name);
>  
> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
> +  unsigned char *table = ggc_cleared_vec_alloc (
> +MAX_MACHINE_MODE);

Incorrect formatting.  And, see my other mail, this is wrong.

> @@ -108,7 +108,7 @@ inline void
>  bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
>  {
>streamer_mode_table[mode] = 1;
> -  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
> +  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, mode);
>  }
>  
>  inline machine_mode
> @@ -116,7 +116,8 @@ bp_unpack_machine_mode (struct bitpack_d *bp)
>  {
>return (machine_mode)
>  ((class lto_input_block *)
> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
> + bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode,
> + MAX_MACHINE_MODE)];
>  }

And these two are wrong as well.  The value passed to bp_pack_enum
has to match the one used on bp_unpack_enum.  But that is not the case
after your changes.  You stream out with the host MAX_MACHINE_MODE, and
stream in for normal LTO with the same value (ok), but for offloading
targets (nvptx, amdgcn) with a different MAX_MACHINE_MODE.  That will
immediate result in LTO streaming being out of sync and ICEs all around.
The reason for using 1 << 8 there was exactly to make it interoperable for
offloading.  What could be perhaps done is that you stream out the
host MAX_MACHINE_MODE value somewhere and stream it in inside of
lto_input_mode_table before you allocate the table.  But, that streamed
in host max_machine_mdoe has to be remembered somewhere and used e.g. in
bp_unpack_machine_mode instead of MAX_MACHINE_MODE.

Jakub



RE: [PATCH v2] RISC-V: Bugfix for RVV float reduction in ZVE32/64

2023-06-19 Thread Li, Pan2 via Gcc-patches
Ok for trunk?

And a reminder to myself that this PATCH should be committed before the RVV 
widen reduction one.

Pan

From: 钟居哲 
Sent: Sunday, June 18, 2023 9:15 PM
To: Li, Pan2 ; gcc-patches 
Cc: rdapp.gcc ; Jeff Law ; Li, Pan2 
; Wang, Yanzhang ; kito.cheng 

Subject: Re: [PATCH v2] RISC-V: Bugfix for RVV float reduction in ZVE32/64

Thanks for fixing it for me.
LGTM now.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-18 10:57
To: gcc-patches
CC: juzhe.zhong; 
rdapp.gcc; 
jeffreyalaw; pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v2] RISC-V: Bugfix for RVV float reduction in ZVE32/64
From: Pan Li mailto:pan2...@intel.com>>

The rvv integer reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.

code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx16hf; // ZVE128+

  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf;  // ZVE64

  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf;  // ZVE32
}

Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of
the ZVE128+ instead of the ZVE32 logically.

This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32
will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32
will be returned as expectation.

Please note both GCC 13 and 14 are impacted by this issue.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
Co-Authored by: Juzhe-Zhong mailto:juzhe.zh...@rivai.ai>>

PR target/110277

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Adjust expand for
ret_mode.
* config/riscv/vector-iterators.md: Add VHF, VSF, VDF,
VHF_LMUL1, VSF_LMUL1, VDF_LMUL1, and remove unused attr.
* config/riscv/vector.md (@pred_reduc_): Removed.
(@pred_reduc_): Ditto.
(@pred_reduc_): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_): New pattern.
(@pred_reduc_): Ditto.
(@pred_reduc_): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_plus): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr110277-1.c: New test.
* gcc.target/riscv/rvv/base/pr110277-1.h: New test.
* gcc.target/riscv/rvv/base/pr110277-2.c: New test.
* gcc.target/riscv/rvv/base/pr110277-2.h: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |   5 +-
gcc/config/riscv/vector-iterators.md  | 128 +++---
gcc/config/riscv/vector.md| 363 +++---
.../gcc.target/riscv/rvv/base/pr110277-1.c|   9 +
.../gcc.target/riscv/rvv/base/pr110277-1.h|  33 ++
.../gcc.target/riscv/rvv/base/pr110277-2.c|  11 +
.../gcc.target/riscv/rvv/base/pr110277-2.h|  33 ++
7 files changed, 366 insertions(+), 216 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110277-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110277-1.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110277-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110277-2.h

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 53bd0ed2534..27545113996 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1400,8 +1400,7 @@ public:
 machine_mode ret_mode = e.ret_mode ();
 /* TODO: we will use ret_mode after all types of PR110265 are addressed.  
*/
-if ((GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)
-   || GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
+if (GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
   return e.use_exact_insn (
code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
 else
@@ -1435,7 +1434,7 @@ public:
   rtx expand (function_expander &e) const override
   {
 return e.use_exact_insn (
-  code_for_pred_reduc_plus (UNSPEC, e.vector_mode (), e.vector_mode ()));
+  code_for_pred_reduc_plus (UNSPEC, e.vector_mode (), e.ret_mode ()));
   }
};
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index e2c8ade98eb..6169116482a 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -967,6 +967,33 @@ (define_mode_iterator VDI [
   (VNx16DI "TA

Re: [PATCH v2] RISC-V: Bugfix for RVV float reduction in ZVE32/64

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/18/23 07:14, 钟居哲 wrote:

Thanks for fixing it for me.
LGTM now.

OK for the trunk.
jeff


Re: [PATCH] Improved SUBREG simplifications in simplify-rtx.cc's simplify_subreg.

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/18/23 04:22, Roger Sayle wrote:


An x86 backend improvement that I'm working results in combine attempting
to recognize:

(set (reg:DI 87 [ xD.2846 ])
  (ior:DI (subreg:DI (ashift:TI (zero_extend:TI (reg:DI 92))
(const_int 64 [0x40])) 0)
  (reg:DI 91)))

where the lowpart SUBREG has difficulty seeing through the (hi<<64)
that the lowpart must be zero.  Rather than workaround this in the
backend, the better fix is to teach simplify-rtx that
lowpart((hi<<64)|lo) -> lo and highpart((hi<<64)|lo) -> hi, so that
all backends benefit.  Reducing the number of places where the
middle-end generates a SUBREG of something other than REG is a
good thing.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures, except for pr78904-1b.c, for which a backend
solution has just been proposed.  Ok for mainline?


2023-06-18  Roger Sayle  

gcc/ChangeLog
 * simplify-rtx.cc (simplify_subreg):  Optimize lowpart SUBREGs
 of ASHIFT to const0_rtx with sufficiently large shift count.
 Optimize highpart SUBREGs of ASHIFT as the shift operand when
 the shift count is the correct offset.  Optimize SUBREGs of
 multi-word logic operations if the SUBREGs of both operands
 can be simplified.

OK
Jeff


Re: [PATCH] combine: Narrow comparison of memory and constant

2023-06-19 Thread Stefan Schulze Frielinghaus via Gcc-patches
On Mon, Jun 12, 2023 at 03:29:00PM -0600, Jeff Law wrote:
> 
> 
> On 6/12/23 01:57, Stefan Schulze Frielinghaus via Gcc-patches wrote:
> > Comparisons between memory and constants might be done in a smaller mode
> > resulting in smaller constants which might finally end up as immediates
> > instead of in the literal pool.
> > 
> > For example, on s390x a non-symmetric comparison like
> >x <= 0x3fff
> > results in the constant being spilled to the literal pool and an 8 byte
> > memory comparison is emitted.  Ideally, an equivalent comparison
> >x0 <= 0x3f
> > where x0 is the most significant byte of x, is emitted where the
> > constant is smaller and more likely to materialize as an immediate.
> > 
> > Similarly, comparisons of the form
> >x >= 0x4000
> > can be shortened into x0 >= 0x40.
> > 
> > I'm not entirely sure whether combine is the right place to implement
> > something like this.  In my first try I implemented it in
> > TARGET_CANONICALIZE_COMPARISON but then thought other targets might
> > profit from it, too.  simplify_context::simplify_relational_operation_1
> > seems to be the wrong place since code/mode may change.  Any opinions?
> > 
> > gcc/ChangeLog:
> > 
> > * combine.cc (simplify_compare_const): Narrow comparison of
> > memory and constant.
> > (try_combine): Adapt new function signature.
> > (simplify_comparison): Adapt new function signature.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/s390/cmp-mem-const-1.c: New test.
> > * gcc.target/s390/cmp-mem-const-2.c: New test.
> This does seem more general than we'd want to do in the canonicalization
> hook.  So thanks for going the extra mile and doing a generic
> implementation.
> 
> 
> 
> 
> > @@ -11987,6 +11988,79 @@ simplify_compare_const (enum rtx_code code, 
> > machine_mode mode,
> > break;
> >   }
> > +  /* Narrow non-symmetric comparison of memory and constant as e.g.
> > + x0...x7 <= 0x3fff into x0 <= 0x3f where x0 is the most
> > + significant byte.  Likewise, transform x0...x7 >= 0x4000 
> > into
> > + x0 >= 0x40.  */
> > +  if ((code == LEU || code == LTU || code == GEU || code == GTU)
> > +  && is_a  (GET_MODE (op0), &int_mode)
> > +  && MEM_P (op0)
> > +  && !MEM_VOLATILE_P (op0)
> > +  && (unsigned HOST_WIDE_INT)const_op > 0xff)
> > +{
> > +  unsigned HOST_WIDE_INT n = (unsigned HOST_WIDE_INT)const_op;
> > +  enum rtx_code adjusted_code = code;
> > +
> > +  /* If the least significant bit is already zero, then adjust the
> > +comparison in the hope that we hit cases like
> > +  op0  <= 0x3dfe
> > +where the adjusted comparison
> > +  op0  <  0x3dff
> > +can be shortened into
> > +  op0' <  0x3d.  */
> > +  if (code == LEU && (n & 1) == 0)
> > +   {
> > + ++n;
> > + adjusted_code = LTU;
> > +   }
> > +  /* or e.g. op0 < 0x4020  */
> > +  else if (code == LTU && (n & 1) == 0)
> > +   {
> > + --n;
> > + adjusted_code = LEU;
> > +   }
> > +  /* or op0 >= 0x4001  */
> > +  else if (code == GEU && (n & 1) == 1)
> > +   {
> > + --n;
> > + adjusted_code = GTU;
> > +   }
> > +  /* or op0 > 0x3fff.  */
> > +  else if (code == GTU && (n & 1) == 1)
> > +   {
> > + ++n;
> > + adjusted_code = GEU;
> > +   }
> > +
> > +  scalar_int_mode narrow_mode_iter;
> > +  bool lower_p = code == LEU || code == LTU;
> > +  bool greater_p = !lower_p;
> > +  FOR_EACH_MODE_UNTIL (narrow_mode_iter, int_mode)
> > +   {
> > + unsigned nbits = GET_MODE_PRECISION (int_mode)
> > + - GET_MODE_PRECISION (narrow_mode_iter);
> > + unsigned HOST_WIDE_INT mask = (HOST_WIDE_INT_1U << nbits) - 1;
> > + unsigned HOST_WIDE_INT lower_bits = n & mask;
> > + if ((lower_p && lower_bits == mask)
> > + || (greater_p && lower_bits == 0))
> > +   {
> > + n >>= nbits;
> > + break;
> > +   }
> > +   }
> > +
> > +  if (narrow_mode_iter < int_mode)
> > +   {
> > + poly_int64 offset = BYTES_BIG_ENDIAN
> > +   ? 0
> > +   : GET_MODE_SIZE (int_mode)
> > + - GET_MODE_SIZE (narrow_mode_iter);
> Go ahead and add some parenthesis here.  I'd add one pair around the whole
> RHS of that assignment.  The '?' and ':' would line up under the 'B' in that
> case.  Similarly add them around the false arm of the ternary.  The '-' will
> line up under the 'G'.

Done.

> 
> Going to trust you got the little endian adjustment correct here ;-)

Sadly I gave it a try on x64, aarch64, and powerpc64le and in all cases
the resulting instructions were rejected either because the costs were
higher or because the new instructions failed to match.  Thus currently I
have tested this only thoroughly on s390x.

> 
> 
> > /* Compute some predicates 

Re: ping: [PATCH] libcpp: Improve location for macro names [PR66290]

2023-06-19 Thread Lewis Hyatt via Gcc-patches
May I please ping this one? FWIW, it's 10 months old now without any feedback.
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607647.html

Most of the changes are just adapting the testsuite to look for the
improved diagnostic location. Otherwise it's a handful of lines in
libcpp and it just changes this:

t.cpp:1: warning: macro "X" is not used [-Wunused-macros]
1 | #define X 1
  |

to this:

t.cpp:1:9: warning: macro "X" is not used [-Wunused-macros]
1 | #define X 1
  | ^

which closes out PR66290. Thank you!

-Lewis

On Thu, Jan 12, 2023 at 6:31 PM Lewis Hyatt  wrote:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607647.html
> May I please ping this one again? It will enable closing out the PR. Thanks!
>
> -Lewis
>
> On Thu, Dec 1, 2022 at 9:22 AM Lewis Hyatt  wrote:
> >
> > Hello-
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
> >
> > May I please ping this one? Thanks!
> > I have also re-attached the rebased patch here.
> >
> > -Lewis
> >
> > On Wed, Oct 12, 2022 at 06:37:50PM -0400, Lewis Hyatt wrote:
> > > Hello-
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
> > >
> > > Since Jeff was kind enough to ack one of my other preprocessor patches
> > > today, I have become emboldened to ping this one again too :). Would
> > > anyone have some time to take a look at it please? Thanks!
> > >
> > > -Lewis
> > >
> > > On Thu, Sep 15, 2022 at 6:31 PM Lewis Hyatt  wrote:
> > > >
> > > > Hello-
> > > >
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
> > > > May I please ping this patch? Thank you.
> > > >
> > > > -Lewis
> > > >
> > > > On Fri, Aug 5, 2022 at 12:14 PM Lewis Hyatt  wrote:
> > > > >
> > > > >
> > > > > When libcpp reports diagnostics whose locus is a macro name (such as 
> > > > > for
> > > > > -Wunused-macros), it uses the location in the cpp_macro object that 
> > > > > was
> > > > > stored by _cpp_new_macro. This is currently set to 
> > > > > pfile->directive_line,
> > > > > which contains the line number only and no column information. This 
> > > > > patch
> > > > > changes the stored location to the src_loc for the token defining the 
> > > > > macro
> > > > > name, which includes the location and range information.
> > > > >
> > > > > libcpp/ChangeLog:
> > > > >
> > > > > PR c++/66290
> > > > > * macro.cc (_cpp_create_definition): Add location argument.
> > > > > * internal.h (_cpp_create_definition): Adjust prototype.
> > > > > * directives.cc (do_define): Pass new location argument to
> > > > > _cpp_create_definition.
> > > > > (do_undef): Stop passing inferior location to 
> > > > > cpp_warning_with_line;
> > > > > the default from cpp_warning is better.
> > > > > (cpp_pop_definition): Pass new location argument to
> > > > > _cpp_create_definition.
> > > > > * pch.cc (cpp_read_state): Likewise.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > PR c++/66290
> > > > > * c-c++-common/cpp/macro-ranges.c: New test.
> > > > > * c-c++-common/cpp/line-2.c: Adapt to check for column 
> > > > > information
> > > > > on macro-related libcpp warnings.
> > > > > * c-c++-common/cpp/line-3.c: Likewise.
> > > > > * c-c++-common/cpp/macro-arg-count-1.c: Likewise.
> > > > > * c-c++-common/cpp/pr58844-1.c: Likewise.
> > > > > * c-c++-common/cpp/pr58844-2.c: Likewise.
> > > > > * c-c++-common/cpp/warning-zero-location.c: Likewise.
> > > > > * c-c++-common/pragma-diag-14.c: Likewise.
> > > > > * c-c++-common/pragma-diag-15.c: Likewise.
> > > > > * g++.dg/modules/macro-2_d.C: Likewise.
> > > > > * g++.dg/modules/macro-4_d.C: Likewise.
> > > > > * g++.dg/modules/macro-4_e.C: Likewise.
> > > > > * g++.dg/spellcheck-macro-ordering.C: Likewise.
> > > > > * gcc.dg/builtin-redefine.c: Likewise.
> > > > > * gcc.dg/cpp/Wunused.c: Likewise.
> > > > > * gcc.dg/cpp/redef2.c: Likewise.
> > > > > * gcc.dg/cpp/redef3.c: Likewise.
> > > > > * gcc.dg/cpp/redef4.c: Likewise.
> > > > > * gcc.dg/cpp/ucnid-11-utf8.c: Likewise.
> > > > > * gcc.dg/cpp/ucnid-11.c: Likewise.
> > > > > * gcc.dg/cpp/undef2.c: Likewise.
> > > > > * gcc.dg/cpp/warn-redefined-2.c: Likewise.
> > > > > * gcc.dg/cpp/warn-redefined.c: Likewise.
> > > > > * gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
> > > > > * gcc.dg/cpp/warn-unused-macros.c: Likewise.
> > > > > ---
> > > > >
> > > > > Notes:
> > > > > Hello-
> > > > >
> > > > > The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66290) was 
> > > > > originally
> > > > > about the entirely wrong location for -Wunused-macros in C++ 
> > > > > mode, which
> > > > > behavior was fixed by r13-1903, but before closing it out I 
> > > > > wanted to also
> > > >

RE: [PATCH v2] RISC-V: Bugfix for RVV float reduction in ZVE32/64

2023-06-19 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Monday, June 19, 2023 9:51 PM
To: 钟居哲 ; Li, Pan2 ; gcc-patches 

Cc: rdapp.gcc ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v2] RISC-V: Bugfix for RVV float reduction in ZVE32/64



On 6/18/23 07:14, 钟居哲 wrote:
> Thanks for fixing it for me.
> LGTM now.
OK for the trunk.
jeff


[PATCH v2] combine: Narrow comparison of memory and constant

2023-06-19 Thread Stefan Schulze Frielinghaus via Gcc-patches
Comparisons between memory and constants might be done in a smaller mode
resulting in smaller constants which might finally end up as immediates
instead of in the literal pool.

For example, on s390x a non-symmetric comparison like
  x <= 0x3fff
results in the constant being spilled to the literal pool and an 8 byte
memory comparison is emitted.  Ideally, an equivalent comparison
  x0 <= 0x3f
where x0 is the most significant byte of x, is emitted where the
constant is smaller and more likely to materialize as an immediate.

Similarly, comparisons of the form
  x >= 0x4000
can be shortened into x0 >= 0x40.

Bootstrapped and regtested on s390x, x64, aarch64, and powerpc64le.
Note, the new tests show that for the mentioned little-endian targets
the optimization does not materialize since either the costs of the new
instructions are higher or they do not match.  Still ok for mainline?

gcc/ChangeLog:

* combine.cc (simplify_compare_const): Narrow comparison of
memory and constant.
(try_combine): Adapt new function signature.
(simplify_comparison): Adapt new function signature.

gcc/testsuite/ChangeLog:

* gcc.dg/cmp-mem-const-1.c: New test.
* gcc.dg/cmp-mem-const-2.c: New test.
* gcc.dg/cmp-mem-const-3.c: New test.
* gcc.dg/cmp-mem-const-4.c: New test.
* gcc.dg/cmp-mem-const-5.c: New test.
* gcc.dg/cmp-mem-const-6.c: New test.
* gcc.target/s390/cmp-mem-const-1.c: New test.
---
 gcc/combine.cc| 79 +--
 gcc/testsuite/gcc.dg/cmp-mem-const-1.c| 17 
 gcc/testsuite/gcc.dg/cmp-mem-const-2.c| 17 
 gcc/testsuite/gcc.dg/cmp-mem-const-3.c| 17 
 gcc/testsuite/gcc.dg/cmp-mem-const-4.c| 17 
 gcc/testsuite/gcc.dg/cmp-mem-const-5.c| 17 
 gcc/testsuite/gcc.dg/cmp-mem-const-6.c| 17 
 .../gcc.target/s390/cmp-mem-const-1.c | 24 ++
 8 files changed, 200 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-1.c
 create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-2.c
 create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-3.c
 create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-4.c
 create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-5.c
 create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-6.c
 create mode 100644 gcc/testsuite/gcc.target/s390/cmp-mem-const-1.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 5aa0ec5c45a..56e15a93409 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -460,7 +460,7 @@ static rtx simplify_shift_const (rtx, enum rtx_code, 
machine_mode, rtx,
 static int recog_for_combine (rtx *, rtx_insn *, rtx *);
 static rtx gen_lowpart_for_combine (machine_mode, rtx);
 static enum rtx_code simplify_compare_const (enum rtx_code, machine_mode,
-rtx, rtx *);
+rtx *, rtx *);
 static enum rtx_code simplify_comparison (enum rtx_code, rtx *, rtx *);
 static void update_table_tick (rtx);
 static void record_value_for_reg (rtx, rtx_insn *, rtx);
@@ -3185,7 +3185,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
  compare_code = orig_compare_code = GET_CODE (*cc_use_loc);
  if (is_a  (GET_MODE (i2dest), &mode))
compare_code = simplify_compare_const (compare_code, mode,
-  op0, &op1);
+  &op0, &op1);
  target_canonicalize_comparison (&compare_code, &op0, &op1, 1);
}
 
@@ -11796,13 +11796,14 @@ gen_lowpart_for_combine (machine_mode omode, rtx x)
(CODE OP0 const0_rtx) form.
 
The result is a possibly different comparison code to use.
-   *POP1 may be updated.  */
+   *POP0 and *POP1 may be updated.  */
 
 static enum rtx_code
 simplify_compare_const (enum rtx_code code, machine_mode mode,
-   rtx op0, rtx *pop1)
+   rtx *pop0, rtx *pop1)
 {
   scalar_int_mode int_mode;
+  rtx op0 = *pop0;
   HOST_WIDE_INT const_op = INTVAL (*pop1);
 
   /* Get the constant we are comparing against and turn off all bits
@@ -11987,6 +11988,74 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
   break;
 }
 
+  /* Narrow non-symmetric comparison of memory and constant as e.g.
+ x0...x7 <= 0x3fff into x0 <= 0x3f where x0 is the most
+ significant byte.  Likewise, transform x0...x7 >= 0x4000 into
+ x0 >= 0x40.  */
+  if ((code == LEU || code == LTU || code == GEU || code == GTU)
+  && is_a  (GET_MODE (op0), &int_mode)
+  && MEM_P (op0)
+  && !MEM_VOLATILE_P (op0)
+  /* The optimization makes only sense for constants which are big enough
+so that we have a chance to chop off something at all.  */
+  && (unsigned HOST_WIDE_INT) const_op > 0xff
+  /* Ensure that we do not overflow during

RE: [PATCH v2] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64

2023-06-19 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Monday, June 19, 2023 7:45 PM
To: juzhe.zh...@rivai.ai; Li, Pan2 ; gcc-patches 

Cc: Robin Dapp ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v2] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64



On 6/19/23 01:01, juzhe.zh...@rivai.ai wrote:
> 
> LGTM
ACK for the trunk.
jeff


RE: [PATCH v2] RISC-V: Fix VWEXTF iterator requirement

2023-06-19 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Monday, June 19, 2023 9:10 PM
To: juzhe.zh...@rivai.ai; Li Xu ; gcc-patches 

Cc: kito.cheng ; palmer 
Subject: Re: [PATCH v2] RISC-V: Fix VWEXTF iterator requirement



On 6/19/23 00:05, juzhe.zh...@rivai.ai wrote:
> LGTM.
OK
jeff


[committed] recog: Change return type of predicate functions from int to bool

2023-06-19 Thread Uros Bizjak via Gcc-patches
Also change some internal variables to bool and change return type of
split_all_insns_noflow to void.

gcc/ChangeLog:

* recog.h (check_asm_operands): Change return type from int to bool.
(insn_invalid_p): Ditto.
(verify_changes): Ditto.
(apply_change_group): Ditto.
(constrain_operands): Ditto.
(constrain_operands_cached): Ditto.
(validate_replace_rtx_subexp): Ditto.
(validate_replace_rtx): Ditto.
(validate_replace_rtx_part): Ditto.
(validate_replace_rtx_part_nosimplify): Ditto.
(added_clobbers_hard_reg_p): Ditto.
(peep2_regno_dead_p): Ditto.
(peep2_reg_dead_p): Ditto.
(store_data_bypass_p): Ditto.
(if_test_bypass_p): Ditto.
* rtl.h (split_all_insns_noflow): Change
return type from unsigned int to void.
* genemit.cc (output_added_clobbers_hard_reg_p): Change return type
of generated added_clobbers_hard_reg_p from int to bool and adjust
function body accordingly.  Change "used" variable type from
int to bool.
* recog.cc (check_asm_operands): Change return type
from int to bool and adjust function body accordingly.
(insn_invalid_p): Ditto.  Change "is_asm" variable to bool.
(verify_changes): Change return type from int to bool.
(apply_change_group): Change return type from int to bool
and adjust function body accordingly.
(validate_replace_rtx_subexp): Change return type from int to bool.
(validate_replace_rtx): Ditto.
(validate_replace_rtx_part): Ditto.
(validate_replace_rtx_part_nosimplify): Ditto.
(constrain_operands_cached): Ditto.
(constrain_operands): Ditto.  Change "lose" and "win"
variables type from int to bool.
(split_all_insns_noflow): Change return type from unsigned int
to void and adjust function body accordingly.
(peep2_regno_dead_p): Change return type from int to bool.
(peep2_reg_dead_p): Ditto.
(peep2_find_free_register): Change "success"
variable type from int to bool
(store_data_bypass_p_1): Change return type from int to bool.
(store_data_bypass_p): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index 33c9ec05d6f..1ce0564076d 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -688,26 +688,27 @@ output_added_clobbers_hard_reg_p (void)
 {
   struct clobber_pat *clobber;
   struct clobber_ent *ent;
-  int clobber_p, used;
+  int clobber_p;
+  bool used;
 
-  printf ("\n\nint\nadded_clobbers_hard_reg_p (int insn_code_number)\n");
+  printf ("\n\nbool\nadded_clobbers_hard_reg_p (int insn_code_number)\n");
   printf ("{\n");
   printf ("  switch (insn_code_number)\n");
   printf ("{\n");
 
   for (clobber_p = 0; clobber_p <= 1; clobber_p++)
 {
-  used = 0;
+  used = false;
   for (clobber = clobber_list; clobber; clobber = clobber->next)
if (clobber->has_hard_reg == clobber_p)
  for (ent = clobber->insns; ent; ent = ent->next)
{
  printf ("case %d:\n", ent->code_number);
- used++;
+ used = true;
}
 
   if (used)
-   printf ("  return %d;\n\n", clobber_p);
+   printf ("  return %s;\n\n", clobber_p ? "true" : "false");
 }
 
   printf ("default:\n");
diff --git a/gcc/recog.cc b/gcc/recog.cc
index fd09145d45e..37432087812 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -133,7 +133,7 @@ asm_labels_ok (rtx body)
 /* Check that X is an insn-body for an `asm' with operands
and that the operands mentioned in it are legitimate.  */
 
-int
+bool
 check_asm_operands (rtx x)
 {
   int noperands;
@@ -142,7 +142,7 @@ check_asm_operands (rtx x)
   int i;
 
   if (!asm_labels_ok (x))
-return 0;
+return false;
 
   /* Post-reload, be more strict with things.  */
   if (reload_completed)
@@ -156,9 +156,9 @@ check_asm_operands (rtx x)
 
   noperands = asm_noperands (x);
   if (noperands < 0)
-return 0;
+return false;
   if (noperands == 0)
-return 1;
+return true;
 
   operands = XALLOCAVEC (rtx, noperands);
   constraints = XALLOCAVEC (const char *, noperands);
@@ -171,10 +171,10 @@ check_asm_operands (rtx x)
   if (c[0] == '%')
c++;
   if (! asm_operand_ok (operands[i], c, constraints))
-   return 0;
+   return false;
 }
 
-  return 1;
+  return true;
 }
 
 /* Static data for the next two routines.  */
@@ -212,8 +212,8 @@ static int temporarily_undone_changes = 0;
 
If IN_GROUP is zero, this is a single change.  Try to recognize the insn
or validate the memory reference with the change applied.  If the result
-   is not valid for the machine, suppress the change and return zero.
-   Otherwise, perform the change and return 1.  */
+   is not valid for the machine, suppress the change and return false.
+   Otherwise, perform the change and return true.  */
 
 static bool
 validate_change_1 (rtx object, rtx *loc, rtx new_rtx, bool in_group,
@@ -232,7 +232,7 @@ validate_change_1 (rtx 

Re: [libstdc++] Improve M_check_len

2023-06-19 Thread Jonathan Wakely via Gcc-patches
On Mon, 19 Jun 2023 at 12:20, Jakub Jelinek wrote:

> On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches
> wrote:
> > - if (max_size() - size() < __n)
> > -   __throw_length_error(__N(__s));
> > + const size_type __max_size = max_size();
> > + // On 64bit systems vectors can not reach overflow by growing
> > + // by small sizes; before this happens, we will run out of memory.
> > + if (__builtin_constant_p(__n)
> > + && __builtin_constant_p(__max_size)
> > + && sizeof(ptrdiff_t) >= 8
> > + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60)
>
> Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation?
>

For std::allocator, no, because max_size() is size_t(-1) / sizeof(_Tp). But
for a user-defined allocator that has a silly max_size(), yes, that's
possible.

I still don't really understand why any change is needed here. The PR says
that the current _M_check_len brings in the EH code, but how/why does that
happen? The __throw_length_error function is not inline, it's defined in
libstdc++.so, so why isn't it just an extern call? Is the problem that it
makes _M_check_len potentially-throwing? Because that's basically the
entire point of _M_check_len: to throw the exception that is required by
the C++ standard. We need to be very careful about removing that required
throw! And after we call _M_check_len we call allocate unconditionally, so
_M_realloc_insert can always throw (we only call _M_realloc_insert in the
case where we've already decided a reallocation is definitely needed).

Would this version of _M_check_len help?

  size_type
  _M_check_len(size_type __n, const char* __s) const
  {
const size_type __size = size();
const size_type __max_size = max_size();

if (__is_same(allocator_type, allocator<_Tp>)
  && __size > __max_size / 2)
  __builtin_unreachable(); // Assume std::allocator can't fill
memory.
else if (__size > __max_size)
  __builtin_unreachable();

if (__max_size - __size < __n)
  __throw_length_error(__N(__s));

const size_type __len = __size + (std::max)(__size, __n);
return (__len < __size || __len > __max_size) ? __max_size : __len;
  }

This only applies to std::allocator, not user-defined allocators (because
we don't know their semantics). It also seems like less of a big hack!


Re: [libstdc++] Improve M_check_len

2023-06-19 Thread Jonathan Wakely via Gcc-patches
P.S. please CC libstd...@gcc.gnu.org for all libstdc++ patches.

On Mon, 19 Jun 2023 at 16:13, Jonathan Wakely  wrote:

> On Mon, 19 Jun 2023 at 12:20, Jakub Jelinek wrote:
>
>> On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches
>> wrote:
>> > - if (max_size() - size() < __n)
>> > -   __throw_length_error(__N(__s));
>> > + const size_type __max_size = max_size();
>> > + // On 64bit systems vectors can not reach overflow by growing
>> > + // by small sizes; before this happens, we will run out of memory.
>> > + if (__builtin_constant_p(__n)
>> > + && __builtin_constant_p(__max_size)
>> > + && sizeof(ptrdiff_t) >= 8
>> > + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60)
>>
>> Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation?
>>
>
> For std::allocator, no, because max_size() is size_t(-1) / sizeof(_Tp).
> But for a user-defined allocator that has a silly max_size(), yes, that's
> possible.
>
> I still don't really understand why any change is needed here. The PR says
> that the current _M_check_len brings in the EH code, but how/why does that
> happen? The __throw_length_error function is not inline, it's defined in
> libstdc++.so, so why isn't it just an extern call? Is the problem that it
> makes _M_check_len potentially-throwing? Because that's basically the
> entire point of _M_check_len: to throw the exception that is required by
> the C++ standard. We need to be very careful about removing that required
> throw! And after we call _M_check_len we call allocate unconditionally, so
> _M_realloc_insert can always throw (we only call _M_realloc_insert in the
> case where we've already decided a reallocation is definitely needed).
>
> Would this version of _M_check_len help?
>
>   size_type
>   _M_check_len(size_type __n, const char* __s) const
>   {
> const size_type __size = size();
> const size_type __max_size = max_size();
>
> if (__is_same(allocator_type, allocator<_Tp>)
>   && __size > __max_size / 2)
>   __builtin_unreachable(); // Assume std::allocator can't fill
> memory.
> else if (__size > __max_size)
>   __builtin_unreachable();
>
> if (__max_size - __size < __n)
>   __throw_length_error(__N(__s));
>
> const size_type __len = __size + (std::max)(__size, __n);
> return (__len < __size || __len > __max_size) ? __max_size : __len;
>   }
>
> This only applies to std::allocator, not user-defined allocators (because
> we don't know their semantics). It also seems like less of a big hack!
>
>
>


Re: [libstdc++] Improve M_check_len

2023-06-19 Thread Jonathan Wakely via Gcc-patches
On Mon, 19 Jun 2023 at 16:13, Jonathan Wakely  wrote:

> On Mon, 19 Jun 2023 at 12:20, Jakub Jelinek wrote:
>
>> On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches
>> wrote:
>> > - if (max_size() - size() < __n)
>> > -   __throw_length_error(__N(__s));
>> > + const size_type __max_size = max_size();
>> > + // On 64bit systems vectors can not reach overflow by growing
>> > + // by small sizes; before this happens, we will run out of memory.
>> > + if (__builtin_constant_p(__n)
>> > + && __builtin_constant_p(__max_size)
>> > + && sizeof(ptrdiff_t) >= 8
>> > + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60)
>>
>> Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation?
>>
>
> For std::allocator, no, because max_size() is size_t(-1) / sizeof(_Tp).
> But for a user-defined allocator that has a silly max_size(), yes, that's
> possible.
>
> I still don't really understand why any change is needed here. The PR says
> that the current _M_check_len brings in the EH code, but how/why does that
> happen? The __throw_length_error function is not inline, it's defined in
> libstdc++.so, so why isn't it just an extern call? Is the problem that it
> makes _M_check_len potentially-throwing? Because that's basically the
> entire point of _M_check_len: to throw the exception that is required by
> the C++ standard. We need to be very careful about removing that required
> throw! And after we call _M_check_len we call allocate unconditionally, so
> _M_realloc_insert can always throw (we only call _M_realloc_insert in the
> case where we've already decided a reallocation is definitely needed).
>
> Would this version of _M_check_len help?
>
>   size_type
>   _M_check_len(size_type __n, const char* __s) const
>   {
> const size_type __size = size();
> const size_type __max_size = max_size();
>
> if (__is_same(allocator_type, allocator<_Tp>)
>   && __size > __max_size / 2)
>

This check is wrong for C++17 and older standards, because max_size()
changed value in C++20.

In C++17 it was PTRDIFF_MAX / sizeof(T) but in C++20 it's SIZE_MAX /
sizeof(T). So on 32-bit targets using C++17, it's possible a std::vector
could use PTRDIFF_MAX/2 bytes, and then the size <= max_size/2 assumption
would not hold.


[PATCH] rs6000, __builtin_set_fpscr_rn add retrun value

2023-06-19 Thread Carl Love via Gcc-patches
GCC maintainers:


The GLibC team requested a builtin to replace the mffscrn and mffscrniinline 
asm instructions in the GLibC code.  Previously there was discussion on adding 
builtins for the mffscrn instructions.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html

In the end, it was felt that it would be to extend the existing
__builtin_set_fpscr_rn builtin to return a double instead of a void
type.  The desire is that we could have the functionality of the
mffscrn and mffscrni instructions on older ISAs.  The two instructions
were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
needed functionality to set the RN field using the mffscrn and mffscrni
instructions if ISA 3.0 is supported or fall back to using logical
instructions to mask and set the bits for earlier ISAs.  The
instructions return the current value of the FPSCR fields DRN, VE, OE,
UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
the new RN value provided.

The current __builtin_set_fpscr_rn builtin has a return type of void. 
So, changing the return type to double and returning the  FPSCR fields
DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
functionally equivalent of the mffscrn and mffscrni instructions.  Any
current uses of the builtin would just ignore the return value yet any
new uses could use the return value.  So the requirement is for the
change to the __builtin_set_fpscr_rn builtin to be backwardly
compatible and work for all ISAs.

The following patch changes the return type of the
 __builtin_set_fpscr_rn builtin from void to double.  The return value
is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
XE, NI, RN bit positions when the builtin is called.  The builtin then
updated the RN field with the new value provided as an argument to the
builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
check that the builtin returns the current value of the FPSCR fields
and then updates the RN field.

The GLibC team has reviewed the patch to make sure it met their needs
as a drop in replacement for the inline asm mffscr and mffscrni
statements in the GLibC code.  T

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl 


rs6000, __builtin_set_fpscr_rn add retrun value

Change the return value from void to double.  The return value consists of
the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit positions.  Add an
overloaded version which accepts a double argument.

The test powerpc/test_fpscr_rn_builtin.c is updated to add tests for the
double reterun value and the new double argument.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Delete.
(__builtin_set_fpscr_rn_i): New builtin definition.
(__builtin_set_fpscr_rn_d): New builtin definition.
* config/rs6000/rs6000-overload.def (__builtin_set_fpscr_rn): New
overloaded definition.
* config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New
define_expand.
(rs6000_update_fpscr_rn_field): New define_expand.
(rs6000_set_fpscr_rn_d): New define expand.
(rs6000_set_fpscr_rn_i): Renamed from rs6000_set_fpscr_rn, Added
return argument.  Updated to use new rs6000_get_fpscr_fields and
rs6000_update_fpscr_rn_field define _expands.
* doc/extend.texi (__builtin_set_fpscr_rn): Update description for
the return value and new double argument.

gcc/testsuite/ChangeLog:
gcc.target/powerpc/test_fpscr_rn_builtin.c: Add new tests th check
double return value.  Add tests for overloaded double argument.
re
---
 gcc/config/rs6000/rs6000-builtins.def |   7 +-
 gcc/config/rs6000/rs6000-overload.def |   6 +
 gcc/config/rs6000/rs6000.md   | 122 ---
 gcc/doc/extend.texi   |  25 ++-
 .../powerpc/test_fpscr_rn_builtin.c   | 143 +-
 5 files changed, 262 insertions(+), 41 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 289a37998b1..30e0b0bb06d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -237,8 +237,11 @@
   const __ibm128 __builtin_pack_ibm128 (double, double);
 PACK_IF packif {ibm128}
 
-  void __builtin_set_fpscr_rn (const int[0,3]);
-SET_FPSCR_RN rs6000_set_fpscr_rn {nosoft}
+  double __builtin_set_fpscr_rn_i (const int[0,3]);
+SET_FPSCR_RN_I rs6000_set_fpscr_rn_i {nosoft}
+
+  double __builtin_set_fpscr_rn_d (double);
+SET_FPSCR_RN_D rs6000_set_fpscr_rn_d {nosoft}
 
   const double __builtin_unpack_ibm128 (__ibm128, const int<1>);
 UNPACK_IF unpackif {ibm128}
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..bb

Re: [libstdc++] Improve M_check_len

2023-06-19 Thread Jan Hubicka via Gcc-patches
> On Mon, 19 Jun 2023 at 12:20, Jakub Jelinek wrote:
> 
> > On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches
> > wrote:
> > > - if (max_size() - size() < __n)
> > > -   __throw_length_error(__N(__s));
> > > + const size_type __max_size = max_size();
> > > + // On 64bit systems vectors can not reach overflow by growing
> > > + // by small sizes; before this happens, we will run out of memory.
> > > + if (__builtin_constant_p(__n)
> > > + && __builtin_constant_p(__max_size)
> > > + && sizeof(ptrdiff_t) >= 8
> > > + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60)
> >
> > Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation?
> >
> 
> For std::allocator, no, because max_size() is size_t(-1) / sizeof(_Tp). But
> for a user-defined allocator that has a silly max_size(), yes, that's
> possible.
> 
> I still don't really understand why any change is needed here. The PR says
> that the current _M_check_len brings in the EH code, but how/why does that
> happen? The __throw_length_error function is not inline, it's defined in
> libstdc++.so, so why isn't it just an extern call? Is the problem that it

It is really quite interesting peformance problem which does affect real
code. Extra extern call counts (especially since it is seen as
3 calls by inliner).  

This is _M_check_len after early optimizations (so as seen by inline
heuristics):

   [local count: 1073741824]:
  _15 = this_7(D)->D.26656._M_impl.D.25963._M_finish;
  _14 = this_7(D)->D.26656._M_impl.D.25963._M_start;
  _13 = _15 - _14;
  _10 = _13 /[ex] 8;
  _8 = (long unsigned int) _10;
  _1 = 1152921504606846975 - _8;
  __n.3_2 = __n;
  if (_1 < __n.3_2)
goto ; [0.04%]
  else
goto ; [99.96%]

   [local count: 429496]:
  std::__throw_length_error (__s_16(D));

   [local count: 1073312329]:
  D.27696 = _8;
  if (__n.3_2 > _8)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 364926196]:

   [local count: 1073312330]:
  # _18 = PHI <&D.27696(4), &__n(5)>
  _3 = *_18;
  __len_11 = _3 + _8;
  D.27696 ={v} {CLOBBER(eol)};
  if (_8 > __len_11)
goto ; [35.00%]
  else
goto ; [65.00%]

   [local count: 697653013]:
  _5 = MIN_EXPR <__len_11, 1152921504606846975>;

   [local count: 1073312330]:
  # iftmp.4_4 = PHI <1152921504606846975(6), _5(7)>
  return iftmp.4_4;

So a lot of code that is essnetially semantically equivalent to:

   return __size + MAX_EXPR (__n, __size)

at least with the default allocator.

Early inliner decides that it is not good idea to early inline. 
At this stage we inline mostly calls where we expect code to get
smaller after inlining and since the function contains another
uninlinable call, this does not seem likely.

With -O3 we will inline it later at IPA stage, but only when the code is
considered hot. 
With -O2 we decide to keep it offline if the unit contians multiple
calls to the function otherwise we inline it since it wins in the code
size estimation model.

The problem is that _M_check_len is used by _M_realloc_insert that later
feeds result to the allocator.  There is extra redundancy since
allocator can call std::__throw_bad_array_new_length and 
std::__throw_bad_alloc for bad sizes, but _M_check_len will not produce
them which is something we won't work out before inlning it.

As a result _M_realloc_insert is seen as very large function by
inliner heuristics (71 instructions).  Functions that are not
declared inline are inlined if smaller than 15 instructions with -O2
and 30 instructions with -O3. So we don't inline.

This hurts common lops that use vector as a stack and calls push_back
in internal loop. Not inlining prevents SRA and we end up saving and
loading the end of vector pointer on every iteration of the loop.

The following testcase:

typedef unsigned int uint32_t;
std::pair pair;
void
test()
{
std::vector> stack;
stack.push_back (pair);
while (!stack.empty()) {
std::pair cur = stack.back();
stack.pop_back();
if (!cur.first)
{
cur.second++;
stack.push_back (cur);
}
if (cur.second > 1)
break;
}
}
int
main()
{
for (int i = 0; i < 1; i++)
  test();
}

Runs for me 0.5s with _M_realoc_insert not inlined and 0.048s with
_M_realloc_insert inlined.  Clang inlines it even at -O2 and does
0.063s.  I believe it is the reason why jpegxl library is slower
when built with GCC and since such loops are quite common in say
DFS walk, I think it is frequent problem.
> makes _M_check_len potentially-throwing? Because that's basically the
> entire point of _M_check_len: to throw the exception that is required by
> the C++ standard. We need to be very careful about removing that required
> throw! And after we call _M_check_len we call allocate unconditionally, so
> _M_realloc_insert can always throw (w

[PATCH] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-19 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch is apply LEN_MASK_{LOAD,STORE} into vectorizer.
I refactor gimple IR build to make codes look cleaner.

gcc/ChangeLog:

* internal-fn.cc (expand_partial_store_optab_fn): Add 
LEN_MASK_{LOAD,STORE} vectorizer support.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
(internal_len_load_store_bias): Ditto.
* optabs-query.cc (can_vec_mask_load_store_p): Ditto.
(get_len_load_store_mode): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(get_all_ones_mask): New function.
(vectorizable_store): Add LEN_MASK_{LOAD,STORE} vectorizer support.
(vectorizable_load): Ditto.

---
 gcc/internal-fn.cc |  35 +-
 gcc/optabs-query.cc|  25 +++-
 gcc/tree-vect-stmts.cc | 259 +
 3 files changed, 213 insertions(+), 106 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c911ae790cb..e10c21de5f1 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
  * OPTAB.  */
 
 static void
-expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
optab)
 {
   class expand_operand ops[5];
   tree type, lhs, rhs, maskt, biast;
@@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
   insn_code icode;
 
   maskt = gimple_call_arg (stmt, 2);
-  rhs = gimple_call_arg (stmt, 3);
+  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
   type = TREE_TYPE (rhs);
   lhs = expand_call_mem_ref (type, stmt, 0);
 
@@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
 case IFN_GATHER_LOAD:
 case IFN_MASK_GATHER_LOAD:
 case IFN_LEN_LOAD:
+case IFN_LEN_MASK_LOAD:
   return true;
 
 default:
@@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
 case IFN_SCATTER_STORE:
 case IFN_MASK_SCATTER_STORE:
 case IFN_LEN_STORE:
+case IFN_LEN_MASK_STORE:
   return true;
 
 default:
@@ -4494,6 +4496,10 @@ internal_fn_mask_index (internal_fn fn)
 case IFN_MASK_STORE_LANES:
   return 2;
 
+case IFN_LEN_MASK_LOAD:
+case IFN_LEN_MASK_STORE:
+  return 3;
+
 case IFN_MASK_GATHER_LOAD:
 case IFN_MASK_SCATTER_STORE:
   return 4;
@@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
 case IFN_LEN_STORE:
   return 3;
 
+case IFN_LEN_MASK_STORE:
+  return 4;
+
 default:
   return -1;
 }
@@ -4583,13 +4592,31 @@ internal_len_load_store_bias (internal_fn ifn, 
machine_mode mode)
 {
   optab optab = direct_internal_fn_optab (ifn);
   insn_code icode = direct_optab_handler (optab, mode);
+  int bias_argno = 3;
+  if (icode == CODE_FOR_nothing)
+{
+  machine_mode mask_mode
+   = targetm.vectorize.get_mask_mode (mode).require ();
+  if (ifn == IFN_LEN_LOAD)
+   {
+ /* Try LEN_MASK_LOAD.  */
+ optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
+   }
+  else
+   {
+ /* Try LEN_MASK_STORE.  */
+ optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
+   }
+  icode = convert_optab_handler (optab, mode, mask_mode);
+  bias_argno = 4;
+}
 
   if (icode != CODE_FOR_nothing)
 {
   /* For now we only support biases of 0 or -1.  Try both of them.  */
-  if (insn_operand_matches (icode, 3, GEN_INT (0)))
+  if (insn_operand_matches (icode, bias_argno, GEN_INT (0)))
return 0;
-  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
+  if (insn_operand_matches (icode, bias_argno, GEN_INT (-1)))
return -1;
 }
 
diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
index 276f8408dd7..4394d391200 100644
--- a/gcc/optabs-query.cc
+++ b/gcc/optabs-query.cc
@@ -566,11 +566,14 @@ can_vec_mask_load_store_p (machine_mode mode,
   bool is_load)
 {
   optab op = is_load ? maskload_optab : maskstore_optab;
+  optab len_op = is_load ? len_maskload_optab : len_maskstore_optab;
   machine_mode vmode;
 
   /* If mode is vector mode, check it directly.  */
   if (VECTOR_MODE_P (mode))
-return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing;
+return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing
+  || convert_optab_handler (len_op, mode, mask_mode)
+   != CODE_FOR_nothing;
 
   /* Otherwise, return true if there is some vector mode with
  the mask load/store supported.  */
@@ -584,7 +587,9 @@ can_vec_mask_load_store_p (machine_mode mode,
   vmode = targetm.vectorize.preferred_simd_mode (smode);
   if (VECTOR_MODE_P (vmode)
   && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode)
-  && convert_optab_h

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Thiago Jung Bauermann via Gcc-patches


Hello Manolis,

Philipp Tomsich  writes:

> On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
>>
>> On 5/25/23 06:35, Manolis Tsamis wrote:
>> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
>> > in all cases, due to maybe_mode_change returning NULL. Relax this
>> > restriction and allow propagation when no mode change is requested.
>> >
>> > gcc/ChangeLog:
>> >
>> >  * regcprop.cc (maybe_mode_change): Enable stack pointer 
>> > propagation.
>> Thanks for the clarification.  This is OK for the trunk.  It looks
>> generic enough to have value going forward now rather than waiting.
>
> Rebased, retested, and applied to trunk.  Thanks!

Our CI found a couple of tests that started failing on aarch64-linux
after this commit. I was able to confirm manually that they don't happen
in the commit immediately before this one, and also that these failures
are still present in today's trunk.

I have testsuite logs for last good commit, first bad commit and current
trunk here:

https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/

Could you please check?

These are the new failures:

Running gcc:gcc.target/aarch64/aarch64.exp ...
FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 
1

Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ...
FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve 
-fno-stack-protector  check-function-bodies caller_pred
FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
#8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
#8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tfmov\\t(z[0-9]+\\.h), 
#8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - 
z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u32.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u64.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u8.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - 
z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_bf16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - 
z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - 
z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f32.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - 
z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f64.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - 
z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\

Re: [PATCH] c-family: implement -ffp-contract=on

2023-06-19 Thread Alexander Monakov via Gcc-patches


Ping. OK for trunk?

On Mon, 5 Jun 2023, Alexander Monakov wrote:

> Ping for the front-end maintainers' input.
> 
> On Mon, 22 May 2023, Richard Biener wrote:
> 
> > On Thu, May 18, 2023 at 11:04 PM Alexander Monakov via Gcc-patches
> >  wrote:
> > >
> > > Implement -ffp-contract=on for C and C++ without changing default
> > > behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN).
> > 
> > The documentation changes mention the defaults are changed for
> > standard modes, I suppose you want to remove that hunk.
> > 
> > > gcc/c-family/ChangeLog:
> > >
> > > * c-gimplify.cc (fma_supported_p): New helper.
> > > (c_gimplify_expr) [PLUS_EXPR, MINUS_EXPR]: Implement FMA
> > > contraction.
> > >
> > > gcc/ChangeLog:
> > >
> > > * common.opt (fp_contract_mode) [on]: Remove fallback.
> > > * config/sh/sh.md (*fmasf4): Correct flag_fp_contract_mode test.
> > > * doc/invoke.texi (-ffp-contract): Update.
> > > * trans-mem.cc (diagnose_tm_1): Skip internal function calls.
> > > ---
> > >  gcc/c-family/c-gimplify.cc | 78 ++
> > >  gcc/common.opt |  3 +-
> > >  gcc/config/sh/sh.md|  2 +-
> > >  gcc/doc/invoke.texi|  8 ++--
> > >  gcc/trans-mem.cc   |  3 ++
> > >  5 files changed, 88 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
> > > index ef5c7d919f..f7635d3b0c 100644
> > > --- a/gcc/c-family/c-gimplify.cc
> > > +++ b/gcc/c-family/c-gimplify.cc
> > > @@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.  If not see
> > >  #include "c-ubsan.h"
> > >  #include "tree-nested.h"
> > >  #include "context.h"
> > > +#include "tree-pass.h"
> > > +#include "internal-fn.h"
> > >
> > >  /*  The gimplification pass converts the language-dependent trees
> > >  (ld-trees) emitted by the parser into language-independent trees
> > > @@ -686,6 +688,14 @@ c_build_bind_expr (location_t loc, tree block, tree 
> > > body)
> > >return bind;
> > >  }
> > >
> > > +/* Helper for c_gimplify_expr: test if target supports fma-like FN.  */
> > > +
> > > +static bool
> > > +fma_supported_p (enum internal_fn fn, tree type)
> > > +{
> > > +  return direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH);
> > > +}
> > > +
> > >  /* Gimplification of expression trees.  */
> > >
> > >  /* Do C-specific gimplification on *EXPR_P.  PRE_P and POST_P are as in
> > > @@ -739,6 +749,74 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
> > > ATTRIBUTE_UNUSED,
> > > break;
> > >}
> > >
> > > +case PLUS_EXPR:
> > > +case MINUS_EXPR:
> > > +  {
> > > +   tree type = TREE_TYPE (*expr_p);
> > > +   /* For -ffp-contract=on we need to attempt FMA contraction only
> > > +  during initial gimplification.  Late contraction across 
> > > statement
> > > +  boundaries would violate language semantics.  */
> > > +   if (SCALAR_FLOAT_TYPE_P (type)
> > > +   && flag_fp_contract_mode == FP_CONTRACT_ON
> > > +   && cfun && !(cfun->curr_properties & PROP_gimple_any)
> > > +   && fma_supported_p (IFN_FMA, type))
> > > + {
> > > +   bool neg_mul = false, neg_add = code == MINUS_EXPR;
> > > +
> > > +   tree *op0_p = &TREE_OPERAND (*expr_p, 0);
> > > +   tree *op1_p = &TREE_OPERAND (*expr_p, 1);
> > > +
> > > +   /* Look for ±(x * y) ± z, swapping operands if necessary.  */
> > > +   if (TREE_CODE (*op0_p) == NEGATE_EXPR
> > > +   && TREE_CODE (TREE_OPERAND (*op0_p, 0)) == MULT_EXPR)
> > > + /* '*EXPR_P' is '-(x * y) ± z'.  This is fine.  */;
> > > +   else if (TREE_CODE (*op0_p) != MULT_EXPR)
> > > + {
> > > +   std::swap (op0_p, op1_p);
> > > +   std::swap (neg_mul, neg_add);
> > > + }
> > > +   if (TREE_CODE (*op0_p) == NEGATE_EXPR)
> > > + {
> > > +   op0_p = &TREE_OPERAND (*op0_p, 0);
> > > +   neg_mul = !neg_mul;
> > > + }
> > > +   if (TREE_CODE (*op0_p) != MULT_EXPR)
> > > + break;
> > > +   auto_vec ops (3);
> > > +   ops.quick_push (TREE_OPERAND (*op0_p, 0));
> > > +   ops.quick_push (TREE_OPERAND (*op0_p, 1));
> > > +   ops.quick_push (*op1_p);
> > > +
> > > +   enum internal_fn ifn = IFN_FMA;
> > > +   if (neg_mul)
> > > + {
> > > +   if (fma_supported_p (IFN_FNMA, type))
> > > + ifn = IFN_FNMA;
> > > +   else
> > > + ops[0] = build1 (NEGATE_EXPR, type, ops[0]);
> > > + }
> > > +   if (neg_add)
> > > + {
> > > +   enum internal_fn ifn2 = ifn == IFN_FMA ? IFN_FMS : 
> > > IFN_FNMS;
> > > +   if (fma_supported_p (ifn2, type))
> > > + ifn = ifn2;
> > > +   else
> > > +

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Manolis Tsamis
On Mon, Jun 19, 2023 at 7:57 PM Thiago Jung Bauermann
 wrote:
>
>
> Hello Manolis,
>
> Philipp Tomsich  writes:
>
> > On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
> >>
> >> On 5/25/23 06:35, Manolis Tsamis wrote:
> >> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> >> > in all cases, due to maybe_mode_change returning NULL. Relax this
> >> > restriction and allow propagation when no mode change is requested.
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  * regcprop.cc (maybe_mode_change): Enable stack pointer 
> >> > propagation.
> >> Thanks for the clarification.  This is OK for the trunk.  It looks
> >> generic enough to have value going forward now rather than waiting.
> >
> > Rebased, retested, and applied to trunk.  Thanks!
>
> Our CI found a couple of tests that started failing on aarch64-linux
> after this commit. I was able to confirm manually that they don't happen
> in the commit immediately before this one, and also that these failures
> are still present in today's trunk.
>
> I have testsuite logs for last good commit, first bad commit and current
> trunk here:
>
> https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/
>
> Could you please check?
>
> These are the new failures:
>
> Running gcc:gcc.target/aarch64/aarch64.exp ...
> FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, 
> sp 1
>
> Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ...
> FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve 
> -fno-stack-protector  check-function-bodies caller_pred
> FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
> #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
> #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tfmov\\t(z[0-9]+\\.h), 
> #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
> z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
> z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
> z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
> z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - 
> z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
> z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u64.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
> z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u8.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - 
> z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_bf16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - 
> z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - 
> z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - 
> z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f6

Re: [PATCH] debug/110295 - mixed up early/late debug for member DIEs

2023-06-19 Thread Jason Merrill via Gcc-patches

On 6/19/23 06:15, Richard Biener wrote:

When we process a scope typedef during early debug creation and
we have already created a DIE for the type when the decl is
TYPE_DECL_IS_STUB and this DIE is still in limbo we end up
just re-parenting that type DIE instead of properly creating
a DIE for the decl, eventually picking up the now completed
type and creating DIEs for the members.  Instead this is currently
defered to the second time we come here, when we annotate the
DIEs with locations late where now the type DIE is no longer
in limbo and we fall through doing the job for the decl.

The following makes sure we perform the necessary early tasks
for this by continuing with the decl DIE creation after setting
a parent for the limbo type DIE.

[LTO] Bootstrapped on x86_64-unknown-linux-gnu.

OK for trunk?

Thanks,
Richard.

PR debug/110295
* dwarf2out.cc (process_scope_var): Continue processing
the decl after setting a parent in case the existing DIE
was in limbo.

* g++.dg/debug/pr110295.C: New testcase.
---
  gcc/dwarf2out.cc  |  3 ++-
  gcc/testsuite/g++.dg/debug/pr110295.C | 19 +++
  2 files changed, 21 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/debug/pr110295.C

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index d89ffa66847..e70c47cec8d 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -26533,7 +26533,8 @@ process_scope_var (tree stmt, tree decl, tree origin, 
dw_die_ref context_die)
  
if (die != NULL && die->die_parent == NULL)

  add_child_die (context_die, die);


I wonder about reorganizing the function a bit to unify this parent 
setting with the one a bit below, which already falls through to 
gen_decl_die:



  if (decl && DECL_P (decl))
{
  die = lookup_decl_die (decl);

  /* Early created DIEs do not have a parent as the decls refer 
 to the function as DECL_CONTEXT rather than the BLOCK.  */

  if (die && die->die_parent == NULL)
{
  gcc_assert (in_lto_p);
  add_child_die (context_die, die);
}
}


OK either way.

Jason



Re: [PATCH] Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 05:41, Richard Biener via Gcc-patches wrote:

On Mon, Jun 19, 2023 at 12:33 PM Toru Kisuki via Gcc-patches
 wrote:


Hi,


With -O3 -fsignaling-nans -fno-signed-zeros, compiler should not simplify 'x + 
0.0' to 'x'.



OK if you bootstrapped / tested this change.
I'm suspect Toru doesn't have write access.  So I went ahead and did and 
x86 bootstrap & regression test which passed.  The ChangeLog entry 
needed fleshing out a bit and fixed a minor whitespace problem in the 
patch itself.


Pushed to the trunk.


jeff


Re: [PATCH] RISC-V: Add tuple vector mode psABI checking and simplify code

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/18/23 07:16, 钟居哲 wrote:

Thanks for cleaning up codes for future's ABI support patch.
Let's wait for Jeff or Robin comments.

Looks reasonable to me given the state we're in WRT psabi and vectors.

jeff


Re: Tiny phiprop compile time optimization

2023-06-19 Thread Andrew Pinski via Gcc-patches
On Mon, Jun 19, 2023 at 1:32 AM Richard Biener via Gcc-patches
 wrote:
>
> On Mon, 19 Jun 2023, Jan Hubicka wrote:
>
> > Hi,
> > this patch avoids unnecessary post dominator and update_ssa in phiprop.
> >
> > Bootstrapped/regtested x86_64-linux, OK?
> >
> > gcc/ChangeLog:
> >
> >   * tree-ssa-phiprop.cc (propagate_with_phi): Add 
> > post_dominators_computed;
> >   compute post dominators lazilly.
> >   (const pass_data pass_data_phiprop): Remove TODO_update_ssa.
> >   (pass_phiprop::execute): Update; return TODO_update_ssa if something
> >   changed.
> >
> > diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
> > index 3cb4900b6be..87e3a2ccf3a 100644
> > --- a/gcc/tree-ssa-phiprop.cc
> > +++ b/gcc/tree-ssa-phiprop.cc
> > @@ -260,7 +260,7 @@ chk_uses (tree, tree *idx, void *data)
> >
> >  static bool
> >  propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
> > - size_t n)
> > + size_t n, bool *post_dominators_computed)
> >  {
> >tree ptr = PHI_RESULT (phi);
> >gimple *use_stmt;
> > @@ -324,6 +324,12 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
> > phiprop_d *phivn,
> >gimple *def_stmt;
> >tree vuse;
> >
> > +  if (!*post_dominators_computed)
> > +{
> > +   calculate_dominance_info (CDI_POST_DOMINATORS);
> > +   *post_dominators_computed = true;
>
> I think you can save the parameter by using dom_info_available_p () here
> and ...
>
> > + }
> > +
> >/* Only replace loads in blocks that post-dominate the PHI node.  
> > That
> >   makes sure we don't end up speculating loads.  */
> >if (!dominated_by_p (CDI_POST_DOMINATORS,
> > @@ -465,7 +471,7 @@ const pass_data pass_data_phiprop =
> >0, /* properties_provided */
> >0, /* properties_destroyed */
> >0, /* todo_flags_start */
> > -  TODO_update_ssa, /* todo_flags_finish */
> > +  0, /* todo_flags_finish */
> >  };
> >
> >  class pass_phiprop : public gimple_opt_pass
> > @@ -490,9 +497,9 @@ pass_phiprop::execute (function *fun)
> >gphi_iterator gsi;
> >unsigned i;
> >size_t n;
> > +  bool post_dominators_computed = false;
> >
> >calculate_dominance_info (CDI_DOMINATORS);
> > -  calculate_dominance_info (CDI_POST_DOMINATORS);
> >
> >n = num_ssa_names;
> >phivn = XCNEWVEC (struct phiprop_d, n);
> > @@ -508,7 +515,8 @@ pass_phiprop::execute (function *fun)
> >if (bb_has_abnormal_pred (bb))
> >   continue;
> >for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > - did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
> > + did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n,
> > +  &post_dominators_computed);
> >  }
> >
> >if (did_something)
> > @@ -516,9 +524,10 @@ pass_phiprop::execute (function *fun)
> >
> >free (phivn);
> >
> > -  free_dominance_info (CDI_POST_DOMINATORS);
> > +  if (post_dominators_computed)
> > +free_dominance_info (CDI_POST_DOMINATORS);
>
> unconditionally free_dominance_info here.
>
> > -  return 0;
> > +  return did_something ? TODO_update_ssa : 0;
>
> I guess that change is following general practice and good to catch
> undesired changes (update_ssa will exit early when there's nothing
> to do anyway).

I wonder if TODO_update_ssa_only_virtuals should be used here rather
than TODO_update_ssa as the code produces ssa names already and just
adds memory loads/stores. But I could be wrong.

Thanks,
Andrew Pinski


>
> OK with those changes.


Re: Tiny phiprop compile time optimization

2023-06-19 Thread Richard Biener via Gcc-patches



> Am 19.06.2023 um 20:08 schrieb Andrew Pinski via Gcc-patches 
> :
> 
> On Mon, Jun 19, 2023 at 1:32 AM Richard Biener via Gcc-patches
>  wrote:
>> 
>>> On Mon, 19 Jun 2023, Jan Hubicka wrote:
>>> 
>>> Hi,
>>> this patch avoids unnecessary post dominator and update_ssa in phiprop.
>>> 
>>> Bootstrapped/regtested x86_64-linux, OK?
>>> 
>>> gcc/ChangeLog:
>>> 
>>>  * tree-ssa-phiprop.cc (propagate_with_phi): Add 
>>> post_dominators_computed;
>>>  compute post dominators lazilly.
>>>  (const pass_data pass_data_phiprop): Remove TODO_update_ssa.
>>>  (pass_phiprop::execute): Update; return TODO_update_ssa if something
>>>  changed.
>>> 
>>> diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
>>> index 3cb4900b6be..87e3a2ccf3a 100644
>>> --- a/gcc/tree-ssa-phiprop.cc
>>> +++ b/gcc/tree-ssa-phiprop.cc
>>> @@ -260,7 +260,7 @@ chk_uses (tree, tree *idx, void *data)
>>> 
>>> static bool
>>> propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
>>> - size_t n)
>>> + size_t n, bool *post_dominators_computed)
>>> {
>>>   tree ptr = PHI_RESULT (phi);
>>>   gimple *use_stmt;
>>> @@ -324,6 +324,12 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
>>> phiprop_d *phivn,
>>>   gimple *def_stmt;
>>>   tree vuse;
>>> 
>>> +  if (!*post_dominators_computed)
>>> +{
>>> +   calculate_dominance_info (CDI_POST_DOMINATORS);
>>> +   *post_dominators_computed = true;
>> 
>> I think you can save the parameter by using dom_info_available_p () here
>> and ...
>> 
>>> + }
>>> +
>>>   /* Only replace loads in blocks that post-dominate the PHI node.  That
>>>  makes sure we don't end up speculating loads.  */
>>>   if (!dominated_by_p (CDI_POST_DOMINATORS,
>>> @@ -465,7 +471,7 @@ const pass_data pass_data_phiprop =
>>>   0, /* properties_provided */
>>>   0, /* properties_destroyed */
>>>   0, /* todo_flags_start */
>>> -  TODO_update_ssa, /* todo_flags_finish */
>>> +  0, /* todo_flags_finish */
>>> };
>>> 
>>> class pass_phiprop : public gimple_opt_pass
>>> @@ -490,9 +497,9 @@ pass_phiprop::execute (function *fun)
>>>   gphi_iterator gsi;
>>>   unsigned i;
>>>   size_t n;
>>> +  bool post_dominators_computed = false;
>>> 
>>>   calculate_dominance_info (CDI_DOMINATORS);
>>> -  calculate_dominance_info (CDI_POST_DOMINATORS);
>>> 
>>>   n = num_ssa_names;
>>>   phivn = XCNEWVEC (struct phiprop_d, n);
>>> @@ -508,7 +515,8 @@ pass_phiprop::execute (function *fun)
>>>   if (bb_has_abnormal_pred (bb))
>>>  continue;
>>>   for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>> - did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
>>> + did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n,
>>> +  &post_dominators_computed);
>>> }
>>> 
>>>   if (did_something)
>>> @@ -516,9 +524,10 @@ pass_phiprop::execute (function *fun)
>>> 
>>>   free (phivn);
>>> 
>>> -  free_dominance_info (CDI_POST_DOMINATORS);
>>> +  if (post_dominators_computed)
>>> +free_dominance_info (CDI_POST_DOMINATORS);
>> 
>> unconditionally free_dominance_info here.
>> 
>>> -  return 0;
>>> +  return did_something ? TODO_update_ssa : 0;
>> 
>> I guess that change is following general practice and good to catch
>> undesired changes (update_ssa will exit early when there's nothing
>> to do anyway).
> 
> I wonder if TODO_update_ssa_only_virtuals should be used here rather
> than TODO_update_ssa as the code produces ssa names already and just
> adds memory loads/stores. But I could be wrong.

I guess it should be able to update virtual SSA form itself.  But it’s been 
some time since I wrote the pass …

> 
> Thanks,
> Andrew Pinski
> 
> 
>> 
>> OK with those changes.


Re: [PATCH] c-family: implement -ffp-contract=on

2023-06-19 Thread Richard Biener via Gcc-patches



> Am 19.06.2023 um 19:03 schrieb Alexander Monakov :
> 
> 
> Ping. OK for trunk?

Ok if the FE maintainers do not object within 48h.

Thanks,
Richard 

>> On Mon, 5 Jun 2023, Alexander Monakov wrote:
>> 
>> Ping for the front-end maintainers' input.
>> 
>>> On Mon, 22 May 2023, Richard Biener wrote:
>>> 
>>> On Thu, May 18, 2023 at 11:04 PM Alexander Monakov via Gcc-patches
>>>  wrote:
 
 Implement -ffp-contract=on for C and C++ without changing default
 behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN).
>>> 
>>> The documentation changes mention the defaults are changed for
>>> standard modes, I suppose you want to remove that hunk.
>>> 
 gcc/c-family/ChangeLog:
 
* c-gimplify.cc (fma_supported_p): New helper.
(c_gimplify_expr) [PLUS_EXPR, MINUS_EXPR]: Implement FMA
contraction.
 
 gcc/ChangeLog:
 
* common.opt (fp_contract_mode) [on]: Remove fallback.
* config/sh/sh.md (*fmasf4): Correct flag_fp_contract_mode test.
* doc/invoke.texi (-ffp-contract): Update.
* trans-mem.cc (diagnose_tm_1): Skip internal function calls.
 ---
 gcc/c-family/c-gimplify.cc | 78 ++
 gcc/common.opt |  3 +-
 gcc/config/sh/sh.md|  2 +-
 gcc/doc/invoke.texi|  8 ++--
 gcc/trans-mem.cc   |  3 ++
 5 files changed, 88 insertions(+), 6 deletions(-)
 
 diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
 index ef5c7d919f..f7635d3b0c 100644
 --- a/gcc/c-family/c-gimplify.cc
 +++ b/gcc/c-family/c-gimplify.cc
 @@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-ubsan.h"
 #include "tree-nested.h"
 #include "context.h"
 +#include "tree-pass.h"
 +#include "internal-fn.h"
 
 /*  The gimplification pass converts the language-dependent trees
 (ld-trees) emitted by the parser into language-independent trees
 @@ -686,6 +688,14 @@ c_build_bind_expr (location_t loc, tree block, tree 
 body)
   return bind;
 }
 
 +/* Helper for c_gimplify_expr: test if target supports fma-like FN.  */
 +
 +static bool
 +fma_supported_p (enum internal_fn fn, tree type)
 +{
 +  return direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH);
 +}
 +
 /* Gimplification of expression trees.  */
 
 /* Do C-specific gimplification on *EXPR_P.  PRE_P and POST_P are as in
 @@ -739,6 +749,74 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
 ATTRIBUTE_UNUSED,
break;
   }
 
 +case PLUS_EXPR:
 +case MINUS_EXPR:
 +  {
 +   tree type = TREE_TYPE (*expr_p);
 +   /* For -ffp-contract=on we need to attempt FMA contraction only
 +  during initial gimplification.  Late contraction across 
 statement
 +  boundaries would violate language semantics.  */
 +   if (SCALAR_FLOAT_TYPE_P (type)
 +   && flag_fp_contract_mode == FP_CONTRACT_ON
 +   && cfun && !(cfun->curr_properties & PROP_gimple_any)
 +   && fma_supported_p (IFN_FMA, type))
 + {
 +   bool neg_mul = false, neg_add = code == MINUS_EXPR;
 +
 +   tree *op0_p = &TREE_OPERAND (*expr_p, 0);
 +   tree *op1_p = &TREE_OPERAND (*expr_p, 1);
 +
 +   /* Look for ±(x * y) ± z, swapping operands if necessary.  */
 +   if (TREE_CODE (*op0_p) == NEGATE_EXPR
 +   && TREE_CODE (TREE_OPERAND (*op0_p, 0)) == MULT_EXPR)
 + /* '*EXPR_P' is '-(x * y) ± z'.  This is fine.  */;
 +   else if (TREE_CODE (*op0_p) != MULT_EXPR)
 + {
 +   std::swap (op0_p, op1_p);
 +   std::swap (neg_mul, neg_add);
 + }
 +   if (TREE_CODE (*op0_p) == NEGATE_EXPR)
 + {
 +   op0_p = &TREE_OPERAND (*op0_p, 0);
 +   neg_mul = !neg_mul;
 + }
 +   if (TREE_CODE (*op0_p) != MULT_EXPR)
 + break;
 +   auto_vec ops (3);
 +   ops.quick_push (TREE_OPERAND (*op0_p, 0));
 +   ops.quick_push (TREE_OPERAND (*op0_p, 1));
 +   ops.quick_push (*op1_p);
 +
 +   enum internal_fn ifn = IFN_FMA;
 +   if (neg_mul)
 + {
 +   if (fma_supported_p (IFN_FNMA, type))
 + ifn = IFN_FNMA;
 +   else
 + ops[0] = build1 (NEGATE_EXPR, type, ops[0]);
 + }
 +   if (neg_add)
 + {
 +   enum internal_fn ifn2 = ifn == IFN_FMA ? IFN_FMS : 
 IFN_FNMS;
 +   if (fma_supported_p (ifn2, type))
 + ifn = ifn2;
 +   else

Re: [PATCH] RISC-V: Add VLS modes for GNU vectors

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/18/23 17:06, Juzhe-Zhong wrote:

This patch is a propsal patch is **NOT** ready to push since
after this patch the total machine modes will exceed 255 which will create ICE
in LTO:
   internal compiler error: in bp_pack_int_in_range, at data-streamer.h:290
Right.  Note that an ack from Jakub or Richi will be sufficient for the 
LTO fixes to go forward.





The reason we need to add VLS modes for following reason:
1. Enhance GNU vectors codegen:
For example:
  typedef int32_t vnx8si __attribute__ ((vector_size (32)));

  __attribute__ ((noipa)) void
  f_vnx8si (int32_t * in, int32_t * out)
  {
vnx8si v = *(vnx8si*)in;
*(vnx8si *) out = v;
  }

compile option: --param=riscv-autovec-preference=scalable
 before this patch:
 f_vnx8si:
 ld  a2,0(a0)
 ld  a3,8(a0)
 ld  a4,16(a0)
 ld  a5,24(a0)
 addisp,sp,-32
 sd  a2,0(a1)
 sd  a3,8(a1)
 sd  a4,16(a1)
 sd  a5,24(a1)
 addisp,sp,32
 jr  ra

After this patch:
f_vnx8si:
 vsetivlizero,8,e32,m2,ta,ma
 vle32.v v2,0(a0)
 vse32.v v2,0(a1)
 ret

2. Ehance VLA SLP:
void
f (uint8_t *restrict a, uint8_t *restrict b, uint8_t *restrict c)
{
   for (int i = 0; i < 100; ++i)
 {
   a[i * 8] = b[i * 8] + c[i * 8];
   a[i * 8 + 1] = b[i * 8] + c[i * 8 + 1];
   a[i * 8 + 2] = b[i * 8 + 2] + c[i * 8 + 2];
   a[i * 8 + 3] = b[i * 8 + 2] + c[i * 8 + 3];
   a[i * 8 + 4] = b[i * 8 + 4] + c[i * 8 + 4];
   a[i * 8 + 5] = b[i * 8 + 4] + c[i * 8 + 5];
   a[i * 8 + 6] = b[i * 8 + 6] + c[i * 8 + 6];
   a[i * 8 + 7] = b[i * 8 + 6] + c[i * 8 + 7];
 }
}


..
Loop body:
  ...
  vrgatherei16.vv...
  ...

Tail:
  lbu a4,792(a1)
 lbu a5,792(a2)
 addwa5,a5,a4
 sb  a5,792(a0)
 lbu a5,793(a2)
 addwa5,a5,a4
 sb  a5,793(a0)
 lbu a4,794(a1)
 lbu a5,794(a2)
 addwa5,a5,a4
 sb  a5,794(a0)
 lbu a5,795(a2)
 addwa5,a5,a4
 sb  a5,795(a0)
 lbu a4,796(a1)
 lbu a5,796(a2)
 addwa5,a5,a4
 sb  a5,796(a0)
 lbu a5,797(a2)
 addwa5,a5,a4
 sb  a5,797(a0)
 lbu a4,798(a1)
 lbu a5,798(a2)
 addwa5,a5,a4
 sb  a5,798(a0)
 lbu a5,799(a2)
 addwa5,a5,a4
 sb  a5,799(a0)
 ret

The tail elements need VLS modes to vectorize like ARM SVE:

f:
 mov x3, 0
 cntbx5
 mov x4, 792
 whilelo p7.b, xzr, x4
.L2:
 ld1bz31.b, p7/z, [x1, x3]
 ld1bz30.b, p7/z, [x2, x3]
 trn1z31.b, z31.b, z31.b
 add z31.b, z31.b, z30.b
 st1bz31.b, p7, [x0, x3]
 add x3, x3, x5
 whilelo p7.b, x3, x4
 b.any   .L2
Tail:
 ldr b31, [x1, 792]
 ldr b27, [x1, 794]
 ldr b28, [x1, 796]
 dup v31.8b, v31.b[0]
 ldr b29, [x1, 798]
 ldr d30, [x2, 792]
 ins v31.b[2], v27.b[0]
 ins v31.b[3], v27.b[0]
 ins v31.b[4], v28.b[0]
 ins v31.b[5], v28.b[0]
 ins v31.b[6], v29.b[0]
 ins v31.b[7], v29.b[0]
 add v31.8b, v30.8b, v31.8b
 str d31, [x0, 792]
 ret

Notice ARM SVE use ADVSIMD modes (Neon) to vectorize the tail.






gcc/ChangeLog:

 * config/riscv/riscv-modes.def (VECTOR_BOOL_MODE): Add VLS modes for 
GNU vectors.
 (ADJUST_ALIGNMENT): Ditto.
 (ADJUST_BYTESIZE): Ditto.

 (ADJUST_PRECISION): Ditto.
 (VECTOR_MODES): Ditto.
 * config/riscv/riscv-protos.h (riscv_v_ext_vls_mode_p): Ditto.
 (get_regno_alignment): Ditto.
 * config/riscv/riscv-v.cc (INCLUDE_ALGORITHM): Ditto.
 (const_vlmax_p): Ditto.
 (legitimize_move): Ditto.
 (get_vlmul): Ditto.
 (get_regno_alignment): Ditto.
 (get_ratio): Ditto.
 (get_vector_mode): Ditto.
 * config/riscv/riscv-vector-switch.def (VLS_ENTRY): Ditto.
 * config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Ditto.
 (VLS_ENTRY): Ditto.
 (riscv_v_ext_mode_p): Ditto.
 (riscv_hard_regno_nregs): Ditto.
 (riscv_hard_regno_mode_ok): Ditto.
 * config/riscv/riscv.md: Ditto.
 * config/riscv/vector-iterators.md: Ditto.
 * config/riscv/vector.md: Ditto.
 * config/riscv/autovec-vls.md: New file.

---
So I expected we were going to have to define some static length 
patterns at some point.  So this isn't a huge surprise.








diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 6421e933ca9..6fc1c433069 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/ri

Re: [PATCH] tree-optimization/110243 - kill off IVOPTs split_offset

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/16/23 06:34, Richard Biener via Gcc-patches wrote:

IVOPTs has strip_offset which suffers from the same issues regarding
integer overflow that split_constant_offset did but the latter was
fixed quite some time ago.  The following implements strip_offset
in terms of split_constant_offset, removing the redundant and
incorrect implementation.

The implementations are not exactly the same, strip_offset relies
on ptrdiff_tree_p to fend off too large offsets while split_constant_offset
simply assumes those do not happen and truncates them.  By
the same means strip_offset also handles POLY_INT_CSTs but
split_constant_offset does not.  Massaging the latter to
behave like strip_offset in those cases might be the way to go?

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Comments?

Thanks,
Richard.

PR tree-optimization/110243
* tree-ssa-loop-ivopts.cc (strip_offset_1): Remove.
(strip_offset): Make it a wrapper around split_constant_offset.

* gcc.dg/torture/pr110243.c: New testcase.

Your call -- IMHO you know this code far better than I.

jeff


Re: [PATCH ver 5] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-19 Thread Carl Love via Gcc-patches
Kewen:

On Mon, 2023-06-19 at 14:08 +0800, Kewen.Lin wrote:
> > 



> Hi Carl,
> 
> on 2023/6/17 01:57, Carl Love wrote:
> > overloaded instance. Update comments.
> > * config/rs6000/rs6000-overload.def
> > (__builtin_vec_scalar_insert_exp): Add new overload definition
> > with
> > vector arguments.
> > (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
> > overloaded definitions.
> > * config/vsx.md (V2DI_DI): New mode iterator.
> 
> Missing an entry for DI_to_TI.

Opps, missed that.  Sorry, fixed.

> > 



> 
> >  
> >const signed long long __builtin_vsx_scalar_extract_expq
> > (_Float128);
> > -VSEEQP xsxexpqp_kf {}
> > +VSEEQP xsxexpqp_kf_di {}
> > +
> > +  vull __builtin_vsx_scalar_extract_exp_to_vec (_Float128);
> > +VSEEXPKF xsxexpqp_kf_v2di {}
> 
> As I pointed out previously, the related id is VSEEQP, since both of
> them

Oops, I guess I forgot to change that.  Sorry.

> have kf in their names, having KF in its id doesn't look good IMHO.
> How about VSEEQPV instead of VSEEXPKF?  It's also consistent with
> what
> we use for VSIEQP.

Yup, makes sense, changed to VSEEQPV.
> 
> >  
> >const signed __int128 __builtin_vsx_scalar_extract_sigq
> > (_Float128);
> > -VSESQP xsxsigqp_kf {}
> > +VSESQP xsxsigqp_kf_ti {}
> > +
> > +  vuq __builtin_vsx_scalar_extract_sig_to_vec (_Float128);
> > +VSESIGKF xsxsigqp_kf_v1ti {}
> 
> Similar to the above, s/VSESIGKF/VSESQPV/
 
Changed to VSESQPV.
> 
> >  
> >const _Float128 __builtin_vsx_scalar_insert_exp_q (unsigned
> > __int128, \
> >   unsigned long
> > long);
> > -VSIEQP xsiexpqp_kf {}
> > +VSIEQP xsiexpqp_kf_di {}
> >  
> >const _Float128 __builtin_vsx_scalar_insert_exp_qp (_Float128, \
> >unsigned
> > long long);
> >  VSIEQPF xsiexpqpf_kf {}
> >  
> > +  const _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
> > +VSIEQPV xsiexpqp_kf_v2di {}
> > +
> >const signed int __builtin_vsx_scalar_test_data_class_qp
> > (_Float128, \
> >  const
> > int<7>);
> >  VSTDCQP xststdcqp_kf {}
> > diff --git a/gcc/config/rs6000/rs6000-c.cc
> > b/gcc/config/rs6000/rs6000-c.cc
> > index 8555174d36e..11060f697db 100644
> > --- a/gcc/config/rs6000/rs6000-c.cc
> > +++ b/gcc/config/rs6000/rs6000-c.cc
> > @@ -1929,11 +1929,15 @@ altivec_resolve_overloaded_builtin
> > (location_t loc, tree fndecl,
> >128-bit variant of built-in function.  */
> > if (GET_MODE_PRECISION (arg1_mode) > 64)
> >   {
> > -   /* If first argument is of float variety, choose variant
> > -  that expects __ieee128 argument.  Otherwise, expect
> > -  __int128 argument.  */
> > +   /* If first argument is of float variety, choose the
> > variant that
> > +  expects __ieee128 argument.  If the first argument is
> > vector
> > +  int, choose the variant that expects vector unsigned
> > +  __int128 argument.  Otherwise, expect scalar __int128
> > argument.
> > +   */
> > if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
> >   instance_code = RS6000_BIF_VSIEQPF;
> > +   else if (GET_MODE_CLASS (arg1_mode) == MODE_VECTOR_INT)
> > + instance_code = RS6000_BIF_VSIEQPV;
> > else
> >   instance_code = RS6000_BIF_VSIEQP;
> >   }
> > diff --git a/gcc/config/rs6000/rs6000-overload.def
> > b/gcc/config/rs6000/rs6000-overload.def
> > index c582490c084..05a5ca6a04d 100644
> > --- a/gcc/config/rs6000/rs6000-overload.def
> > +++ b/gcc/config/rs6000/rs6000-overload.def
> > @@ -4515,6 +4515,18 @@
> >  VSIEQP
> >_Float128 __builtin_vec_scalar_insert_exp (_Float128, unsigned
> > long long);
> >  VSIEQPF
> > +  _Float128 __builtin_vec_scalar_insert_exp (vuq, vull);
> > +VSIEQPV
> > +
> > +[VEC_VSEEV, scalar_extract_exp_to_vec, \
> > +__builtin_vec_scalar_extract_exp_to_vector]
> > +  vull __builtin_vec_scalar_extract_exp_to_vector (_Float128);
> > +VSEEXPKF
> > +
> 
> Need to update if the above changes.

changed 
> 
> > +[VEC_VSESV, scalar_extract_sig_to_vec, \
> > +__builtin_vec_scalar_extract_sig_to_vector]
> > +  vuq __builtin_vec_scalar_extract_sig_to_vector (_Float128);
> > +VSESIGKF
> >  
> 
> Ditto.

changed

> 



> > 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-
> > exp-8.c b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-
> > 8.c
> > new file mode 100644
> > index 000..e24e09012d9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-8.c
> > @@ -0,0 +1,58 @@
> > +/* { dg-do run { target { powerpc*-*-* } } } */
> > +/* { dg-require-effective-target lp64 } */
> > +/* { dg-require-effective-target p9vector_hw } */
> > +/* { dg-options "-mdejagnu-cpu=power9 -save-temps" } */
> > +
> > +#include 
> > +#include 
> > +
> > +#if DE

Re: [PATCH] Introduce hardbool attribute for C

2023-06-19 Thread Bernhard Reutner-Fischer via Gcc-patches
On 16 June 2023 07:35:27 CEST, Alexandre Oliva via Gcc-patches 
 wrote:

index 0..634feaed4deef
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/hardbool-err.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+typedef _Bool __attribute__ ((__hardbool__))
+hbbl; /* { dg-error "integral types" } */
+
+typedef double __attribute__ ((__hardbool__))
+hbdbl; /* { dg-error "integral types" } */
+
+enum x;
+typedef enum x __attribute__ ((__hardbool__))
+hbenum; /* { dg-error "integral types" } */
+
+struct s;
+typedef struct s __attribute__ ((__hardbool__))
+hbstruct; /* { dg-error "integral types" } */
+
+typedef int __attribute__ ((__hardbool__ (0, 0)))
+hb00; /* { dg-error "different values" } */
+
+typedef int __attribute__ ((__hardbool__ (4, 16))) hb4x;
+struct s {
+ hb4x m:2;
+}; /* { dg-error "is a GCC extension|different values" } */
+/* { dg-warning "changes value" "warning" { target *-*-* } .-1 } */
+
+hb4x __attribute__ ((vector_size (4 * sizeof (hb4x
+vvar; /* { dg-error "invalid vector type" } */

Arm-chair, tinfoil hat still on, didn't look closely, hence:

I don't see explicit tests with _Complex nor __complex__. Would we want to 
check these here, or are they handled thought the "underlying" tests above?

I'd welcome a fortran interop note in the docs as hinted previously to cover 
out of the box behavior. It's probably reasonably unlikely but better be safe 
than sorry?
cheers,


[PATCH ver 6] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-19 Thread Carl Love via Gcc-patches


Kewen, GCC maintainers:

Version 6, Fixed missing change log entry.  Changed builtin id names as
requested.  Missed making the change on the last version.  Fixed
comment in the three test cases.  Reran regression suite on Power 10,
no regressions.

Version 5, Tested the patch on P9 BE per request.  Fixed up test case
to get the correct expected values for BE and LE.  Fixed typos. 
Updated the doc/extend.texi to clarify the vector arguments.  Changed
test file names per request.  Moved builtin defs next to related
definitions.  Renamed new mode_attr. Removed new mode_iterator, used
existing iterator instead. Renamed mode_iterator VSEEQP_DI to V2DI_DI. 
Fixed up overloaded definitions per request.

Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp
cases to rs6000_expand_builtin.  Merged the new define_insn definitions
with the existing definitions.  Renamed the builtins by removing the
__builtin_ prefix from the names.  Fixed the documentation for the
builtins.  Updated the test files to check the desired instructions
were generated.  Retested patch on Power 10 with no regressions.

Version 3, was able to get the overloaded version of scalar_insert_exp
to work and the change to xsxexpqp_f128_ define instruction to
work with the suggestions from Kewen.  

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 9 BE and Power 10 LE with no
regressions.  Please let me know if the patch is acceptable or not. 
Thanks.

   Carl


rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
 __ieee128 scalar_insert_exp (__vector unsigned __int128,
  __vector unsigned long long);

The instructions used in the builtins operate on vector registers.  Thus
the result must be moved to a scalar type.  There is no clean, performant
way to do this.  The user code typically needs the result as a vector
anyway.

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
(CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
CODE_FOR_xsiexpqp_kf_v2di): Add case statements.
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di,
xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new
overloaded instance. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
(scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
overloaded definitions.
* config/vsx.md (V2DI_DI): New mode iterator.
(DI_to_TI): New mode attribute.
Rename xsxexpqp_ to sxexpqp__.
Rename xsxsigqp_ to xsxsigqp__.
Rename xsiexpqp_ to xsiexpqp__.
* doc/extend.texi (__builtin_extractf128_exp,
__builtin_extractf128_sig): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-8.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-8.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-16.c: New test case.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  21 +++-
 gcc/config/rs6000/rs6000-builtins.def |  15 ++-
 gcc/config/rs6000/rs6000-c.cc |  10 +-
 gcc/config/rs6000/rs6000-overload.def |  12 ++
 gcc/config/rs6000/vsx.md  |  25 +++--
 gcc/doc/extend.texi   |  24 +++-
 .../power

Re: [PATCH v2] RISC-V: Save and restore FCSR in interrupt functions to avoid program errors.

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/14/23 01:57, Jin Ma wrote:

In order to avoid interrupt functions to change the FCSR, it needs to be saved
and restored at the beginning and end of the function.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_compute_frame_info): Allocate frame for 
FCSR.
(riscv_for_each_saved_reg): Save and restore FCSR in interrupt 
functions.
* config/riscv/riscv.md (riscv_frcsr): New patterns.
(riscv_fscsr): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/interrupt-fcsr-1.c: New test.
* gcc.target/riscv/interrupt-fcsr-2.c: New test.
* gcc.target/riscv/interrupt-fcsr-3.c: New test.

Thanks.  I pushed this to the trunk.
jeff


Re: [PATCH] tree-optimization/110243 - kill off IVOPTs split_offset

2023-06-19 Thread Richard Sandiford via Gcc-patches
Jeff Law  writes:
> On 6/16/23 06:34, Richard Biener via Gcc-patches wrote:
>> IVOPTs has strip_offset which suffers from the same issues regarding
>> integer overflow that split_constant_offset did but the latter was
>> fixed quite some time ago.  The following implements strip_offset
>> in terms of split_constant_offset, removing the redundant and
>> incorrect implementation.
>> 
>> The implementations are not exactly the same, strip_offset relies
>> on ptrdiff_tree_p to fend off too large offsets while split_constant_offset
>> simply assumes those do not happen and truncates them.  By
>> the same means strip_offset also handles POLY_INT_CSTs but
>> split_constant_offset does not.  Massaging the latter to
>> behave like strip_offset in those cases might be the way to go?
>> 
>> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> 
>> Comments?
>> 
>> Thanks,
>> Richard.
>> 
>>  PR tree-optimization/110243
>>  * tree-ssa-loop-ivopts.cc (strip_offset_1): Remove.
>>  (strip_offset): Make it a wrapper around split_constant_offset.
>> 
>>  * gcc.dg/torture/pr110243.c: New testcase.
> Your call -- IMHO you know this code far better than I.

+1, but LGTM FWIW.  I couldn't see anything obvious (and valid)
that split_offset_1 handles and split_constant_offset doesn't.

Thanks,
Richard


[PATCH 00/14] [og13] OpenMP/OpenACC: map clause and OMP gimplify rework

2023-06-19 Thread Julian Brown
This series (for the og13 branch) is a rebased and merged version of
the first few patches of the series previously sent upstream for mainline:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609031.html

The series contains patches 1-6 and the parts of 8 ("C++
"declare mapper" support) that pertain to reorganisation of
gimplify.cc:gimplify_{scan,adjust}_omp_clauses.

The series also contains reversions and rewrites of several patches
that needed adjustment in order to fit in with the new clause-processing
arrangements.

Tested with offloading to AMD GCN. I will apply shortly.

Thanks,

Julian

Julian Brown (14):
  Revert "Assumed-size arrays with non-lexical data mappings"
  Revert "Fix references declared in lexically-enclosing OpenACC data
region"
  Revert "Fix implicit mapping for array slices on lexically-enclosing
data constructs (PR70828)"
  Revert "openmp: Handle C/C++ array reference base-pointers in array
sections"
  OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in
{c_}finish_omp_clause
  OpenMP/OpenACC: Rework clause expansion and nested struct handling
  OpenMP: implicitly map base pointer for array-section pointer
components
  OpenMP: Pointers and member mappings
  OpenMP/OpenACC: Unordered/non-constant component offset runtime
diagnostic
  OpenMP/OpenACC: Reorganise OMP map clause handling in gimplify.cc
  OpenACC: Reimplement "inheritance" for lexically-nested offload
regions
  OpenACC: "declare create" fixes wrt. "allocatable" variables
  OpenACC: Allow implicit uses of assumed-size arrays in offload regions
  OpenACC: Improve implicit mapping for non-lexically nested offload
regions

 gcc/c-family/c-common.h   |   74 +-
 gcc/c-family/c-omp.cc |  837 -
 gcc/c/c-parser.cc |   17 +-
 gcc/c/c-typeck.cc |  773 ++--
 gcc/cp/parser.cc  |   17 +-
 gcc/cp/pt.cc  |4 +-
 gcc/cp/semantics.cc   | 1065 +++---
 gcc/fortran/dependency.cc |  128 +
 gcc/fortran/dependency.h  |1 +
 gcc/fortran/gfortran.h|1 +
 gcc/fortran/trans-openmp.cc   |  376 +-
 gcc/gimplify.cc   | 2239 
 gcc/omp-general.cc|  424 +++
 gcc/omp-general.h |   69 +
 gcc/omp-low.cc|   23 +-
 .../c-c++-common/goacc/acc-data-chain.c   |2 +-
 .../c-c++-common/goacc/combined-reduction.c   |2 +-
 .../c-c++-common/goacc/reduction-1.c  |4 +-
 .../c-c++-common/goacc/reduction-10.c |9 +-
 .../c-c++-common/goacc/reduction-2.c  |4 +-
 .../c-c++-common/goacc/reduction-3.c  |4 +-
 .../c-c++-common/goacc/reduction-4.c  |4 +-
 gcc/testsuite/c-c++-common/gomp/clauses-2.c   |2 +-
 gcc/testsuite/c-c++-common/gomp/target-50.c   |2 +-
 .../c-c++-common/gomp/target-enter-data-1.c   |4 +-
 .../c-c++-common/gomp/target-implicit-map-2.c |3 +-
 .../g++.dg/gomp/static-component-1.C  |   23 +
 gcc/testsuite/gcc.dg/gomp/target-3.c  |2 +-
 .../gfortran.dg/goacc/assumed-size.f90|   35 +
 .../gfortran.dg/goacc/loop-tree-1.f90 |2 +-
 gcc/testsuite/gfortran.dg/gomp/map-12.f90 |2 +-
 gcc/testsuite/gfortran.dg/gomp/map-9.f90  |2 +-
 .../gfortran.dg/gomp/map-subarray-2.f90   |   57 +
 .../gfortran.dg/gomp/map-subarray.f90 |   40 +
 gcc/tree-pretty-print.cc  |3 +
 gcc/tree.h|8 +
 include/gomp-constants.h  |9 +-
 libgomp/oacc-mem.c|6 +-
 libgomp/target.c  |   91 +-
 libgomp/testsuite/libgomp.c++/baseptrs-3.C|  275 ++
 libgomp/testsuite/libgomp.c++/baseptrs-4.C| 3154 +
 libgomp/testsuite/libgomp.c++/baseptrs-5.C|   62 +
 libgomp/testsuite/libgomp.c++/class-array-1.C |   59 +
 libgomp/testsuite/libgomp.c++/target-48.C |   32 +
 libgomp/testsuite/libgomp.c++/target-49.C |   37 +
 .../libgomp.c-c++-common/baseptrs-1.c |   50 +
 .../libgomp.c-c++-common/baseptrs-2.c |   70 +
 .../map-arrayofstruct-1.c |   38 +
 .../map-arrayofstruct-2.c |   58 +
 .../map-arrayofstruct-3.c |   68 +
 .../target-implicit-map-2.c   |2 +
 .../target-implicit-map-5.c   |   50 +
 .../libgomp.c-c++-common/target-map-zlas-1.c  |   36 +
 .../libgomp.fortran/map-subarray-2.f90|  108 +
 .../libgomp.fortran/map-subarray-3.f90|   62 +
 .../libgomp.fortran/map-subarray-4.f90|   35 +
 .../libgomp.fortran/map-subarray-5.f90|   54 +
 .../libgomp.fortran/map-subarray-6.f90|   26 +
 .../libgomp

[PATCH 04/14] Revert "openmp: Handle C/C++ array reference base-pointers in array sections"

2023-06-19 Thread Julian Brown
This reverts commit 3385743fd2fa15a2a750a29daf6d4f97f5aad0ae.

2023-06-16  Julian Brown  

Revert:
2022-02-24  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-typeck.cc (handle_omp_array_sections): Add handling for
creating array-reference base-pointer attachment clause.

gcc/cp/ChangeLog:

* semantics.cc (handle_omp_array_sections): Add handling for
creating array-reference base-pointer attachment clause.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/target-enter-data-1.c: Adjust testcase.

libgomp/ChangeLog:

* testsuite/libgomp.c-c++-common/ptr-attach-2.c: New test.
---
 gcc/c/c-typeck.cc | 27 +
 gcc/cp/semantics.cc   | 28 +
 .../c-c++-common/gomp/target-enter-data-1.c   |  3 +-
 .../libgomp.c-c++-common/ptr-attach-2.c   | 60 ---
 4 files changed, 3 insertions(+), 115 deletions(-)
 delete mode 100644 libgomp/testsuite/libgomp.c-c++-common/ptr-attach-2.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 450214556f9..9591d67251e 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -14113,10 +14113,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
   if (int_size_in_bytes (TREE_TYPE (first)) <= 0)
maybe_zero_len = true;
 
-  struct dim { tree low_bound, length; };
-  auto_vec dims (num);
-  dims.safe_grow (num);
-
   for (i = num, t = OMP_CLAUSE_DECL (c); i > 0;
   t = TREE_CHAIN (t))
{
@@ -14238,9 +14234,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  else
size = size_binop (MULT_EXPR, size, l);
}
-
- dim d = { low_bound, length };
- dims[i] = d;
}
   if (non_contiguous)
{
@@ -14288,23 +14281,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  OMP_CLAUSE_DECL (c) = t;
  return false;
}
-
-  tree aref = t;
-  for (i = 0; i < dims.length (); i++)
-   {
- if (dims[i].length && integer_onep (dims[i].length))
-   {
- tree lb = dims[i].low_bound;
- aref = build_array_ref (OMP_CLAUSE_LOCATION (c), aref, lb);
-   }
- else
-   {
- if (TREE_CODE (TREE_TYPE (aref)) == POINTER_TYPE)
-   t = aref;
- break;
-   }
-   }
-
   first = c_fully_fold (first, false, NULL);
   OMP_CLAUSE_DECL (c) = first;
   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR)
@@ -14339,8 +14315,7 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  break;
}
   tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
-  if (TREE_CODE (t) == COMPONENT_REF || TREE_CODE (t) == ARRAY_REF
- || TREE_CODE (t) == INDIRECT_REF)
+  if (TREE_CODE (t) == COMPONENT_REF)
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH);
   else
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index e7bda6fa060..93ff7cf5e1b 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -5605,10 +5605,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
   if (processing_template_decl && maybe_zero_len)
return false;
 
-  struct dim { tree low_bound, length; };
-  auto_vec dims (num);
-  dims.safe_grow (num);
-
   for (i = num, t = OMP_CLAUSE_DECL (c); i > 0;
   t = TREE_CHAIN (t))
{
@@ -5728,9 +5724,6 @@ handle_omp_array_sections (tree c, enum c_omp_region_type 
ort)
  else
size = size_binop (MULT_EXPR, size, l);
}
-
- dim d = { low_bound, length };
- dims[i] = d;
}
   if (!processing_template_decl)
{
@@ -5782,24 +5775,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  OMP_CLAUSE_DECL (c) = t;
  return false;
}
-
- tree aref = t;
- for (i = 0; i < dims.length (); i++)
-   {
- if (dims[i].length && integer_onep (dims[i].length))
-   {
- tree lb = dims[i].low_bound;
- aref = convert_from_reference (aref);
- aref = build_array_ref (OMP_CLAUSE_LOCATION (c), aref, lb);
-   }
- else
-   {
- if (TREE_CODE (TREE_TYPE (aref)) == POINTER_TYPE)
-   t = aref;
- break;
-   }
-   }
-
  OMP_CLAUSE_DECL (c) = first;
  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR)
return false;
@@ -5841,8 +5816,7 @@ handle_omp_array_sections (tree c, enum c_omp_region_type 
ort)
  bool reference_always_pointer = true;
  tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c),
  OMP_CLAUSE_MAP);
- 

[PATCH 02/14] Revert "Fix references declared in lexically-enclosing OpenACC data region"

2023-06-19 Thread Julian Brown
This reverts commit c9cd2bac6a5127a01c6f47e5636a926ac39b5e21.

2023-06-16  Julian Brown  

gcc/fortran/
Revert:
* trans-openmp.cc (gfc_omp_finish_clause): Guard addition of clauses for
pointers with DECL_P.

gcc/
Revert:
* gimplify.cc (oacc_array_mapping_info): Add REF field.
(gimplify_scan_omp_clauses): Initialise above field for data blocks
passed by reference.
(gomp_oacc_needs_data_present): Handle references.
(gimplify_adjust_omp_clauses_1): Handle references and optional
arguments for variables declared in lexically-enclosing OpenACC data
region.
---
 gcc/fortran/trans-openmp.cc |  2 +-
 gcc/gimplify.cc | 55 +
 2 files changed, 8 insertions(+), 49 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index e55c8292d05..96e91a3bc50 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1611,7 +1611,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   tree c2 = NULL_TREE, c3 = NULL_TREE, c4 = NULL_TREE;
   tree present = gfc_omp_check_optional_argument (decl, true);
   tree orig_decl = NULL_TREE;
-  if (DECL_P (decl) && POINTER_TYPE_P (TREE_TYPE (decl)))
+  if (POINTER_TYPE_P (TREE_TYPE (decl)))
 {
   if (!gfc_omp_privatize_by_reference (decl)
  && !GFC_DECL_GET_SCALAR_POINTER (decl)
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 3729b986801..80f1f3a657f 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -227,7 +227,6 @@ struct oacc_array_mapping_info
   tree mapping;
   tree pset;
   tree pointer;
-  tree ref;
 };
 
 struct gimplify_omp_ctx
@@ -11248,9 +11247,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  }
if (base_ptr
&& OMP_CLAUSE_CODE (base_ptr) == OMP_CLAUSE_MAP
-   && !(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
-&& (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALLOC
-|| OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER))
&& OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_TO_PSET
&& ((OMP_CLAUSE_MAP_KIND (base_ptr)
 == GOMP_MAP_FIRSTPRIVATE_POINTER)
@@ -11269,19 +11265,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
ai.mapping = unshare_expr (c);
ai.pset = pset ? unshare_expr (pset) : NULL;
ai.pointer = unshare_expr (base_ptr);
-   ai.ref = NULL_TREE;
-   if (TREE_CODE (base_addr) == INDIRECT_REF
-   && (TREE_CODE (TREE_TYPE (TREE_OPERAND (base_addr, 0)))
-   == REFERENCE_TYPE))
- {
-   base_addr = TREE_OPERAND (base_addr, 0);
-   tree ref_clause = OMP_CLAUSE_CHAIN (base_ptr);
-   gcc_assert ((OMP_CLAUSE_CODE (ref_clause)
-== OMP_CLAUSE_MAP)
-   && (OMP_CLAUSE_MAP_KIND (ref_clause)
-   == GOMP_MAP_POINTER));
-   ai.ref = unshare_expr (ref_clause);
- }
ctx->decl_data_clause->put (base_addr, ai);
  }
if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE)
@@ -12464,15 +12447,11 @@ gomp_oacc_needs_data_present (tree decl)
   && gimplify_omp_ctxp->region_type != ORT_ACC_KERNELS)
 return NULL;
 
-  tree type = TREE_TYPE (decl);
-  if (TREE_CODE (type) == REFERENCE_TYPE)
-type = TREE_TYPE (type);
-
-  if (TREE_CODE (type) != ARRAY_TYPE
-  && TREE_CODE (type) != POINTER_TYPE
-  && TREE_CODE (type) != RECORD_TYPE
-  && (TREE_CODE (type) != POINTER_TYPE
- || TREE_CODE (TREE_TYPE (type)) != ARRAY_TYPE))
+  if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE
+  && TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE
+  && TREE_CODE (TREE_TYPE (decl)) != RECORD_TYPE
+  && (TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE
+ || TREE_CODE (TREE_TYPE (TREE_TYPE (decl))) != ARRAY_TYPE))
 return NULL;
 
   decl = get_base_address (decl);
@@ -12626,12 +12605,6 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void 
*data)
 {
   tree mapping = array_info->mapping;
   tree pointer = array_info->pointer;
-  gomp_map_kind presence_kind = GOMP_MAP_FORCE_PRESENT;
-  bool no_alloc = (OMP_CLAUSE_CODE (mapping) == OMP_CLAUSE_MAP
-  && OMP_CLAUSE_MAP_KIND (mapping) == GOMP_MAP_IF_PRESENT);
-
-  if (no_alloc || omp_check_optional_argument (decl, false))
-presence_kind = GOMP_MAP_IF_PRESENT;
 
   if (code == OMP_CLAUSE_FIRSTPRIVATE)
/* Oops, we have the wrong type of clause.  Rebuild it.  */
@@ -12639,15 +12612,14 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, 
void *data)
 

[PATCH 01/14] Revert "Assumed-size arrays with non-lexical data mappings"

2023-06-19 Thread Julian Brown
This reverts commit 72733f6e6f6ec1bb9884fea8bfbebd3de03d9374.

2023-06-16  Julian Brown  

gcc/
Revert:
* gimplify.cc (gimplify_adjust_omp_clauses_1): Raise error for
assumed-size arrays in map clauses for Fortran/OpenMP.
* omp-low.cc (lower_omp_target): Set the size of assumed-size Fortran
arrays to one to allow use of data already mapped on the offload device.

gcc/fortran/
Revert:
* trans-openmp.cc (gfc_omp_finish_clause): Change clauses mapping
assumed-size arrays to use the GOMP_MAP_FORCE_PRESENT map type.
---
 gcc/fortran/trans-openmp.cc | 22 +-
 gcc/gimplify.cc | 14 --
 gcc/omp-low.cc  |  5 -
 3 files changed, 9 insertions(+), 32 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index e8f3b24e5f8..e55c8292d05 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1588,18 +1588,10 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   tree decl = OMP_CLAUSE_DECL (c);
 
   /* Assumed-size arrays can't be mapped implicitly, they have to be mapped
- explicitly using array sections.  For OpenACC this restriction is lifted
- if the array has already been mapped:
-
-   - Using a lexically-enclosing data region: in that case we see the
- GOMP_MAP_FORCE_PRESENT mapping kind here.
-
-   - Using a non-lexical data mapping ("acc enter data").
-
- In the latter case we change the mapping type to GOMP_MAP_FORCE_PRESENT.
- This raises an error for OpenMP in the caller
- (gimplify.c:gimplify_adjust_omp_clauses_1).  OpenACC will raise a runtime
- error if the implicitly-referenced assumed-size array is not mapped.  */
+ explicitly using array sections.  An exception is if the array is
+ mapped explicitly in an enclosing data construct for OpenACC, in which
+ case we see GOMP_MAP_FORCE_PRESENT here and do not need to raise an
+ error.  */
   if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_PRESENT
   && TREE_CODE (decl) == PARM_DECL
   && GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
@@ -1607,7 +1599,11 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),
GFC_TYPE_ARRAY_RANK (TREE_TYPE (decl)) - 1)
 == NULL)
-OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_FORCE_PRESENT);
+{
+  error_at (OMP_CLAUSE_LOCATION (c),
+   "implicit mapping of assumed size array %qD", decl);
+  return;
+}
 
   if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR)
 return;
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 09c596f026e..3729b986801 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -12828,26 +12828,12 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, 
void *data)
   *list_p = clause;
   struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp;
   gimplify_omp_ctxp = ctx->outer_context;
-  gomp_map_kind kind = (code == OMP_CLAUSE_MAP) ? OMP_CLAUSE_MAP_KIND (clause)
-   : (gomp_map_kind) GOMP_MAP_LAST;
   /* Don't call omp_finish_clause on implicitly added OMP_CLAUSE_PRIVATE
  in simd.  Those are only added for the local vars inside of simd body
  and they don't need to be e.g. default constructible.  */
   if (code != OMP_CLAUSE_PRIVATE || ctx->region_type != ORT_SIMD) 
 lang_hooks.decls.omp_finish_clause (clause, pre_p,
(ctx->region_type & ORT_ACC) != 0);
-  /* Allow OpenACC to have implicit assumed-size arrays via FORCE_PRESENT,
- which should work as long as the array has previously been mapped
- explicitly on the target (e.g. by "enter data").  Raise an error for
- OpenMP.  */
-  if (lang_GNU_Fortran ()
-  && code == OMP_CLAUSE_MAP
-  && (ctx->region_type & ORT_ACC) == 0
-  && kind == GOMP_MAP_TOFROM
-  && OMP_CLAUSE_MAP_KIND (clause) == GOMP_MAP_FORCE_PRESENT)
-error_at (OMP_CLAUSE_LOCATION (clause),
- "implicit mapping of assumed size array %qD",
- OMP_CLAUSE_DECL (clause));
   if (gimplify_omp_ctxp)
 for (; clause != chain; clause = OMP_CLAUSE_CHAIN (clause))
   if (OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_MAP
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 3424eba2217..59143d8efe5 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14353,11 +14353,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
  s = OMP_CLAUSE_SIZE (c);
if (s == NULL_TREE)
  s = TYPE_SIZE_UNIT (TREE_TYPE (ovar));
-   /* Fortran assumed-size arrays have zero size because the type is
-  incomplete.  Set the size to one to allow the runtime to remap
-  any existing data that is already present on the accelerator.  */
-   if (s == NULL_TREE && is_gimple_omp_oacc (ctx->stmt))
- s = integer_one_node;

[PATCH 03/14] Revert "Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)"

2023-06-19 Thread Julian Brown
This reverts commit a84b89b8f070f1efe86ea347e98d57e6bc32ae2d.

Relevant tests are temporarily disabled or XFAILed.

2023-06-16  Julian Brown  

gcc/
Revert:
* gimplify.cc (oacc_array_mapping_info): New struct.
(gimplify_omp_ctx): Add decl_data_clause hash map.
(new_omp_context): Zero-initialise above.
(delete_omp_context): Delete above if allocated.
(gimplify_scan_omp_clauses): Scan for array mappings on data constructs,
and record in above map.
(gomp_oacc_needs_data_present): New function.
(gimplify_adjust_omp_clauses_1): Handle data mappings (e.g. array
slices) declared in lexically-enclosing data constructs.
* omp-low.cc (lower_omp_target): Allow decl for bias not to be present
in OpenACC context.

gcc/fortran/
Revert:
* trans-openmp.cc: Handle implicit "present".

gcc/testsuite/
* c-c++-common/goacc/acc-data-chain.c: Partly disable test.
* gfortran.dg/goacc/pr70828.f90: Likewise.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/pr70828.c: XFAIL test.
* testsuite/libgomp.oacc-c-c++-common/pr70828-2.c: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-2.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-3.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-4.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-5.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-6.f90: XFAIL test.
---
 gcc/fortran/trans-openmp.cc   |  10 +-
 gcc/gimplify.cc   | 143 +-
 gcc/omp-low.cc|  10 +-
 .../c-c++-common/goacc/acc-data-chain.c   |   4 +-
 gcc/testsuite/gfortran.dg/goacc/pr70828.f90   |   3 +-
 .../libgomp.oacc-c-c++-common/pr70828-2.c |   2 +
 .../libgomp.oacc-c-c++-common/pr70828.c   |   2 +
 .../libgomp.oacc-fortran/pr70828-2.f90|   2 +
 .../libgomp.oacc-fortran/pr70828-3.f90|   2 +
 .../libgomp.oacc-fortran/pr70828-4.f90|   2 +
 .../libgomp.oacc-fortran/pr70828-5.f90|   2 +
 .../libgomp.oacc-fortran/pr70828-6.f90|   2 +
 .../libgomp.oacc-fortran/pr70828.f90  |   2 +
 13 files changed, 28 insertions(+), 158 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 96e91a3bc50..809b96bc220 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1587,13 +1587,9 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
 
   tree decl = OMP_CLAUSE_DECL (c);
 
-  /* Assumed-size arrays can't be mapped implicitly, they have to be mapped
- explicitly using array sections.  An exception is if the array is
- mapped explicitly in an enclosing data construct for OpenACC, in which
- case we see GOMP_MAP_FORCE_PRESENT here and do not need to raise an
- error.  */
-  if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_PRESENT
-  && TREE_CODE (decl) == PARM_DECL
+  /* Assumed-size arrays can't be mapped implicitly, they have to be
+ mapped explicitly using array sections.  */
+  if (TREE_CODE (decl) == PARM_DECL
   && GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
   && GFC_TYPE_ARRAY_AKIND (TREE_TYPE (decl)) == GFC_ARRAY_UNKNOWN
   && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 80f1f3a657f..e3384c7f65b 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -218,17 +218,6 @@ enum gimplify_defaultmap_kind
   GDMK_POINTER
 };
 
-/* Used to record clauses representing array slices on data directives that
-   may affect implicit mapping semantics on enclosed OpenACC parallel/kernels
-   regions.  PSET is used for Fortran array slices with array descriptors,
-   or NULL otherwise.  */
-struct oacc_array_mapping_info
-{
-  tree mapping;
-  tree pset;
-  tree pointer;
-};
-
 struct gimplify_omp_ctx
 {
   struct gimplify_omp_ctx *outer_context;
@@ -250,7 +239,6 @@ struct gimplify_omp_ctx
   bool in_for_exprs;
   bool ompacc;
   int defaultmap[5];
-  hash_map *decl_data_clause;
 };
 
 struct privatize_reduction
@@ -485,7 +473,6 @@ new_omp_context (enum omp_region_type region_type)
   c->defaultmap[GDMK_AGGREGATE] = GOVD_MAP;
   c->defaultmap[GDMK_ALLOCATABLE] = GOVD_MAP;
   c->defaultmap[GDMK_POINTER] = GOVD_MAP;
-  c->decl_data_clause = NULL;
 
   return c;
 }
@@ -498,8 +485,6 @@ delete_omp_context (struct gimplify_omp_ctx *c)
   splay_tree_delete (c->variables);
   delete c->privatized_types;
   c->loop_iter_var.release ();
-  if (c->decl_data_clause)
-delete c->decl_data_clause;
   XDELETE (c);
 }
 
@@ -11235,41 +11220,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
case OMP_TARGET:
  break;
case OACC_DATA:
- {
-   tree base_ptr = OMP_CLAUSE_CHAIN (c);
-   tree pset = NULL;
-   

[PATCH 07/14] OpenMP: implicitly map base pointer for array-section pointer components

2023-06-19 Thread Julian Brown
Following from discussion in:

  https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html

and:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608100.html

and also upstream OpenMP issue 342, this patch changes mapping for array
sections of pointer components on compute regions like this:

  #pragma omp target map(s.ptr[0:10])
  {
...use of 's'...
  }

so the base pointer 's.ptr' is implicitly mapped, and thus pointer
attachment happens.  This is subtly different in the "enter data"
case, e.g:

  #pragma omp target enter data map(s.ptr[0:10])

if 's.ptr' (or the whole of 's') is not present on the target before
the directive is executed, the array section is copied to the target
but pointer attachment does *not* take place, since 's' (or 's.ptr')
is not mapped implicitly for "enter data".

To get a pointer attachment with "enter data", you can do, e.g:

  #pragma omp target enter data map(s.ptr, s.ptr[0:10])

  #pragma omp target
  {
...implicit use of 's'...
  }

That is, once the attachment has happened, implicit mapping of 's'
and uses of 's.ptr[...]' work correctly in the target region.

ChangeLog

2022-12-12  Julian Brown  

gcc/
* gimplify.cc (omp_accumulate_sibling_list): Don't require
explicitly-mapped base pointer for compute regions.

gcc/testsuite/
* c-c++-comon/gomp/target-implicit-map-2.c: Update expected scan output.

libgomp/
* testsuite/libgomp.c-c++-common/target-implicit-map-2.c: Fix missing
"free".
* testsuite/libgomp.c-c++-common/target-implicit-map-3.c: New test.
* testsuite/libgomp.c-c++-common/target-map-zlas-1.c: New test.
* testsuite/libgomp.c/target-22.c: Remove explicit base pointer
mappings.
---
 gcc/gimplify.cc   |  9 ++--
 .../c-c++-common/gomp/target-implicit-map-2.c |  3 +-
 .../target-implicit-map-2.c   |  2 +
 .../target-implicit-map-5.c   | 50 +++
 .../libgomp.c-c++-common/target-map-zlas-1.c  | 36 +
 libgomp/testsuite/libgomp.c/target-22.c   |  3 +-
 6 files changed, 97 insertions(+), 6 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-implicit-map-5.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-map-zlas-1.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 9be5d9c5328..6a43c792450 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -10696,6 +10696,7 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
   poly_int64 cbitpos;
   tree ocd = OMP_CLAUSE_DECL (grp_end);
   bool openmp = !(region_type & ORT_ACC);
+  bool target = (region_type & ORT_TARGET) != 0;
   tree *continue_at = NULL;
 
   while (TREE_CODE (ocd) == ARRAY_REF)
@@ -10800,9 +10801,9 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
}
 
  /* For OpenMP semantics, we don't want to implicitly allocate
-space for the pointer here.  A FRAGILE_P node is only being
-created so that omp-low.cc is able to rewrite the struct
-properly.
+space for the pointer here for non-compute regions (e.g. "enter
+data").  A FRAGILE_P node is only being created so that
+omp-low.cc is able to rewrite the struct properly.
 For references (to pointers), we want to actually allocate the
 space for the reference itself in the sorted list following the
 struct node.
@@ -10810,6 +10811,7 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
 mapping of the attachment point, but not otherwise.  */
  if (*fragile_p
  || (openmp
+ && !target
  && attach_detach
  && TREE_CODE (TREE_TYPE (ocd)) == POINTER_TYPE
  && !OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED (grp_end)))
@@ -11122,6 +11124,7 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
 
  if (*fragile_p
  || (openmp
+ && !target
  && attach_detach
  && TREE_CODE (TREE_TYPE (ocd)) == POINTER_TYPE
  && !OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED (grp_end)))
diff --git a/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c 
b/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c
index 5ba1d7efe08..72df5b1 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c
@@ -49,4 +49,5 @@ main (void)
 
 /* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(tofrom:a 
\[len: [0-9]+\]\[implicit\]\)} "gimple" } } */
 
-/* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(struct:a 
\[len: 1\]\) map\(alloc:a\.ptr \[len: 0\]\) map\(tofrom:\*_[0-9]+ \[len: 
[0-9]+\]\) map\(attach:a\.ptr \[bias: 0\]\)} "gimple" } } */
+/* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(struct:a 
\[len: 1\]

[PATCH 05/14] OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in {c_}finish_omp_clause

2023-06-19 Thread Julian Brown
This patch trivially adds braces and reindents the
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza in
c_finish_omp_clause and finish_omp_clause, in preparation for the
following patch (to clarify the diff a little).

2022-09-13  Julian Brown  

gcc/c/
* c-typeck.cc (c_finish_omp_clauses): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.

gcc/cp/
* semantics.cc (finish_omp_clause): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.
---
 gcc/c/c-typeck.cc   | 615 +-
 gcc/cp/semantics.cc | 788 ++--
 2 files changed, 707 insertions(+), 696 deletions(-)

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 9591d67251e..2cfe2174bab 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -15520,321 +15520,326 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
case OMP_CLAUSE_TO:
case OMP_CLAUSE_FROM:
case OMP_CLAUSE__CACHE_:
- t = OMP_CLAUSE_DECL (c);
- if (TREE_CODE (t) == TREE_LIST)
-   {
- grp_start_p = pc;
- grp_sentinel = OMP_CLAUSE_CHAIN (c);
+ {
+   t = OMP_CLAUSE_DECL (c);
+   if (TREE_CODE (t) == TREE_LIST)
+ {
+   grp_start_p = pc;
+   grp_sentinel = OMP_CLAUSE_CHAIN (c);
 
- if (handle_omp_array_sections (c, ort))
-   remove = true;
- else
-   {
- t = OMP_CLAUSE_DECL (c);
- if (!omp_mappable_type (TREE_TYPE (t)))
-   {
- error_at (OMP_CLAUSE_LOCATION (c),
-   "array section does not have mappable type "
-   "in %qs clause",
-   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
- remove = true;
-   }
- else if (TYPE_ATOMIC (TREE_TYPE (t)))
-   {
- error_at (OMP_CLAUSE_LOCATION (c),
-   "%<_Atomic%> %qE in %qs clause", t,
-   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
- remove = true;
-   }
- while (TREE_CODE (t) == ARRAY_REF)
-   t = TREE_OPERAND (t, 0);
- if (TREE_CODE (t) == COMPONENT_REF
- && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
-   {
- do
-   {
- t = TREE_OPERAND (t, 0);
- if (TREE_CODE (t) == MEM_REF
- || TREE_CODE (t) == INDIRECT_REF)
-   {
- t = TREE_OPERAND (t, 0);
- STRIP_NOPS (t);
- if (TREE_CODE (t) == POINTER_PLUS_EXPR)
-   t = TREE_OPERAND (t, 0);
-   }
-   }
- while (TREE_CODE (t) == COMPONENT_REF
-|| TREE_CODE (t) == ARRAY_REF);
-
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
- && OMP_CLAUSE_MAP_IMPLICIT (c)
- && (bitmap_bit_p (&map_head, DECL_UID (t))
- || bitmap_bit_p (&map_field_head, DECL_UID (t))
- || bitmap_bit_p (&map_firstprivate_head,
-  DECL_UID (t
-   {
- remove = true;
- break;
-   }
- if (bitmap_bit_p (&map_field_head, DECL_UID (t)))
-   break;
- if (bitmap_bit_p (&map_head, DECL_UID (t)))
-   {
- if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP)
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in motion "
- "clauses", t);
- else if (ort == C_ORT_ACC)
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in data "
- "clauses", t);
- else
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in map "
- "clauses", t);
- remove = true;
-   }
- else
-   {
- bitmap_set_bit (&map_head, DECL_UID (t));
- bitmap_set_bit (&map_field_head, DECL_UID (t));
-   }
-   }
-  

[PATCH 13/14] OpenACC: Allow implicit uses of assumed-size arrays in offload regions

2023-06-19 Thread Julian Brown
This patch reimplements the functionality of the previously-reverted
patch "Assumed-size arrays with non-lexical data mappings". The purpose
is to support implicit uses of assumed-size arrays for Fortran when those
arrays have already been mapped on the target some other way (e.g. by
"acc enter data").

This relates to upstream OpenACC issue 489 (not yet resolved).

2023-06-16  Julian Brown  

gcc/fortran/
* trans-openmp.cc (gfc_omp_finish_clause): Treat implicitly-mapped
assumed-size arrays as zero-sized for OpenACC, rather than an error.

gcc/testsuite/
* gfortran.dg/goacc/assumed-size.f90: Don't expect error.

libgomp/
* testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90: New
test.
* testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90: New
test.
---
 gcc/fortran/trans-openmp.cc   | 16 ++--
 .../gfortran.dg/goacc/assumed-size.f90|  4 +-
 .../nonlexical-assumed-size-1.f90 | 28 +
 .../nonlexical-assumed-size-2.f90 | 40 +++
 4 files changed, 82 insertions(+), 6 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90
 create mode 100644 
libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 819d79cda28..230cebf250b 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1587,6 +1587,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
 return;
 
   tree decl = OMP_CLAUSE_DECL (c);
+  bool assumed_size = false;
 
   /* Assumed-size arrays can't be mapped implicitly, they have to be
  mapped explicitly using array sections.  */
@@ -1597,9 +1598,14 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
GFC_TYPE_ARRAY_RANK (TREE_TYPE (decl)) - 1)
 == NULL)
 {
-  error_at (OMP_CLAUSE_LOCATION (c),
-   "implicit mapping of assumed size array %qD", decl);
-  return;
+  if (openacc)
+   assumed_size = true;
+  else
+   {
+ error_at (OMP_CLAUSE_LOCATION (c),
+   "implicit mapping of assumed size array %qD", decl);
+ return;
+   }
 }
 
   if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR)
@@ -1654,7 +1660,9 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   else
{
  OMP_CLAUSE_DECL (c) = decl;
- OMP_CLAUSE_SIZE (c) = NULL_TREE;
+ OMP_CLAUSE_SIZE (c) = assumed_size ? size_zero_node : NULL_TREE;
+ if (assumed_size)
+   OMP_CLAUSE_MAP_MAYBE_ZERO_LENGTH_ARRAY_SECTION (c) = 1;
}
   if (TREE_CODE (TREE_TYPE (orig_decl)) == REFERENCE_TYPE
  && (GFC_DECL_GET_SCALAR_POINTER (orig_decl)
diff --git a/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90 
b/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90
index 4fced2e70c9..12f44c4743a 100644
--- a/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90
@@ -4,7 +4,8 @@
 ! exit data, respectively.
 
 ! This does not appear to be supported by the OpenACC standard as of version
-! 3.0.  Check for an appropriate error message.
+! 3.0.  There is however real-world code that relies on this working, so we
+! make an attempt to support it.
 
 program test
   implicit none
@@ -26,7 +27,6 @@ subroutine dtest (a, n)
   !$acc enter data copyin(a(1:n))
 
   !$acc parallel loop
-! { dg-error {implicit mapping of assumed size array 'a'} "" { target *-*-* } 
.-1 }
   do i = 1, n
  a(i) = i
   end do
diff --git 
a/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90
new file mode 100644
index 000..4b61e1cee9b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90
@@ -0,0 +1,28 @@
+! { dg-do run }
+
+program p
+implicit none
+integer :: myarr(10)
+
+myarr = 0
+
+call subr(myarr)
+
+if (myarr(5).ne.5) stop 1
+
+contains
+
+subroutine subr(arr)
+implicit none
+integer :: arr(*)
+
+!$acc enter data copyin(arr(1:10))
+
+!$acc serial
+arr(5) = 5
+!$acc end serial
+
+!$acc exit data copyout(arr(1:10))
+
+end subroutine subr
+end program p
diff --git 
a/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90
new file mode 100644
index 000..daf7089915f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90
@@ -0,0 +1,40 @@
+! { dg-do run }
+
+program p
+implicit none
+integer :: myarr(10)
+
+myarr = 0
+
+call subr(myarr)
+
+if (myarr(5).ne.5) stop 1
+
+contains
+
+subroutine subr(arr)
+implicit none
+integer :: arr(*)
+
+! At first glance, it might not be obvious how this works.  The "enter data"
+! and "exit data" operations expand to a pair o

  1   2   >