date:20240711

Re: [PATCH, rs6000] Add TARGET_FLOAT128_HW guard for quad-precision insns

2024-07-11 Thread Kewen.Lin

Hi Haochen,

on 2024/7/11 13:50, HAO CHEN GUI wrote:
> Hi,
>   This patch adds TARGET_FLOAT128_HW into pattern conditions for quad-
> precision insns. Also it removes FLOAT128_IEEE_P check from pattern
> conditions if the mode of pattern is IEEE128 as the mode iterator -
> IEEE128 already checks with FLOAT128_IEEE_P.

I noticed that there are several patterns with similar useless 
FLOAT128_IBM_P condition, could you make a separated patch for both
redundant FLOAT128_IBM_P and FLOAT128_IEEE_P removal?  Then it can
be separated from this TARGET_FLOAT128_HW change and become purely
a NFC patch.

> 
>   For test case float128-cmp2-runnable.c, it should be guarded with
> ppc_float128_hw as it calls qp insns. The p9vector_hw is covered with
> ppc_float128_hw, so it's removed.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Add TARGET_FLOAT128_HW guard for quad-precision insns
> 
> gcc/
>   * config/rs6000/rs6000.md (*fpmask, floatdidf2, floatti2,
>   floatunsti2, fix_truncti2): Add guard
>   TARGET_FLOAT128_HW.
>   (add3, sub3, mul3, div3, sqrt2,
>   copysign3_hard, copysign3_soft, @neg2_hw,
>   @abs2_hw, *nabs2_hw, fma4_hw, *fms4_hw,
>   *nfma4_hw, *nfms4_hw,
>   extend2_hw, truncdf2_hw,
>   truncsf2_hw, fix_trunc2,
>   *fix_trunc2_mem,
>   float_si2_hw, floatuns_di2_hw, floor2,
>   ceil2, btrunc2, round2, add3_odd,
>   sub3_odd, mul3_odd, div3_odd, sqrt2_odd,
>   fma4_odd, *fms4_odd, *nfma4_odd,
>   *nfms4_odd, truncdf2_odd, *cmp_hw for IEEE128):
>   Remove guard FLOAT128_IEEE_P.
>   * config/rs6000/vsx.md (xsxexpqp__,
>   xsxsigqp__, xsiexpqpf_,
>   xsiexpqp__, xscmpexpqp__,
>   *xscmpexpqp, xststdcnegqp_): Add guard TARGET_FLOAT128_HW.
>   (xststdc_, *xststdc_, xststdc_): Add guard
>   TARGET_FLOAT128_HW for the IEEE128 mode.
> 
> gcc/testsuite/
>   * testsuite/gcc.target/powerpc/float128-cmp2-runnable.c: Replace
>   ppc_float128_sw with ppc_float128_hw and remove p9vector_hw.
> 
> patch.diff

snip...

> "xscmpuqp %0,%1,%2"
>[(set_attr "type" "veccmp")
> (set_attr "size" "128")])
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 56d1d8c737e..b5c143b1523 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5157,7 +5157,7 @@ (define_insn "xsxexpqp__"
>   (unspec:V2DI_DI
> [(match_operand:IEEE128 1 "altivec_register_operand" "v")]
>UNSPEC_VSX_SXEXPDP))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR && TARGET_FLOAT128_HW"

TARGET_FLOAT128_HW checks ISA_3_0_MASKS_IEEE which has OPTION_MASK_P9_VECTOR,
so I think TARGET_P9_VECTOR is redundant here.

>"xsxexpqp %0,%1"
>[(set_attr "type" "vecmove")])
> 
> @@ -5176,7 +5176,7 @@ (define_insn "xsxsigqp__"
>   (unspec:VEC_TI [(match_operand:IEEE128 1
>   "altivec_register_operand" "v")]
>UNSPEC_VSX_SXSIG))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR && TARGET_FLOAT128_HW"

Ditto.

>"xsxsigqp %0,%1"
>[(set_attr "type" "vecmove")])
> 
> @@ -5196,7 +5196,7 @@ (define_insn "xsiexpqpf_"
>[(match_operand:IEEE128 1 "altivec_register_operand" "v")
> (match_operand:DI 2 "altivec_register_operand" "v")]
>UNSPEC_VSX_SIEXPQP))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR && TARGET_FLOAT128_HW"

Ditto.

>"xsiexpqp %0,%1,%2"
>[(set_attr "type" "vecmove")])
> 
> @@ -5208,7 +5208,7 @@ (define_insn "xsiexpqp__"
>(match_operand:V2DI_DI 2
> "altivec_register_operand" "v")]
>UNSPEC_VSX_SIEXPQP))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR && TARGET_FLOAT128_HW"

Ditto.

>"xsiexpqp %0,%1,%2"
>[(set_attr "type" "vecmove")])
> 
> @@ -5278,7 +5278,7 @@ (define_expand "xscmpexpqp__"
> (set (match_operand:SI 0 "register_operand" "=r")
>   (CMP_TEST:SI (match_dup 3)
>(const_int 0)))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR && TARGET_FLOAT128_HW"

Ditto.

>  {
>if ( == UNORDERED && !HONOR_NANS (mode))
>  {
> @@ -5296,7 +5296,7 @@ (define_insn "*xscmpexpqp"
> (match_operand:IEEE128 2 "altivec_register_operand" 
> "v")]
> UNSPEC_VSX_SCMPEXPQP)
>(match_operand:SI 3 "zero_constant" "j")))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR && TARGET_FLOAT128_HW"

Ditto.

>"xscmpexpqp %0,%1,%2"
>[(set_attr "type" "fpcompare")])
> 
> @@ -5315,7 +5315,8 @@ (define_expand "xststdc_"
> (set (match_operand:SI 0 "register_operand" "=r")
>   (eq:SI (match_dup 3)
>  (const_int 0)))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR
> +   && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)"
>  {
>operands[3] = gen_reg_rtx (CCFPmode);
>operands[4] = CONST0_RTX (SImode);
> @@ -5324,7 +5325,9 @@ (define_expand "xststdc_"
>  (define_expand "isinf2"
>[(use (match_oper

RE: [PATCH 10/10] autovectorizer: Test autovectorization of different dot-prod modes.

2024-07-11 Thread Tamar Christina

Hi Victor,

> -Original Message-
> From: Victor Do Nascimento 
> Sent: Wednesday, July 10, 2024 3:06 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Richard Earnshaw
> ; Victor Do Nascimento
> 
> Subject: [PATCH 10/10] autovectorizer: Test autovectorization of different 
> dot-
> prod modes.
> 
> From: Victor Do Nascimento 
> 
> Given the novel treatment of the dot product optab as a conversion we
> are now able to target, for a given architecture, different
> relationships between output modes and input modes.
> 
> This is made clearer by way of example. Previously, on AArch64, the
> following loop was vectorizable:
> 
> uint32_t udot4(int n, uint8_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i sum += data[i] * data[i];
>   return sum;
> }
> 
> while the following wasn't:
> 
> uint32_t udot2(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i sum += data[i] * data[i];
>   return sum;
> }
> 
> Under the new treatment of the dot product optab, they are both now
> vectorizable.
> 
> This adds the relevant target-agnostic check to ensure this behaviour
> in the autovectorizer.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../gcc.dg/vect/vect-dotprod-twoway.c | 38 +++
>  1 file changed, 38 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> new file mode 100644
> index 000..5caa7b81fce
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* Ensure both the two-way and four-way dot products are autovectorized.  */
> +#include 
> +
> +uint32_t udot4(int n, uint8_t* data) {
> +  uint32_t sum = 0;
> +  for (int i=0; i +sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +int32_t sdot4(int n, int8_t* data) {
> +  int32_t sum = 0;
> +  for (int i=0; i +sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +uint32_t udot2(int n, uint16_t* data) {
> +  uint32_t sum = 0;
> +  for (int i=0; i +sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +int32_t sdot2(int n, int16_t* data) {
> +  int32_t sum = 0;
> +  for (int i=0; i +sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */

These tests only test that you have vectorized the loops, not that the loop was 
vectorized
using dotprod.  I think you want to have a scan for DOT_PROD_EXPR as well, 
gated to the
targets that support two-way dot prod.

Cheers,
Tamar

> --
> 2.34.1

[PATCH 1/3 v3] RISC-V: Add vector type of BFloat16 format

2024-07-11 Thread Feng Wang

v3: Rebase
v2: Rebase
The vector type of BFloat16 format is added in this patch,
subsequent extensions to zvfbfmin and zvfwma need to be based
on this patch.

Signed-off-by: Feng Wang 
gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (bfloat16_type):
Generate bf16 vector_type and scalar_type in DEF_RVV_TYPE_INDEX.
(bfloat16_wide_type): Ditto.
(same_ratio_eew_bf16_type): Ditto.
(main): Ditto.
* config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
(RVV_WHOLE_MODES): Add vector type for BFloat16.
(RVV_FRACT_MODE): Ditto.
(RVV_NF4_MODES): Ditto.
(RVV_NF8_MODES): Ditto.
(RVV_NF2_MODES): Ditto.
* config/riscv/riscv-vector-builtins-types.def (vbfloat16mf4_t):
(vbfloat16mf2_t): Add builtin vector type for BFloat16.
(vbfloat16m1_t): Ditto.
(vbfloat16m2_t): Ditto.
(vbfloat16m4_t): Ditto.
(vbfloat16m8_t): Ditto.
(vbfloat16mf4x2_t): Ditto.
(vbfloat16mf4x3_t): Ditto.
(vbfloat16mf4x4_t): Ditto.
(vbfloat16mf4x5_t): Ditto.
(vbfloat16mf4x6_t): Ditto.
(vbfloat16mf4x7_t): Ditto.
(vbfloat16mf4x8_t): Ditto.
(vbfloat16mf2x2_t): Ditto.
(vbfloat16mf2x3_t): Ditto.
(vbfloat16mf2x4_t): Ditto.
(vbfloat16mf2x5_t): Ditto.
(vbfloat16mf2x6_t): Ditto.
(vbfloat16mf2x7_t): Ditto.
(vbfloat16mf2x8_t): Ditto.
(vbfloat16m1x2_t): Ditto.
(vbfloat16m1x3_t): Ditto.
(vbfloat16m1x4_t): Ditto.
(vbfloat16m1x5_t): Ditto.
(vbfloat16m1x6_t): Ditto.
(vbfloat16m1x7_t): Ditto.
(vbfloat16m1x8_t): Ditto.
(vbfloat16m2x2_t): Ditto.
(vbfloat16m2x3_t): Ditto.
(vbfloat16m2x4_t): Ditto.
(vbfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (check_required_extensions):
Add required_ext checking for BFloat16.
* config/riscv/riscv-vector-builtins.def (vbfloat16mf4_t):
Add vector_type for BFloat16 in builtins.def.
(vbfloat16mf4x2_t): Ditto.
(vbfloat16mf4x3_t): Ditto.
(vbfloat16mf4x4_t): Ditto.
(vbfloat16mf4x5_t): Ditto.
(vbfloat16mf4x6_t): Ditto.
(vbfloat16mf4x7_t): Ditto.
(vbfloat16mf4x8_t): Ditto.
(vbfloat16mf2_t): Ditto.
(vbfloat16mf2x2_t): Ditto.
(vbfloat16mf2x3_t): Ditto.
(vbfloat16mf2x4_t): Ditto.
(vbfloat16mf2x5_t): Ditto.
(vbfloat16mf2x6_t): Ditto.
(vbfloat16mf2x7_t): Ditto.
(vbfloat16mf2x8_t): Ditto.
(vbfloat16m1_t): Ditto.
(vbfloat16m1x2_t): Ditto.
(vbfloat16m1x3_t): Ditto.
(vbfloat16m1x4_t): Ditto.
(vbfloat16m1x5_t): Ditto.
(vbfloat16m1x6_t): Ditto.
(vbfloat16m1x7_t): Ditto.
(vbfloat16m1x8_t): Ditto.
(vbfloat16m2_t): Ditto.
(vbfloat16m2x2_t): Ditto.
(vbfloat16m2x3_t): Ditto.
(vbfloat16m2x4_t): Ditto.
(vbfloat16m4_t): Ditto.
(vbfloat16m4x2_t): Ditto.
(vbfloat16m8_t): Ditto.
(double_trunc_bfloat_scalar): Add scalar_type def for BFloat16.
(double_trunc_bfloat_vector): Add vector_type def for BFloat16.
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ELEN_BF_16):
Add required defination of BFloat16 ext.
* config/riscv/riscv-vector-switch.def (ENTRY):
Add vector_type information for BFloat16.
(TUPLE_ENTRY): Add tuple vector_type information for BFloat16.

---
 gcc/config/riscv/genrvv-type-indexer.cc   | 115 ++
 gcc/config/riscv/riscv-modes.def  |  30 -
 .../riscv/riscv-vector-builtins-types.def |  50 
 gcc/config/riscv/riscv-vector-builtins.cc |   7 +-
 gcc/config/riscv/riscv-vector-builtins.def|  55 -
 gcc/config/riscv/riscv-vector-builtins.h  |   1 +
 gcc/config/riscv/riscv-vector-switch.def  |  36 ++
 7 files changed, 291 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
b/gcc/config/riscv/genrvv-type-indexer.cc
index 27cbd14982c..8626ddeaaa8 100644
--- a/gcc/config/riscv/genrvv-type-indexer.cc
+++ b/gcc/config/riscv/genrvv-type-indexer.cc
@@ -117,6 +117,42 @@ inttype (unsigned sew, int lmul_log2, unsigned nf, bool 
unsigned_p)
   return mode.str ();
 }
 
+std::string
+bfloat16_type (int lmul_log2)
+{
+  if (!valid_type (16, lmul_log2, /*float_t*/ true))
+return "INVALID";
+
+  std::stringstream mode;
+  mode << "vbfloat16" << to_lmul (lmul_log2) << "_t";
+  return mode.str ();
+}
+
+std::string
+bfloat16_wide_type (int lmul_log2)
+{
+  if (!valid_type (32, lmul_log2, /*float_t*/ true))
+return "INVALID";
+
+  std::stringstream mode;
+  mode << "vfloat32" << to_lmul (lmul_log2) << "_t";
+  return mode.str ();
+}
+
+std::string
+bfloat16_type (int lmul_log2, unsigned nf)
+{
+  if (!valid_type (16, lmul_log2, nf, /*float_t*/ true))
+return "INVALID";
+
+  s

[PATCH 1/2] gcov: Cache source files

2024-07-11 Thread Jørgen Kvalsvik

Cache the source files as they are read, rather than discarding them at
the end of output_lines (), and move the reading of the source file to
the new function slurp.

This patch does not really change anything other than moving the file
reading out of output_file, but set gcov up for more interaction with
the source file. The motvating example is reporting coverage on
functions from different source files, notably C++ headers and
((always_inline)).

Here is an example of what gcov does today:

hello.h:
inline __attribute__((always_inline))
int hello (const char *s)
{
  if (s)
printf ("hello, %s!\n", s);
  else
printf ("hello, world!\n");
  return 0;
}

hello.c:
int notmain(const char *entity)
{
  return hello (entity);
}

int main()
{
  const char *empty = 0;
  if (!empty)
hello (empty);
  else
puts ("Goodbye!");
}

$ gcov -abc hello
function notmain called 0 returned 0% blocks executed 0%
#:4:int notmain(const char *entity)
%:4-block 2
branch  0 never executed (fallthrough)
branch  1 never executed
-:5:{
#:6:  return hello (entity);
%:6-block 7
-:7:}

Clearly there is a branch in notmain, but the branch comes from the
inlining of hello. This is not very obvious from looking at the output.
Here is hello.h.gcov:

-:3:inline __attribute__((always_inline))
-:4:int hello (const char *s)
-:5:{
#:6:  if (s)
%:6-block 3
branch  0 never executed (fallthrough)
branch  1 never executed
%:6-block 2
branch  2 never executed (fallthrough)
branch  3 never executed
#:7:printf ("hello, %s!\n", s);
%:7-block 4
call0 never executed
%:7-block 3
call1 never executed
-:8:  else
#:9:printf ("hello, world!\n");
%:9-block 5
call0 never executed
%:9-block 4
call1 never executed
#:   10:  return 0;
%:   10-block 6
%:   10-block 5
-:   11:}

The blocks from the different call sites have all been interleaved.

The reporting could tuned be to list the inlined function, too, like
this:

1:4:int notmain(const char *entity)
-: == inlined from hello.h ==
1:6:  if (s)
branch  0 taken 0 (fallthrough)
branch  1 taken 1
#:7:printf ("hello, %s!\n", s);
%:7-block 3
call0 never executed
-:8:  else
1:9:printf ("hello, world!\n");
1:9-block 4
call0 returned 1
1:   10:  return 0;
1:   10-block 5
-: == inlined from hello.h (end) ==
-:5:{
1:6:  return hello (entity);
1:6-block 7
-:7:}

Implementing something to this effect relies on having the sources for
both files (hello.c, hello.h) available, which is what this patch sets
up.

Note that the previous reading code would leak the source file content,
and explicitly storing them is not a huge departure nor performance
implication. I verified this with valgrind:

With slurp:

$ valgrind gcov ./hello
== == Memcheck, a memory error detector
== == Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
== == Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
== == Command: ./gcc/gcov demo
== ==
File 'hello.c'
Lines executed:100.00% of 4
Creating 'hello.c.gcov'

File 'hello.h'
Lines executed:75.00% of 4
Creating 'hello.h.gcov'
== ==
== == HEAP SUMMARY:
== == in use at exit: 84,907 bytes in 54 blocks
== ==   total heap usage: 254 allocs, 200 frees, 137,156 bytes allocated
== ==
== == LEAK SUMMARY:
== ==definitely lost: 1,237 bytes in 22 blocks
== ==indirectly lost: 562 bytes in 18 blocks
== ==  possibly lost: 0 bytes in 0 blocks
== ==still reachable: 83,108 bytes in 14 blocks
== ==   of which reachable via heuristic:
== == newarray   : 1,544 bytes in 1 blocks
== == suppressed: 0 bytes in 0 blocks
== == Rerun with --leak-check=full to see details of leaked memory
== ==
== == For lists of detected and suppressed errors, rerun with: -s
== == ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Without slurp:

$ valgrind gcov ./demo
== == Memcheck, a memory error detector
== == Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
== == Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
== == Command: ./gcc/gcov demo
== ==
File 'hello.c'
Lines executed:100.00% of 4
Creating 'hello.c.gcov'

File 'hello.h'
Lines executed:75.00% of 4
Creating 'hello.h.gcov'

Lines executed:87.50% of 8
== ==
== == HEAP SUMMARY:
== == in use at exit: 85,316 bytes in 82 blocks
== ==   total heap usage: 250 allocs, 168 frees, 137,084 bytes allocated
== ==
== == LEAK SUMMARY:
== ==definitely lost: 1,646 bytes in 50 blocks
== ==indirectly lost: 562 bytes in 18 blocks
== ==  possibly lost: 0 bytes in 0 blocks
==

[PATCH 2/3 v3] RISC-V: Add Zvfbfmin and Zvfbfwma intrinsic

2024-07-11 Thread Feng Wang

v3: Modify warning message in riscv.cc
v2: Rebase
Accroding to the intrinsic doc, the 'Zvfbfmin' and 'Zvfbfwma' intrinsic
functions are added by this patch.

Signed-off-by: Feng Wang 
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (class vfncvtbf16_f):
Add 'Zvfbfmin' intrinsic in bases.
(class vfwcvtbf16_f): Ditto.
(class vfwmaccbf16): Add 'Zvfbfwma' intrinsic in bases.
(BASE): Add BASE macro for 'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins-bases.h: Add declaration for 
'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins-functions.def 
(REQUIRED_EXTENSIONS):
Add builtins def for 'Zvfbfmin' and 'Zvfbfwma'.
(vfncvtbf16_f): Ditto.
(vfncvtbf16_f_frm): Ditto.
(vfwcvtbf16_f): Ditto.
(vfwmaccbf16): Ditto.
(vfwmaccbf16_frm): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (supports_vectype_p):
Add vector intrinsic build judgment for BFloat16.
(build_all): Ditto.
(BASE_NAME_MAX_LEN): Adjust max length.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_F32_OPS):
Add new operand type for BFloat16.
(vfloat32mf2_t): Ditto.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_F32_OPS): Ditto.
(validate_instance_type_required_extensions):
Add required_ext checking for 'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins.h (enum required_ext):
Add required_ext declaration for 'Zvfbfmin' and 'Zvfbfwma'.
(reqired_ext_to_isa_name): Ditto.
(required_extensions_specified): Ditto.
(struct function_group_info): Add match case for 'Zvfbfmin' and 
'Zvfbfwma'.
* config/riscv/riscv.cc (riscv_validate_vector_type):
Add required_ext checking for 'Zvfbfmin' and 'Zvfbfwma'.

---
 .../riscv/riscv-vector-builtins-bases.cc  | 69 +++
 .../riscv/riscv-vector-builtins-bases.h   |  7 ++
 .../riscv/riscv-vector-builtins-functions.def | 15 
 .../riscv/riscv-vector-builtins-shapes.cc | 31 -
 .../riscv/riscv-vector-builtins-types.def | 13 
 gcc/config/riscv/riscv-vector-builtins.cc | 67 ++
 gcc/config/riscv/riscv-vector-builtins.h  | 34 ++---
 gcc/config/riscv/riscv.cc | 13 ++--
 8 files changed, 232 insertions(+), 17 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 6483faba39c..193392fbcc2 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2417,6 +2417,60 @@ public:
   }
 };
 
+/* Implements vfncvtbf16_f. */
+template 
+class vfncvtbf16_f : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override
+  {
+return FRM_OP == HAS_FRM;
+  }
+
+  bool may_require_frm_p () const override { return true; }
+
+  rtx expand (function_expander &e) const override
+  {
+return e.use_exact_insn (code_for_pred_trunc_to_bf16 (e.vector_mode ()));
+  }
+};
+
+/* Implements vfwcvtbf16_f. */
+class vfwcvtbf16_f : public function_base
+{
+public:
+  rtx expand (function_expander &e) const override
+  {
+return e.use_exact_insn (code_for_pred_extend_bf16_to (e.vector_mode ()));
+  }
+};
+
+/* Implements vfwmaccbf16. */
+template 
+class vfwmaccbf16 : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override
+  {
+return FRM_OP == HAS_FRM;
+  }
+
+  bool may_require_frm_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_widen_ternop_insn (
+   code_for_pred_widen_bf16_mul_scalar (e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_widen_ternop_insn (
+   code_for_pred_widen_bf16_mul (e.vector_mode ()));
+gcc_unreachable ();
+  }
+};
+
 static CONSTEXPR const vsetvl vsetvl_obj;
 static CONSTEXPR const vsetvl vsetvlmax_obj;
 static CONSTEXPR const loadstore vle_obj;
@@ -2734,6 +2788,14 @@ static CONSTEXPR const crypto_vv   
vsm4r_obj;
 static CONSTEXPR const vsm3me vsm3me_obj;
 static CONSTEXPR const vaeskf2_vsm3c   vsm3c_obj;
 
+/* Zvfbfmin */
+static CONSTEXPR const vfncvtbf16_f vfncvtbf16_f_obj;
+static CONSTEXPR const vfncvtbf16_f vfncvtbf16_f_frm_obj;
+static CONSTEXPR const vfwcvtbf16_f vfwcvtbf16_f_obj;
+/* Zvfbfwma; */
+static CONSTEXPR const vfwmaccbf16 vfwmaccbf16_obj;
+static CONSTEXPR const vfwmaccbf16 vfwmaccbf16_frm_obj;
+
 /* Declare the function base NAME, pointing it to an instance
of class _obj.  */
 #define BASE(NAME) \
@@ -3054,4 +3116,11 @@ BASE (vsm4k)
 BASE (vsm4r)
 BASE (vsm3me)

[PATCH 3/3 v3] RISC-V: Add md files for vector BFloat16

2024-07-11 Thread Feng Wang

V3: Add Bfloat16 vector insn in generic-vector-ooo.md
v2: Rebase
Accroding to the BFloat16 spec, some vector iterators and new pattern
are added in md files.

Signed-off-by: Feng Wang 
gcc/ChangeLog:

* config/riscv/generic-vector-ooo.md: Add def_insn_reservation for 
vector BFloat16.
* config/riscv/riscv.md: Add new insn name for vector BFloat16.
* config/riscv/vector-iterators.md: Add some iterators for vector 
BFloat16.
* config/riscv/vector.md: Add some attribute for vector BFloat16.
* config/riscv/vector-bfloat16.md: New file. Add insn pattern vector 
BFloat16.

---
 gcc/config/riscv/generic-vector-ooo.md |   4 +-
 gcc/config/riscv/riscv.md  |  13 +-
 gcc/config/riscv/vector-bfloat16.md| 135 
 gcc/config/riscv/vector-iterators.md   | 169 -
 gcc/config/riscv/vector.md | 103 +--
 5 files changed, 407 insertions(+), 17 deletions(-)
 create mode 100644 gcc/config/riscv/vector-bfloat16.md

diff --git a/gcc/config/riscv/generic-vector-ooo.md 
b/gcc/config/riscv/generic-vector-ooo.md
index 5e933c83841..efe6bc41e86 100644
--- a/gcc/config/riscv/generic-vector-ooo.md
+++ b/gcc/config/riscv/generic-vector-ooo.md
@@ -53,7 +53,7 @@
 (define_insn_reservation "vec_fcmp" 3
   (eq_attr "type" "vfrecp,vfminmax,vfcmp,vfsgnj,vfclass,vfcvtitof,\
vfcvtftoi,vfwcvtitof,vfwcvtftoi,vfwcvtftof,vfncvtitof,\
-   vfncvtftoi,vfncvtftof")
+   vfncvtftoi,vfncvtftof,vfncvtbf16,vfwcvtbf16")
   "vxu_ooo_issue,vxu_ooo_alu")
 
 ;; Vector integer multiplication.
@@ -69,7 +69,7 @@
 
 ;; Vector float multiplication and FMA.
 (define_insn_reservation "vec_fmul" 6
-  (eq_attr "type" "vfmul,vfwmul,vfmuladd,vfwmuladd")
+  (eq_attr "type" "vfmul,vfwmul,vfmuladd,vfwmuladd,vfwmaccbf16")
   "vxu_ooo_issue,vxu_ooo_alu")
 
 ;; Vector crypto, assumed to be a generic operation for now.
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index c0c960353eb..777fb780117 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -200,6 +200,7 @@
   RVVMF64BI,RVVMF32BI,RVVMF16BI,RVVMF8BI,RVVMF4BI,RVVMF2BI,RVVM1BI,
   RVVM8QI,RVVM4QI,RVVM2QI,RVVM1QI,RVVMF2QI,RVVMF4QI,RVVMF8QI,
   RVVM8HI,RVVM4HI,RVVM2HI,RVVM1HI,RVVMF2HI,RVVMF4HI,
+  RVVM8BF,RVVM4BF,RVVM2BF,RVVM1BF,RVVMF2BF,RVVMF4BF,
   RVVM8HF,RVVM4HF,RVVM2HF,RVVM1HF,RVVMF2HF,RVVMF4HF,
   RVVM8SI,RVVM4SI,RVVM2SI,RVVM1SI,RVVMF2SI,
   RVVM8SF,RVVM4SF,RVVM2SF,RVVM1SF,RVVMF2SF,
@@ -219,6 +220,11 @@
   RVVM2x4HI,RVVM1x4HI,RVVMF2x4HI,RVVMF4x4HI,
   RVVM2x3HI,RVVM1x3HI,RVVMF2x3HI,RVVMF4x3HI,
   RVVM4x2HI,RVVM2x2HI,RVVM1x2HI,RVVMF2x2HI,RVVMF4x2HI,
+  RVVM1x8BF,RVVMF2x8BF,RVVMF4x8BF,RVVM1x7BF,RVVMF2x7BF,
+  RVVMF4x7BF,RVVM1x6BF,RVVMF2x6BF,RVVMF4x6BF,RVVM1x5BF,
+  RVVMF2x5BF,RVVMF4x5BF,RVVM2x4BF,RVVM1x4BF,RVVMF2x4BF,
+  RVVMF4x4BF,RVVM2x3BF,RVVM1x3BF,RVVMF2x3BF,RVVMF4x3BF,
+  RVVM4x2BF,RVVM2x2BF,RVVM1x2BF,RVVMF2x2BF,RVVMF4x2BF,
   RVVM1x8HF,RVVMF2x8HF,RVVMF4x8HF,RVVM1x7HF,RVVMF2x7HF,
   RVVMF4x7HF,RVVM1x6HF,RVVMF2x6HF,RVVMF4x6HF,RVVM1x5HF,
   RVVMF2x5HF,RVVMF4x5HF,RVVM2x4HF,RVVM1x4HF,RVVMF2x4HF,
@@ -462,6 +468,10 @@
 ;; vsm4rcrypto vector SM4 Rounds instructions
 ;; vsm3me   crypto vector SM3 Message Expansion instructions
 ;; vsm3ccrypto vector SM3 Compression instructions
+;; 18.Vector BF16 instrctions
+;; vfncvtbf16  vector narrowing single floating-point to brain floating-point 
instruction
+;; vfwcvtbf16  vector widening brain floating-point to single floating-point 
instruction
+;; vfwmaccbf16  vector BF16 widening multiply-accumulate
 (define_attr "type"
   "unknown,branch,jump,jalr,ret,call,load,fpload,store,fpstore,
mtc,mfc,const,arith,logical,shift,slt,imul,idiv,move,fmove,fadd,fmul,
@@ -483,7 +493,7 @@
vslideup,vslidedown,vislide1up,vislide1down,vfslide1up,vfslide1down,

vgather,vcompress,vmov,vector,vandn,vbrev,vbrev8,vrev8,vclz,vctz,vcpop,vrol,vror,vwsll,

vclmul,vclmulh,vghsh,vgmul,vaesef,vaesem,vaesdf,vaesdm,vaeskf1,vaeskf2,vaesz,
-   vsha2ms,vsha2ch,vsha2cl,vsm4k,vsm4r,vsm3me,vsm3c"
+   
vsha2ms,vsha2ch,vsha2cl,vsm4k,vsm4r,vsm3me,vsm3c,vfncvtbf16,vfwcvtbf16,vfwmaccbf16"
   (cond [(eq_attr "got" "load") (const_string "load")
 
 ;; If a doubleword move uses these expensive instructions,
@@ -4343,6 +4353,7 @@
 (include "generic-ooo.md")
 (include "vector.md")
 (include "vector-crypto.md")
+(include "vector-bfloat16.md")
 (include "zicond.md")
 (include "sfb.md")
 (include "zc.md")
diff --git a/gcc/config/riscv/vector-bfloat16.md 
b/gcc/config/riscv/vector-bfloat16.md
new file mode 100644
index 000..562aa8ee5ed
--- /dev/null
+++ b/gcc/config/riscv/vector-bfloat16.md
@@ -0,0 +1,135 @@
+;; Machine description for RISC-V bfloat16 extensions.
+;; Copyright (C) 2024 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as publis

RE: [PATCH 02/10] autovectorizer: Add basic support for convert optabs

2024-07-11 Thread Tamar Christina

Hi Victor,

> -Original Message-
> From: Victor Do Nascimento 
> Sent: Wednesday, July 10, 2024 3:06 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Richard Earnshaw
> ; Victor Do Nascimento
> 
> Subject: [PATCH 02/10] autovectorizer: Add basic support for convert optabs
> 
> Given the shift from modeling dot products as direct optabs to
> treating them as conversion optabs, we make necessary changes to the
> autovectorizer code to ensure that given the relevant tree code,
> together with the input and output data modes, we can retrieve the
> relevant optab and subsequently the insn_code for it.
> 
> gcc/ChangeLog:
> 
>   * gimple-match-exports.cc (directly_supported_p): Add overload
>   for conversion-type optabs.
>   * gimple-match.h (directly_supported_p): Add new function
>   prototype.
>   * optabs.cc (expand_widen_pattern_expr): Make the
>   DOT_PROD_EXPR tree code use `find_widening_optab_handler' to
>   retrieve icode.
>   * tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): make it
>   call conversion-type overloaded `directly_supported_p'.
>   * tree-vect-patterns.cc (vect_supportable_conv_optab_p): New.
>   (vect_recog_dot_prod_pattern): s/direct/conv/ in call to
>   `vect_supportable_direct_optab_p'.
> ---
>  gcc/gimple-match-exports.cc | 18 
>  gcc/gimple-match.h  |  2 ++
>  gcc/optabs.cc   |  3 ++-
>  gcc/tree-vect-loop.cc   |  1 +
>  gcc/tree-vect-patterns.cc   | 43 +++--
>  5 files changed, 64 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index aacf3ff0414..c079fa1fb19 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -1381,6 +1381,24 @@ directly_supported_p (code_helper code, tree type,
> optab_subtype query_type)
> && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED));
>  }
> 
> +/* As above, overloading the function for conversion-type optabs.  */
> +bool
> +directly_supported_p (code_helper code, tree type_out, tree type_in,
> +   optab_subtype query_type)
> +{
> +
> +  if (code.is_tree_code ())
> +{
> +  convert_optab optab = optab_for_tree_code (tree_code (code), type_in,
> + query_type);
> +  return (optab != unknown_optab
> +   && convert_optab_handler (optab, TYPE_MODE (type_out),
> + TYPE_MODE (type_in)) !=
> CODE_FOR_nothing);
> +}
> +  gcc_unreachable ();
> +}
> +

Since the argument is a code_helper, why not provide the implementation for 
internal_fn as well,
similar to the existing directly_supported_p.  You can get the optab from an 
ifn with direct_internal_fn_optab.

>  /* A wrapper around the internal-fn.cc versions of 
> get_conditional_internal_fn
> for a code_helper CODE operating on type TYPE.  */
> 
> diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
> index d710fcbace2..0333a5db00a 100644
> --- a/gcc/gimple-match.h
> +++ b/gcc/gimple-match.h
> @@ -419,6 +419,8 @@ code_helper canonicalize_code (code_helper, tree);
> 
>  #ifdef GCC_OPTABS_TREE_H
>  bool directly_supported_p (code_helper, tree, optab_subtype = optab_default);
> +bool directly_supported_p (code_helper, tree, tree,
> +optab_subtype = optab_default);
>  #endif
> 
>  internal_fn get_conditional_internal_fn (code_helper, tree);
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 185c5b1a705..32737fb80e8 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -317,7 +317,8 @@ expand_widen_pattern_expr (const_sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>  widen_pattern_optab
>= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
>if (ops->code == WIDEN_MULT_PLUS_EXPR
> -  || ops->code == WIDEN_MULT_MINUS_EXPR)
> +  || ops->code == WIDEN_MULT_MINUS_EXPR
> +  || ops->code == DOT_PROD_EXPR)
>  icode = find_widening_optab_handler (widen_pattern_optab,
>TYPE_MODE (TREE_TYPE (ops->op2)),
>tmode0);
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index a64b5082bd1..7e4c1e0f52e 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -5289,6 +5289,7 @@ vect_is_emulated_mixed_dot_prod (stmt_vec_info
> stmt_info)
> 
>gcc_assert (STMT_VINFO_REDUC_VECTYPE_IN (stmt_info));
>return !directly_supported_p (DOT_PROD_EXPR,
> + STMT_VINFO_VECTYPE (stmt_info),
>   STMT_VINFO_REDUC_VECTYPE_IN (stmt_info),
>   optab_vector_mixed_sign);
>  }
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 86e893a1c43..c4dd627aa90 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -248,6 +248,45 @@ vect_supportable_direct_optab_p (vec_info *vinf

[pushed] Ensure function.end_line in source_info.lines

2024-07-11 Thread Jørgen Kvalsvik

Ensure that the function.end_line in the lines vector for the source
file, even if it is not explicitly touched by a basic block. This
ensures consistency with what you would expect. For example, this file
has sources[sum.cc].lines.size () == 23 and main.end_line == 2 without
adjusting sources.lines, which in this case is a no-op.

#:   17:int main ()
-:   18:{
#:   19:  sum (1, 2);
#:   20:  sum (1.1, 2);
#:   21:  sum (2.2, 2.3);
#:   22:}

This is a useful property when combined with selective reporting.

gcc/ChangeLog:

* gcov.cc (process_all_functions): Ensure fn.end_line is
included source[fn].lines.
---
 gcc/gcov.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index 2e4bd9d3c5d..7b4a075c5db 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -1544,6 +1544,12 @@ process_all_functions (void)
}
}
 
+ /* Make sure to include the last line for this function even when it
+is not directly covered by a basic block, for example when } is on
+its own line.  */
+ if (sources[fn->src].lines.size () <= fn->end_line)
+   sources[fn->src].lines.resize (fn->end_line + 1);
+
  /* Allocate lines for group function, following start_line
 and end_line information of the function.  */
  if (fn->is_group)
-- 
2.39.2

[pushed] Add function filtering to gcov

2024-07-11 Thread Jørgen Kvalsvik

Add the --include and --exclude flags to gcov to control what functions
to report on. This is meant to make gcov more practical as an when
writing test suites or performing other coverage experiments, which
tends to focus on a few functions at the time. This really shines in
combination with the -t/--stdout flag. With support for more expansive
metrics in gcov like modified condition/decision coverage (MC/DC) and
path coverage, output quickly gets overwhelming without filtering.

The approach is quite simple: filters are egrep regexes and are
evaluated left-to-right, and the last filter "wins", that is, if a
function matches an --include and a subsequent --exclude, it should not
be included in the output. All of the output machinery works on the
function table, so by optionally (not) adding function makes the even
the json output work as expected, and only minor changes are needed to
suppress the filtered-out functions.

Demo: math.c

int mul (int a, int b) {
return a * b;
}

int sub (int a, int b) {
return a - b;
}

int sum (int a, int b) {
return a + b;
}

Plain matches:

$ gcov -t math --include=sum
-:0:Source:math.c
-:0:Graph:math.gcno
-:0:Data:-
-:0:Runs:0
#:9:int sum (int a, int b) {
#:   10:return a + b;
-:   11:}

$ gcov -t math --include=mul
-:0:Source:math.c
-:0:Graph:math.gcno
-:0:Data:-
-:0:Runs:0
#:1:int mul (int a, int b) {
#:2:return a * b;
-:3:}

Regex match:

$ gcov -t math --include=su
-:0:Source:math.c
-:0:Graph:math.gcno
-:0:Data:-
-:0:Runs:0
#:5:int sub (int a, int b) {
#:6:return a - b;
-:7:}
#:9:int sum (int a, int b) {
#:   10:return a + b;
-:   11:}

And similar for exclude:

$ gcov -t math --exclude=sum
-:0:Source:math.c
-:0:Graph:math.gcno
-:0:Data:-
-:0:Runs:0
#:1:int mul (int a, int b) {
#:2:return a * b;
-:3:}
#:5:int sub (int a, int b) {
#:6:return a - b;
-:7:}

And json, for good measure:

$ gcov -t math --include=sum --json | jq ".files[].lines[]"
{
  "line_number": 9,
  "function_name": "sum",
  "count": 0,
  "unexecuted_block": true,
  "block_ids": [],
  "branches": [],
  "calls": []
}
{
  "line_number": 10,
  "function_name": "sum",
  "count": 0,
  "unexecuted_block": true,
  "block_ids": [
2
  ],
  "branches": [],
  "calls": []
}

Matching generally work well for mangled names, as the mangled names
also have the base symbol name in it. By default, functions are matched
by the mangled name, which means matching on base names always work as
expected. The -M flag makes the matching work on the demangled name
which is quite useful when you only want to report on specific
overloads and can use the full type names.

Why not just use grep? grep is not really sufficient as grep is very
line oriented, and the reports that benefit the most from filtering
often unpredictably span multiple lines based on the state of coverage.
For example, a condition coverage report for 3 terms/6 outcomes only
outputs 1 line when all conditions are covered, and 7 with no lines
covered.

gcc/ChangeLog:

* doc/gcov.texi: Add --include, --exclude, --match-on-demangled
documentation.
* gcov.cc (struct fnfilter): New.
(print_usage): Add --include, --exclude, -M,
--match-on-demangled.
(process_args): Likewise.
(release_structures): Release filters.
(read_graph_file): Only add function_infos matching filters.
(output_lines): Likewise.

gcc/testsuite/ChangeLog:

* lib/gcov.exp: Add filtering test function.
* g++.dg/gcov/gcov-19.C: New test.
* g++.dg/gcov/gcov-20.C: New test.
* g++.dg/gcov/gcov-21.C: New test.
* gcc.misc-tests/gcov-25.c: New test.
* gcc.misc-tests/gcov-26.c: New test.
* gcc.misc-tests/gcov-27.c: New test.
* gcc.misc-tests/gcov-28.c: New test.
---
 gcc/doc/gcov.texi  | 113 ++
 gcc/gcov.cc| 127 +++--
 gcc/testsuite/g++.dg/gcov/gcov-19.C|  35 +++
 gcc/testsuite/g++.dg/gcov/gcov-20.C|  38 
 gcc/testsuite/g++.dg/gcov/gcov-21.C|  32 +++
 gcc/testsuite/gcc.misc-tests/gcov-25.c |  25 +
 gcc/testsuite/gcc.misc-tests/gcov-26.c |  25 +
 gcc/testsuite/gcc.misc-tests/gcov-27.c |  24 +
 gcc/testsuite/gcc.misc-tests/gcov-28.c |  22 +
 gcc/testsuite/lib/gcov.exp |  53 ++-
 10 files changed, 484 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-19.C
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-20.C
 create mode 100644 gcc/tests

Re: [PATCH 1/1] AArch64: Add LUTI ACLE for SVE2

2024-07-11 Thread Kyrylo Tkachov

Hi Vladimir,

> On 10 Jul 2024, at 15:34, vladimir.miloser...@arm.com wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> This patch introduces support for LUTI2/LUTI4 ACLE for SVE2.
> 
> LUTI instructions are used for efficient table lookups with 2-bit
> or 4-bit indices. LUTI2 reads indexed 8-bit or 16-bit elements from
> the low 128 bits of the table vector using packed 2-bit indices,
> while LUTI4 can read from the low 128 or 256 bits of the table
> vector or from two table vectors using packed 4-bit indices.
> These instructions fill the destination vector by copying elements
> indexed by segments of the source vector, selected by the vector
> segment index.
> 
> The changes include the addition of a new AArch64 option
> extension "lut", __ARM_FEATURE_LUT preprocessor macro, definitions
> for the new LUTI instruction shapes, and implementations of the
> svluti2 and svluti4 builtins.
> 
> New tests are added as well
> ---
> gcc/config/aarch64/aarch64-c.cc   |  1 +
> .../aarch64/aarch64-option-extensions.def |  2 +
> .../aarch64/aarch64-sve-builtins-shapes.cc| 41 +
> .../aarch64/aarch64-sve-builtins-shapes.h |  2 +
> .../aarch64/aarch64-sve-builtins-sve2.cc  | 17 +++
> .../aarch64/aarch64-sve-builtins-sve2.def |  4 ++
> .../aarch64/aarch64-sve-builtins-sve2.h   |  2 +
> gcc/config/aarch64/aarch64-sve2.md| 45 +++
> gcc/config/aarch64/aarch64.h  |  5 +++
> gcc/config/aarch64/iterators.md   | 10 +
> .../aarch64/sve/acle/asm/test_sve_acle.h  | 16 ++-
> .../aarch64/sve2/acle/asm/luti2_bf16.c| 35 +++
> .../aarch64/sve2/acle/asm/luti2_f16.c | 35 +++
> .../aarch64/sve2/acle/asm/luti2_s16.c | 35 +++
> .../aarch64/sve2/acle/asm/luti2_s8.c  | 35 +++
> .../aarch64/sve2/acle/asm/luti2_u16.c | 35 +++
> .../aarch64/sve2/acle/asm/luti2_u8.c  | 35 +++
> .../aarch64/sve2/acle/asm/luti4_bf16.c| 35 +++
> .../aarch64/sve2/acle/asm/luti4_bf16_x2.c | 15 +++
> .../aarch64/sve2/acle/asm/luti4_f16.c | 35 +++
> .../aarch64/sve2/acle/asm/luti4_f16_x2.c  | 15 +++
> .../aarch64/sve2/acle/asm/luti4_s16.c | 35 +++
> .../aarch64/sve2/acle/asm/luti4_s16_x2.c  | 15 +++
> .../aarch64/sve2/acle/asm/luti4_s8.c  | 25 +++
> .../aarch64/sve2/acle/asm/luti4_u16.c | 35 +++
> .../aarch64/sve2/acle/asm/luti4_u16_x2.c  | 15 +++
> .../aarch64/sve2/acle/asm/luti4_u8.c  | 25 +++
> gcc/testsuite/lib/target-supports.exp | 12 +
> 28 files changed, 616 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_f16.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_s16.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_s8.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_u16.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_u8.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_f16.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s16.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s8.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u16.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u8.c
> 

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 42ec0eec31e..840f52e08ed 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")

AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")

+AARCH64_OPT_EXTENSION("lut", LUT, (SVE2, SME2), (), (), "lut")
+

I think the LUT extension doesn’t require SME2, does it? It doesn’t seem to use 
any SME state. I don’t think +lut should be enabling +sme2 for the user

+;; -
+;;  Table lookup
+;; -
+;; Includes:
+;; - LUTI2
+;; - LUTI4
+;; --

Re: [PATCH 1/1] AArch64: Add LUTI ACLE for SVE2

2024-07-11 Thread Kyrylo Tkachov



> On 11 Jul 2024, at 09:18, Kyrylo Tkachov  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Vladimir,
> 
>> On 10 Jul 2024, at 15:34, vladimir.miloser...@arm.com wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> This patch introduces support for LUTI2/LUTI4 ACLE for SVE2.
>> 
>> LUTI instructions are used for efficient table lookups with 2-bit
>> or 4-bit indices. LUTI2 reads indexed 8-bit or 16-bit elements from
>> the low 128 bits of the table vector using packed 2-bit indices,
>> while LUTI4 can read from the low 128 or 256 bits of the table
>> vector or from two table vectors using packed 4-bit indices.
>> These instructions fill the destination vector by copying elements
>> indexed by segments of the source vector, selected by the vector
>> segment index.
>> 
>> The changes include the addition of a new AArch64 option
>> extension "lut", __ARM_FEATURE_LUT preprocessor macro, definitions
>> for the new LUTI instruction shapes, and implementations of the
>> svluti2 and svluti4 builtins.
>> 
>> New tests are added as well
>> ---
>> gcc/config/aarch64/aarch64-c.cc   |  1 +
>> .../aarch64/aarch64-option-extensions.def |  2 +
>> .../aarch64/aarch64-sve-builtins-shapes.cc| 41 +
>> .../aarch64/aarch64-sve-builtins-shapes.h |  2 +
>> .../aarch64/aarch64-sve-builtins-sve2.cc  | 17 +++
>> .../aarch64/aarch64-sve-builtins-sve2.def |  4 ++
>> .../aarch64/aarch64-sve-builtins-sve2.h   |  2 +
>> gcc/config/aarch64/aarch64-sve2.md| 45 +++
>> gcc/config/aarch64/aarch64.h  |  5 +++
>> gcc/config/aarch64/iterators.md   | 10 +
>> .../aarch64/sve/acle/asm/test_sve_acle.h  | 16 ++-
>> .../aarch64/sve2/acle/asm/luti2_bf16.c| 35 +++
>> .../aarch64/sve2/acle/asm/luti2_f16.c | 35 +++
>> .../aarch64/sve2/acle/asm/luti2_s16.c | 35 +++
>> .../aarch64/sve2/acle/asm/luti2_s8.c  | 35 +++
>> .../aarch64/sve2/acle/asm/luti2_u16.c | 35 +++
>> .../aarch64/sve2/acle/asm/luti2_u8.c  | 35 +++
>> .../aarch64/sve2/acle/asm/luti4_bf16.c| 35 +++
>> .../aarch64/sve2/acle/asm/luti4_bf16_x2.c | 15 +++
>> .../aarch64/sve2/acle/asm/luti4_f16.c | 35 +++
>> .../aarch64/sve2/acle/asm/luti4_f16_x2.c  | 15 +++
>> .../aarch64/sve2/acle/asm/luti4_s16.c | 35 +++
>> .../aarch64/sve2/acle/asm/luti4_s16_x2.c  | 15 +++
>> .../aarch64/sve2/acle/asm/luti4_s8.c  | 25 +++
>> .../aarch64/sve2/acle/asm/luti4_u16.c | 35 +++
>> .../aarch64/sve2/acle/asm/luti4_u16_x2.c  | 15 +++
>> .../aarch64/sve2/acle/asm/luti4_u8.c  | 25 +++
>> gcc/testsuite/lib/target-supports.exp | 12 +
>> 28 files changed, 616 insertions(+), 1 deletion(-)
>> create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_f16.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_s16.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_s8.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_u16.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_u8.c
>> create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c
>> create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_f16.c
>> create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s16.c
>> create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s8.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u16.c
>> create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u8.c
>> 
> 
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index 42ec0eec31e..840f52e08ed 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
> 
> AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")
> 
> +AARCH64_OPT_EXTENSION("lut", LUT, (SVE2, SME2), (), (), "lut")
> +
> 
> I think the LUT extension doesn’t require SME2, does it? It doesn’t seem to 
> use any SME state. I don’t think +lut should be enabling +sme2 for the user
> 
> +;; --

[PATCH] RISC-V: skip vector tests if target not supporting v extension

2024-07-11 Thread Jerry Zhang Jian

The original method tried to overwrite the march option when the target
doesn't support v exctension, which caused unexpected compile and
runtime test failures

This patch change the way to handle targets that don't support v
extension by simply skip the test cases that requires v extension

The test cases under g[cc|++].dg/vect/vect.exp will be skipped on rv64gc after 
this patch

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: skip vector tests if target not supporting v 
extension

Signed-off-by: Jerry Zhang Jian 
---
 gcc/testsuite/lib/target-supports.exp | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f001c28072f..71d7e569a7d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11807,10 +11807,7 @@ proc check_vect_support_and_set_flags { } {
lappend DEFAULT_VECTCFLAGS "-mno-vector-strict-align"
}
} else {
-   foreach item [add_options_for_riscv_v ""] {
-   lappend DEFAULT_VECTCFLAGS $item
-   }
-   set dg-do-what-default compile
+   return 0
}
 } elseif [istarget loongarch*-*-*] {
   # Set the default vectorization option to "-mlsx" due to the problem
-- 
2.45.2

Re: [PATCH 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns.

2024-07-11 Thread Kyrylo Tkachov

Hi Victor,

> On 10 Jul 2024, at 16:05, Victor Do Nascimento  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Given recent changes to the dot_prod standard pattern name, this patch
> fixes the aarch64 back-end by implementing the following changes:
> 
> 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
> 2. Rewrite initialization and function expansion mechanism for simd
> builtins.
> 3. Fix all direct calls to back-end `dot_prod' patterns in SVE
> builtins.
> 
> Finally, given that it is now possible for the compiler to
> differentiate between the two- and four-way dot product, we add a test
> to ensure that autovectorization picks up on dot-product patterns
> where the result is twice the width of the operands.
> 
> gcc/ChangeLog:
> 
>* config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
>New AARCH64_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI,
>UDOTV8QI, UDOTV16QI, USDOTV8QI, USDOTV16QI.
>(aarch64_init_builtin_dotprod_functions): New.
>(aarch64_init_simd_builtins): Add call to
>`aarch64_init_builtin_dotprod_functions'.
>(aarch64_general_gimple_fold_builtin): Add DOT_PROD_EXPR
>handling.
>* config/aarch64/aarch64-simd-builtins.def: Remove macro
>expansion-based initialization and expansion
>of (u|s|us)dot_prod builtins.
>* config/aarch64/aarch64-simd.md
>(dot_prod): Deleted.
>(dot_prod): New.
>(usdot_prod): Deleted.
>(usdot_prod): New.
>(sadv16qi): Adjust call to gen_udot_prod take second mode.
>(popcount): fix use of `udot_prod_optab'.
>* config/aarch64/aarch64-sve-builtins-base.cc
>(svdot_impl::expand): s/direct/convert/ in
>`convert_optab_handler_for_sign' function call.
>(svusdot_impl::expand): add second mode argument in call to
>`code_for_dot_prod'.
>* config/aarch64/aarch64-sve-builtins.cc
>(function_expander::convert_optab_handler_for_sign): New class
>method.
>* config/aarch64/aarch64-sve-builtins.h
>(class function_expander): Add prototype for new
>`convert_optab_handler_for_sign' method.
>* gcc/config/aarch64/aarch64-sve.md
>(dot_prod): Deleted.
>(dot_prod): New.
>(@dot_prod): Deleted.
>(@dot_prod): New.
>(sad): Adjust call to gen_udot_prod take second mode.
>* gcc/config/aarch64/aarch64-sve2.md
>(@aarch64_sve_dotvnx4sivnx8hi): Deleted.
>(dot_prodvnx4sivnx8hi): New.
> 
> gcc/testsuite/ChangeLog:
>* gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.
> ---
> gcc/config/aarch64/aarch64-builtins.cc| 71 +++
> gcc/config/aarch64/aarch64-simd-builtins.def  |  4 --
> gcc/config/aarch64/aarch64-simd.md|  9 +--
> .../aarch64/aarch64-sve-builtins-base.cc  | 13 ++--
> gcc/config/aarch64/aarch64-sve-builtins.cc| 17 +
> gcc/config/aarch64/aarch64-sve-builtins.h |  3 +
> gcc/config/aarch64/aarch64-sve.md |  6 +-
> gcc/config/aarch64/aarch64-sve2.md|  2 +-
> gcc/config/aarch64/iterators.md   |  1 +
> .../aarch64/sme/vect-dotprod-twoway.c | 25 +++
> 10 files changed, 133 insertions(+), 18 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
> 
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 30669f8aa18..6c7c86d0e6e 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -783,6 +783,12 @@ enum aarch64_builtins
>   AARCH64_SIMD_PATTERN_START = AARCH64_SIMD_BUILTIN_LANE_CHECK + 1,
>   AARCH64_SIMD_BUILTIN_MAX = AARCH64_SIMD_PATTERN_START
>  + ARRAY_SIZE (aarch64_simd_builtin_data) - 1,
> +  AARCH64_BUILTIN_SDOTV8QI,
> +  AARCH64_BUILTIN_SDOTV16QI,
> +  AARCH64_BUILTIN_UDOTV8QI,
> +  AARCH64_BUILTIN_UDOTV16QI,
> +  AARCH64_BUILTIN_USDOTV8QI,
> +  AARCH64_BUILTIN_USDOTV16QI,
>   AARCH64_CRC32_BUILTIN_BASE,
>   AARCH64_CRC32_BUILTINS
>   AARCH64_CRC32_BUILTIN_MAX,
> @@ -1642,6 +1648,60 @@ handle_arm_neon_h (void)
>   aarch64_init_simd_intrinsics ();
> }
> 
> +void
> +aarch64_init_builtin_dotprod_functions (void)
> +{
> +  tree fndecl = NULL;
> +  tree ftype = NULL;
> +
> +  tree uv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_unsigned);
> +  tree sv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_none);
> +  tree uv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_unsigned);
> +  tree sv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_none);
> +  tree uv2si = aarch64_simd_builtin_type (V2SImode, qualifier_unsigned);
> +  tree sv2si = aarch64_simd_builtin_type (V2SImode, qualifier_none);
> +  tree uv4si = aarch64_simd_builtin_type (V4SImode, qualifier_unsigned);
> +  tree sv4si = aarch64_simd_builtin_type (V4SImode, qualifier_none);
> +
> +  struct builtin_dec

Re: [PATCH 1/3 v3] RISC-V: Add vector type of BFloat16 format

2024-07-11 Thread Kito Cheng

OK for this patch set, I know you already got LGTM from JuZhe or me
before, so just an explicitly ack to let you know it's still OK once
CI is passed.

On Thu, Jul 11, 2024 at 3:11 PM Feng Wang  wrote:
>
> v3: Rebase
> v2: Rebase
> The vector type of BFloat16 format is added in this patch,
> subsequent extensions to zvfbfmin and zvfwma need to be based
> on this patch.
>
> Signed-off-by: Feng Wang 
> gcc/ChangeLog:
>
> * config/riscv/genrvv-type-indexer.cc (bfloat16_type):
> Generate bf16 vector_type and scalar_type in DEF_RVV_TYPE_INDEX.
> (bfloat16_wide_type): Ditto.
> (same_ratio_eew_bf16_type): Ditto.
> (main): Ditto.
> * config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
> (RVV_WHOLE_MODES): Add vector type for BFloat16.
> (RVV_FRACT_MODE): Ditto.
> (RVV_NF4_MODES): Ditto.
> (RVV_NF8_MODES): Ditto.
> (RVV_NF2_MODES): Ditto.
> * config/riscv/riscv-vector-builtins-types.def (vbfloat16mf4_t):
> (vbfloat16mf2_t): Add builtin vector type for BFloat16.
> (vbfloat16m1_t): Ditto.
> (vbfloat16m2_t): Ditto.
> (vbfloat16m4_t): Ditto.
> (vbfloat16m8_t): Ditto.
> (vbfloat16mf4x2_t): Ditto.
> (vbfloat16mf4x3_t): Ditto.
> (vbfloat16mf4x4_t): Ditto.
> (vbfloat16mf4x5_t): Ditto.
> (vbfloat16mf4x6_t): Ditto.
> (vbfloat16mf4x7_t): Ditto.
> (vbfloat16mf4x8_t): Ditto.
> (vbfloat16mf2x2_t): Ditto.
> (vbfloat16mf2x3_t): Ditto.
> (vbfloat16mf2x4_t): Ditto.
> (vbfloat16mf2x5_t): Ditto.
> (vbfloat16mf2x6_t): Ditto.
> (vbfloat16mf2x7_t): Ditto.
> (vbfloat16mf2x8_t): Ditto.
> (vbfloat16m1x2_t): Ditto.
> (vbfloat16m1x3_t): Ditto.
> (vbfloat16m1x4_t): Ditto.
> (vbfloat16m1x5_t): Ditto.
> (vbfloat16m1x6_t): Ditto.
> (vbfloat16m1x7_t): Ditto.
> (vbfloat16m1x8_t): Ditto.
> (vbfloat16m2x2_t): Ditto.
> (vbfloat16m2x3_t): Ditto.
> (vbfloat16m2x4_t): Ditto.
> (vbfloat16m4x2_t): Ditto.
> * config/riscv/riscv-vector-builtins.cc (check_required_extensions):
> Add required_ext checking for BFloat16.
> * config/riscv/riscv-vector-builtins.def (vbfloat16mf4_t):
> Add vector_type for BFloat16 in builtins.def.
> (vbfloat16mf4x2_t): Ditto.
> (vbfloat16mf4x3_t): Ditto.
> (vbfloat16mf4x4_t): Ditto.
> (vbfloat16mf4x5_t): Ditto.
> (vbfloat16mf4x6_t): Ditto.
> (vbfloat16mf4x7_t): Ditto.
> (vbfloat16mf4x8_t): Ditto.
> (vbfloat16mf2_t): Ditto.
> (vbfloat16mf2x2_t): Ditto.
> (vbfloat16mf2x3_t): Ditto.
> (vbfloat16mf2x4_t): Ditto.
> (vbfloat16mf2x5_t): Ditto.
> (vbfloat16mf2x6_t): Ditto.
> (vbfloat16mf2x7_t): Ditto.
> (vbfloat16mf2x8_t): Ditto.
> (vbfloat16m1_t): Ditto.
> (vbfloat16m1x2_t): Ditto.
> (vbfloat16m1x3_t): Ditto.
> (vbfloat16m1x4_t): Ditto.
> (vbfloat16m1x5_t): Ditto.
> (vbfloat16m1x6_t): Ditto.
> (vbfloat16m1x7_t): Ditto.
> (vbfloat16m1x8_t): Ditto.
> (vbfloat16m2_t): Ditto.
> (vbfloat16m2x2_t): Ditto.
> (vbfloat16m2x3_t): Ditto.
> (vbfloat16m2x4_t): Ditto.
> (vbfloat16m4_t): Ditto.
> (vbfloat16m4x2_t): Ditto.
> (vbfloat16m8_t): Ditto.
> (double_trunc_bfloat_scalar): Add scalar_type def for BFloat16.
> (double_trunc_bfloat_vector): Add vector_type def for BFloat16.
> * config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ELEN_BF_16):
> Add required defination of BFloat16 ext.
> * config/riscv/riscv-vector-switch.def (ENTRY):
> Add vector_type information for BFloat16.
> (TUPLE_ENTRY): Add tuple vector_type information for BFloat16.
>
> ---
>  gcc/config/riscv/genrvv-type-indexer.cc   | 115 ++
>  gcc/config/riscv/riscv-modes.def  |  30 -
>  .../riscv/riscv-vector-builtins-types.def |  50 
>  gcc/config/riscv/riscv-vector-builtins.cc |   7 +-
>  gcc/config/riscv/riscv-vector-builtins.def|  55 -
>  gcc/config/riscv/riscv-vector-builtins.h  |   1 +
>  gcc/config/riscv/riscv-vector-switch.def  |  36 ++
>  7 files changed, 291 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
> b/gcc/config/riscv/genrvv-type-indexer.cc
> index 27cbd14982c..8626ddeaaa8 100644
> --- a/gcc/config/riscv/genrvv-type-indexer.cc
> +++ b/gcc/config/riscv/genrvv-type-indexer.cc
> @@ -117,6 +117,42 @@ inttype (unsigned sew, int lmul_log2, unsigned nf, bool 
> unsigned_p)
>return mode.str ();
>  }
>
> +std::string
> +bfloat16_type (int lmul_log2)
> +{
> +  if (!valid_type (16, lmul_log2, /*float_t*/ true))
> +return "INVALID";
> +
> +  std::stringstream mode;
> +  mode << "vbfloat16" << to_l

[PATCH-1v4] Value Range: Add range op for builtin isinf

2024-07-11 Thread HAO CHEN GUI

Hi,
  The builtin isinf is not folded at front end if the corresponding optab
exists. It causes the range evaluation failed on the targets which has
optab_isinf. For instance, range-sincos.c will fail on the targets which
has optab_isinf as it calls builtin_isinf.

  This patch fixed the problem by adding range op for builtin isinf. It
also fixed the issue in PR114678.

  Compared with previous version, the main change is to remove xfail for
s390 in range-sincos.c and vrp-float-abs-1.c.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
Value Range: Add range op for builtin isinf

The builtin isinf is not folded at front end if the corresponding optab
exists.  So the range op for isinf is needed for value range analysis.
This patch adds range op for builtin isinf.

gcc/
PR target/114678
* gimple-range-op.cc (class cfn_isinf): New.
(op_cfn_isinf): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_FLT_FN (BUILT_IN_ISINF).

gcc/testsuite/
PR target/114678
* gcc.dg/tree-ssa/range-isinf.c: New test.
* gcc.dg/tree-ssa/range-sincos.c: Remove xfail for s390.
* gcc.dg/tree-ssa/vrp-float-abs-1.c: Likewise.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index a80b93cf063..24559951dd6 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1153,6 +1153,63 @@ private:
   bool m_is_pos;
 } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);

+// Implement range operator for CFN_BUILT_IN_ISINF
+class cfn_isinf : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange &r, tree type, const frange &op1,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isinf ())
+  {
+   wide_int one = wi::one (TYPE_PRECISION (type));
+   r.set (type, one, one);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || (!real_isinf (&op1.lower_bound ())
+   && !real_isinf (&op1.upper_bound (
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange &r, tree type, const irange &lhs,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   nan_state nan (true);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   // The range is [-INF,-INF][+INF,+INF], but it can't be represented.
+   // Set range to [-INF,+INF]
+   r.set_varying (type);
+   r.clear_nan ();
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isinf;

 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
@@ -1246,6 +1303,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = &op_cfn_signbit;
   break;

+CASE_FLT_FN (BUILT_IN_ISINF):
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = &op_cfn_isinf;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
new file mode 100644
index 000..468f1bcf5c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void
+test1 (double x)
+{
+  if (x > __DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test2 (float x)
+{
+  if (x > __FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test3 (double x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__)
+link_error ();
+}
+
+void
+test4 (float x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__)
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
index 35b38c3c914..337f9cda02f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range

Re: [PATCH] fixincludes: skip stdio_stdarg_h on darwin

2024-07-11 Thread Iain Sandoe



> On 10 Jul 2024, at 16:45, Iain Sandoe  wrote:

>> On 10 Jul 2024, at 16:25, FX Coudert  wrote:
>> 
>> I found another useless fixincludes on darwin, but this one was a bit harder 
>> to diagnose. GCC trunk applies a fix to  on modern Darwin: it is 
>> stdio_stdarg_h. That fix is actually part of a pair, along with 
>> stdio_va_list, and they appear to work around issues with some old Unix (or 
>> BSD?) headers and the definition of va_list. It is not entirely clear to me 
>> what they fix, but they have been here forever.
>> 
>> They use various bypass mechanisms, but those are fragile. I have no idea if 
>> the fix is actually needed on any still-supported system, and maybe some 
>> global reviewer might want to remove it. But for now, I only want to bypass 
>> the fix on Darwin: it is useless there, and applying it makes our builds 
>> more fragile (and sensitive to the SDK version). Solaris has already opted 
>> out years ago, and now we do the same.
>> 
>> To show the madness of this fix, the macOS headers actually contain a 
>> comment that is supposed to trigger the bypass:
>> 
>> /* DO NOT REMOVE THIS COMMENT: fixincludes needs to see:
>> * __gnuc_va_list and include  */
>> 
>> This kludge was added to the Apple headers in Libc-391 released around 2004. 
>> But it recently became ineffective, due to the majority of the content of 
>>  being moved into <_stdio.h> (which is not covered by fixincludes).
>> 
>> Anyway, the only sane thing to do is to disarm this fix on darwin, as the 
>> attached patch does.
> 
> Right, if the comment was added in 2004, we have no still-supported OS 
> versions that are relevant,
> 
>> Tested on aarch64-apple-darwin24, OK to push?
> 
> Yes, OK for trunk, and backports after some bake time,

I have reverted the commit with r15-1964-g619f587f68525178  since it breaks 
bootstrap on supported
x86_64 OS versions.  We need to revise it and check across the whole supported 
OS range.
thanks
Iain

Re: Ping^3 [PATCH-1v3] Value Range: Add range op for builtin isinf

2024-07-11 Thread HAO CHEN GUI

Hi Ruoyao,
  Thanks for your info. I updated my patch and sent it for review.

Thanks
Gui Haochen

在 2024/7/10 22:01, Xi Ruoyao 写道:
> On Wed, 2024-07-10 at 21:54 +0800, Xi Ruoyao wrote:
>> On Mon, 2024-07-01 at 09:11 +0800, HAO CHEN GUI wrote:
>>> Hi,
>>>   Gently ping it.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html
>>
>> I guess you can add PR114678 into the subject and the ChangeLog, and
>> also mention the patch in the bugzilla.
> 
> And, remove xfail in vrp-float-abs-1.c and range-sincos.c (if this patch
> works as intended they should no longer fail).
>

Re: [PATCH, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-07-11 Thread HAO CHEN GUI

Hi Jeff,

在 2024/7/11 6:25, Jeff Law 写道:
> OK.  But given this patch is several months old, can you re-bootstrap & test 
> before committing to the trunk.

Thanks. I will rebase the patch and test it again.

Thanks
Gui Haochen

Re: [Fortran, Patch, PR 96992, V4] Fix Class arrays of different ranks are rejected as storage association argument

2024-07-11 Thread Andre Vehreschild

Hi Harald,

thank you very much for ok'ing this large patch. Merged as
gcc-15-1965-ge4f2f46e015

Looking forward to get (no) bug reports ;-)

Thanks again,

Andre

On Wed, 10 Jul 2024 20:52:37 +0200
Harald Anlauf  wrote:

> Hi Andre,
>
> Am 10.07.24 um 10:45 schrieb Andre Vehreschild:
> > Hi Harald,
> >
> > thanks for the review. I totally agree, that this patch has gotten
> > bigger than I expected (and wanted). But things are as they are.
> >
> > About the coding style: I have worked in so many projects, that I
> > consider a consistent coding style luxury. I esp. do not have my
> > own one anymore. The formating you are seeing in my patches is the
> > result of clang-format with the provided parameter file in
> > contrib/clang-format. I was happy to have a tool to do the
> > formatting, that I could integrate into my IDE, because previously
> > it was hard to mimic the GNU style. I try to get to the GNU style
> > as good as possible, where I consider clang-format doing garbage.
> >
> > I see that clang-format has a "very specific opinion" on how to
> > format the lines you mentioned, but it will "correct" them any time
> > I change them and touch them later. I now have forbidden
> > clang-format to touch the code lines, but this means to add
> > formatter specific comments. Is this ok?
>
> yes, this is much better now!  Thanks.
>
> (I entirely rely on Emacs' formatting when working with C.  Sometimes
> the indentation at first may appear unexpected, but in most of these
> cases I find that it helps to just use explicit parentheses to
> convince Emacs.  This is documented.)
>
> > About the assumed size arrays, that was a small change and is added
> > now.
>
> Great!
>
> > Note, the runtime part of the patch (pr96992_3p1.patch) did not
> > change and is therefore not updated.
> >
> > Regtests ok on x86_64-pc-linux-gnu/Fedora 39. Ok for mainline?
>
> Yes, this is OK now.
>
> Thanks for the patch and your patience ;-)
>
> Harald
>
>
> > Regards,
> > Andre
> >
> > On Fri, 5 Jul 2024 22:10:16 +0200
> > Harald Anlauf  wrote:
> >
> >> Hi Andre,
> >>
> >> Am 03.07.24 um 12:58 schrieb Andre Vehreschild:
> >>> Hi Harald,
> >>>
> >>> I am sorry for the long delay, but fixing the negative stride
> >>> lead from one issue to the next. I finally got a version that
> >>> does not regress. Please have a look.
> >>>
> >>> This patch has two parts:
> >>> 1. The runtime library part in pr96992_3p1.patch and
> >>> 2. the compiler changes in pr96992_3p2.patch.
> >>>
> >>> In my branch also the two patches from Paul for pr59104 and
> >>> pr102689 are living, which might lead to small shifts during
> >>> application of the patches.
> >>>
> >>> NOTE, this patch adds internal packing and unpacking of class
> >>> arrays similar to the regular pack and unpack. I think this is
> >>> necessary, because the regular un-/pack does not use the vptr's
> >>> _copy routine for moving data and therefore may produce bugs.
> >>>
> >>> The un-/pack_class routines are yet only used for converting a
> >>> derived type array to a class array. Extending their use when a
> >>> UN-/PACK() is applied on a class array is still to be done (as
> >>> part of another PR).
> >>>
> >>> Regtests fine on x86_64-pc-linux-gnu/ Fedora 39.
> >>
> >> this is a really huge patch to review, and I am not sure that I
> >> can do this without help from others.  Paul?  Anybody else?
> >>
> >> As far as I can tell for now:
> >>
> >> - pr96992_3p1.patch (the libgfortran part) looks good to me.
> >>
> >> - git had some whitespace issues with pr96992_3p2.patch as
> >> attached, but I could fix that locally and do some testing
> >> parallel to reading.
> >>
> >> A few advance comments on the latter patch:
> >>
> >> - my understanding is that the PR at the end of a summary line
> >> should be like in:
> >>
> >> Fortran: Fix rejecting class arrays of different ranks as storage
> >> association argument [PR96992]
> >>
> >> I was told that this helps people explicitly scanning for the
> >> PR number in that place.
> >>
> >> - some rewrites of logical conditions change the coding style from
> >> what it recommended GNU coding style, and I find the more
> >> compact way used in some places harder to grok (but that may be
> >> just me). Example:
> >>
> >> @@ -8850,20 +8857,24 @@ gfc_conv_array_parameter (gfc_se * se,
> >> gfc_expr
> >> * expr, bool g77,
> >>  /* There is no need to pack and unpack the array, if it is
> >> contiguous and not a deferred- or assumed-shape array, or if it is
> >> simply contiguous.  */
> >> -  no_pack = ((sym && sym->as
> >> -&& !sym->attr.pointer
> >> -&& sym->as->type != AS_DEFERRED
> >> -&& sym->as->type != AS_ASSUMED_RANK
> >> -&& sym->as->type != AS_ASSUMED_SHAPE)
> >> -||
> >> -   (ref && ref->u.ar.as
> >> -&& ref->u.ar.as->type != AS_DEFERRED
> >> +  no_pack = false;
> >> +  gfc_array_spec *as;
> >> +  if (sym)
> >> +{
> >> +  symbol_attribute

[Patch, tree-optimization, predcom] Improve unroll factor for predictive commoning

2024-07-11 Thread Ajit Agarwal

Hello All:

Unroll factor is determined with max distance across loop iterations.
The logic for determining the loop unroll factor is based on
data dependency across loop iterations.

The max distance across loop iterations is the unrolling factor
that helps in predictive commoning.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

tree-optimization, predcom: Improve unroll factor for predictive commoning

Unroll factor is determined with max distance across loop iterations.
The logic for determining the loop unroll factor is based on
data dependency across loop iterations.

The max distance across loop iterations is the unrolling factor
that helps in predictive commoning.

2024-07-11  Ajit Kumar Agarwal  

gcc/ChangeLog:

* tree-predcom.cc: Change in determining unroll factor with
data dependence across loop iterations.
---
 gcc/tree-predcom.cc | 51 ++---
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc
index 9844fee1e97..029b02f5990 100644
--- a/gcc/tree-predcom.cc
+++ b/gcc/tree-predcom.cc
@@ -409,6 +409,7 @@ public:
   /* Perform the predictive commoning optimization for chains, make this
  public for being called in callback execute_pred_commoning_cbck.  */
   void execute_pred_commoning (bitmap tmp_vars);
+  unsigned determine_unroll_factor (const vec &chains);
 
 private:
   /* The pointer to the given loop.  */
@@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p 
chain,
copies as possible.  CHAINS is the list of chains that will be
optimized.  */
 
-static unsigned
-determine_unroll_factor (const vec &chains)
+unsigned
+pcom_worker::determine_unroll_factor (const vec &chains)
 {
   chain_p chain;
-  unsigned factor = 1, af, nfactor, i;
+  unsigned factor = 1, i;
   unsigned max = param_max_unroll_times;
+  struct data_dependence_relation *ddr;
+  unsigned nfactor = 0;
+  int nzfactor = 0;
+
+  /* Best unroll factor is the maximum distance across loop
+ iterations.  */
+  FOR_EACH_VEC_ELT (m_dependences, i, ddr)
+{
+  for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++)
+   {
+ lambda_vector vec = DDR_DIST_VECT (ddr, j);
+ widest_int distance = vec[j];
+ unsigned offset = distance.to_uhwi ();
+ if (offset == 0)
+   continue;
+
+ int dist = offset - nzfactor;
+ if (dist  == 0)
+   continue;
 
+ if (nfactor == 0)
+   {
+ nfactor = offset;
+ nzfactor = offset;
+   }
+ else if (dist <= nzfactor)
+   nfactor = offset;
+
+ if (nfactor > 0 && nfactor <= max)
+   factor = nfactor;
+   }
+}
+
+  int max_use = 0;
   FOR_EACH_VEC_ELT (chains, i, chain)
 {
   if (chain->type == CT_INVARIANT)
@@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec &chains)
  continue;
}
 
-  /* The best unroll factor for this chain is equal to the number of
-temporary variables that we create for it.  */
-  af = chain->length;
   if (chain->has_max_use_after)
-   af++;
-
-  nfactor = factor * af / gcd (factor, af);
-  if (nfactor <= max)
-   factor = nfactor;
+   max_use++;
 }
-
+  factor += max_use;
   return factor;
 }
 
-- 
2.43.5

[PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark

2024-07-11 Thread pan2 . li

From: Pan Li 

This patch would like to add the test cases for the vector .SAT_SUB in
the zip benchmark.  Aka:

Form in zip benchmark:
  #define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \
  void __attribute__((noinline))\
  vec_sat_u_sub_##T1##_##T2##_fmt_zip (T1 *x, T2 b, unsigned limit) \
  { \
T2 a;   \
T1 *p = x;  \
do {\
  a = *--p; \
  *p = (T1)(a >= b ? a - b : 0);\
} while (--limit);  \
  }

DEF_VEC_SAT_U_SUB_ZIP(uint8_t, uint16_t)

vec_sat_u_sub_uint16_t_uint32_t_fmt_zip:
  ...
  vsetvli   a4,zero,e32,m1,ta,ma
  vmv.v.x   v6,a1
  vsetvli   zero,zero,e16,mf2,ta,ma
  vid.v v2
  lia4,-1
  vnclipu.wiv6,v6,0   // .SAT_TRUNC
.L3:
  vle16.v   v3,0(a3)
  vrsub.vx  v5,v2,a6
  mva7,a4
  addw  a4,a4,t3
  vrgather.vv   v1,v3,v5
  vssubu.vv v1,v1,v6  // .SAT_SUB
  vrgather.vv   v3,v1,v5
  vse16.v   v3,0(a3)
  sub   a3,a3,t1
  bgtu  t4,a4,.L3

Passed the rv64gcv tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: Add test
data for .SAT_SUB in zip benchmark.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h   | 18 +
 .../rvv/autovec/binop/vec_sat_binary_vx.h | 22 +
 .../riscv/rvv/autovec/binop/vec_sat_data.h| 81 +++
 .../rvv/autovec/binop/vec_sat_u_sub_zip-run.c | 16 
 .../rvv/autovec/binop/vec_sat_u_sub_zip.c | 18 +
 5 files changed, 155 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 10459807b2c..416a1e49a47 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -322,6 +322,19 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 } \
 }
 
+#define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \
+void __attribute__((noinline))\
+vec_sat_u_sub_##T1##_##T2##_fmt_zip (T1 *x, T2 b, unsigned limit) \
+{ \
+  T2 a;   \
+  T1 *p = x;  \
+  do {\
+a = *--p; \
+*p = (T1)(a >= b ? a - b : 0);\
+  } while (--limit);  \
+}
+#define DEF_VEC_SAT_U_SUB_ZIP_WRAP(T1, T2) DEF_VEC_SAT_U_SUB_ZIP(T1, T2)
+
 #define RUN_VEC_SAT_U_SUB_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_1(out, op_1, op_2, N)
 
@@ -352,6 +365,11 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 #define RUN_VEC_SAT_U_SUB_FMT_10(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_10(out, op_1, op_2, N)
 
+#define RUN_VEC_SAT_U_SUB_FMT_ZIP(T1, T2, x, b, N) \
+  vec_sat_u_sub_##T1##_##T2##_fmt_zip(x, b, N)
+#define RUN_VEC_SAT_U_SUB_FMT_ZIP_WRAP(T1, T2, x, b, N) \
+  RUN_VEC_SAT_U_SUB_FMT_ZIP(T1, T2, x, b, N) \
+
 
/**/
 /* Saturation Sub Truncated (Unsigned and Signed) 
*/
 
/**/
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
new file mode 100644
index 000..d238c6392de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
@@ -0,0 +1,22 @@
+#ifndef HAVE_DEFINED_VEC_SAT_BINARY_VX_H
+#define HAVE_DEFINED_VEC_SAT_BINARY_VX_H
+
+int
+main ()
+{
+  unsigned i, k;
+  T d;
+
+  for (i =

[PATCH] tree-optimization/115868 - ICE with .MASK_CALL in simdclone

2024-07-11 Thread Richard Biener

The following adjusts mask recording which didn't take into account
that we can merge call arguments from two vectors like

  _50 = {vect_d_1.253_41, vect_d_1.254_43};
  _51 = VIEW_CONVERT_EXPR(mask__19.257_49);
  _52 = (unsigned int) _51;
  _53 = _Z3bazd.simdclone.7 (_50, _52);
  _54 = BIT_FIELD_REF <_53, 256, 0>;
  _55 = BIT_FIELD_REF <_53, 256, 256>;

The testcase g++.dg/vect/pr68762-2.cc exercises this on x86_64 with
partial vector usage enabled and AVX512 support.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

PR tree-optimization/115868
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Adjust
the number of mask copies required for vect_record_loop_mask.
---
 gcc/tree-vect-stmts.cc | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 8b5d82c005c..5c9f2329ad3 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4349,9 +4349,14 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
case SIMD_CLONE_ARG_TYPE_MASK:
  if (loop_vinfo
  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
-   vect_record_loop_mask (loop_vinfo,
-  &LOOP_VINFO_MASKS (loop_vinfo),
-  ncopies, vectype, op);
+   {
+ unsigned mult
+   = exact_div (bestn->simdclone->simdlen,
+TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
+ vect_record_loop_mask (loop_vinfo,
+&LOOP_VINFO_MASKS (loop_vinfo),
+ncopies * mult, vectype, op);
+   }
 
  break;
}
-- 
2.35.3

[PATCH] tree-optimization/115867 - ICE with simdcall vectorization in masked loop

2024-07-11 Thread Richard Biener

When only a loop mask is to be supplied for the inbranch arg to a
simd function we fail to handle integer mode masks correctly.  We
need to guess the number of elements represented by it.  This assumes
that excess arguments are all for masks, I wasn't able to create
a simdclone with more than one integer mode mask argument.

The gcc.dg/vect/vect-simd-clone-20.c exercises this with -mavx512vl

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

PR tree-optimization/115867
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Properly
guess the number of mask elements for integer mode masks.
---
 gcc/tree-vect-stmts.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 5c9f2329ad3..712399e46a3 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4753,7 +4753,12 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
  SIMD_CLONE_ARG_TYPE_MASK);
 
  tree masktype = bestn->simdclone->args[mask_i].vector_type;
- callee_nelements = TYPE_VECTOR_SUBPARTS (masktype);
+ if (SCALAR_INT_MODE_P (bestn->simdclone->mask_mode))
+   /* Guess the number of lanes represented by masktype.  */
+   callee_nelements = exact_div (bestn->simdclone->simdlen,
+ bestn->simdclone->nargs - nargs);
+ else
+   callee_nelements = TYPE_VECTOR_SUBPARTS (masktype);
  o = vector_unroll_factor (nunits, callee_nelements);
  for (m = j * o; m < (j + 1) * o; m++)
{
-- 
2.35.3

Re: [PATCH 10/10] autovectorizer: Test autovectorization of different dot-prod modes.

2024-07-11 Thread Richard Biener

On Thu, Jul 11, 2024 at 9:03 AM Tamar Christina  wrote:
>
> Hi Victor,
>
> > -Original Message-
> > From: Victor Do Nascimento 
> > Sent: Wednesday, July 10, 2024 3:06 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Richard Earnshaw
> > ; Victor Do Nascimento
> > 
> > Subject: [PATCH 10/10] autovectorizer: Test autovectorization of different 
> > dot-
> > prod modes.
> >
> > From: Victor Do Nascimento 
> >
> > Given the novel treatment of the dot product optab as a conversion we
> > are now able to target, for a given architecture, different
> > relationships between output modes and input modes.
> >
> > This is made clearer by way of example. Previously, on AArch64, the
> > following loop was vectorizable:
> >
> > uint32_t udot4(int n, uint8_t* data) {
> >   uint32_t sum = 0;
> >   for (int i=0; i > sum += data[i] * data[i];
> >   return sum;
> > }
> >
> > while the following wasn't:
> >
> > uint32_t udot2(int n, uint16_t* data) {
> >   uint32_t sum = 0;
> >   for (int i=0; i > sum += data[i] * data[i];
> >   return sum;
> > }
> >
> > Under the new treatment of the dot product optab, they are both now
> > vectorizable.
> >
> > This adds the relevant target-agnostic check to ensure this behaviour
> > in the autovectorizer.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/vect/vect-dotprod-twoway.c: New.
> > ---
> >  .../gcc.dg/vect/vect-dotprod-twoway.c | 38 +++
> >  1 file changed, 38 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > new file mode 100644
> > index 000..5caa7b81fce
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > @@ -0,0 +1,38 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* Ensure both the two-way and four-way dot products are autovectorized.  
> > */
> > +#include 
> > +
> > +uint32_t udot4(int n, uint8_t* data) {
> > +  uint32_t sum = 0;
> > +  for (int i=0; i > +sum += data[i] * data[i];
> > +  }
> > +  return sum;
> > +}
> > +
> > +int32_t sdot4(int n, int8_t* data) {
> > +  int32_t sum = 0;
> > +  for (int i=0; i > +sum += data[i] * data[i];
> > +  }
> > +  return sum;
> > +}
> > +
> > +uint32_t udot2(int n, uint16_t* data) {
> > +  uint32_t sum = 0;
> > +  for (int i=0; i > +sum += data[i] * data[i];
> > +  }
> > +  return sum;
> > +}
> > +
> > +int32_t sdot2(int n, int16_t* data) {
> > +  int32_t sum = 0;
> > +  for (int i=0; i > +sum += data[i] * data[i];
> > +  }
> > +  return sum;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
>
> These tests only test that you have vectorized the loops, not that the loop 
> was vectorized
> using dotprod.  I think you want to have a scan for DOT_PROD_EXPR as well, 
> gated to the
> targets that support two-way dot prod.

Ideally they'd also verify correctness, thus make them have runtime checks.

> Cheers,
> Tamar
>
> > --
> > 2.34.1
>

[PATCH 1/4] vect: Shorten name of macro SLP_TREE_NUMBER_OF_VEC_STMTS

2024-07-11 Thread Feng Xue OS

This patch series are recomposed and split from
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655974.html.

As I will add a new field tightly coupled with "vec_stmts_size", if following
naming conversion as original, the new macro would be very long. So better
to choose samely meaningful but shorter names, the patch makes change for
this macro, the other new patch would handle the new field and macro
accordingly as this.

Thanks,
Feng

---
gcc/
* tree-vectorizer.h (SLP_TREE_NUMBER_OF_VEC_STMTS): Change the macro
to SLP_TREE_VEC_STMTS_NUM.
* tree-vect-stmts.cc (vect_model_simple_cost): Likewise.
(check_load_store_for_partial_vectors): Likewise.
(vectorizable_bswap): Likewise.
(vectorizable_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_shift): Likewise. And replace direct field reference
to "vec_stmts_size" with the new macro.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
* tree-vect-loop.cc (vect_reduction_update_partial_vector_usage):
Likewise.
(vectorizable_reduction): Likewise.
(vect_transform_reduction): Likewise.
(vectorizable_phi): Likewise.
(vectorizable_recurr): Likewise.
(vectorizable_induction): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Likewise.
(vect_slp_analyze_node_operations_1): Likewise.
(vect_prologue_cost_for_slp): Likewise.
(vect_slp_analyze_node_operations): Likewise.
(vect_create_constant_vectors): Likewise.
(vect_get_slp_vect_def): Likewise.
(vect_transform_slp_perm_load_1): Likewise.
(vectorizable_slp_permutation_1): Likewise.
(vect_schedule_slp_node): Likewise.
(vectorize_slp_instance_root_stmt): Likewise.
---
 gcc/tree-vect-loop.cc  | 17 +++---
 gcc/tree-vect-slp.cc   | 34 +--
 gcc/tree-vect-stmts.cc | 52 --
 gcc/tree-vectorizer.h  |  2 +-
 4 files changed, 51 insertions(+), 54 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a64b5082bd1..c183e2b6068 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7471,7 +7471,7 @@ vect_reduction_update_partial_vector_usage (loop_vec_info 
loop_vinfo,
   unsigned nvectors;
 
   if (slp_node)
-   nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
+   nvectors = SLP_TREE_VEC_STMTS_NUM (slp_node);
   else
nvectors = vect_get_num_copies (loop_vinfo, vectype_in);
 
@@ -8121,7 +8121,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
|| reduction_type == CONST_COND_REDUCTION
|| reduction_type == EXTRACT_LAST_REDUCTION)
   && slp_node
-  && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1)
+  && SLP_TREE_VEC_STMTS_NUM (slp_node) > 1)
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -8600,7 +8600,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   if (slp_node)
 {
   ncopies = 1;
-  vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
+  vec_num = SLP_TREE_VEC_STMTS_NUM (slp_node);
 }
   else
 {
@@ -9196,7 +9196,7 @@ vectorizable_phi (vec_info *,
 for the scalar and the vector PHIs.  This avoids artificially
 favoring the vector path (but may pessimize it in some cases).  */
   if (gimple_phi_num_args (as_a  (stmt_info->stmt)) > 1)
-   record_stmt_cost (cost_vec, SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node),
+   record_stmt_cost (cost_vec, SLP_TREE_VEC_STMTS_NUM (slp_node),
  vector_stmt, stmt_info, vectype, 0, vect_body);
   STMT_VINFO_TYPE (stmt_info) = phi_info_type;
   return true;
@@ -9304,7 +9304,7 @@ vectorizable_recurr (loop_vec_info loop_vinfo, 
stmt_vec_info stmt_info,
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   unsigned ncopies;
   if (slp_node)
-ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
+ncopies = SLP_TREE_VEC_STMTS_NUM (slp_node);
   else
 ncopies = vect_get_num_copies (loop_vinfo, vectype);
   poly_int64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
@@ -10217,8 +10217,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
  }
  /* loop cost for vec_loop.  */
  inside_cost
-   = record_stmt_cost (cost_vec,
-   SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node),
+   = record_stmt_cost (cost_vec, SLP_TREE_VEC_STMTS_NUM (slp_node),
vector_stmt, stmt_info, 0, vect_body);
  /* prologue cost for vec_init (if not nested) and step.  */
  prologue_cost = record_stmt_cost (cost_vec, 1 + !nested_in_vect_loop,
@@ -10289,7 +10288,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
}
 
   /*

Re: [Patch, tree-optimization, predcom] Improve unroll factor for predictive commoning

2024-07-11 Thread Richard Biener

On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal  wrote:
>
> Hello All:
>
> Unroll factor is determined with max distance across loop iterations.
> The logic for determining the loop unroll factor is based on
> data dependency across loop iterations.
>
> The max distance across loop iterations is the unrolling factor
> that helps in predictive commoning.

The old comment in the code says

> -  /* The best unroll factor for this chain is equal to the number of
> -temporary variables that we create for it.  */

why is that wrong and why is the max dependence distance more correct?

Do you have a testcase that shows how this makes a (positive) difference?

Richard.

> Bootstrapped and regtested on powerpc64-linux-gnu.
>
> Thanks & Regards
> Ajit
>
> tree-optimization, predcom: Improve unroll factor for predictive commoning
>
> Unroll factor is determined with max distance across loop iterations.
> The logic for determining the loop unroll factor is based on
> data dependency across loop iterations.
>
> The max distance across loop iterations is the unrolling factor
> that helps in predictive commoning.
>
> 2024-07-11  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
> * tree-predcom.cc: Change in determining unroll factor with
> data dependence across loop iterations.
> ---
>  gcc/tree-predcom.cc | 51 ++---
>  1 file changed, 39 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc
> index 9844fee1e97..029b02f5990 100644
> --- a/gcc/tree-predcom.cc
> +++ b/gcc/tree-predcom.cc
> @@ -409,6 +409,7 @@ public:
>/* Perform the predictive commoning optimization for chains, make this
>   public for being called in callback execute_pred_commoning_cbck.  */
>void execute_pred_commoning (bitmap tmp_vars);
> +  unsigned determine_unroll_factor (const vec &chains);
>
>  private:
>/* The pointer to the given loop.  */
> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p 
> chain,
> copies as possible.  CHAINS is the list of chains that will be
> optimized.  */
>
> -static unsigned
> -determine_unroll_factor (const vec &chains)
> +unsigned
> +pcom_worker::determine_unroll_factor (const vec &chains)
>  {
>chain_p chain;
> -  unsigned factor = 1, af, nfactor, i;
> +  unsigned factor = 1, i;
>unsigned max = param_max_unroll_times;
> +  struct data_dependence_relation *ddr;
> +  unsigned nfactor = 0;
> +  int nzfactor = 0;
> +
> +  /* Best unroll factor is the maximum distance across loop
> + iterations.  */
> +  FOR_EACH_VEC_ELT (m_dependences, i, ddr)
> +{
> +  for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++)
> +   {
> + lambda_vector vec = DDR_DIST_VECT (ddr, j);
> + widest_int distance = vec[j];
> + unsigned offset = distance.to_uhwi ();
> + if (offset == 0)
> +   continue;
> +
> + int dist = offset - nzfactor;
> + if (dist  == 0)
> +   continue;
>
> + if (nfactor == 0)
> +   {
> + nfactor = offset;
> + nzfactor = offset;
> +   }
> + else if (dist <= nzfactor)
> +   nfactor = offset;
> +
> + if (nfactor > 0 && nfactor <= max)
> +   factor = nfactor;
> +   }
> +}
> +
> +  int max_use = 0;
>FOR_EACH_VEC_ELT (chains, i, chain)
>  {
>if (chain->type == CT_INVARIANT)
> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec &chains)
>   continue;
> }
>
> -  /* The best unroll factor for this chain is equal to the number of
> -temporary variables that we create for it.  */
> -  af = chain->length;
>if (chain->has_max_use_after)
> -   af++;
> -
> -  nfactor = factor * af / gcd (factor, af);
> -  if (nfactor <= max)
> -   factor = nfactor;
> +   max_use++;
>  }
> -
> +  factor += max_use;
>return factor;
>  }
>
> --
> 2.43.5
>

[PATCH 2/4] vect: Fix inaccurate vector stmts number for slp reduction with lane-reducing

2024-07-11 Thread Feng Xue OS

Vector stmts number of an operation is calculated based on output vectype.
This is over-estimated for lane-reducing operation. Sometimes, to workaround
the issue, we have to rely on additional logic to deduce an exactly accurate
number by other means. Aiming at the inconvenience, in this patch, we would
"turn" lane-reducing operation into a normal one by inserting new trivial
statements like zero-valued PHIs and pass-through copies, which could be
optimized away by later passes. At the same time, a new field is added for
slp node to hold number of vector stmts that are really effective after
vectorization. For example:

  int sum = 1;
  for (i)
{
  sum += d0[i] * d1[i];  // dot-prod 
}

  The vector size is 128-bit，vectorization factor is 16.  Reduction
  statements would be transformed as:

  vector<4> int sum_v0 = { 0, 0, 0, 1 };
  vector<4> int sum_v1 = { 0, 0, 0, 0 };
  vector<4> int sum_v2 = { 0, 0, 0, 0 };
  vector<4> int sum_v3 = { 0, 0, 0, 0 };

  for (i / 16)
{
  sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
  sum_v1 = sum_v1;  // copy
  sum_v2 = sum_v2;  // copy
  sum_v3 = sum_v3;  // copy
}

  sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3;   // = sum_v0

Thanks,
Feng

---
gcc/
* tree-vectorizer.h (vec_stmts_effec_size): New field in _slp_tree.
(SLP_TREE_VEC_STMTS_EFFEC_NUM): New macro.
(vect_get_num_vectors): New overload function.
(vect_get_slp_num_vectors): New function.
* tree-vect-loop.cc (vect_reduction_update_partial_vector_usage): Use
effective vector stmts number.
(vectorizable_reduction): Compute number of effective vector stmts for
lane-reducing op and reduction PHI.
(vect_transform_reduction): Insert copies for lane-reducing so as to
fix inaccurate vector stmts number.
(vect_transform_cycle_phi): Only need to calculate vector PHI number
based on input vectype for non-slp code path.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize effective vector
stmts number to zero.
(vect_slp_analyze_node_operations_1): Remove adjustment on vector
stmts number specific to slp reduction.
(vect_slp_analyze_node_operations): Compute number of vector elements
for constant/external slp node with vect_get_slp_num_vectors.
---
 gcc/tree-vect-loop.cc | 139 --
 gcc/tree-vect-slp.cc  |  56 ++---
 gcc/tree-vectorizer.h |  45 ++
 3 files changed, 183 insertions(+), 57 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index c183e2b6068..5ad9836d6c8 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7471,7 +7471,7 @@ vect_reduction_update_partial_vector_usage (loop_vec_info 
loop_vinfo,
   unsigned nvectors;
 
   if (slp_node)
-   nvectors = SLP_TREE_VEC_STMTS_NUM (slp_node);
+   nvectors = SLP_TREE_VEC_STMTS_EFFEC_NUM (slp_node);
   else
nvectors = vect_get_num_copies (loop_vinfo, vectype_in);
 
@@ -7594,6 +7594,15 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   stmt_vec_info phi_info = stmt_info;
   if (!is_a  (stmt_info->stmt))
 {
+  if (lane_reducing_stmt_p (stmt_info->stmt) && slp_node)
+   {
+ /* Compute number of effective vector statements for lane-reducing
+ops.  */
+ vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (stmt_info);
+ gcc_assert (vectype_in);
+ SLP_TREE_VEC_STMTS_EFFEC_NUM (slp_node)
+   = vect_get_slp_num_vectors (loop_vinfo, slp_node, vectype_in);
+   }
   STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
   return true;
 }
@@ -8012,14 +8021,25 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   if (STMT_VINFO_LIVE_P (phi_info))
 return false;
 
-  if (slp_node)
-ncopies = 1;
-  else
-ncopies = vect_get_num_copies (loop_vinfo, vectype_in);
+  poly_uint64 nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
 
-  gcc_assert (ncopies >= 1);
+  if (slp_node)
+{
+  ncopies = 1;
 
-  poly_uint64 nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
+  if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype_in), nunits_out))
+   {
+ /* Not all vector reduction PHIs would be used, compute number
+of the effective statements.  */
+ SLP_TREE_VEC_STMTS_EFFEC_NUM (slp_node)
+   = vect_get_slp_num_vectors (loop_vinfo, slp_node, vectype_in);
+   }
+}
+  else
+{
+  ncopies = vect_get_num_copies (loop_vinfo, vectype_in);
+  gcc_assert (ncopies >= 1);
+}
 
   if (nested_cycle)
 {
@@ -8360,7 +8380,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
|| (slp_node
   && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)
   && SLP_TREE_LANES (slp_node) == 1
-  && vect_get_num_copies (loop_vinfo, vectype_in) > 1))
+  && SLP_TREE_VEC_STMTS_EFFEC_NUM (slp_node) > 1))
   && (STMT_VINFO_RELEVANT (stmt

Re: [Fortran, Patch, PR 96992, V4] Fix Class arrays of different ranks are rejected as storage association argument

2024-07-11 Thread Richard Biener

On Thu, Jul 11, 2024 at 10:04 AM Andre Vehreschild  wrote:
>
> Hi Harald,
>
> thank you very much for ok'ing this large patch. Merged as
> gcc-15-1965-ge4f2f46e015
>
> Looking forward to get (no) bug reports ;-)

This seems to break bootstrap with

../../gcc/gcc/fortran/trans-array.cc: In function ‘void
gfc_conv_array_paramete (gfc_se*, gfc_expr*, bool, const gfc_symbol*,
const char*, tree_node**, tree_node**, tree_node**)’:
../../gcc/gcc/fortran/trans-array.cc:9135:41: error: ‘pack_attr’ may
be used uninitialized [-Werror=maybe-uninitialized]
 9135 |   tmp = build_call_expr_loc (input_location,
  | ^~~~
 9136 |
gfor_fndecl_in_unpack_class, 4, tmp,
  |

 9137 |  packedptr,
  |  ~~
 9138 |  size_in_bytes
(TREE_TYPE (ctree)),
  |
~~
 9139 |  pack_attr);
  |  ~~
../../gcc/gcc/fortran/trans-array.cc:8665:8: note: ‘pack_attr’ was declared here
 8665 |   tree pack_attr;
  |^
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:1198: fortran/trans-array.o] Error 1


> Thanks again,
>
> Andre
>
> On Wed, 10 Jul 2024 20:52:37 +0200
> Harald Anlauf  wrote:
>
> > Hi Andre,
> >
> > Am 10.07.24 um 10:45 schrieb Andre Vehreschild:
> > > Hi Harald,
> > >
> > > thanks for the review. I totally agree, that this patch has gotten
> > > bigger than I expected (and wanted). But things are as they are.
> > >
> > > About the coding style: I have worked in so many projects, that I
> > > consider a consistent coding style luxury. I esp. do not have my
> > > own one anymore. The formating you are seeing in my patches is the
> > > result of clang-format with the provided parameter file in
> > > contrib/clang-format. I was happy to have a tool to do the
> > > formatting, that I could integrate into my IDE, because previously
> > > it was hard to mimic the GNU style. I try to get to the GNU style
> > > as good as possible, where I consider clang-format doing garbage.
> > >
> > > I see that clang-format has a "very specific opinion" on how to
> > > format the lines you mentioned, but it will "correct" them any time
> > > I change them and touch them later. I now have forbidden
> > > clang-format to touch the code lines, but this means to add
> > > formatter specific comments. Is this ok?
> >
> > yes, this is much better now!  Thanks.
> >
> > (I entirely rely on Emacs' formatting when working with C.  Sometimes
> > the indentation at first may appear unexpected, but in most of these
> > cases I find that it helps to just use explicit parentheses to
> > convince Emacs.  This is documented.)
> >
> > > About the assumed size arrays, that was a small change and is added
> > > now.
> >
> > Great!
> >
> > > Note, the runtime part of the patch (pr96992_3p1.patch) did not
> > > change and is therefore not updated.
> > >
> > > Regtests ok on x86_64-pc-linux-gnu/Fedora 39. Ok for mainline?
> >
> > Yes, this is OK now.
> >
> > Thanks for the patch and your patience ;-)
> >
> > Harald
> >
> >
> > > Regards,
> > > Andre
> > >
> > > On Fri, 5 Jul 2024 22:10:16 +0200
> > > Harald Anlauf  wrote:
> > >
> > >> Hi Andre,
> > >>
> > >> Am 03.07.24 um 12:58 schrieb Andre Vehreschild:
> > >>> Hi Harald,
> > >>>
> > >>> I am sorry for the long delay, but fixing the negative stride
> > >>> lead from one issue to the next. I finally got a version that
> > >>> does not regress. Please have a look.
> > >>>
> > >>> This patch has two parts:
> > >>> 1. The runtime library part in pr96992_3p1.patch and
> > >>> 2. the compiler changes in pr96992_3p2.patch.
> > >>>
> > >>> In my branch also the two patches from Paul for pr59104 and
> > >>> pr102689 are living, which might lead to small shifts during
> > >>> application of the patches.
> > >>>
> > >>> NOTE, this patch adds internal packing and unpacking of class
> > >>> arrays similar to the regular pack and unpack. I think this is
> > >>> necessary, because the regular un-/pack does not use the vptr's
> > >>> _copy routine for moving data and therefore may produce bugs.
> > >>>
> > >>> The un-/pack_class routines are yet only used for converting a
> > >>> derived type array to a class array. Extending their use when a
> > >>> UN-/PACK() is applied on a class array is still to be done (as
> > >>> part of another PR).
> > >>>
> > >>> Regtests fine on x86_64-pc-linux-gnu/ Fedora 39.
> > >>
> > >> this is a really huge patch to review, and I am not sure that I
> > >> can do this without help from others.  Paul?  Anybody else?
> > >>
> > >> As far as I can tell for now:
> > >>
> > >> - pr96992_3p1.patch (the libgfortran part) looks good to me.
> > >>
> > >> - git had some whitespace issues with pr96992_3p2.patch as
> > >>

[PATCH 3/4] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-07-11 Thread Feng Xue OS

For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current
vectorizer could only handle the pattern if the reduction chain does not
contain other operation, no matter the other is normal or lane-reducing.

This patches removes some constraints in reduction analysis to allow multiple
arbitrary lane-reducing operations with mixed input vectypes in a loop
reduction chain. For example:

   int sum = 1;
   for (i)
 {
   sum += d0[i] * d1[i];  // dot-prod 
   sum += w[i];   // widen-sum 
   sum += abs(s0[i] - s1[i]); // sad 
 }

The vector size is 128-bit vectorization factor is 16. Reduction statements
would be transformed as:

   vector<4> int sum_v0 = { 0, 0, 0, 1 };
   vector<4> int sum_v1 = { 0, 0, 0, 0 };
   vector<4> int sum_v2 = { 0, 0, 0, 0 };
   vector<4> int sum_v3 = { 0, 0, 0, 0 };

   for (i / 16)
 {
   sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0);
   sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1);
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy
 }

sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3;   // = sum_v0 + sum_v1

Thanks,
Feng

---
gcc/
PR tree-optimization/114440
* tree-vectorizer.h (vectorizable_lane_reducing): New function
declaration.
* tree-vect-stmts.cc (vect_analyze_stmt): Call new function
vectorizable_lane_reducing to analyze lane-reducing operation.
* tree-vect-loop.cc (vect_model_reduction_cost): Remove cost computation
code related to emulated_mixed_dot_prod.
(vectorizable_lane_reducing): New function.
(vectorizable_reduction): Allow multiple lane-reducing operations in
loop reduction. Move some original lane-reducing related code to
vectorizable_lane_reducing.
(vect_transform_reduction): Extend transformation to support reduction
statements with mixed input vectypes for non-slp code path.

gcc/testsuite/
PR tree-optimization/114440
* gcc.dg/vect/vect-reduc-chain-1.c
* gcc.dg/vect/vect-reduc-chain-2.c
* gcc.dg/vect/vect-reduc-chain-3.c
* gcc.dg/vect/vect-reduc-chain-dot-slp-1.c
* gcc.dg/vect/vect-reduc-chain-dot-slp-2.c
* gcc.dg/vect/vect-reduc-chain-dot-slp-3.c
* gcc.dg/vect/vect-reduc-chain-dot-slp-4.c
* gcc.dg/vect/vect-reduc-dot-slp-1.c
---
 .../gcc.dg/vect/vect-reduc-chain-1.c  |  64 
 .../gcc.dg/vect/vect-reduc-chain-2.c  |  79 +
 .../gcc.dg/vect/vect-reduc-chain-3.c  |  68 +
 .../gcc.dg/vect/vect-reduc-chain-dot-slp-1.c  |  95 ++
 .../gcc.dg/vect/vect-reduc-chain-dot-slp-2.c  |  67 
 .../gcc.dg/vect/vect-reduc-chain-dot-slp-3.c  |  79 +
 .../gcc.dg/vect/vect-reduc-chain-dot-slp-4.c  |  63 
 .../gcc.dg/vect/vect-reduc-dot-slp-1.c|  60 
 gcc/tree-vect-loop.cc | 285 +-
 gcc/tree-vect-stmts.cc|   2 +
 gcc/tree-vectorizer.h |   2 +
 11 files changed, 785 insertions(+), 79 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-1.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
new file mode 100644
index 000..80b0089ea0f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
@@ -0,0 +1,64 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { 
aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_dotprod_neon }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 signed
+#endif
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res,
+   SIGNEDNESS_2 char *restrict a,
+   SIGNEDNESS_2 char *restrict b,
+   SIGNEDNESS_2 char *restrict c,
+   SIGNEDNESS_2 char *restrict d,
+   SIGNEDNESS_1 int *restrict e)
+{
+  for (int i = 0; i < N; ++i)
+

[PATCH 4/4] vect: Optimize order of lane-reducing statements in loop def-use cycles

2024-07-11 Thread Feng Xue OS

When transforming multiple lane-reducing operations in a loop reduction chain,
originally, corresponding vectorized statements are generated into def-use
cycles starting from 0. The def-use cycle with smaller index, would contain
more statements, which means more instruction dependency. For example:

   int sum = 1;
   for (i)
 {
   sum += d0[i] * d1[i];  // dot-prod 
   sum += w[i];   // widen-sum 
   sum += abs(s0[i] - s1[i]); // sad 
   sum += n[i];   // normal 
 }

Original transformation result:

   for (i / 16)
 {
   sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0);
   sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1);
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   ...
 }

For a higher instruction parallelism in final vectorized loop, an optimal
means is to make those effective vector lane-reducing ops be distributed
evenly among all def-use cycles. Transformed as the below, DOT_PROD,
WIDEN_SUM and SADs are generated into disparate cycles, instruction
dependency among them could be eliminated.

   for (i / 16)
 {
   sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = sum_v0;  // copy
   sum_v1 = WIDEN_SUM (w_v1[i: 0 ~ 15], sum_v1);
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = sum_v0;  // copy
   sum_v1 = sum_v1;  // copy
   sum_v2 = SAD (s0_v2[i: 0 ~ 7 ], s1_v2[i: 0 ~ 7 ], sum_v2);
   sum_v3 = SAD (s0_v3[i: 8 ~ 15], s1_v3[i: 8 ~ 15], sum_v3);

   ...
 }

Thanks,
Feng

---
gcc/
PR tree-optimization/114440
* tree-vectorizer.h (struct _stmt_vec_info): Add a new field
reduc_result_pos.
(vect_transform_reduction): Add a new parameter of slp_instance type.
* tree-vect-stmts.cc (vect_transform_stmt): Add a new argument
slp_node_instance to vect_transform_reduction.
* tree-vect-loop.cc (vect_transform_reduction): Add a new parameter
slp_node_instance. Generate lane-reducing statements in an optimized
order.
---
 gcc/tree-vect-loop.cc  | 73 +++---
 gcc/tree-vect-stmts.cc |  3 +-
 gcc/tree-vectorizer.h  |  8 -
 3 files changed, 71 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a3374fb2d1a..841ef4c9120 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8673,7 +8673,8 @@ vect_emulate_mixed_dot_prod (loop_vec_info loop_vinfo, 
stmt_vec_info stmt_info,
 bool
 vect_transform_reduction (loop_vec_info loop_vinfo,
  stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
- gimple **vec_stmt, slp_tree slp_node)
+ gimple **vec_stmt, slp_tree slp_node,
+ slp_instance slp_node_instance)
 {
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
@@ -8863,6 +8864,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   sum += d0[i] * d1[i];  // dot-prod 
   sum += w[i];   // widen-sum 
   sum += abs(s0[i] - s1[i]); // sad 
+  sum += n[i];   // normal 
 }
 
 The vector size is 128-bit，vectorization factor is 16.  Reduction
@@ -8880,25 +8882,30 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy
 
-  sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0);
-  sum_v1 = sum_v1;  // copy
+  sum_v0 = sum_v0;  // copy
+  sum_v1 = WIDEN_SUM (w_v1[i: 0 ~ 15], sum_v1);
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy
 
-  sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0);
-  sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1);
-  sum_v2 = sum_v2;  // copy
+  sum_v0 = sum_v0;  // copy
+  sum_v1 = SAD (s0_v1[i: 0 ~ 7 ], s1_v1[i: 0 ~ 7 ], sum_v1);
+  sum_v2 = SAD (s0_v2[i: 8 ~ 15], s1_v2[i: 8 ~ 15], sum_v2);
   sum_v3 = sum_v3;  // copy
+
+  sum_v0 += n_v0[i: 0  ~ 3 ];
+  sum_v1 += n_v1[i: 4  ~ 7 ];
+  sum_v2 += n_v2[i: 8  ~ 11];
+  sum_v3 += n_v3[i: 12 ~ 15];
 }
 
-  sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3;   // = sum_v0 + sum_v1
-   */
+Moreover, for a higher instruction pa

Re: [Fortran, Patch, PR 96992, V4] Fix Class arrays of different ranks are rejected as storage association argument

2024-07-11 Thread Richard Biener

On Thu, Jul 11, 2024 at 10:54 AM Richard Biener
 wrote:
>
> On Thu, Jul 11, 2024 at 10:04 AM Andre Vehreschild  wrote:
> >
> > Hi Harald,
> >
> > thank you very much for ok'ing this large patch. Merged as
> > gcc-15-1965-ge4f2f46e015
> >
> > Looking forward to get (no) bug reports ;-)
>
> This seems to break bootstrap with
>
> ../../gcc/gcc/fortran/trans-array.cc: In function ‘void
> gfc_conv_array_paramete (gfc_se*, gfc_expr*, bool, const gfc_symbol*,
> const char*, tree_node**, tree_node**, tree_node**)’:
> ../../gcc/gcc/fortran/trans-array.cc:9135:41: error: ‘pack_attr’ may
> be used uninitialized [-Werror=maybe-uninitialized]
>  9135 |   tmp = build_call_expr_loc (input_location,
>   | ^~~~
>  9136 |
> gfor_fndecl_in_unpack_class, 4, tmp,
>   |
> 
>  9137 |  packedptr,
>   |  ~~
>  9138 |  size_in_bytes
> (TREE_TYPE (ctree)),
>   |
> ~~
>  9139 |  pack_attr);
>   |  ~~
> ../../gcc/gcc/fortran/trans-array.cc:8665:8: note: ‘pack_attr’ was declared 
> here
>  8665 |   tree pack_attr;
>   |^
> cc1plus: all warnings being treated as errors
> make[3]: *** [Makefile:1198: fortran/trans-array.o] Error 1

It seems to be a false positive but GCCs little mind is too weak to prove that
(yes, we error on the side of emitting a diagnostic if we can't prove it's
initialized)

Richard.

>
> > Thanks again,
> >
> > Andre
> >
> > On Wed, 10 Jul 2024 20:52:37 +0200
> > Harald Anlauf  wrote:
> >
> > > Hi Andre,
> > >
> > > Am 10.07.24 um 10:45 schrieb Andre Vehreschild:
> > > > Hi Harald,
> > > >
> > > > thanks for the review. I totally agree, that this patch has gotten
> > > > bigger than I expected (and wanted). But things are as they are.
> > > >
> > > > About the coding style: I have worked in so many projects, that I
> > > > consider a consistent coding style luxury. I esp. do not have my
> > > > own one anymore. The formating you are seeing in my patches is the
> > > > result of clang-format with the provided parameter file in
> > > > contrib/clang-format. I was happy to have a tool to do the
> > > > formatting, that I could integrate into my IDE, because previously
> > > > it was hard to mimic the GNU style. I try to get to the GNU style
> > > > as good as possible, where I consider clang-format doing garbage.
> > > >
> > > > I see that clang-format has a "very specific opinion" on how to
> > > > format the lines you mentioned, but it will "correct" them any time
> > > > I change them and touch them later. I now have forbidden
> > > > clang-format to touch the code lines, but this means to add
> > > > formatter specific comments. Is this ok?
> > >
> > > yes, this is much better now!  Thanks.
> > >
> > > (I entirely rely on Emacs' formatting when working with C.  Sometimes
> > > the indentation at first may appear unexpected, but in most of these
> > > cases I find that it helps to just use explicit parentheses to
> > > convince Emacs.  This is documented.)
> > >
> > > > About the assumed size arrays, that was a small change and is added
> > > > now.
> > >
> > > Great!
> > >
> > > > Note, the runtime part of the patch (pr96992_3p1.patch) did not
> > > > change and is therefore not updated.
> > > >
> > > > Regtests ok on x86_64-pc-linux-gnu/Fedora 39. Ok for mainline?
> > >
> > > Yes, this is OK now.
> > >
> > > Thanks for the patch and your patience ;-)
> > >
> > > Harald
> > >
> > >
> > > > Regards,
> > > > Andre
> > > >
> > > > On Fri, 5 Jul 2024 22:10:16 +0200
> > > > Harald Anlauf  wrote:
> > > >
> > > >> Hi Andre,
> > > >>
> > > >> Am 03.07.24 um 12:58 schrieb Andre Vehreschild:
> > > >>> Hi Harald,
> > > >>>
> > > >>> I am sorry for the long delay, but fixing the negative stride
> > > >>> lead from one issue to the next. I finally got a version that
> > > >>> does not regress. Please have a look.
> > > >>>
> > > >>> This patch has two parts:
> > > >>> 1. The runtime library part in pr96992_3p1.patch and
> > > >>> 2. the compiler changes in pr96992_3p2.patch.
> > > >>>
> > > >>> In my branch also the two patches from Paul for pr59104 and
> > > >>> pr102689 are living, which might lead to small shifts during
> > > >>> application of the patches.
> > > >>>
> > > >>> NOTE, this patch adds internal packing and unpacking of class
> > > >>> arrays similar to the regular pack and unpack. I think this is
> > > >>> necessary, because the regular un-/pack does not use the vptr's
> > > >>> _copy routine for moving data and therefore may produce bugs.
> > > >>>
> > > >>> The un-/pack_class routines are yet only used for converting a
> > > >>> derived type array to a class array. Extending their use when a
> > > >>> UN-/PACK() is applied o

Re: [Fortran, Patch, PR 96992, V4] Fix Class arrays of different ranks are rejected as storage association argument

2024-07-11 Thread Richard Biener

On Thu, Jul 11, 2024 at 11:04 AM Andre Vehreschild  wrote:
>
> Hi Richard,
>
> I am sorry to hear that. Shall I revert?

I would suggest to instead fix by initializing the variable with NULL
(and a comment).

> - Andre
> On Thu, 11 Jul 2024 10:57:48 +0200
> Richard Biener  wrote:
>
> > On Thu, Jul 11, 2024 at 10:54 AM Richard Biener
> >  wrote:
> > >
> > > On Thu, Jul 11, 2024 at 10:04 AM Andre Vehreschild 
> > > wrote:
> > > >
> > > > Hi Harald,
> > > >
> > > > thank you very much for ok'ing this large patch. Merged as
> > > > gcc-15-1965-ge4f2f46e015
> > > >
> > > > Looking forward to get (no) bug reports ;-)
> > >
> > > This seems to break bootstrap with
> > >
> > > ../../gcc/gcc/fortran/trans-array.cc: In function ‘void
> > > gfc_conv_array_paramete (gfc_se*, gfc_expr*, bool, const
> > > gfc_symbol*, const char*, tree_node**, tree_node**, tree_node**)’:
> > > ../../gcc/gcc/fortran/trans-array.cc:9135:41: error: ‘pack_attr’ may
> > > be used uninitialized [-Werror=maybe-uninitialized]
> > >  9135 |   tmp = build_call_expr_loc (input_location,
> > >   | ^~~~
> > >  9136 |
> > > gfor_fndecl_in_unpack_class, 4, tmp,
> > >   |
> > > 
> > >  9137 |  packedptr,
> > >   |  ~~
> > >  9138 |  size_in_bytes
> > > (TREE_TYPE (ctree)),
> > >   |
> > > ~~
> > >  9139 |  pack_attr);
> > >   |  ~~
> > > ../../gcc/gcc/fortran/trans-array.cc:8665:8: note: ‘pack_attr’ was
> > > declared here 8665 |   tree pack_attr;
> > >   |^
> > > cc1plus: all warnings being treated as errors
> > > make[3]: *** [Makefile:1198: fortran/trans-array.o] Error 1
> >
> > It seems to be a false positive but GCCs little mind is too weak to
> > prove that (yes, we error on the side of emitting a diagnostic if we
> > can't prove it's initialized)
> >
> > Richard.
> >
> > >
> > > > Thanks again,
> > > >
> > > > Andre
> > > >
> > > > On Wed, 10 Jul 2024 20:52:37 +0200
> > > > Harald Anlauf  wrote:
> > > >
> > > > > Hi Andre,
> > > > >
> > > > > Am 10.07.24 um 10:45 schrieb Andre Vehreschild:
> > > > > > Hi Harald,
> > > > > >
> > > > > > thanks for the review. I totally agree, that this patch has
> > > > > > gotten bigger than I expected (and wanted). But things are as
> > > > > > they are.
> > > > > >
> > > > > > About the coding style: I have worked in so many projects,
> > > > > > that I consider a consistent coding style luxury. I esp. do
> > > > > > not have my own one anymore. The formating you are seeing in
> > > > > > my patches is the result of clang-format with the provided
> > > > > > parameter file in contrib/clang-format. I was happy to have a
> > > > > > tool to do the formatting, that I could integrate into my
> > > > > > IDE, because previously it was hard to mimic the GNU style. I
> > > > > > try to get to the GNU style as good as possible, where I
> > > > > > consider clang-format doing garbage.
> > > > > >
> > > > > > I see that clang-format has a "very specific opinion" on how
> > > > > > to format the lines you mentioned, but it will "correct" them
> > > > > > any time I change them and touch them later. I now have
> > > > > > forbidden clang-format to touch the code lines, but this
> > > > > > means to add formatter specific comments. Is this ok?
> > > > >
> > > > > yes, this is much better now!  Thanks.
> > > > >
> > > > > (I entirely rely on Emacs' formatting when working with C.
> > > > > Sometimes the indentation at first may appear unexpected, but
> > > > > in most of these cases I find that it helps to just use
> > > > > explicit parentheses to convince Emacs.  This is documented.)
> > > > >
> > > > > > About the assumed size arrays, that was a small change and is
> > > > > > added now.
> > > > >
> > > > > Great!
> > > > >
> > > > > > Note, the runtime part of the patch (pr96992_3p1.patch) did
> > > > > > not change and is therefore not updated.
> > > > > >
> > > > > > Regtests ok on x86_64-pc-linux-gnu/Fedora 39. Ok for
> > > > > > mainline?
> > > > >
> > > > > Yes, this is OK now.
> > > > >
> > > > > Thanks for the patch and your patience ;-)
> > > > >
> > > > > Harald
> > > > >
> > > > >
> > > > > > Regards,
> > > > > > Andre
> > > > > >
> > > > > > On Fri, 5 Jul 2024 22:10:16 +0200
> > > > > > Harald Anlauf  wrote:
> > > > > >
> > > > > >> Hi Andre,
> > > > > >>
> > > > > >> Am 03.07.24 um 12:58 schrieb Andre Vehreschild:
> > > > > >>> Hi Harald,
> > > > > >>>
> > > > > >>> I am sorry for the long delay, but fixing the negative
> > > > > >>> stride lead from one issue to the next. I finally got a
> > > > > >>> version that does not regress. Please have a look.
> > > > > >>>
> > > > > >>> This patch has two parts:
> > > > > >>> 1. T

Re: [Fortran, Patch, PR 96992, V4] Fix Class arrays of different ranks are rejected as storage association argument

2024-07-11 Thread Andre Vehreschild

Hi Richard,

I am sorry to hear that. Shall I revert?

- Andre
On Thu, 11 Jul 2024 10:57:48 +0200
Richard Biener  wrote:

> On Thu, Jul 11, 2024 at 10:54 AM Richard Biener
>  wrote:
> >
> > On Thu, Jul 11, 2024 at 10:04 AM Andre Vehreschild 
> > wrote:  
> > >
> > > Hi Harald,
> > >
> > > thank you very much for ok'ing this large patch. Merged as
> > > gcc-15-1965-ge4f2f46e015
> > >
> > > Looking forward to get (no) bug reports ;-)  
> >
> > This seems to break bootstrap with
> >
> > ../../gcc/gcc/fortran/trans-array.cc: In function ‘void
> > gfc_conv_array_paramete (gfc_se*, gfc_expr*, bool, const
> > gfc_symbol*, const char*, tree_node**, tree_node**, tree_node**)’:
> > ../../gcc/gcc/fortran/trans-array.cc:9135:41: error: ‘pack_attr’ may
> > be used uninitialized [-Werror=maybe-uninitialized]
> >  9135 |   tmp = build_call_expr_loc (input_location,
> >   | ^~~~
> >  9136 |
> > gfor_fndecl_in_unpack_class, 4, tmp,
> >   |
> > 
> >  9137 |  packedptr,
> >   |  ~~
> >  9138 |  size_in_bytes
> > (TREE_TYPE (ctree)),
> >   |
> > ~~
> >  9139 |  pack_attr);
> >   |  ~~
> > ../../gcc/gcc/fortran/trans-array.cc:8665:8: note: ‘pack_attr’ was
> > declared here 8665 |   tree pack_attr;
> >   |^
> > cc1plus: all warnings being treated as errors
> > make[3]: *** [Makefile:1198: fortran/trans-array.o] Error 1  
> 
> It seems to be a false positive but GCCs little mind is too weak to
> prove that (yes, we error on the side of emitting a diagnostic if we
> can't prove it's initialized)
> 
> Richard.
> 
> >  
> > > Thanks again,
> > >
> > > Andre
> > >
> > > On Wed, 10 Jul 2024 20:52:37 +0200
> > > Harald Anlauf  wrote:
> > >  
> > > > Hi Andre,
> > > >
> > > > Am 10.07.24 um 10:45 schrieb Andre Vehreschild:  
> > > > > Hi Harald,
> > > > >
> > > > > thanks for the review. I totally agree, that this patch has
> > > > > gotten bigger than I expected (and wanted). But things are as
> > > > > they are.
> > > > >
> > > > > About the coding style: I have worked in so many projects,
> > > > > that I consider a consistent coding style luxury. I esp. do
> > > > > not have my own one anymore. The formating you are seeing in
> > > > > my patches is the result of clang-format with the provided
> > > > > parameter file in contrib/clang-format. I was happy to have a
> > > > > tool to do the formatting, that I could integrate into my
> > > > > IDE, because previously it was hard to mimic the GNU style. I
> > > > > try to get to the GNU style as good as possible, where I
> > > > > consider clang-format doing garbage.
> > > > >
> > > > > I see that clang-format has a "very specific opinion" on how
> > > > > to format the lines you mentioned, but it will "correct" them
> > > > > any time I change them and touch them later. I now have
> > > > > forbidden clang-format to touch the code lines, but this
> > > > > means to add formatter specific comments. Is this ok?  
> > > >
> > > > yes, this is much better now!  Thanks.
> > > >
> > > > (I entirely rely on Emacs' formatting when working with C.
> > > > Sometimes the indentation at first may appear unexpected, but
> > > > in most of these cases I find that it helps to just use
> > > > explicit parentheses to convince Emacs.  This is documented.)
> > > >  
> > > > > About the assumed size arrays, that was a small change and is
> > > > > added now.  
> > > >
> > > > Great!
> > > >  
> > > > > Note, the runtime part of the patch (pr96992_3p1.patch) did
> > > > > not change and is therefore not updated.
> > > > >
> > > > > Regtests ok on x86_64-pc-linux-gnu/Fedora 39. Ok for
> > > > > mainline?  
> > > >
> > > > Yes, this is OK now.
> > > >
> > > > Thanks for the patch and your patience ;-)
> > > >
> > > > Harald
> > > >
> > > >  
> > > > > Regards,
> > > > > Andre
> > > > >
> > > > > On Fri, 5 Jul 2024 22:10:16 +0200
> > > > > Harald Anlauf  wrote:
> > > > >  
> > > > >> Hi Andre,
> > > > >>
> > > > >> Am 03.07.24 um 12:58 schrieb Andre Vehreschild:  
> > > > >>> Hi Harald,
> > > > >>>
> > > > >>> I am sorry for the long delay, but fixing the negative
> > > > >>> stride lead from one issue to the next. I finally got a
> > > > >>> version that does not regress. Please have a look.
> > > > >>>
> > > > >>> This patch has two parts:
> > > > >>> 1. The runtime library part in pr96992_3p1.patch and
> > > > >>> 2. the compiler changes in pr96992_3p2.patch.
> > > > >>>
> > > > >>> In my branch also the two patches from Paul for pr59104 and
> > > > >>> pr102689 are living, which might lead to small shifts during
> > > > >>> application of the patches.
> > > > >>>
> > > > >>> NOTE, this patch adds internal packing and u

[pushed] wwwdocs: gcc-3.*: Drop FTP links to kernel tarballs

2024-07-11 Thread Gerald Pfeifer

Most browsers these days do not support FTP any longer, and while
kernel.org offers downloads via other means, realistically, who is
going to download 2.4.0 or 2.4.18 kernels anyways?
---
 htdocs/gcc-3.0/criteria.html | 4 +---
 htdocs/gcc-3.1/criteria.html | 4 +---
 htdocs/gcc-3.3/criteria.html | 4 +---
 htdocs/gcc-3.4/criteria.html | 4 +---
 4 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/htdocs/gcc-3.0/criteria.html b/htdocs/gcc-3.0/criteria.html
index 146bf056..57e7e897 100644
--- a/htdocs/gcc-3.0/criteria.html
+++ b/htdocs/gcc-3.0/criteria.html
@@ -229,9 +229,7 @@ different programming languages.
 http://www.kernel.org";>Linux kernel
 C
 2.4.0
-ftp://ftp.kernel.org/pub/linux/kernel/v2.4/linux-2.4.0.tar.gz";>
-linux-2.4.0.tar.gz
+linux-2.4.0.tar.gz
 
 http://www.gnu.org/software/emacs/";>GNU Emacs
 C
diff --git a/htdocs/gcc-3.1/criteria.html b/htdocs/gcc-3.1/criteria.html
index 078303be..13d5a7bd 100644
--- a/htdocs/gcc-3.1/criteria.html
+++ b/htdocs/gcc-3.1/criteria.html
@@ -245,9 +245,7 @@ shown here are used for GCC 3.1 integration testing.
 https://www.kernel.org";>Linux kernel
 C
 2.4.18
-ftp://ftp.kernel.org/pub/linux/kernel/v2.4/linux-2.4.18.tar.bz2";>
-linux-2.4.18.tar.gz
+linux-2.4.18.tar.gz
  
 
 http://www.osl.iu.edu/research/mtl/";>MTL
diff --git a/htdocs/gcc-3.3/criteria.html b/htdocs/gcc-3.3/criteria.html
index ceb21ac7..ec32b28d 100644
--- a/htdocs/gcc-3.3/criteria.html
+++ b/htdocs/gcc-3.3/criteria.html
@@ -249,9 +249,7 @@ shown here are used for GCC 3.3 integration testing.
 http://www.kernel.org";>Linux kernel
 C
 2.4.18
-ftp://ftp.kernel.org/pub/linux/kernel/v2.4/linux-2.4.18.tar.bz2";>
-linux-2.4.18.tar.gz
+linux-2.4.18.tar.gz
  
 
 http://www.osl.iu.edu/research/mtl/";>MTL
diff --git a/htdocs/gcc-3.4/criteria.html b/htdocs/gcc-3.4/criteria.html
index e3d84320..8860aa36 100644
--- a/htdocs/gcc-3.4/criteria.html
+++ b/htdocs/gcc-3.4/criteria.html
@@ -249,9 +249,7 @@ shown here are used for GCC 3.4 integration testing.
 http://www.kernel.org";>Linux kernel
 C
 2.4.18
-ftp://ftp.kernel.org/pub/linux/kernel/v2.4/linux-2.4.18.tar.bz2";>
-linux-2.4.18.tar.gz
+linux-2.4.18.tar.gz
  
 
 http://www.osl.iu.edu/research/mtl/";>MTL
-- 
2.45.2

Re: [Fortran, Patch, PR 96992, V4] Fix Class arrays of different ranks are rejected as storage association argument

2024-07-11 Thread Andre Vehreschild

Hi Richard,

would that be sufficient? Bootstrap is still running for me...

From c30c2cf829a094ba5e4c2c31333bed6e8c0d32af Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Thu, 11 Jul 2024 11:21:04 +0200
Subject: [PATCH] [Fortran] Fix bootstrap broken by gcc-15-1965-ge4f2f46e015

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_array_parameter): Init variable to
NULL_TREE to fix bootstrap.
---
 gcc/fortran/trans-array.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 0fffa07495c..5558ab69969 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -8750,7 +8750,7 @@ gfc_conv_array_parameter (gfc_se *se, gfc_expr *expr, 
bool g77,
   tree stmt;
   tree parent = DECL_CONTEXT (current_function_decl);
   tree ctree;
-  tree pack_attr;
+  tree pack_attr = NULL_TREE; /* Set when packing class arrays.  */
   bool full_array_var;
   bool this_array_result;
   bool contiguous;
-- 
2.45.2

Sorry for the breakage.

Regards,
Andre

On Thu, 11 Jul 2024 11:06:47 +0200
Richard Biener  wrote:

> On Thu, Jul 11, 2024 at 11:04 AM Andre Vehreschild 
> wrote:
> >
> > Hi Richard,
> >
> > I am sorry to hear that. Shall I revert?  
> 
> I would suggest to instead fix by initializing the variable with NULL
> (and a comment).
> 
> > - Andre
> > On Thu, 11 Jul 2024 10:57:48 +0200
> > Richard Biener  wrote:
> >  
> > > On Thu, Jul 11, 2024 at 10:54 AM Richard Biener
> > >  wrote:  
> > > >
> > > > On Thu, Jul 11, 2024 at 10:04 AM Andre Vehreschild
> > > >  wrote:  
> > > > >
> > > > > Hi Harald,
> > > > >
> > > > > thank you very much for ok'ing this large patch. Merged as
> > > > > gcc-15-1965-ge4f2f46e015
> > > > >
> > > > > Looking forward to get (no) bug reports ;-)  
> > > >
> > > > This seems to break bootstrap with
> > > >
> > > > ../../gcc/gcc/fortran/trans-array.cc: In function ‘void
> > > > gfc_conv_array_paramete (gfc_se*, gfc_expr*, bool, const
> > > > gfc_symbol*, const char*, tree_node**, tree_node**,
> > > > tree_node**)’: ../../gcc/gcc/fortran/trans-array.cc:9135:41:
> > > > error: ‘pack_attr’ may be used uninitialized
> > > > [-Werror=maybe-uninitialized] 9135 |   tmp =
> > > > build_call_expr_loc (input_location, |
> > > > ^~~~ 9136 |
> > > > gfor_fndecl_in_unpack_class, 4, tmp,
> > > >   |
> > > > 
> > > >  9137 |  packedptr,
> > > >   |  ~~
> > > >  9138 |  size_in_bytes
> > > > (TREE_TYPE (ctree)),
> > > >   |
> > > > ~~
> > > >  9139 |  pack_attr);
> > > >   |  ~~
> > > > ../../gcc/gcc/fortran/trans-array.cc:8665:8: note: ‘pack_attr’
> > > > was declared here 8665 |   tree pack_attr;
> > > >   |^
> > > > cc1plus: all warnings being treated as errors
> > > > make[3]: *** [Makefile:1198: fortran/trans-array.o] Error 1  
> > >
> > > It seems to be a false positive but GCCs little mind is too weak
> > > to prove that (yes, we error on the side of emitting a diagnostic
> > > if we can't prove it's initialized)
> > >
> > > Richard.
> > >  
> > > >  
> > > > > Thanks again,
> > > > >
> > > > > Andre
> > > > >
> > > > > On Wed, 10 Jul 2024 20:52:37 +0200
> > > > > Harald Anlauf  wrote:
> > > > >  
> > > > > > Hi Andre,
> > > > > >
> > > > > > Am 10.07.24 um 10:45 schrieb Andre Vehreschild:  
> > > > > > > Hi Harald,
> > > > > > >
> > > > > > > thanks for the review. I totally agree, that this patch
> > > > > > > has gotten bigger than I expected (and wanted). But
> > > > > > > things are as they are.
> > > > > > >
> > > > > > > About the coding style: I have worked in so many projects,
> > > > > > > that I consider a consistent coding style luxury. I esp.
> > > > > > > do not have my own one anymore. The formating you are
> > > > > > > seeing in my patches is the result of clang-format with
> > > > > > > the provided parameter file in contrib/clang-format. I
> > > > > > > was happy to have a tool to do the formatting, that I
> > > > > > > could integrate into my IDE, because previously it was
> > > > > > > hard to mimic the GNU style. I try to get to the GNU
> > > > > > > style as good as possible, where I consider clang-format
> > > > > > > doing garbage.
> > > > > > >
> > > > > > > I see that clang-format has a "very specific opinion" on
> > > > > > > how to format the lines you mentioned, but it will
> > > > > > > "correct" them any time I change them and touch them
> > > > > > > later. I now have forbidden clang-format to touch the
> > > > > > > code lines, but this means to add formatter specific
> > > > > > > comments. Is this ok?  
> > > > > >
> > > > > > yes, this is much better now!  Thanks.
> > > > > >
> > > > > > (I en

[pushed] wwwdocs: news: Update Google Summer of Code 2014 project link

2024-07-11 Thread Gerald Pfeifer

Quite a different link - kudos to Google for the proper redirect alerting 
of the change _and_ providing the new URL.

Pushed.

Gerald
---
 htdocs/news.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/news.html b/htdocs/news.html
index f13a8249..5e782349 100644
--- a/htdocs/news.html
+++ b/htdocs/news.html
@@ -328,7 +328,7 @@
 GCC Google Summer of Code 2014
 [2014-02-24] wwwdocs:
 GCC has been accepted as a
-http://www.google-melange.com/gsoc/org2/google/gsoc2014/gcc";>Google 
Summer of Code 2014 project.
+https://www.google-melange.com/archive/gsoc/2014/orgs/gcc";>Google 
Summer of Code 2014 project.
 Students, mentors and project ideas welcome!
 
 Intel AVX-512 support
-- 
2.45.2

Re: [PATCH 2/4] vect: Fix inaccurate vector stmts number for slp reduction with lane-reducing

2024-07-11 Thread Richard Biener

On Thu, Jul 11, 2024 at 10:53 AM Feng Xue OS
 wrote:
>
> Vector stmts number of an operation is calculated based on output vectype.
> This is over-estimated for lane-reducing operation. Sometimes, to workaround
> the issue, we have to rely on additional logic to deduce an exactly accurate
> number by other means. Aiming at the inconvenience, in this patch, we would
> "turn" lane-reducing operation into a normal one by inserting new trivial
> statements like zero-valued PHIs and pass-through copies, which could be
> optimized away by later passes. At the same time, a new field is added for
> slp node to hold number of vector stmts that are really effective after
> vectorization. For example:

Adding Richard into the loop.

I'm sorry, but this feels a bit backwards - in the end I was hoping that we
can get rid of SLP_TREE_NUMBER_OF_VEC_STMTS completely.
We do currently have the odd ncopies (non-SLP) vs. vec_num (SLP)
duality but in reality all vectorizable_* should know the number of
stmt copies (or output vector defs) to produce by looking at the vector
type and the vectorization factor (and in the SLP case the number of
lanes represented by the node).

That means that in the end vectorizable_* could at transform time
simply make sure that SLP_TREE_VEC_DEF is appropriately
created (currently generic code does this based on
SLP_TREE_NUMBER_OF_VEC_STMTS and also generic code
tries to determine SLP_TREE_NUMBER_OF_VEC_STMTS).

There are a lot (well, not too many) uses of SLP_TREE_NUMBER_OF_VEC_STMTS
that short-cut "appropriate" vec_num computation based on
VF, vector type and lanes.  I hope all of them could vanish.

You add vect_get_[slp_]num_vectors, but I'd rather see a single

inline unsigned int
vect_get_num_copies (vec_info *vinfo, tree vectype, slp_tree node = nullptr)
{
   if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
 {
if (node)
  return vect_get_num_vectors (LOOP_VINFO_VECT_FACTOR
(loop_vinfo) * SLP_TREE_LANES (node), vectype);
else
  return vect_get_num_vectors (LOOP_VINFO_VECT_FACTOR
(loop_vinfo), vectype);
 }
  else
 return vect_get_num_vectors (SLP_TREE_LANES (node), vectype);
}

so that

  if (slp_node)
{
  ncopies = 1;
  vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
}
  else
{
  ncopies = vect_get_num_copies (loop_vinfo, vectype);
  vec_num = 1;
}

can become

  if (slp_node)
{
  ncopies = 1;
  vec_num = vect_get_num_copies (loop_vinfo, vectype, slp_node);
}
  else
{
  ncopies = vect_get_num_copies (loop_vinfo, vectype/*, slp_node */);
  vec_num = 1;
}

without actually resolving the ncopies/vec_num duality (that will solve itself
when we have achieved only-SLP).

I think if SLP_TREE_NUMBER_OF_VEC_STMTS is gone then having the
few places you needed to fix compute the correct number of stmts should be OK.
Note this will probably require moving the allocation of SLP_TREE_VEC_DEFS
to the code doing the transform (and remove/adjust some sanity checking we do).

Thanks,
Richard.

>   int sum = 1;
>   for (i)
> {
>   sum += d0[i] * d1[i];  // dot-prod 
> }
>
>   The vector size is 128-bit，vectorization factor is 16.  Reduction
>   statements would be transformed as:
>
>   vector<4> int sum_v0 = { 0, 0, 0, 1 };
>   vector<4> int sum_v1 = { 0, 0, 0, 0 };
>   vector<4> int sum_v2 = { 0, 0, 0, 0 };
>   vector<4> int sum_v3 = { 0, 0, 0, 0 };
>
>   for (i / 16)
> {
>   sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
>   sum_v1 = sum_v1;  // copy
>   sum_v2 = sum_v2;  // copy
>   sum_v3 = sum_v3;  // copy
> }
>
>   sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3;   // = sum_v0
>
> Thanks,
> Feng
>
> ---
> gcc/
> * tree-vectorizer.h (vec_stmts_effec_size): New field in _slp_tree.
> (SLP_TREE_VEC_STMTS_EFFEC_NUM): New macro.
> (vect_get_num_vectors): New overload function.
> (vect_get_slp_num_vectors): New function.
> * tree-vect-loop.cc (vect_reduction_update_partial_vector_usage): Use
> effective vector stmts number.
> (vectorizable_reduction): Compute number of effective vector stmts for
> lane-reducing op and reduction PHI.
> (vect_transform_reduction): Insert copies for lane-reducing so as to
> fix inaccurate vector stmts number.
> (vect_transform_cycle_phi): Only need to calculate vector PHI number
> based on input vectype for non-slp code path.
> * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize effective vector
> stmts number to zero.
> (vect_slp_analyze_node_operations_1): Remove adjustment on vector
> stmts number specific to slp reduction.
> (vect_slp_analyze_node_operations): Compute number of vector elements
> for constant/external slp node with vect_get_slp_num_vectors.
> ---
>  gcc/tree-vect-loop.cc | 139 --
>  gcc/tree-vect-slp.cc  |  56 ++-

RE: Lower zeroing array assignment to memset for allocatable arrays

2024-07-11 Thread Prathamesh Kulkarni



> -Original Message-
> From: Harald Anlauf 
> Sent: Thursday, July 11, 2024 12:53 AM
> To: Prathamesh Kulkarni ; gcc-
> patc...@gcc.gnu.org; fort...@gcc.gnu.org
> Subject: Re: Lower zeroing array assignment to memset for allocatable
> arrays
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Prathamesh,
> 
> Am 10.07.24 um 13:22 schrieb Prathamesh Kulkarni:
> > Hi,
> > The attached patch lowers zeroing array assignment to memset for
> allocatable arrays.
> >
> > For example:
> > subroutine test(z, n)
> >  implicit none
> >  integer :: n
> >  real(4), allocatable :: z(:,:,:)
> >
> >  allocate(z(n, 8192, 2048))
> >  z = 0
> > end subroutine
> >
> > results in following call to memset instead of 3 nested loops for z
> = 0:
> >  (void) __builtin_memset ((void *) z->data, 0, (unsigned long)
> > MAX_EXPR dim[0].ubound - z->dim[0].lbound, -1> + 1) *
> > (MAX_EXPR dim[1].ubound - z->dim[1].lbound, -1> + 1)) *
> (MAX_EXPR
> > dim[2].ubound - z->dim[2].lbound, -1> + 1)) * 4));
> >
> > The patch significantly improves speedup for an internal Fortran
> application on AArch64 -mcpu=grace (and potentially on other AArch64
> cores too).
> > Bootstrapped+tested on aarch64-linux-gnu.
> > Does the patch look OK to commit ?
> 
> no, it is NOT ok.
> 
> Consider:
> 
> subroutine test0 (n, z)
>implicit none
>integer :: n
>real, pointer :: z(:,:,:) ! need not be contiguous!
>z = 0
> end subroutine
> 
> After your patch this also generates a memset, but this cannot be true
> in general.  One would need to have a test on contiguity of the array
> before memset can be used.
> 
> In principle this is a nice idea, and IIRC there exists a very old PR
> on this (by Thomas König?).  So it might be worth pursuing.
Hi Harald,
Thanks for the suggestions!
The attached patch checks gfc_is_simply_contiguous(expr, true, false) before 
lowering to memset,
which avoids generating memset for your example above.

Bootstrapped+tested on aarch64-linux-gnu.
Does the attached patch look OK ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
> 
> Thanks,
> Harald
> 
> 
> > Signed-off-by: Prathamesh Kulkarni 
> >
> > Thanks,
> > Prathamesh

Lower zeroing array assignment to memset for allocatable arrays.

gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_trans_zero_assign): Handle allocatable arrays.

gcc/testsuite/ChangeLog:
* gfortran.dg/array_memset_3.f90: New test.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 477c2720187..f9a7f70b2a3 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -11515,18 +11515,24 @@ gfc_trans_zero_assign (gfc_expr * expr)
   type = TREE_TYPE (dest);
   if (POINTER_TYPE_P (type))
 type = TREE_TYPE (type);
-  if (!GFC_ARRAY_TYPE_P (type))
-return NULL_TREE;
-
-  /* Determine the length of the array.  */
-  len = GFC_TYPE_ARRAY_SIZE (type);
-  if (!len || TREE_CODE (len) != INTEGER_CST)
+  if (GFC_ARRAY_TYPE_P (type))
+{
+  /* Determine the length of the array.  */
+  len = GFC_TYPE_ARRAY_SIZE (type);
+  if (!len || TREE_CODE (len) != INTEGER_CST)
+   return NULL_TREE;
+}
+  else if (GFC_DESCRIPTOR_TYPE_P (type)
+ && gfc_is_simply_contiguous (expr, true, false))
+{
+  if (POINTER_TYPE_P (TREE_TYPE (dest)))
+   dest = build_fold_indirect_ref_loc (input_location, dest);
+  len = gfc_conv_descriptor_size (dest, GFC_TYPE_ARRAY_RANK (type));
+  dest = gfc_conv_descriptor_data_get (dest);
+}
+  else
 return NULL_TREE;
 
-  tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type));
-  len = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type, len,
-fold_convert (gfc_array_index_type, tmp));
-
   /* If we are zeroing a local array avoid taking its address by emitting
  a = {} instead.  */
   if (!POINTER_TYPE_P (TREE_TYPE (dest)))
@@ -11534,6 +11540,11 @@ gfc_trans_zero_assign (gfc_expr * expr)
   dest, build_constructor (TREE_TYPE (dest),
  NULL));
 
+  /* Multiply len by element size.  */
+  tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type));
+  len = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
+len, fold_convert (gfc_array_index_type, tmp));
+
   /* Convert arguments to the correct types.  */
   dest = fold_convert (pvoid_type_node, dest);
   len = fold_convert (size_type_node, len);
diff --git a/gcc/testsuite/gfortran.dg/array_memset_3.f90 
b/gcc/testsuite/gfortran.dg/array_memset_3.f90
new file mode 100644
index 000..753006f7a91
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/array_memset_3.f90
@@ -0,0 +1,45 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-original" }
+
+subroutine test1(n)
+  implicit none
+integer(8) :: n
+real(4), allocatable :: z(:,:,:)
+
+allocate(z(n, 100, 200))
+z = 0
+end subroutine
+
+s

[PATCH v2 2/2] arm: [MVE intrinsics] Improve vdupq_n implementation

2024-07-11 Thread Christophe Lyon

This patch makes the non-predicated vdupq_n MVE intrinsics use
vec_duplicate rather than an unspec.  This enables the compiler to
generate better code sequences (for instance using vmov when
possible).

The patch renames the existing mve_vdup pattern into
@mve_vdupq_n, and removes the now useless
@mve_q_n_f and @mve_q_n_ ones.

As a side-effect, it needs to update the mve_unpredicated_insn
predicates in @mve_q_m_n_ and
@mve_q_m_n_f.

Using vec_duplicates means the compiler is now able to use vmov in the
tests with an immediate argument in vdupq_n_[su]{8,16,32}.c:
vmov.i8 q0,#0x1

However, this is only possible when the immediate has a suitable value
(MVE encoding constraints, see imm_for_neon_mov_operand predicate).

Provided we adjust the cost computations in arm_rtx_costs_internal(),
when the immediate does not meet the vmov constraints, we now generate:
mov r0, #imm
vdup.xx q0,r0

or
ldr r0, .L4
vdup.32 q0,r0
in the f32 case (with 1.1 as immediate).

Without the cost adjustment, we would generate:
vldr.64 d0, .L4
vldr.64 d1, .L4+8
and an associated literal pool entry.

Regarding the testsuite updates:

* The signed versions of vdupq_* tests lack a version with an
immediate argument.  This patch adds them, similar to what we already
have for vdupq_n_u*.c tests.

* Code generation for different immediate values is checked with the
new tests this patch introduces.  Note there's no need for s8/u8 tests
because 8-bit immediates always comply wth imm_for_neon_mov_operand.

* We can remove xfail from vcmp*f tests since we now generate:
movw r3, #15462
vcmp.f16 eq, q0, r3
instead of the previous:
vldr.64 d6, .L5
vldr.64 d7, .L5+8
vcmp.f16 eq, q0, q3

Changes v1->v2:
* Dropped change to cost computation for Neon, and associated
  testcases updates (crypto-vsha1*)
* Updated expected regexp in vdupq_n_[su]16-2.c to account for
  different assembly comments (none for arm-none-eabi, '@ movhi' for
  arm-linux-gnueabihf)

Tested on arm-linux-gnueabihf and arm-none-eabi with no regression.

2024-07-02  Jolen Li  
Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vdupq_impl): New class.
(vdupq): Use new implementation.
* config/arm/arm.cc (arm_rtx_costs_internal): Handle HFmode
for COST_DOUBLE. Update consting for CONST_VECTOR.
* config/arm/arm_mve_builtins.def: Merge vdupq_n_f, vdupq_n_s
and vdupq_n_u into vdupq_n.
* config/arm/mve.md (mve_vdup): Rename into ...
(@mve_vdup_n): ... this.
(@mve_q_n_f): Delete.
(@mve_q_n_): Delete..
(@mve_q_m_n_): Update mve_unpredicated_insn
attribute.
(@mve_q_m_n_f): Likewise.

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/vdupq_n_u8.c (foo1): Update
expected code.
* gcc.target/arm/mve/intrinsics/vdupq_n_u16.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_u32.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_s8.c: Add test with
immediate argument.
* gcc.target/arm/mve/intrinsics/vdupq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_f16.c (foo1): Update
expected code.
* gcc.target/arm/mve/intrinsics/vdupq_n_f32.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c: Add test with
immediate argument.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_f32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_s16-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_s32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_u16-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_u32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: Remove xfail.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrin

[PATCH v2 1/2] arm: [MVE intrinsics] fix vdup iterator

2024-07-11 Thread Christophe Lyon

This patch fixes a bug where the mode iterator for mve_vdup
should be MVE_VLD_ST instead of MVE_vecs: V2DI and V2DF (thus vdup.64)
are not supported by MVE.

2024-07-02  Jolen Li  
Christophe Lyon  

gcc/
* config/arm/mve.md (mve_vdup): Fix mode iterator.
---
 gcc/config/arm/mve.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 4b4d6298ffb..afe5fba698c 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -95,8 +95,8 @@ (define_insn "mve_mov"
(set_attr "neg_pool_range" "*,*,*,*,996,*,*,*")])
 
 (define_insn "mve_vdup"
-  [(set (match_operand:MVE_vecs 0 "s_register_operand" "=w")
-   (vec_duplicate:MVE_vecs
+  [(set (match_operand:MVE_VLD_ST 0 "s_register_operand" "=w")
+   (vec_duplicate:MVE_VLD_ST
  (match_operand: 1 "s_register_operand" "r")))]
   "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
   "vdup.\t%q0, %1"
-- 
2.34.1

Re: [PATCH v2 1/3] c++: Introduce USING_DECLs for non-function usings [PR114683]

2024-07-11 Thread Nathaniel Shead

On Tue, Jul 09, 2024 at 05:43:59PM -0400, Jason Merrill wrote:
> On 7/9/24 9:44 AM, Nathaniel Shead wrote:
> > On Mon, Jul 08, 2024 at 12:26:41PM -0400, Jason Merrill wrote:
> > > For a using-decl in the same scope as the original decl, won't this 
> > > replace
> > > it so only the using-decl is visible to lookup?  I had expected to omit 
> > > the
> > > USING_DECL in that case.
> > 
> > Yup it will; I think I'd originally done that so that more recent
> > (re-)declaration would be the one referred to by diagnostics, but on
> > retrospect that seems unhelpful; fixed.  (Though need to keep the
> > replacement for CONST_DECLs, because the modules handling otherwise only
> > handles them in the context of their containing enumeration type, which
> > isn't what we want here; I've added a new test for this as well.)
> 
> Ah, using-25, sure.  I would think we could still tell the difference by
> comparing PURVIEW/EXPORT on the CONST_DECL to those of its type?
> 
> Or perhaps have add_binding_entity skip implicitly inserted enumerators, and
> instead insert them again when reading the enum, which should also save a
> bit of space.
> 
> Jason
> 

So maybe something like the following?

Bootstrapped and regtested on x86_64-pc-linux-gnu, can be applied either
incrementally on the previous patch or separately as you prefer.

-- >8 --

Subject: [PATCH] c++/modules: Avoid unnecessary wrapping for CONST_DECLs

Enumerators are only written when writing the type definition (at which
point they are all written); this will happen regardless of scoped vs
unscoped or whether the enum is explicitly exported.  All other cases
where an enumerator needs to be written (e.g. template parameters) they
are just a backreference to the type decl and the name of the value.

'add_binding_entity' needs to explicitly write the names of unscoped
enumerators so that lazy loading will trigger when the name is found by
name lookup; it does this by pretending that the enum declarations are
always usings so that it doesn't double-write definitions.  By also
checking if the enumerator was marked purview/exported we can use that
to override a non-purview/non-exported TYPE_DECL and ensure it's made
visible regardless.

When reading we should get the exported flag on the enumeration
constant, and so should properly create a binding for it.  We don't need
to do anything to handle importedness as that checking is skipped for
EK_USINGs.

Some other places assume that module information for a CONST_DECL
inherits module information from its containing type.  This includes:

- get_originating_module_decl, for determining if the name was imported
  or has module attachment; I don't /think/ this change should affect
  that, so I'm leaving this untouched.

- binding_cmp, for sorting by exportedness; since now an enumerator
  could be exported without the containing decl being exported, we need
  to handle this here too.

With all this in mind, we can avoid creating a new USING_DECL for a
same-scope using that reveals a CONST_DECL by ensuring that we
special-case CONST_DECLs with purview/exported flags appropriately.

gcc/cp/ChangeLog:

* module.cc (depset::hash::add_binding_entity): Handle
CONST_DECLs with different purview/exported from their enum.
(binding_cmp): Likewise.
(set_instantiating_module): Support CONST_DECLs.
* name-lookup.cc (do_nonmember_using_decl): Don't special-case
CONST_DECLs.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc  | 19 +--
 gcc/cp/name-lookup.cc |  6 ++
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 7187d251d1d..d385b422168 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -13129,7 +13129,10 @@ depset::hash::add_binding_entity (tree decl, WMB_Flags 
flags, void *data_)
   tree inner = decl;

   if (TREE_CODE (inner) == CONST_DECL
- && TREE_CODE (DECL_CONTEXT (inner)) == ENUMERAL_TYPE)
+ && TREE_CODE (DECL_CONTEXT (inner)) == ENUMERAL_TYPE
+ /* A using-decl could make a CONST_DECL purview for a non-purview
+enumeration.  */
+ && (!DECL_LANG_SPECIFIC (inner) || !DECL_MODULE_PURVIEW_P (inner)))
inner = TYPE_NAME (DECL_CONTEXT (inner));
   else if (TREE_CODE (inner) == TEMPLATE_DECL)
inner = DECL_TEMPLATE_RESULT (inner);
@@ -13164,7 +13167,10 @@ depset::hash::add_binding_entity (tree decl, WMB_Flags 
flags, void *data_)
  gcc_checking_assert (TREE_CODE (decl) == CONST_DECL);

  flags = WMB_Flags (flags | WMB_Using);
- if (DECL_MODULE_EXPORT_P (TYPE_NAME (TREE_TYPE (decl
+ if (DECL_MODULE_EXPORT_P (TYPE_NAME (TREE_TYPE (decl)))
+ /* A using-decl can make an enum constant exported for a
+non-exported enumeration.  */
+ || (DECL_LANG_SPECIFIC (decl) && DECL_MODULE_EXPORT_P (decl)))
flags = WMB_Flags (flags | WMB_Export);
}

Re: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark

2024-07-11 Thread juzhe.zh...@rivai.ai


LGTM


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-07-11 16:29
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark
From: Pan Li 
 
This patch would like to add the test cases for the vector .SAT_SUB in
the zip benchmark.  Aka:
 
Form in zip benchmark:
  #define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \
  void __attribute__((noinline))\
  vec_sat_u_sub_##T1##_##T2##_fmt_zip (T1 *x, T2 b, unsigned limit) \
  { \
T2 a;   \
T1 *p = x;  \
do {\
  a = *--p; \
  *p = (T1)(a >= b ? a - b : 0);\
} while (--limit);  \
  }
 
DEF_VEC_SAT_U_SUB_ZIP(uint8_t, uint16_t)
 
vec_sat_u_sub_uint16_t_uint32_t_fmt_zip:
  ...
  vsetvli   a4,zero,e32,m1,ta,ma
  vmv.v.x   v6,a1
  vsetvli   zero,zero,e16,mf2,ta,ma
  vid.v v2
  lia4,-1
  vnclipu.wiv6,v6,0   // .SAT_TRUNC
.L3:
  vle16.v   v3,0(a3)
  vrsub.vx  v5,v2,a6
  mva7,a4
  addw  a4,a4,t3
  vrgather.vv   v1,v3,v5
  vssubu.vv v1,v1,v6  // .SAT_SUB
  vrgather.vv   v3,v1,v5
  vse16.v   v3,0(a3)
  sub   a3,a3,t1
  bgtu  t4,a4,.L3
 
Passed the rv64gcv tests.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: Add test
data for .SAT_SUB in zip benchmark.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/binop/vec_sat_arith.h   | 18 +
.../rvv/autovec/binop/vec_sat_binary_vx.h | 22 +
.../riscv/rvv/autovec/binop/vec_sat_data.h| 81 +++
.../rvv/autovec/binop/vec_sat_u_sub_zip-run.c | 16 
.../rvv/autovec/binop/vec_sat_u_sub_zip.c | 18 +
5 files changed, 155 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 10459807b2c..416a1e49a47 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -322,6 +322,19 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 } \
}
+#define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \
+void __attribute__((noinline))\
+vec_sat_u_sub_##T1##_##T2##_fmt_zip (T1 *x, T2 b, unsigned limit) \
+{ \
+  T2 a;   \
+  T1 *p = x;  \
+  do {\
+a = *--p; \
+*p = (T1)(a >= b ? a - b : 0);\
+  } while (--limit);  \
+}
+#define DEF_VEC_SAT_U_SUB_ZIP_WRAP(T1, T2) DEF_VEC_SAT_U_SUB_ZIP(T1, T2)
+
#define RUN_VEC_SAT_U_SUB_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_1(out, op_1, op_2, N)
@@ -352,6 +365,11 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, 
unsigned limit) \
#define RUN_VEC_SAT_U_SUB_FMT_10(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_10(out, op_1, op_2, N)
+#define RUN_VEC_SAT_U_SUB_FMT_ZIP(T1, T2, x, b, N) \
+  vec_sat_u_sub_##T1##_##T2##_fmt_zip(x, b, N)
+#define RUN_VEC_SAT_U_SUB_FMT_ZIP_WRAP(T1, T2, x, b, N) \
+  RUN_VEC_SAT_U_SUB_FMT_ZIP(T1, T2, x, b, N) \
+
/**/
/* Saturation Sub Truncated (Unsigned and Signed) */
/**/
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
new file mode 100644
index 000..d238c6392de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
@@ -0,0

RE: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark

2024-07-11 Thread Li, Pan2

Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, July 11, 2024 6:32 PM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; jeffreyalaw ; 
Robin Dapp ; Li, Pan2 
Subject: Re: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip 
benchmark


LGTM

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2024-07-11 16:29
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; 
jeffreyalaw; 
rdapp.gcc; Pan Li
Subject: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to add the test cases for the vector .SAT_SUB in
the zip benchmark.  Aka:

Form in zip benchmark:
  #define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \
  void __attribute__((noinline))\
  vec_sat_u_sub_##T1##_##T2##_fmt_zip (T1 *x, T2 b, unsigned limit) \
  { \
T2 a;   \
T1 *p = x;  \
do {\
  a = *--p; \
  *p = (T1)(a >= b ? a - b : 0);\
} while (--limit);  \
  }

DEF_VEC_SAT_U_SUB_ZIP(uint8_t, uint16_t)

vec_sat_u_sub_uint16_t_uint32_t_fmt_zip:
  ...
  vsetvli   a4,zero,e32,m1,ta,ma
  vmv.v.x   v6,a1
  vsetvli   zero,zero,e16,mf2,ta,ma
  vid.v v2
  lia4,-1
  vnclipu.wiv6,v6,0   // .SAT_TRUNC
.L3:
  vle16.v   v3,0(a3)
  vrsub.vx  v5,v2,a6
  mva7,a4
  addw  a4,a4,t3
  vrgather.vv   v1,v3,v5
  vssubu.vv v1,v1,v6  // .SAT_SUB
  vrgather.vv   v3,v1,v5
  vse16.v   v3,0(a3)
  sub   a3,a3,t1
  bgtu  t4,a4,.L3

Passed the rv64gcv tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: Add test
data for .SAT_SUB in zip benchmark.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
.../riscv/rvv/autovec/binop/vec_sat_arith.h   | 18 +
.../rvv/autovec/binop/vec_sat_binary_vx.h | 22 +
.../riscv/rvv/autovec/binop/vec_sat_data.h| 81 +++
.../rvv/autovec/binop/vec_sat_u_sub_zip-run.c | 16 
.../rvv/autovec/binop/vec_sat_u_sub_zip.c | 18 +
5 files changed, 155 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 10459807b2c..416a1e49a47 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -322,6 +322,19 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 } \
}
+#define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \
+void __attribute__((noinline))\
+vec_sat_u_sub_##T1##_##T2##_fmt_zip (T1 *x, T2 b, unsigned limit) \
+{ \
+  T2 a;   \
+  T1 *p = x;  \
+  do {\
+a = *--p; \
+*p = (T1)(a >= b ? a - b : 0);\
+  } while (--limit);  \
+}
+#define DEF_VEC_SAT_U_SUB_ZIP_WRAP(T1, T2) DEF_VEC_SAT_U_SUB_ZIP(T1, T2)
+
#define RUN_VEC_SAT_U_SUB_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_1(out, op_1, op_2, N)
@@ -352,6 +365,11 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, 
unsigned limit) \
#define RUN_VEC_SAT_U_SUB_FMT_10(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_10(out, op_1, op_2, N)
+#define RUN_VEC_SAT_U_SUB_FMT_ZIP(T1, T2, x, b, N) \
+  vec_sat_u_sub_##T1##_##T2##_fmt_zip(x, b, N)
+#define RUN_VEC_SAT_U_SUB_FMT_ZIP_WRAP(T1, T2, x, b, N) \
+  RUN_VEC_SAT_U_SUB_FMT_Z

[patch,avr,applied] Tidy up subtract + zero_extend insns

2024-07-11 Thread Georg-Johann Lay


There are currently five insns and five splits that handle
subtraction where the subtrahend is zero-extended to the mode
of the minuend.

This patch represents them as one insn (and one split) using
mode iterators.

Applied as obvious.

Johann

--

AVR: Tidy up subtract-and-zero_extend insns.

There are these insns that subtract and zero-extend where
the subtrahend is zero-extended to the mode of the minuend.
This patch uses one insn (and split) with mode iterators
instead of spelling out each variant individually.
This has the additional benefit that u32 - u24 is also supported,
which previously wasn't.

gcc/
* config/avr/avr-protos.h (avr_out_minus): New prototype.
* config/avr/avr.cc (avr_out_minus): New function.
* config/avr/avr.md (*sub3.zero_extend.)
(*sub3.zero_extend._split): New insns.
(*subpsi3_zero_extend.qi_split): Remove isns_and_split.
(*subpsi3_zero_extend.hi_split): Remove insn_and_split.
(*subhi3_zero_extend1_split): Remove insn_and_split.
(*subsi3_zero_extend_split): Remove insn_and_split.
(*subsi3_zero_extend.hi_split): Remove insn_and_split.
(*subpsi3_zero_extend.qi): Remove insn.
(*subpsi3_zero_extend.hi): Remove insn.
(*subhi3_zero_extend1): Remove insn.
(*subsi3_zero_extend): Remove insn.
(*subsi3_zero_extend.hi): Remove insn.
gcc/testsuite/
* gcc.target/avr/torture/sub-zerox.c: New test.diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h
index dc23cfbf461..6e02161759c 100644
--- a/gcc/config/avr/avr-protos.h
+++ b/gcc/config/avr/avr-protos.h
@@ -95,6 +95,7 @@ extern void avr_output_addr_vec (rtx_insn*, rtx);
 extern const char *avr_out_sbxx_branch (rtx_insn *insn, rtx operands[]);
 extern const char* avr_out_bitop (rtx, rtx*, int*);
 extern const char* avr_out_plus (rtx, rtx*, int* =NULL, bool =true);
+extern const char* avr_out_minus (rtx*);
 extern const char* avr_out_round (rtx_insn *, rtx*, int* =NULL);
 extern const char* avr_out_addto_sp (rtx*, int*);
 extern const char* avr_out_xload (rtx_insn *, rtx*, int*);
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index d299fceb782..4a7cbd0e7bc 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -8843,6 +8843,36 @@ lshrsi3_out (rtx_insn *insn, rtx operands[], int *len)
 }
 
 
+/* Output subtraction of integer registers XOP[0] and XOP[2] and return ""
+
+  XOP[0] = XOP[0] - XOP[2]
+
+   where the mode of XOP[0] is in { HI, PSI, SI }, and the mode of
+   XOP[2] is in { QI, HI, PSI }.  When the mode of XOP[0] is larger
+   than the mode of XOP[2], then the latter is zero-extended on the fly.
+   The number of instructions will be the mode size of XOP[0].  */
+
+const char *
+avr_out_minus (rtx *xop)
+{
+  int n_bytes0 = GET_MODE_SIZE (GET_MODE (xop[0]));
+  int n_bytes2 = GET_MODE_SIZE (GET_MODE (xop[2]));
+
+  output_asm_insn ("sub %0,%2", xop);
+
+  for (int i = 1; i < n_bytes0; ++i)
+{
+  rtx op[2];
+  op[0] = all_regs_rtx[i + REGNO (xop[0])];
+  op[1] = (i < n_bytes2) ? all_regs_rtx[i + REGNO (xop[2])] : zero_reg_rtx;
+
+  output_asm_insn ("sbc %0,%1", op);
+}
+
+  return "";
+}
+
+
 /* Output addition of register XOP[0] and compile time constant XOP[2].
INSN is a single_set insn or an insn pattern.
CODE == PLUS:  perform addition by using ADD instructions or
@@ -12717,7 +12747,7 @@ avr_rtx_costs_1 (rtx x, machine_mode mode, int outer_code,
 	  *total = COSTS_N_INSNS (2);
 	  return true;
 	}
-  // *sub3_zero_extend1
+  // *sub3.zero_extend.
   if (REG_P (XEXP (x, 0))
 	  && GET_CODE (XEXP (x, 1)) == ZERO_EXTEND)
 	{
diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index 2783b8c986f..8c3e55a91ee 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -2030,47 +2030,6 @@ (define_insn "*subpsi3"
   "sub %0,%2\;sbc %B0,%B2\;sbc %C0,%C2"
   [(set_attr "length" "3")])
 
-(define_insn_and_split "*subpsi3_zero_extend.qi_split"
-  [(set (match_operand:PSI 0 "register_operand"   "=r")
-(minus:PSI (match_operand:SI 1 "register_operand"  "0")
-   (zero_extend:PSI (match_operand:QI 2 "register_operand" "r"]
-  ""
-  "#"
-  "&& reload_completed"
-  [(parallel [(set (match_dup 0)
-   (minus:PSI (match_dup 1)
-  (zero_extend:PSI (match_dup 2
-  (clobber (reg:CC REG_CC))])])
-
-(define_insn "*subpsi3_zero_extend.qi"
-  [(set (match_operand:PSI 0 "register_operand"   "=r")
-(minus:PSI (match_operand:SI 1 "register_operand"  "0")
-   (zero_extend:PSI (match_operand:QI 2 "register_operand" "r"
-   (clobber (reg:CC REG_CC))]
-  "reload_completed"
-  "sub %A0,%2\;sbc %B0,__zero_reg__\;sbc %C0,__zero_reg__"
-  [(set_attr "length" "3")])
-
-(define_insn_and_split "*subpsi3_zero_extend.hi_split"
-  [(set (match_operand:PSI 0 "register

[PATCH V2] rs6000: Don't pass -many to the assembler [PR112868]

2024-07-11 Thread jeevitha

Hi All,

The following patch has been bootstrapped and regtested with default 
configuration
[--enable-checking=yes] and with --enable-checking=release on powerpc64le-linux.

This patch removes passing the -many assembler option for release builds. Now,
GCC no longer passes -many under any conditions to the assembler.

This patch exposes the issue with target_powerpc_ppu_ok, which makes a few
test cases unsupported. Those changes will be in another patch.

2024-07-11  Jeevitha Palanisamy  

gcc/
PR target/112868
* config/rs6000/rs6000.h (ASM_OPT_ANY): Removed Define.
(ASM_CPU_SPEC): Remove ASM_OPT_ANY usage.

diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 9211f91740a..a5bd8e461a0 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -94,12 +94,6 @@
   "%{mdejagnu-*: %

Re: [PATCH V2] rs6000: Don't pass -many to the assembler [PR112868]

2024-07-11 Thread Sam James

jeevitha  writes:

> Hi All,
>
> The following patch has been bootstrapped and regtested with default 
> configuration
> [--enable-checking=yes] and with --enable-checking=release on 
> powerpc64le-linux.
>
> This patch removes passing the -many assembler option for release builds. Now,
> GCC no longer passes -many under any conditions to the assembler.
>
> This patch exposes the issue with target_powerpc_ppu_ok, which makes a few
> test cases unsupported. Those changes will be in another patch.

For our part, I think we really need PR113652 fixed first, or it'll end up
regressing builds w/ -mcpu=7450. Other than that, we hit no issues in
our testing downstream in Gentoo.

>
> 2024-07-11  Jeevitha Palanisamy  
>
> gcc/
>   PR target/112868
>   * config/rs6000/rs6000.h (ASM_OPT_ANY): Removed Define.
>   (ASM_CPU_SPEC): Remove ASM_OPT_ANY usage.
>
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 9211f91740a..a5bd8e461a0 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -94,12 +94,6 @@
>"%{mdejagnu-*: % SUBTARGET_DRIVER_SELF_SPECS
>  
> -#if CHECKING_P
> -#define ASM_OPT_ANY ""
> -#else
> -#define ASM_OPT_ANY " -many"
> -#endif
> -
>  /* Common ASM definitions used by ASM_SPEC among the various targets for
> handling -mcpu=xxx switches.  There is a parallel list in 
> driver-rs6000.cc to
> provide the default assembler options if the user uses -mcpu=native, so if
> @@ -166,8 +160,7 @@
>   mvsx: -mpower7; \
>   mpowerpc64: -mppc64;: %(asm_default)}; \
>:%eMissing -mcpu option in ASM_CPU_SPEC?\n} \
> -%{mvsx: -mvsx -maltivec; maltivec: -maltivec}" \
> -ASM_OPT_ANY
> +%{mvsx: -mvsx -maltivec; maltivec: -maltivec}"
>  
>  #define CPP_DEFAULT_SPEC ""
>  


signature.asc
Description: PGP signature

RE: [PATCH][ivopts]: perform affine fold on unsigned addressing modes known not to overflow. [PR114932]

2024-07-11 Thread Richard Biener

On Wed, 10 Jul 2024, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, June 20, 2024 8:55 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; bin.ch...@linux.alibaba.com
> > Subject: RE: [PATCH][ivopts]: perform affine fold on unsigned addressing 
> > modes
> > known not to overflow. [PR114932]
> > 
> > On Wed, 19 Jun 2024, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Wednesday, June 19, 2024 1:14 PM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; nd ;
> > bin.ch...@linux.alibaba.com
> > > > Subject: Re: [PATCH][ivopts]: perform affine fold on unsigned addressing
> > modes
> > > > known not to overflow. [PR114932]
> > > >
> > > > On Fri, 14 Jun 2024, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > When the patch for PR114074 was applied we saw a good boost in
> > exchange2.
> > > > >
> > > > > This boost was partially caused by a simplification of the addressing 
> > > > > modes.
> > > > > With the patch applied IV opts saw the following form for the base
> > addressing;
> > > > >
> > > > >   Base: (integer(kind=4) *) &block + ((sizetype) ((unsigned long) 
> > > > > l0_19(D) *
> > > > > 324) + 36)
> > > > >
> > > > > vs what we normally get:
> > > > >
> > > > >   Base: (integer(kind=4) *) &block + ((sizetype) ((integer(kind=8)) 
> > > > > l0_19(D)
> > > > > * 81) + 9) * 4
> > > > >
> > > > > This is because the patch promoted multiplies where one operand is a
> > constant
> > > > > from a signed multiply to an unsigned one, to attempt to fold away the
> > constant.
> > > > >
> > > > > This patch attempts the same but due to the various problems with 
> > > > > SCEV and
> > > > > niters not being able to analyze the resulting forms (i.e. PR114322) 
> > > > > we can't
> > > > > do it during SCEV or in the general form like in fold-const like 
> > > > > extract_muldiv
> > > > > attempts.
> > > > >
> > > > > Instead this applies the simplification during IVopts initialization 
> > > > > when we
> > > > > create the IV.  Essentially when we know the IV won't overflow with 
> > > > > regards
> > to
> > > > > niters then we perform an affine fold which gets it to simplify the 
> > > > > internal
> > > > > computation, even if this is signed because we know that for IVOPTs 
> > > > > uses the
> > > > > IV won't ever overflow.  This allows IV opts to see the simplified 
> > > > > form
> > > > > without influencing the rest of the compiler.
> > > > >
> > > > > as mentioned in PR114074 it would be good to fix the missed 
> > > > > optimization in
> > the
> > > > > other passes so we can perform this in general.
> > > > >
> > > > > The reason this has a big impact on fortran code is that fortran 
> > > > > doesn't seem
> > to
> > > > > have unsigned integer types.  As such all it's addressing are created 
> > > > > with
> > > > > signed types and folding does not happen on them due to the possible
> > overflow.
> > > > >
> > > > > concretely on AArch64 this changes the results from generation:
> > > > >
> > > > > mov x27, -108
> > > > > mov x24, -72
> > > > > mov x23, -36
> > > > > add x21, x1, x0, lsl 2
> > > > > add x19, x20, x22
> > > > > .L5:
> > > > > add x0, x22, x19
> > > > > add x19, x19, 324
> > > > > ldr d1, [x0, x27]
> > > > > add v1.2s, v1.2s, v15.2s
> > > > > str d1, [x20, 216]
> > > > > ldr d0, [x0, x24]
> > > > > add v0.2s, v0.2s, v15.2s
> > > > > str d0, [x20, 252]
> > > > > ldr d31, [x0, x23]
> > > > > add v31.2s, v31.2s, v15.2s
> > > > > str d31, [x20, 288]
> > > > > bl  digits_20_
> > > > > cmp x21, x19
> > > > > bne .L5
> > > > >
> > > > > into:
> > > > >
> > > > > .L5:
> > > > > ldr d1, [x19, -108]
> > > > > add v1.2s, v1.2s, v15.2s
> > > > > str d1, [x20, 216]
> > > > > ldr d0, [x19, -72]
> > > > > add v0.2s, v0.2s, v15.2s
> > > > > str d0, [x20, 252]
> > > > > ldr d31, [x19, -36]
> > > > > add x19, x19, 324
> > > > > add v31.2s, v31.2s, v15.2s
> > > > > str d31, [x20, 288]
> > > > > bl  digits_20_
> > > > > cmp x21, x19
> > > > > bne .L5
> > > > >
> > > > > The two patches together results in a 10% performance increase in 
> > > > > exchange2
> > in
> > > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > > > compile
> > > > > time. There's also a 5% performance improvement in fotonik3d and 
> > > > > similar
> > > > > reduction in binary size.
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > >
> > > > > Ok for master?
> > > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/

[PATCH] LoongArch: Add support to annotate tablejump

2024-07-11 Thread Xi Ruoyao

This is per the request from the kernel developers.  For generating the
ORC unwind info, the objtool program needs to analysis the control flow
of a .o file.  If a jump table is used, objtool has to correlate the
jump instruction with the table.

On x86 (where objtool was initially developed) it's simple: a relocation
entry natrually correlates them because one single instruction is used
for table-based jump.  But on an RISC machine objtool would have to
reconstruct the data flow if it must find out the correlation on its
own.

So, emit an additional section to store the correlation info as pairs of
addresses, each pair contains the address of a jump instruction (jr) and
the address of the jump table.  This is very trivial to implement in
GCC.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in
(mannotate-tablejump): New option.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch.md (tablejump): Emit
additional correlation info between the jump instruction and the
jump table, if -mannotate-tablejump.
* doc/invoke.texi: Document -mannotate-tablejump.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/jump-table-annotate.c: New test.

Suggested-by: Tiezhu Yang 
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/genopts/loongarch.opt.in |  4 
 gcc/config/loongarch/loongarch.md | 12 +++-
 gcc/config/loongarch/loongarch.opt|  4 
 gcc/doc/invoke.texi   | 13 -
 .../gcc.target/loongarch/jump-table-annotate.c| 15 +++
 5 files changed, 46 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/jump-table-annotate.c

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index d00950cb4f4..d5bbf01d85e 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -301,3 +301,7 @@ default value is 4.
 ; CPUCFG independently, so we use bit flags to specify them.
 TargetVariable
 HOST_WIDE_INT la_isa_evolution = 0
+
+mannotate-tablejump
+Target Mask(ANNOTATE_TABLEJUMP) Save
+Annotate table jump instruction (jr {reg}) to correlate it with the jump table.
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index b3cae49832e..6d9fdc257f8 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -3548,12 +3548,22 @@ (define_expand "tablejump"
   DONE;
 })
 
+(define_mode_attr mode_size [(DI "8") (SI "4")])
+
 (define_insn "@tablejump"
   [(set (pc)
(match_operand:P 0 "register_operand" "e"))
(use (label_ref (match_operand 1 "" "")))]
   ""
-  "jr\t%0"
+  {
+return TARGET_ANNOTATE_TABLEJUMP
+  ? "1:jr\t%0\n\t"
+   ".pushsection\t.discard.tablejump_annotate\n\t"
+   "\t.byte\t1b\n\t"
+   "\t.byte\t%1\n\t"
+   ".popsection"
+  : "jr\t%0";
+  }
   [(set_attr "type" "jump")
(set_attr "mode" "none")])
 
diff --git a/gcc/config/loongarch/loongarch.opt 
b/gcc/config/loongarch/loongarch.opt
index 91cb5236ad8..6a396b539c4 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -310,6 +310,10 @@ default value is 4.
 TargetVariable
 HOST_WIDE_INT la_isa_evolution = 0
 
+mannotate-tablejump
+Target Mask(ANNOTATE_TABLEJUMP) Save
+Annotate table jump instruction (jr {reg}) to correlate it with the jump table
+
 mfrecipe
 Target Mask(ISA_FRECIPE) Var(la_isa_evolution)
 Support frecipe.{s/d} and frsqrte.{s/d} instructions.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4d671c4f6d8..f27d2d6bb87 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1065,7 +1065,7 @@ Objective-C and Objective-C++ Dialects}.
 -mcmodel=@var{code-model} -mrelax -mpass-mrelax-to-as
 -mrecip  -mrecip=@var{opt} -mfrecipe -mno-frecipe -mdiv32 -mno-div32
 -mlam-bh -mno-lam-bh -mlamcas -mno-lamcas -mld-seq-sa -mno-ld-seq-sa
--mtls-dialect=@var{opt}}
+-mtls-dialect=@var{opt} -mannotate-tablejump -mno-annotate-tablejump}
 
 @emph{M32R/D Options}
 @gccoptlist{-m32r2  -m32rx  -m32r
@@ -27352,6 +27352,17 @@ Whether a load-load barrier (@code{dbar 0x700}) is 
needed.  When build with
 This option controls which tls dialect may be used for general dynamic and
 local dynamic TLS models.
 
+@opindex mannotate-tablejump
+@opindex mno-annotate-tablejump
+@item -mannotate-tablejump
+@itemx -mno-annotate-tablejump
+Create an annotation section @code{.discard.tablejump_annotate} to
+correlate the @code{jirl} instruction and the jump table when a jump
+table is used to optimize the @code{switch} statement.  Some external
+tools, for example @file{objtool} of the Linux kernel building system,
+need the annotation to analysis the control flow.  The default is
+@option{-mno-annotate-tablejump}.
+
 @table @samp
 @item trad
 Use traditional TLS. This

[PATCH] LoongArch: Implement scalar isinf, isnormal, and isfinite via fclass

2024-07-11 Thread Xi Ruoyao

Doing so can avoid loading FP constants from the memory.  It also
partially fixes PR 66462 as fclass does not signal on sNaN.

gcc/ChangeLog:

* config/loongarch/loongarch.md (extendsidi2): Add ("=r", "f")
alternative and use movfr2gr.s for it.  The spec clearly states
movfr2gr.s sign extends the value to GRLEN.
(fclass_): Make the result SImode instead of a floating
mode.  The fclass results are really not FP values.
(FCLASS_MASK): New define_int_iterator.
(fclass_optab): New define_int_attr.
(): New define_expand
template.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/fclass-compile.c: New test.
* gcc.target/loongarch/fclass-run.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  There are two
regressions: range-sincos.c and vrp-float-abs-1.c but they shall be
fixed by
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656937.html.

There is a redundant "andi" in the code generation for the test case:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656764.html.

I suppose the fix of this redundant "andi" is using word_mode instead
of SImode for operand 0, but it does not work as at now:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656772.html.

Ok for trunk (now, or after the fix for range-sincos.c and
vrp-float-abs-1.c are committed)?  IMO the redundant "andi" can be fixed
later.

 gcc/config/loongarch/loongarch.md | 53 ---
 .../gcc.target/loongarch/fclass-compile.c | 20 +++
 .../gcc.target/loongarch/fclass-run.c | 53 +++
 3 files changed, 119 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/fclass-compile.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/fclass-run.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index e4434c3bd4e..b3cae49832e 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1829,16 +1829,17 @@ (define_insn "*zero_extendhi_truncqi"
 ;;  
 
 (define_insn "extendsidi2"
-  [(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r,r,r")
(sign_extend:DI
-   (match_operand:SI 1 "nonimmediate_operand" "r,ZC,m,k")))]
+   (match_operand:SI 1 "nonimmediate_operand" "r,ZC,m,k,f")))]
   "TARGET_64BIT"
   "@
slli.w\t%0,%1,0
ldptr.w\t%0,%1
ld.w\t%0,%1
-   ldx.w\t%0,%1"
-  [(set_attr "move_type" "sll0,load,load,load")
+   ldx.w\t%0,%1
+   movfr2gr.s\t%0,%1"
+  [(set_attr "move_type" "sll0,load,load,load,mftg")
(set_attr "mode" "DI")])
 
 (define_insn "extend2"
@@ -4162,14 +4163,52 @@ (define_insn "loongarch_movgr2fcsr"
   "movgr2fcsr\t$r%0,%1")
 
 (define_insn "fclass_"
-  [(set (match_operand:ANYF 0 "register_operand" "=f")
-   (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")]
- UNSPEC_FCLASS))]
+  [(set (match_operand:SI 0 "register_operand" "=f")
+   (unspec:SI [(match_operand:ANYF 1 "register_operand" "f")]
+  UNSPEC_FCLASS))]
   "TARGET_HARD_FLOAT"
   "fclass.\t%0,%1"
   [(set_attr "type" "unknown")
(set_attr "mode" "")])
 
+(define_int_iterator FCLASS_MASK [68 136 952])
+(define_int_attr fclass_optab
+  [(68 "isinf")
+   (136"isnormal")
+   (952"isfinite")])
+
+(define_expand "2"
+  [(match_operand:SI   0 "register_operand" "=r")
+   (match_operand:ANYF 1 "register_operand" " f")
+   (const_int FCLASS_MASK)]
+  "TARGET_HARD_FLOAT"
+  {
+rtx ft0 = gen_reg_rtx (SImode);
+rtx t0 = gen_reg_rtx (word_mode);
+rtx mask = GEN_INT ();
+
+emit_insn (gen_fclass_ (ft0, operands[1]));
+
+if (TARGET_64BIT)
+  emit_insn (gen_extend_insn (t0, ft0, DImode, SImode, 0));
+else
+  emit_move_insn (t0, ft0);
+
+emit_move_insn (t0, gen_rtx_AND (word_mode, t0, mask));
+emit_move_insn (t0, gen_rtx_NE (word_mode, t0, const0_rtx));
+
+if (TARGET_64BIT)
+  {
+   t0 = lowpart_subreg (SImode, t0, DImode);
+   SUBREG_PROMOTED_VAR_P (t0) = 1;
+   SUBREG_PROMOTED_SET (t0, SRP_SIGNED);
+  }
+
+emit_move_insn (operands[0], t0);
+
+DONE;
+  })
+
 (define_insn "bytepick_w_"
   [(set (match_operand:SI 0 "register_operand" "=r")
(ior:SI (lshiftrt (match_operand:SI 1 "register_operand" "r")
diff --git a/gcc/testsuite/gcc.target/loongarch/fclass-compile.c 
b/gcc/testsuite/gcc.target/loongarch/fclass-compile.c
new file mode 100644
index 000..9c24d6e263c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/fclass-compile.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mfpu=64 -mabi=lp64d" } */
+/* { dg-final { scan-assembler-times "fclass\\.s" 1 } } */
+/* { dg-final { scan-assembler-times "fclass\\.d" 1 } } */
+
+__attribute__ ((noipa)) int
+test_fclass_f (float f)
+{
+  return __builtin_isinf (f)
+| __builtin_isno

Re: [Fortran, Patch, PR 96992, V4] Fix Class arrays of different ranks are rejected as storage association argument

2024-07-11 Thread Richard Biener

On Thu, Jul 11, 2024 at 11:24 AM Andre Vehreschild  wrote:
>
> Hi Richard,
>
> would that be sufficient? Bootstrap is still running for me...

Yes.

Richard.

> From c30c2cf829a094ba5e4c2c31333bed6e8c0d32af Mon Sep 17 00:00:00 2001
> From: Andre Vehreschild 
> Date: Thu, 11 Jul 2024 11:21:04 +0200
> Subject: [PATCH] [Fortran] Fix bootstrap broken by gcc-15-1965-ge4f2f46e015
>
> gcc/fortran/ChangeLog:
>
> * trans-array.cc (gfc_conv_array_parameter): Init variable to
> NULL_TREE to fix bootstrap.
> ---
>  gcc/fortran/trans-array.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
> index 0fffa07495c..5558ab69969 100644
> --- a/gcc/fortran/trans-array.cc
> +++ b/gcc/fortran/trans-array.cc
> @@ -8750,7 +8750,7 @@ gfc_conv_array_parameter (gfc_se *se, gfc_expr *expr, 
> bool g77,
>tree stmt;
>tree parent = DECL_CONTEXT (current_function_decl);
>tree ctree;
> -  tree pack_attr;
> +  tree pack_attr = NULL_TREE; /* Set when packing class arrays.  */
>bool full_array_var;
>bool this_array_result;
>bool contiguous;
> --
> 2.45.2
>
> Sorry for the breakage.
>
> Regards,
> Andre
>
> On Thu, 11 Jul 2024 11:06:47 +0200
> Richard Biener  wrote:
>
> > On Thu, Jul 11, 2024 at 11:04 AM Andre Vehreschild 
> > wrote:
> > >
> > > Hi Richard,
> > >
> > > I am sorry to hear that. Shall I revert?
> >
> > I would suggest to instead fix by initializing the variable with NULL
> > (and a comment).
> >
> > > - Andre
> > > On Thu, 11 Jul 2024 10:57:48 +0200
> > > Richard Biener  wrote:
> > >
> > > > On Thu, Jul 11, 2024 at 10:54 AM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Thu, Jul 11, 2024 at 10:04 AM Andre Vehreschild
> > > > >  wrote:
> > > > > >
> > > > > > Hi Harald,
> > > > > >
> > > > > > thank you very much for ok'ing this large patch. Merged as
> > > > > > gcc-15-1965-ge4f2f46e015
> > > > > >
> > > > > > Looking forward to get (no) bug reports ;-)
> > > > >
> > > > > This seems to break bootstrap with
> > > > >
> > > > > ../../gcc/gcc/fortran/trans-array.cc: In function ‘void
> > > > > gfc_conv_array_paramete (gfc_se*, gfc_expr*, bool, const
> > > > > gfc_symbol*, const char*, tree_node**, tree_node**,
> > > > > tree_node**)’: ../../gcc/gcc/fortran/trans-array.cc:9135:41:
> > > > > error: ‘pack_attr’ may be used uninitialized
> > > > > [-Werror=maybe-uninitialized] 9135 |   tmp =
> > > > > build_call_expr_loc (input_location, |
> > > > > ^~~~ 9136 |
> > > > > gfor_fndecl_in_unpack_class, 4, tmp,
> > > > >   |
> > > > > 
> > > > >  9137 |  packedptr,
> > > > >   |  ~~
> > > > >  9138 |  size_in_bytes
> > > > > (TREE_TYPE (ctree)),
> > > > >   |
> > > > > ~~
> > > > >  9139 |  pack_attr);
> > > > >   |  ~~
> > > > > ../../gcc/gcc/fortran/trans-array.cc:8665:8: note: ‘pack_attr’
> > > > > was declared here 8665 |   tree pack_attr;
> > > > >   |^
> > > > > cc1plus: all warnings being treated as errors
> > > > > make[3]: *** [Makefile:1198: fortran/trans-array.o] Error 1
> > > >
> > > > It seems to be a false positive but GCCs little mind is too weak
> > > > to prove that (yes, we error on the side of emitting a diagnostic
> > > > if we can't prove it's initialized)
> > > >
> > > > Richard.
> > > >
> > > > >
> > > > > > Thanks again,
> > > > > >
> > > > > > Andre
> > > > > >
> > > > > > On Wed, 10 Jul 2024 20:52:37 +0200
> > > > > > Harald Anlauf  wrote:
> > > > > >
> > > > > > > Hi Andre,
> > > > > > >
> > > > > > > Am 10.07.24 um 10:45 schrieb Andre Vehreschild:
> > > > > > > > Hi Harald,
> > > > > > > >
> > > > > > > > thanks for the review. I totally agree, that this patch
> > > > > > > > has gotten bigger than I expected (and wanted). But
> > > > > > > > things are as they are.
> > > > > > > >
> > > > > > > > About the coding style: I have worked in so many projects,
> > > > > > > > that I consider a consistent coding style luxury. I esp.
> > > > > > > > do not have my own one anymore. The formating you are
> > > > > > > > seeing in my patches is the result of clang-format with
> > > > > > > > the provided parameter file in contrib/clang-format. I
> > > > > > > > was happy to have a tool to do the formatting, that I
> > > > > > > > could integrate into my IDE, because previously it was
> > > > > > > > hard to mimic the GNU style. I try to get to the GNU
> > > > > > > > style as good as possible, where I consider clang-format
> > > > > > > > doing garbage.
> > > > > > > >
> > > > > > > > I see that clang-format has a "very specific opinion" on
> > > > > > > > how to format the lines you mentioned, but it will
> > > > > > > > "correct

RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]

2024-07-11 Thread Richard Biener

On Wed, 10 Jul 2024, Tamar Christina wrote:

> > > > I might also point back to the idea I threw in somewhere, adding
> > > > OEP_VALUE (or a better name) to the set of flags accepted by
> > > > operand_equal_p.  You mentioned hashing IIRC but I don't see the patches
> > > > touching hashing?
> > > >
> > >
> > > Yes, That can indeed be done with this approach.  The hashing was that 
> > > before I
> > > was trying to prevent the "duplicate" IV expressions from being created 
> > > in the
> > > first place by modifying get_loop_invariant_expr.
> > >
> > > This function looks up if we have already seen a particular IV expression 
> > > and if
> > > we have it just returns that expression.  However after reading more of 
> > > the code
> > > I realized this wasn't the right approach, as without also dealing with 
> > > the
> > candidates
> > > we'd end up creating IV expression that can't be handled by any candidate.
> > >
> > > IVops would just give up then.   Reading the code it seems that
> > get_loop_invariant_expr
> > > is just there to prevent blatant duplicates.  i.e. it treats `(signed) a` 
> > > and `a` as the
> > same.
> > >
> > > This is also why I think that everywhere else *has* to continue stripping 
> > > the
> > expression.
> > >
> > > On a note from Richard S that he thought IVopts already had some code to 
> > > deal
> > with
> > > expressions that differ only in signs led me to take a different approach.
> > >
> > > The goal wasn't to remove the different sign/unsigned IV expressions, but
> > instead get
> > > Then to be servable by the same candidates. i.e. we want them in the same
> > candidate
> > > groups and then candidate optimization will just do its thing.
> > >
> > > That seemed a more natural fit to how it worked.
> > 
> > Yeah, I agree that sounds like the better strathegy.
> > 
> > > Do you want me to try the operand_equal_p approach? Though in this case 
> > > the
> > issue
> > > is we not only need to know they're equal, but also need to know the scale
> > factor.
> > 
> > For this case yes, but if you'd keep the code as-is, the equal with scale
> > factor one case would be fixed.  Not a case with different scale factors
> > though - but conversions "elsewhere" should be handled via the stripping.
> > So it would work to simply adjust the operand_equal_p check here?
> > 
> > > get_computation_aff_1 scales the common type IV by the scale we 
> > > determined,
> > > so I think operand_equal_p would not be very useful here.  But it does 
> > > look like
> > > constant_multiple_of can just be implemented with
> > aff_combination_constant_multiple_p.
> > >
> > > Should I try?
> > 
> > You've had the other places where you replace operand_equal_p with
> > affine-compute and compare.  As said that has some associated cost
> > as well as a limit on the number of elements after which it resorts
> > back to operand_equal_p.  So for strict equality tests implementing
> > a weaker operand_equal_p might be a better solution.
> > 
> 
> The structural comparison is implemented as a new mode for operand_equal_p 
> which
> compares two expressions ignoring NOP converts (unless their bitsizes differ)
> and ignoring constant values, but requiring both operands be a constant.
> 
> There is one downside compared to affine comparison, in that this approach 
> does
> not deal well with commutative operations. i.e. it does not see a + (b + c) as
> equivalent to c + (b + a).
> 
> This means we lose out on some of the more complicated addressing modes, but
> with so many operations the address will likely be split anyway and we'll deal
> with it then.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/114932
>   * fold-const.cc (operand_compare::operand_equal_p): Use it.
>   (operand_compare::verify_hash_value): Likewise.
>   * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
>   * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/114932
>   * gfortran.dg/addressing-modes_2.f90: New test.
> 
> -- inline copy of --
> 
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 
> 710d697c0217c784b34f9f9f7b00b1945369076a..3d43020541c082c094164724da9d17fbb5793237
>  100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -3191,6 +3191,9 @@ operand_compare::operand_equal_p (const_tree arg0, 
> const_tree arg1,
>   precision differences.  */
>if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
>  {
> +  if (flags & OEP_STRUCTURAL_EQ)
> + return true;
> +

Hmm, so you ignore all constants?

>/* Address of INTEGER_CST is not defined; check that we did not forget
>to drop the OEP_ADDRESS_OF flags.  */
>gcc_checking_assert (!(flags & OEP_ADDRESS_OF));
> @@ -3204,7 +3207,8 @@ operand_co

[PATCH v2 00/11] aarch64: Extend aarch64_feature_flags to 128 bits

2024-07-11 Thread Andrew Carlotti

The end goal of the series is to change the definition of aarch64_feature_flags
from a uint64_t typedef to a class with 128 bits of storage.  This class is a
new template bitmap type that uses operator overloading to mimic the existing
integer interface as much as possible.

The changes are mostly in the backend, but patch 10/11 introduces this new
bitmap type in the middle end.

Compared to the previous version:
Patches 01-03 are the previous patches 01, 02 and 05.
Patch 04 is a rebased version of the previous 06.
Patch 05-06 are the previous 07-08.
Patches 07 and 09 are a couple of hunks from the old 03/04.
Patch 08 redefines the TARGET_* macros in the manner suggested by Richard S.
Patch 10 intoduces the new templated bbitmap type.
Patch 11 is a replacement for the old 12, using the new type from patch 10.

This is bootstrapped and regression tested on aarch64.  Is it ok for master?

[PATCH v2 01/11] aarch64: Remove unused global aarch64_tune_flags

2024-07-11 Thread Andrew Carlotti

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_tune_flags): Remove unused global variable.
(aarch64_override_options_internal): Remove dead assignment.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
7f0cc47d0f071de9297068baa85c6d5fc4d7fa5b..2a67383bf9d21631664aba82e753120a0173efcf
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -349,9 +349,6 @@ static bool aarch64_print_address_internal (FILE*, 
machine_mode, rtx,
 /* The processor for which instructions should be scheduled.  */
 enum aarch64_processor aarch64_tune = cortexa53;
 
-/* Mask to specify which instruction scheduling options should be used.  */
-uint64_t aarch64_tune_flags = 0;
-
 /* Global flag for PC relative loads.  */
 bool aarch64_pcrelative_literal_loads;
 
@@ -18273,7 +18270,6 @@ void
 aarch64_override_options_internal (struct gcc_options *opts)
 {
   const struct processor *tune = aarch64_get_tune_cpu (opts->x_selected_tune);
-  aarch64_tune_flags = tune->flags;
   aarch64_tune = tune->sched_core;
   /* Make a copy of the tuning parameters attached to the core, which
  we may later overwrite.  */

[PATCH v2 02/11] aarch64: Move AARCH64_NUM_ISA_MODES definition

2024-07-11 Thread Andrew Carlotti

AARCH64_NUM_ISA_MODES will be used within aarch64-opts.h in a later
commit.

gcc/ChangeLog:

* config/aarch64/aarch64.h (DEF_AARCH64_ISA_MODE): Move to...
* config/aarch64/aarch64-opts.h (DEF_AARCH64_ISA_MODE): ...here.


diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 
a05c0d3ded1c69802f15eebb8c150c7dcc62b4ef..06a4fed3833482543891b4f7c778933f7cebd631
 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -24,6 +24,11 @@
 
 #ifndef USED_FOR_TARGET
 typedef uint64_t aarch64_feature_flags;
+
+constexpr unsigned int AARCH64_NUM_ISA_MODES = (0
+#define DEF_AARCH64_ISA_MODE(IDENT) + 1
+#include "aarch64-isa-modes.def"
+);
 #endif
 
 /* The various cores that implement AArch64.  */
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
fac1882bcb38eae3690c2dc366ebc6c3f64ee940..2be6dc4089b81d2a4e1ba6861b25094774198406
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -183,11 +183,6 @@ enum class aarch64_feature : unsigned char {
 
 constexpr auto AARCH64_FL_SM_STATE = AARCH64_FL_SM_ON | AARCH64_FL_SM_OFF;
 
-constexpr unsigned int AARCH64_NUM_ISA_MODES = (0
-#define DEF_AARCH64_ISA_MODE(IDENT) + 1
-#include "aarch64-isa-modes.def"
-);
-
 /* The mask of all ISA modes.  */
 constexpr auto AARCH64_FL_ISA_MODES
   = (aarch64_feature_flags (1) << AARCH64_NUM_ISA_MODES) - 1;

[PATCH 2/2] RISC-V: Allow uninitialized preferred_else_value for RVV

2024-07-11 Thread YunQiang Su

From: YunQiang Su 

PR target/115840.

In riscv_preferred_else_value, we create an uninitialized tmp var
for else value, instead of the 0 (as default_preferred_else_value)
or the pre-exists VAR (as aarch64 does), so that we can use agnostic
policy.

The problem is that `warn_uninit` will emit a warning:
  ({anonymous})’ may be used uninitialized

Let's mark this tmp var as "allow_uninitialized".

This problem is found when I try to build glibc with V extension.

gcc
PR target/115840.
* config/riscv/riscv.cc(riscv_preferred_else_value): Mark
tmp_var as allow_unitialized.

gcc/testsuite
* gcc.dg/vect/pr115840.c: New testcase.
---
 gcc/config/riscv/riscv.cc|  6 +-
 gcc/testsuite/gcc.dg/vect/pr115840.c | 11 +++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr115840.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 61fa74e9322..08159d7cbbc 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11431,7 +11431,11 @@ riscv_preferred_else_value (unsigned ifn, tree 
vectype, unsigned int nops,
tree *ops)
 {
   if (riscv_v_ext_mode_p (TYPE_MODE (vectype)))
-return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
+{
+  tree tmp_var = create_tmp_var (vectype);
+  TREE_ALLOW_UNINITIALIZED (tmp_var) = 1;
+  return get_or_create_ssa_default_def (cfun, tmp_var);
+}
 
   return default_preferred_else_value (ifn, vectype, nops, ops);
 }
diff --git a/gcc/testsuite/gcc.dg/vect/pr115840.c 
b/gcc/testsuite/gcc.dg/vect/pr115840.c
new file mode 100644
index 000..09dc9e4eb7c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115840.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wall -Werror" } */
+
+double loads[16];
+
+void
+foo (double loadavg[], int count)
+{
+  for (int i = 0; i < count; i++)
+loadavg[i] = loads[i] / 1.5;
+}
-- 
2.45.1

[PATCH 1/2] Add allow_uninitialized to tree_base.u.bits for VAR_DECL

2024-07-11 Thread YunQiang Su

From: YunQiang Su 

Uninitialized internal temp variable may be useful in some case,
such as for COND_LEN_MUL etc on RISC-V with V extension: If an
const or pre-exists VAR is used, we have to use "undisturbed"
policy; if an uninitialized VAR is used, we can use "agnostic".
With "agnostic", the microarchitectures can omit copying part of
the VAR.

gcc
* tree-core.h(tree_base): Add u.bits.allow_uninitialized.
* tree.h: Add new macro TREE_ALLOW_UNINITIALIZED.
* tree-ssa-uninit.cc(warn_uninit): Don't warn if VAR is
marked as allow_uninitialized.
---
 gcc/tree-core.h| 5 -
 gcc/tree-ssa-uninit.cc | 4 
 gcc/tree.h | 4 
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 27c569c7702..984201199f6 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1101,7 +1101,10 @@ struct GTY(()) tree_base {
   unsigned nameless_flag : 1;
   unsigned atomic_flag : 1;
   unsigned unavailable_flag : 1;
-  unsigned spare0 : 2;
+  /* Don't warn if uninitialized.  RISC-V V has tail agnostic/undisturbed
+policy, which may be get benifits if we use an uninitialized var.  */
+  unsigned allow_uninitialized : 1;
+  unsigned spare0 : 1;
 
   unsigned spare1 : 8;
 
diff --git a/gcc/tree-ssa-uninit.cc b/gcc/tree-ssa-uninit.cc
index 726684e472a..12861e1dbc9 100644
--- a/gcc/tree-ssa-uninit.cc
+++ b/gcc/tree-ssa-uninit.cc
@@ -142,6 +142,10 @@ warn_uninit (opt_code opt, tree t, tree var, gimple 
*context,
   if (!has_undefined_value_p (t))
 return;
 
+  /* VAR may mark itself as allow_uninitialized.  */
+  if (TREE_ALLOW_UNINITIALIZED (var))
+return;
+
   /* Ignore COMPLEX_EXPR as initializing only a part of a complex
  turns in a COMPLEX_EXPR with the not initialized part being
  set to its previous (undefined) value.  */
diff --git a/gcc/tree.h b/gcc/tree.h
index 28e8e71b036..381780fde2e 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -3311,6 +3311,10 @@ extern void decl_fini_priority_insert (tree, 
priority_type);
 #define VAR_DECL_IS_VIRTUAL_OPERAND(NODE) \
   (VAR_DECL_CHECK (NODE)->base.u.bits.saturating_flag)
 
+/* In a VAR_DECL, nonzero if NODE is allowed to be uninitialized.  */
+#define TREE_ALLOW_UNINITIALIZED(NODE) \
+  (VAR_DECL_CHECK (NODE)->base.u.bits.allow_uninitialized)
+
 /* In a VAR_DECL, nonzero if this is a non-local frame structure.  */
 #define DECL_NONLOCAL_FRAME(NODE)  \
   (VAR_DECL_CHECK (NODE)->base.default_def_flag)
-- 
2.45.1

[PATCH v2 04/11] aarch64: Introduce aarch64_isa_mode type

2024-07-11 Thread Andrew Carlotti

Currently there are many places where an aarch64_feature_flags variable
is used, but only the bottom three isa mode bits are set and read.
Using a separate data type for these value makes it more clear that
they're not expected or required to have any of their upper feature bits
set.  It will also make things simpler and more efficient when we extend
aarch64_feature_flags to 128 bits.

This patch uses explicit casts whenever converting from an
aarch64_feature_flags value to an aarch64_isa_mode value.  This isn't
strictly necessary, but serves to highlight the locations where an
explicit conversion will become necessary later.

gcc/ChangeLog:

* config/aarch64/aarch64-opts.h: Add aarch64_isa_mode typedef.
* config/aarch64/aarch64-protos.h
(aarch64_gen_callee_cookie): Use aarch64_isa_mode parameter.
(aarch64_sme_vq_immediate): Ditto.
* config/aarch64/aarch64.cc
(aarch64_fntype_pstate_sm): Use aarch64_isa_mode values.
(aarch64_fntype_pstate_za): Ditto.
(aarch64_fndecl_pstate_sm): Ditto.
(aarch64_fndecl_pstate_za): Ditto.
(aarch64_fndecl_isa_mode): Ditto.
(aarch64_cfun_incoming_pstate_sm): Ditto.
(aarch64_cfun_enables_pstate_sm): Ditto.
(aarch64_call_switches_pstate_sm): Ditto.
(aarch64_gen_callee_cookie): Ditto.
(aarch64_callee_isa_mode): Ditto.
(aarch64_insn_callee_abi): Ditto.
(aarch64_sme_vq_immediate): Ditto.
(aarch64_add_offset_temporaries): Ditto.
(aarch64_add_offset): Ditto.
(aarch64_add_sp): Ditto.
(aarch64_sub_sp): Ditto.
(aarch64_guard_switch_pstate_sm): Ditto.
(aarch64_switch_pstate_sm): Ditto.
(aarch64_init_cumulative_args): Ditto.
(aarch64_allocate_and_probe_stack_space): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_start_call_args): Ditto.
(aarch64_expand_call): Ditto.
(aarch64_end_call_args): Ditto.
(aarch64_set_current_function): Ditto, with added conversions.
(aarch64_handle_attr_arch): Avoid macro with changed type.
(aarch64_handle_attr_cpu): Ditto.
(aarch64_handle_attr_isa_flags): Ditto.
(aarch64_switch_pstate_sm_for_landing_pad):
Use arch64_isa_mode values.
(aarch64_switch_pstate_sm_for_jump): Ditto.
(pass_switch_pstate_sm::gate): Ditto.
* config/aarch64/aarch64.h
(AARCH64_ISA_MODE_{SM_ON|SM_OFF|ZA_ON}): New macros.
(AARCH64_FL_SM_STATE): Mark as possibly unused.
(AARCH64_ISA_MODE_SM_STATE): New aarch64_isa_mode mask.
(AARCH64_DEFAULT_ISA_MODE): New aarch64_isa_mode value.
(AARCH64_FL_DEFAULT_ISA_MODE): Define using above value.
(AARCH64_ISA_MODE): Change type to aarch64_isa_mode.
(arm_pcs): Use aarch64_isa_mode value.


diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 
06a4fed3833482543891b4f7c778933f7cebd631..2c36bfaad19b999238601d44709c280ef987046b
 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -23,6 +23,8 @@
 #define GCC_AARCH64_OPTS_H
 
 #ifndef USED_FOR_TARGET
+typedef uint64_t aarch64_isa_mode;
+
 typedef uint64_t aarch64_feature_flags;
 
 constexpr unsigned int AARCH64_NUM_ISA_MODES = (0
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
42639e9efcf1e0f9362f759ae63a31b8eeb0d581..f64afe2889018e1c4735a1677e6bf5febc4a7665
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -767,7 +767,7 @@ bool aarch64_constant_address_p (rtx);
 bool aarch64_emit_approx_div (rtx, rtx, rtx);
 bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
 tree aarch64_vector_load_decl (tree);
-rtx aarch64_gen_callee_cookie (aarch64_feature_flags, arm_pcs);
+rtx aarch64_gen_callee_cookie (aarch64_isa_mode, arm_pcs);
 void aarch64_expand_call (rtx, rtx, rtx, bool);
 bool aarch64_expand_cpymem_mops (rtx *, bool);
 bool aarch64_expand_cpymem (rtx *, bool);
@@ -808,7 +808,7 @@ int aarch64_add_offset_temporaries (rtx);
 void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx);
 bool aarch64_rdsvl_immediate_p (const_rtx);
 rtx aarch64_sme_vq_immediate (machine_mode mode, HOST_WIDE_INT,
- aarch64_feature_flags);
+ aarch64_isa_mode);
 char *aarch64_output_rdsvl (const_rtx);
 bool aarch64_addsvl_addspl_immediate_p (const_rtx);
 char *aarch64_output_addsvl_addspl (rtx);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
2be6dc4089b81d2a4e1ba6861b25094774198406..dfb244307635a7aa1c552acd55a635cd0bdeeb39
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -181,7 +181,17 @@ enum class aarch64_feature : unsigned char {
 #include "aarch64-arches.def"
 #undef HANDLE
 
-constexpr auto AARCH64_FL_SM_STATE = AARCH64_FL_SM_ON | AARCH64_FL_SM_O

[PATCH v2 06/11] aarch64: Decouple feature flag option storage type

2024-07-11 Thread Andrew Carlotti

The awk scripts that process the .opt files are relatively fragile and
only handle a limited set of data types correctly.  The unrecognised
aarch64_feature_flags type is handled as a uint64_t, which happens to be
correct for now.  However, that assumption will change when we extend
the mask to 128 bits.

This patch changes the option members to use uint64_t types, and adds a
"_0" suffix to the names (both for future extensibility, and to allow
the original name to be used for the full aarch64_feature_flags mask
within generator files).

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_set_asm_isa_flags): Reorder, and add suffix to names.
* config/aarch64/aarch64.h
(aarch64_get_asm_isa_flags): Add "_0" suffix.
(aarch64_get_isa_flags): Ditto.
(aarch64_asm_isa_flags): Redefine using renamed uint64_t value.
(aarch64_isa_flags): Ditto.
* config/aarch64/aarch64.opt:
(aarch64_asm_isa_flags): Rename to...
(aarch64_asm_isa_flags_0): ...this, and change to uint64_t.
(aarch64_isa_flags): Rename to...
(aarch64_isa_flags_0): ...this, and change to uint64_t.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
63c50189a09d5c7c713f57e23a8172f44bf6bec5..bd0770dd0d84005701afed35d4af356380a405e9
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -66,15 +66,16 @@ static const struct default_options 
aarch_option_optimization_table[] =
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
-/* Set OPTS->x_aarch64_asm_isa_flags to FLAGS and update
-   OPTS->x_aarch64_isa_flags accordingly.  */
+
+/* Set OPTS->x_aarch64_asm_isa_flags_0 to FLAGS and update
+   OPTS->x_aarch64_isa_flags_0 accordingly.  */
 void
 aarch64_set_asm_isa_flags (gcc_options *opts, aarch64_feature_flags flags)
 {
-  opts->x_aarch64_asm_isa_flags = flags;
-  opts->x_aarch64_isa_flags = flags;
+  opts->x_aarch64_asm_isa_flags_0 = flags;
   if (opts->x_target_flags & MASK_GENERAL_REGS_ONLY)
-opts->x_aarch64_isa_flags &= ~feature_deps::get_flags_off (AARCH64_FL_FP);
+flags &= ~feature_deps::get_flags_off (AARCH64_FL_FP);
+  opts->x_aarch64_isa_flags_0 = flags;
 }
 
 /* Implement TARGET_HANDLE_OPTION.
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
193f2486176b6bac372a143e2f52041c5a28ebaf..903e708565dc7830e9544813dd315f99d489cad2
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -23,13 +23,18 @@
 #define GCC_AARCH64_H
 
 #define aarch64_get_asm_isa_flags(opts) \
-  (aarch64_feature_flags ((opts)->x_aarch64_asm_isa_flags))
+  (aarch64_feature_flags ((opts)->x_aarch64_asm_isa_flags_0))
 #define aarch64_get_isa_flags(opts) \
-  (aarch64_feature_flags ((opts)->x_aarch64_isa_flags))
+  (aarch64_feature_flags ((opts)->x_aarch64_isa_flags_0))
 
 /* Make these flags read-only so that all uses go via
aarch64_set_asm_isa_flags.  */
-#ifndef GENERATOR_FILE
+#ifdef GENERATOR_FILE
+#undef aarch64_asm_isa_flags
+#define aarch64_asm_isa_flags (aarch64_feature_flags (aarch64_asm_isa_flags_0))
+#undef aarch64_isa_flags
+#define aarch64_isa_flags (aarch64_feature_flags (aarch64_isa_flags_0))
+#else
 #undef aarch64_asm_isa_flags
 #define aarch64_asm_isa_flags (aarch64_get_asm_isa_flags (&global_options))
 #undef aarch64_isa_flags
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 
6356c419399bd324929cd599e5a4b926b0383469..45aab49de27bdfa0fb3f67ec06c7dcf0ac242fb3
 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -31,10 +31,10 @@ TargetVariable
 enum aarch64_arch selected_arch = aarch64_no_arch
 
 TargetVariable
-aarch64_feature_flags aarch64_asm_isa_flags = 0
+uint64_t aarch64_asm_isa_flags_0 = 0
 
 TargetVariable
-aarch64_feature_flags aarch64_isa_flags = 0
+uint64_t aarch64_isa_flags_0 = 0
 
 TargetVariable
 unsigned aarch_enable_bti = 2

[PATCH v2 07/11] aarch64: Add explicit bool cast to return value

2024-07-11 Thread Andrew Carlotti

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_valid_sysreg_name_p): Add bool cast.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
69f481ecfc848e6c5b61516c2f7b8bff5cd4f8b8..229e438115c268f876c6d77e1fcc4faa2f7d231b
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -30284,7 +30284,7 @@ aarch64_valid_sysreg_name_p (const char *regname)
   if (sysreg == NULL)
 return aarch64_is_implem_def_reg (regname);
   if (sysreg->arch_reqs)
-return (aarch64_isa_flags & sysreg->arch_reqs);
+return bool (aarch64_isa_flags & sysreg->arch_reqs);
   return true;
 }

[PATCH v2 03/11] aarch64: Eliminate a temporary variable.

2024-07-11 Thread Andrew Carlotti

The name would become misleading in a later commit anyway, and I think
this is marginally more readable.

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_override_options): Remove temporary variable.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
2a67383bf9d21631664aba82e753120a0173efcf..67c97569b7d4b5502e8dfc111eced65d2aee5cb2
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18856,7 +18856,6 @@ aarch64_override_options (void)
   SUBTARGET_OVERRIDE_OPTIONS;
 #endif
 
-  auto isa_mode = AARCH64_FL_DEFAULT_ISA_MODE;
   if (cpu && arch)
 {
   /* If both -mcpu and -march are specified, warn if they are not
@@ -18879,25 +18878,25 @@ aarch64_override_options (void)
}
 
   selected_arch = arch->arch;
-  aarch64_set_asm_isa_flags (arch_isa | isa_mode);
+  aarch64_set_asm_isa_flags (arch_isa | AARCH64_FL_DEFAULT_ISA_MODE);
 }
   else if (cpu)
 {
   selected_arch = cpu->arch;
-  aarch64_set_asm_isa_flags (cpu_isa | isa_mode);
+  aarch64_set_asm_isa_flags (cpu_isa | AARCH64_FL_DEFAULT_ISA_MODE);
 }
   else if (arch)
 {
   cpu = &all_cores[arch->ident];
   selected_arch = arch->arch;
-  aarch64_set_asm_isa_flags (arch_isa | isa_mode);
+  aarch64_set_asm_isa_flags (arch_isa | AARCH64_FL_DEFAULT_ISA_MODE);
 }
   else
 {
   /* No -mcpu or -march specified, so use the default CPU.  */
   cpu = &all_cores[TARGET_CPU_DEFAULT];
   selected_arch = cpu->arch;
-  aarch64_set_asm_isa_flags (cpu->flags | isa_mode);
+  aarch64_set_asm_isa_flags (cpu->flags | AARCH64_FL_DEFAULT_ISA_MODE);
 }
 
   selected_tune = tune ? tune->ident : cpu->ident;

[PATCH v2 08/11] aarch64: Add bool conversion to TARGET_* macros

2024-07-11 Thread Andrew Carlotti

Use a new AARCH64_HAVE_ISA macro in TARGET_* definitions, and eliminate
all the AARCH64_ISA_* feature macros.

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc
(aarch64_define_unconditional_macros): Use TARGET_V8R macro.
(aarch64_update_cpp_builtins): Use TARGET_* macros.
* config/aarch64/aarch64.h (AARCH64_HAVE_ISA): New macro.
(AARCH64_ISA_SM_OFF, AARCH64_ISA_SM_ON, AARCH64_ISA_ZA_ON)
(AARCH64_ISA_V8A, AARCH64_ISA_V8_1A, AARCH64_ISA_CRC)
(AARCH64_ISA_FP, AARCH64_ISA_SIMD, AARCH64_ISA_LSE)
(AARCH64_ISA_RDMA, AARCH64_ISA_V8_2A, AARCH64_ISA_F16)
(AARCH64_ISA_SVE, AARCH64_ISA_SVE2, AARCH64_ISA_SVE2_AES)
(AARCH64_ISA_SVE2_BITPERM, AARCH64_ISA_SVE2_SHA3)
(AARCH64_ISA_SVE2_SM4, AARCH64_ISA_SME, AARCH64_ISA_SME_I16I64)
(AARCH64_ISA_SME_F64F64, AARCH64_ISA_SME2, AARCH64_ISA_V8_3A)
(AARCH64_ISA_DOTPROD, AARCH64_ISA_AES, AARCH64_ISA_SHA2)
(AARCH64_ISA_V8_4A, AARCH64_ISA_SM4, AARCH64_ISA_SHA3)
(AARCH64_ISA_F16FML, AARCH64_ISA_RCPC, AARCH64_ISA_RCPC8_4)
(AARCH64_ISA_RNG, AARCH64_ISA_V8_5A, AARCH64_ISA_TME)
(AARCH64_ISA_MEMTAG, AARCH64_ISA_V8_6A, AARCH64_ISA_I8MM)
(AARCH64_ISA_F32MM, AARCH64_ISA_F64MM, AARCH64_ISA_BF16)
(AARCH64_ISA_SB, AARCH64_ISA_RCPC3, AARCH64_ISA_V8R)
(AARCH64_ISA_PAUTH, AARCH64_ISA_V8_7A, AARCH64_ISA_V8_8A)
(AARCH64_ISA_V8_9A, AARCH64_ISA_V9A, AARCH64_ISA_V9_1A)
(AARCH64_ISA_V9_2A, AARCH64_ISA_V9_3A, AARCH64_ISA_V9_4A)
(AARCH64_ISA_MOPS, AARCH64_ISA_LS64, AARCH64_ISA_CSSC)
(AARCH64_ISA_D128, AARCH64_ISA_THE, AARCH64_ISA_GCS): Remove.
(TARGET_BASE_SIMD, TARGET_SIMD, TARGET_FLOAT)
(TARGET_NON_STREAMING, TARGET_STREAMING, TARGET_ZA, TARGET_SHA2)
(TARGET_SHA3, TARGET_AES, TARGET_SM4, TARGET_F16FML)
(TARGET_CRC32, TARGET_LSE, TARGET_FP_F16INST)
(TARGET_SIMD_F16INST, TARGET_DOTPROD, TARGET_SVE, TARGET_SVE2)
(TARGET_SVE2_AES, TARGET_SVE2_BITPERM, TARGET_SVE2_SHA3)
(TARGET_SVE2_SM4, TARGET_SME, TARGET_SME_I16I64)
(TARGET_SME_F64F64, TARGET_SME2, TARGET_ARMV8_3, TARGET_JSCVT)
(TARGET_FRINT, TARGET_TME, TARGET_RNG, TARGET_MEMTAG)
(TARGET_I8MM, TARGET_SVE_I8MM, TARGET_SVE_F32MM)
(TARGET_SVE_F64MM, TARGET_BF16_FP, TARGET_BF16_SIMD)
(TARGET_SVE_BF16, TARGET_PAUTH, TARGET_BTI, TARGET_MOPS)
(TARGET_LS64, TARGET_CSSC, TARGET_SB, TARGET_RCPC, TARGET_RCPC2)
(TARGET_RCPC3, TARGET_SIMD_RDMA, TARGET_ARMV9_4, TARGET_D128)
(TARGET_THE, TARGET_GCS): Redefine using AARCH64_HAVE_ISA.
(TARGET_V8R, TARGET_V9A): New.
* config/aarch64/aarch64.md (arch_enabled): Use TARGET_RCPC2.
* config/aarch64/iterators.md (GPI_I16): Use TARGET_FP_F16INST.
(GPF_F16): Ditto.
* config/aarch64/predicates.md
(aarch64_rcpc_memory_operand): Use TARGET_RCPC2.


diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 
2aff097dd33c1892d255f7227c72dc90892bc78a..f9b9e379375507c5c49cac280f3a8c3e34c9aec9
 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -64,7 +64,7 @@ aarch64_define_unconditional_macros (cpp_reader *pfile)
   builtin_define ("__ARM_ARCH_8A");
 
   builtin_define_with_int_value ("__ARM_ARCH_PROFILE",
-  AARCH64_ISA_V8R ? 'R' : 'A');
+  TARGET_V8R ? 'R' : 'A');
   builtin_define ("__ARM_FEATURE_CLZ");
   builtin_define ("__ARM_FEATURE_IDIV");
   builtin_define ("__ARM_FEATURE_UNALIGNED");
@@ -132,7 +132,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (flag_unsafe_math_optimizations, "__ARM_FP_FAST", 
pfile);
 
   cpp_undef (pfile, "__ARM_ARCH");
-  builtin_define_with_int_value ("__ARM_ARCH", AARCH64_ISA_V9A ? 9 : 8);
+  builtin_define_with_int_value ("__ARM_ARCH", TARGET_V9A ? 9 : 8);
 
   builtin_define_with_int_value ("__ARM_SIZEOF_MINIMAL_ENUM",
 flag_short_enums ? 1 : 4);
@@ -259,7 +259,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
 
   aarch64_def_or_undef (TARGET_LS64,
"__ARM_FEATURE_LS64", pfile);
-  aarch64_def_or_undef (AARCH64_ISA_RCPC, "__ARM_FEATURE_RCPC", pfile);
+  aarch64_def_or_undef (TARGET_RCPC, "__ARM_FEATURE_RCPC", pfile);
   aarch64_def_or_undef (TARGET_D128, "__ARM_FEATURE_SYSREG128", pfile);
 
   aarch64_def_or_undef (TARGET_SME, "__ARM_FEATURE_SME", pfile);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
903e708565dc7830e9544813dd315f99d489cad2..902de6bd269d786d58248d1d2e1614217fa2
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -68,18 +68,6 @@
 #define BYTES_BIG_ENDIAN (TARGET_BIG_END != 0)
 #define WORDS_BIG_ENDIAN (BYTES_BIG_ENDIAN)
 
-/* AdvSIMD is supported in the default configuration, unless disabled by
-   -mgeneral-regs-only or by the +nosimd extension.  The set of available
-   instructions is then subdivided into:
-
-   -

[PATCH v2 09/11] aarch64: Use constructor explicitly in get_flags_off

2024-07-11 Thread Andrew Carlotti

gcc/ChangeLog:

* config/aarch64/aarch64-feature-deps.h
(get_flags_off): Construct aarch64_feature_flags (0) explicitly.


diff --git a/gcc/config/aarch64/aarch64-feature-deps.h 
b/gcc/config/aarch64/aarch64-feature-deps.h
index 
79126db88254b89f74a8583d50a77bc27865e265..a14ae22b72980bef5eec80588f06d9ced895dfd7
 100644
--- a/gcc/config/aarch64/aarch64-feature-deps.h
+++ b/gcc/config/aarch64/aarch64-feature-deps.h
@@ -97,9 +97,10 @@ template struct info;
 constexpr aarch64_feature_flags
 get_flags_off (aarch64_feature_flags mask)
 {
-  return (0
+  return (aarch64_feature_flags (0)
 #define AARCH64_OPT_EXTENSION(A, IDENT, C, D, E, F) \
- | (feature_deps::IDENT ().enable & mask ? AARCH64_FL_##IDENT : 0)
+ | (feature_deps::IDENT ().enable & mask ? AARCH64_FL_##IDENT \
+ : aarch64_feature_flags (0))
 #include "config/aarch64/aarch64-option-extensions.def"
  );
 }

RE: [PATCH]middle-end: Implement conditonal store vectorizer pattern [PR115531]

2024-07-11 Thread Richard Biener

On Wed, 10 Jul 2024, Tamar Christina wrote:

> > > >
> > > > > + }
> > > > > +
> > > > > +  if (new_code == ERROR_MARK)
> > > > > + {
> > > > > +   /* We couldn't flip the condition, so invert the mask 
> > > > > instead.  */
> > > > > +   itype = TREE_TYPE (cmp_ls);
> > > > > +   conv = gimple_build_assign (var, BIT_XOR_EXPR, cmp_ls,
> > > > > +   build_int_cst (itype, 1));
> > > > > + }
> > > > > +
> > > > > +  mask_vec_type = get_mask_type_for_scalar_type (loop_vinfo, 
> > > > > itype);
> > > > > +  append_pattern_def_seq (vinfo, stmt_vinfo, conv, mask_vec_type,
> > itype);
> > > > > +  /* Then prepare the boolean mask as the mask conversion pattern
> > > > > +  won't hit on the pattern statement.  */
> > > > > +  cmp_ls = build_mask_conversion (vinfo, var, gs_vectype, 
> > > > > stmt_vinfo);
> > > >
> > > > Isn't this somewhat redundant with the below call?
> > > >
> > > > I fear of bad [non-]interactions with bool pattern recognition btw.
> > >
> > > So this is again another issue with that patterns don't apply to newly 
> > > produced
> > patterns.
> > > and so they can't serve as root for new patterns.  This is why the 
> > > scatter/gather
> > pattern
> > > addition refactored part of the work into these helper functions.
> > >
> > > I did actually try to just add a secondary loop that iterates over newly 
> > > produced
> > patterns
> > > but you later run into problems where a new pattern completely cancels 
> > > out an
> > old pattern
> > > rather than just extend it.
> > >
> > > So at the moment, unless the code ends up being hybrid, whatever the bool
> > recog pattern
> > > does is just ignored as irrelevant.
> > >
> > > But If we don't invert the compare then it should be simpler as the 
> > > original
> > compare is
> > > never in a pattern.
> > >
> > > I'll respin with these changes.
> > 
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/115531
>   * tree-vect-patterns.cc (vect_cond_store_pattern_same_ref): New.
>   (vect_recog_cond_store_pattern): New.
>   (vect_vect_recog_func_ptrs): Use it.
>   * target.def (conditional_operation_is_expensive): New.
>   * doc/tm.texi: Regenerate.
>   * doc/tm.texi.in: Document it.
>   * targhooks.cc (default_conditional_operation_is_expensive): New.
>   * targhooks.h (default_conditional_operation_is_expensive): New.
>   * tree-vectorizer.h (may_be_nonaddressable_p): New.

It's declared in tree-ssa-loop-ivopts.h so just include that.

> 
> -- inline copy of patch --
> 
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 
> f10d9a59c6673a02823fc05132235af3a1ad7c65..c7535d07f4ddd16d55e0ab9b609a2bf95931a2f4
>  100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -6449,6 +6449,13 @@ The default implementation returns a 
> @code{MODE_VECTOR_INT} with the
>  same size and number of elements as @var{mode}, if such a mode exists.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} bool 
> TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE (unsigned @var{ifn})
> +This hook returns true if masked operation @var{ifn} (really of
> +type @code{internal_fn}) should be considered more expensive to use than
> +implementing the same operation without masking.  GCC can then try to use
> +unconditional operations instead with extra selects.
> +@end deftypefn
> +
>  @deftypefn {Target Hook} bool TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE 
> (unsigned @var{ifn})
>  This hook returns true if masked internal function @var{ifn} (really of
>  type @code{internal_fn}) should be considered expensive when the mask is
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 
> 24596eb2f6b4e9ea3ea3464fda171d99155f4c0f..64cea3b1edaf8ec818c0e8095ab50b00ae0cb857
>  100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -4290,6 +4290,8 @@ address;  but often a machine-dependent strategy can 
> generate better code.
>  
>  @hook TARGET_VECTORIZE_GET_MASK_MODE
>  
> +@hook TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE
> +
>  @hook TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE
>  
>  @hook TARGET_VECTORIZE_CREATE_COSTS
> diff --git a/gcc/target.def b/gcc/target.def
> index 
> ce4d1ecd58be0a1c8110c6993556a52a2c69168e..3de1aad4c84d3df0b171a411f97e1ce70b6f63b5
>  100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -2033,6 +2033,18 @@ same size and number of elements as @var{mode}, if 
> such a mode exists.",
>   (machine_mode mode),
>   default_get_mask_mode)
>  
> +/* Function to say whether a conditional operation is expensive when
> +   compared to non-masked operations.  */
> +DEFHOOK
> +(conditional_operation_is_expensive,
> + "This hook returns true if masked operation @var{ifn} (really of\n\
> +type @code{internal_fn}) should be considered more expensive to use than\n\
> +implementing the same operation without masking.  GCC can

[PATCH v2 05/11] aarch64: Define aarch64_get_{asm_|}isa_flags

2024-07-11 Thread Andrew Carlotti

Building an aarch64_feature_flags value from data within a gcc_options
or cl_target_option struct will get more complicated in a later commit.
Use a macro to avoid doing this manually in more than one location.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_handle_option): Use new macro.
* config/aarch64/aarch64.cc
(aarch64_override_options_internal): Ditto.
(aarch64_option_print): Ditto.
(aarch64_set_current_function): Ditto.
(aarch64_can_inline_p): Ditto.
(aarch64_declare_function_name): Ditto.
(aarch64_start_file): Ditto.
* config/aarch64/aarch64.h (aarch64_get_asm_isa_flags): New
(aarch64_get_isa_flags): New.
(aarch64_asm_isa_flags): Use new macro.
(aarch64_isa_flags): Ditto.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
951d041d3109b935e90a7cb5d714940414e81761..63c50189a09d5c7c713f57e23a8172f44bf6bec5
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -111,7 +111,7 @@ aarch64_handle_option (struct gcc_options *opts,
 
 case OPT_mgeneral_regs_only:
   opts->x_target_flags |= MASK_GENERAL_REGS_ONLY;
-  aarch64_set_asm_isa_flags (opts, opts->x_aarch64_asm_isa_flags);
+  aarch64_set_asm_isa_flags (opts, aarch64_get_asm_isa_flags (opts));
   return true;
 
 case OPT_mfix_cortex_a53_835769:
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
dfb244307635a7aa1c552acd55a635cd0bdeeb39..193f2486176b6bac372a143e2f52041c5a28ebaf
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -22,15 +22,18 @@
 #ifndef GCC_AARCH64_H
 #define GCC_AARCH64_H
 
+#define aarch64_get_asm_isa_flags(opts) \
+  (aarch64_feature_flags ((opts)->x_aarch64_asm_isa_flags))
+#define aarch64_get_isa_flags(opts) \
+  (aarch64_feature_flags ((opts)->x_aarch64_isa_flags))
+
 /* Make these flags read-only so that all uses go via
aarch64_set_asm_isa_flags.  */
 #ifndef GENERATOR_FILE
 #undef aarch64_asm_isa_flags
-#define aarch64_asm_isa_flags \
-  ((aarch64_feature_flags) global_options.x_aarch64_asm_isa_flags)
+#define aarch64_asm_isa_flags (aarch64_get_asm_isa_flags (&global_options))
 #undef aarch64_isa_flags
-#define aarch64_isa_flags \
-  ((aarch64_feature_flags) global_options.x_aarch64_isa_flags)
+#define aarch64_isa_flags (aarch64_get_isa_flags (&global_options))
 #endif
 
 /* Target CPU builtins.  */
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
726f769e708d09285a291472bda4babcc3241d00..69f481ecfc848e6c5b61516c2f7b8bff5cd4f8b8
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18331,10 +18331,11 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
   && !fixed_regs[R18_REGNUM])
 error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>");
 
-  if ((opts->x_aarch64_isa_flags & (AARCH64_FL_SM_ON | AARCH64_FL_ZA_ON))
-  && !(opts->x_aarch64_isa_flags & AARCH64_FL_SME))
+  aarch64_feature_flags isa_flags = aarch64_get_isa_flags (opts);
+  if ((isa_flags & (AARCH64_FL_SM_ON | AARCH64_FL_ZA_ON))
+  && !(isa_flags & AARCH64_FL_SME))
 {
-  if (opts->x_aarch64_isa_flags & AARCH64_FL_SM_ON)
+  if (isa_flags & AARCH64_FL_SM_ON)
error ("streaming functions require the ISA extension %qs", "sme");
   else
error ("functions with SME state require the ISA extension %qs",
@@ -18343,8 +18344,7 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
  " option %<-march%>, or by using the %"
  " attribute or pragma", "sme");
   opts->x_target_flags &= ~MASK_GENERAL_REGS_ONLY;
-  auto new_flags = (opts->x_aarch64_asm_isa_flags
-   | feature_deps::SME ().enable);
+  auto new_flags = isa_flags | feature_deps::SME ().enable;
   aarch64_set_asm_isa_flags (opts, new_flags);
 }
 
@@ -19038,9 +19038,9 @@ aarch64_option_print (FILE *file, int indent, struct 
cl_target_option *ptr)
   const struct processor *cpu
 = aarch64_get_tune_cpu (ptr->x_selected_tune);
   const struct processor *arch = aarch64_get_arch (ptr->x_selected_arch);
+  aarch64_feature_flags isa_flags = aarch64_get_asm_isa_flags(ptr);
   std::string extension
-= aarch64_get_extension_string_for_isa_flags (ptr->x_aarch64_asm_isa_flags,
- arch->flags);
+= aarch64_get_extension_string_for_isa_flags (isa_flags, arch->flags);
 
   fprintf (file, "%*sselected tune = %s\n", indent, "", cpu->name);
   fprintf (file, "%*sselected arch = %s%s\n", indent, "",
@@ -19100,7 +19100,7 @@ aarch64_set_current_function (tree fndecl)
   auto new_isa_mode = (fndecl
   ? aarch64_fndecl_isa_mode (fndecl)
   : AARCH64_DEFAULT_ISA_MODE);
-  auto isa_flags = TREE_TARGET_OPTION (new_tree)->x_a

[PATCH v2 11/11] aarch64: Extend aarch64_feature_flags to 128 bits

2024-07-11 Thread Andrew Carlotti

Replace the existing uint64_t typedef with a bbitmap<2> typedef.  Most
of the preparatory work was carried out in previous commits, so this
patch itself is fairly small.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_set_asm_isa_flags): Store a second uint64_t value.
* config/aarch64/aarch64-opts.h
(aarch64_feature_flags): Switch typedef to bbitmap<2>.
* config/aarch64/aarch64.cc
(aarch64_set_current_function): Extract isa mode from val[0].
* config/aarch64/aarch64.h
(aarch64_get_asm_isa_flags): Load a second uint64_t value.
(aarch64_get_isa_flags): Ditto.
(aarch64_asm_isa_flags): Ditto.
(aarch64_isa_flags): Ditto.
(HANDLE): Use bbitmap<2>::from_index to initialise flags.
(AARCH64_FL_ISA_MODES): Do arithmetic on integer type.
(AARCH64_ISA_MODE): Extract value from bbitmap<2> array.
* config/aarch64/aarch64.opt
(aarch64_asm_isa_flags_1): New variable.
(aarch64_isa_flags_1): Ditto.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
bd0770dd0d84005701afed35d4af356380a405e9..64b65b7ff9e4bf7c72bf0c5db6fa976a51fe9f32
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -67,15 +67,19 @@ static const struct default_options 
aarch_option_optimization_table[] =
   };
 
 
-/* Set OPTS->x_aarch64_asm_isa_flags_0 to FLAGS and update
-   OPTS->x_aarch64_isa_flags_0 accordingly.  */
+/* Set OPTS->x_aarch64_asm_isa_flags_<0..n> to FLAGS and update
+   OPTS->x_aarch64_isa_flags_<0..n> accordingly.  */
 void
 aarch64_set_asm_isa_flags (gcc_options *opts, aarch64_feature_flags flags)
 {
-  opts->x_aarch64_asm_isa_flags_0 = flags;
+  opts->x_aarch64_asm_isa_flags_0 = flags.val[0];
+  opts->x_aarch64_asm_isa_flags_1 = flags.val[1];
+
   if (opts->x_target_flags & MASK_GENERAL_REGS_ONLY)
 flags &= ~feature_deps::get_flags_off (AARCH64_FL_FP);
-  opts->x_aarch64_isa_flags_0 = flags;
+
+  opts->x_aarch64_isa_flags_0 = flags.val[0];
+  opts->x_aarch64_isa_flags_1 = flags.val[1];
 }
 
 /* Implement TARGET_HANDLE_OPTION.
diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 
2c36bfaad19b999238601d44709c280ef987046b..80ec1a05253da62b20eebb5e491f04c6da6851e7
 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -23,14 +23,16 @@
 #define GCC_AARCH64_OPTS_H
 
 #ifndef USED_FOR_TARGET
-typedef uint64_t aarch64_isa_mode;
+#include "bbitmap.h"
 
-typedef uint64_t aarch64_feature_flags;
+typedef uint64_t aarch64_isa_mode;
 
 constexpr unsigned int AARCH64_NUM_ISA_MODES = (0
 #define DEF_AARCH64_ISA_MODE(IDENT) + 1
 #include "aarch64-isa-modes.def"
 );
+
+typedef bbitmap<2> aarch64_feature_flags;
 #endif
 
 /* The various cores that implement AArch64.  */
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
902de6bd269d786d58248d1d2e1614217fa2..8056c33795738b779bc14697803da3eee04fe330
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -23,17 +23,21 @@
 #define GCC_AARCH64_H
 
 #define aarch64_get_asm_isa_flags(opts) \
-  (aarch64_feature_flags ((opts)->x_aarch64_asm_isa_flags_0))
+  (aarch64_feature_flags ((opts)->x_aarch64_asm_isa_flags_0, \
+ (opts)->x_aarch64_asm_isa_flags_1))
 #define aarch64_get_isa_flags(opts) \
-  (aarch64_feature_flags ((opts)->x_aarch64_isa_flags_0))
+  (aarch64_feature_flags ((opts)->x_aarch64_isa_flags_0, \
+ (opts)->x_aarch64_isa_flags_1))
 
 /* Make these flags read-only so that all uses go via
aarch64_set_asm_isa_flags.  */
 #ifdef GENERATOR_FILE
 #undef aarch64_asm_isa_flags
-#define aarch64_asm_isa_flags (aarch64_feature_flags (aarch64_asm_isa_flags_0))
+#define aarch64_asm_isa_flags (aarch64_feature_flags (aarch64_asm_isa_flags_0,\
+ aarch64_asm_isa_flags_1))
 #undef aarch64_isa_flags
-#define aarch64_isa_flags (aarch64_feature_flags (aarch64_isa_flags_0))
+#define aarch64_isa_flags (aarch64_feature_flags (aarch64_isa_flags_0, \
+ aarch64_isa_flags_1))
 #else
 #undef aarch64_asm_isa_flags
 #define aarch64_asm_isa_flags (aarch64_get_asm_isa_flags (&global_options))
@@ -167,8 +171,8 @@ enum class aarch64_feature : unsigned char {
 
 /* Define unique flags for each of the above.  */
 #define HANDLE(IDENT) \
-  constexpr auto AARCH64_FL_##IDENT \
-= aarch64_feature_flags (1) << int (aarch64_feature::IDENT);
+  constexpr auto AARCH64_FL_##IDENT ATTRIBUTE_UNUSED \
+= aarch64_feature_flags::from_index (int (aarch64_feature::IDENT));
 #define DEF_AARCH64_ISA_MODE(IDENT) HANDLE (IDENT)
 #define AARCH64_OPT_EXTENSION(A, IDENT, C, D, E, F) HANDLE (IDENT)
 #define AARCH64_ARCH(A, B, IDENT, D, E) HANDLE (IDENT)
@@ -191,7 +195,7 @@ constexpr auto AARCH64_ISA

[PATCH v2 10/11] Add new bbitmap class

2024-07-11 Thread Andrew Carlotti

This class provides a constant-size bitmap that can be used as almost a
drop-in replacement for bitmaps stored in integer types.  The
implementation is entirely within the header file and uses recursive
templated operations to support effective optimisation and usage in
constexpr expressions.

This initial implementation hardcodes the choice of uint64_t elements
for storage and initialisation, but this could instead be specified via
a second template parameter.

gcc/ChangeLog:

* bbitmap.h: New file.


diff --git a/gcc/bbitmap.h b/gcc/bbitmap.h
new file mode 100644
index 
..108ac1bf9e6042f5ae16988bc1688fe0045a3293
--- /dev/null
+++ b/gcc/bbitmap.h
@@ -0,0 +1,238 @@
+/* Functions to support fixed-length bitmaps.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef GCC_BBITMAP_H
+#define GCC_BBITMAP_H
+
+/* Implementation of bounded (fixed length) bitmaps.
+
+   This provides a drop-in replacement for bitmaps that have outgrown the
+   storage capacity of a single integer.
+
+   Sets are stored as a fixed length array of uint64_t elements.  The length of
+   this array is given as a template parameter.  */
+
+template
+struct bbitmap_operators
+{
+  template
+  static constexpr Result binary(Operator op, const Arg &x, const Arg &y,
+Rest ...rest)
+  {
+return bbitmap_operators::template binary
+  (op, x, y, op (x.val[M - 1], y.val[M - 1]), rest...);
+  }
+
+  template
+  static void compound(Operator op, Arg &x, const Arg &y)
+  {
+bbitmap_operators::template compound (op, x, y);
+x.val[M - 1] = op (x.val[M - 1], y.val[M - 1]);
+  }
+
+  template
+  static constexpr Result bit_not(const Arg &x, Rest ...rest)
+  {
+return bbitmap_operators::template bit_not
+  (x, ~(x.val[M - 1]), rest...);
+  }
+
+  template
+  static constexpr bool non_zero(const Arg &x)
+  {
+return (bool) x.val[M - 1]
+  || bbitmap_operators::template non_zero (x);
+  }
+
+  template
+  static constexpr bool equal(const Arg &x, const Arg &y)
+  {
+return x.val[M - 1] == y.val[M - 1]
+  && bbitmap_operators::template equal (x, y);
+  }
+
+#define SUB_INDEX(i) (i - (M - 1) * 64)
+#define IN_SUB_RANGE(i) (SUB_INDEX (i) >= 0 && SUB_INDEX (i) < 64)
+
+  template
+  static constexpr Result from_index(int index, Rest ...rest)
+  {
+return bbitmap_operators::template from_index
+  (index,
+   (IN_SUB_RANGE (index)
+   ? (uint64_t (1) << (IN_SUB_RANGE (index) ? SUB_INDEX (index) : 0))
+   : uint64_t (0)),
+   rest...);
+  }
+
+#undef IN_SUB_RANGE
+#undef SUB_INDEX
+
+};
+
+template<>
+struct bbitmap_operators<0>
+{
+  template
+  static constexpr Result binary(Operator, const Arg, const Arg,
+Rest ...rest)
+  {
+return Result { rest... };
+  }
+
+  template
+  static void compound(Operator, Arg, const Arg)
+  {
+return;
+  }
+
+  template
+  static constexpr Result bit_not(const Arg, Rest ...rest)
+  {
+return Result { rest... };
+  }
+
+  template
+  static constexpr bool non_zero(const Arg)
+  {
+return false;
+  }
+
+  template
+  static constexpr bool equal(const Arg, const Arg)
+  {
+return true;
+  }
+
+  template
+  static constexpr Result from_index(int, Rest ...rest)
+  {
+return Result { rest... };
+  }
+};
+
+template
+constexpr T bbitmap_element_or(T x, T y) { return x | y;}
+
+template
+constexpr T bbitmap_element_and(T x, T y) { return x & y;}
+
+template
+constexpr T bbitmap_element_xor(T x, T y) { return x ^ y;}
+
+
+
+template 
+class GTY((user)) bbitmap
+{
+public:
+  uint64_t val[N];
+
+  template
+  constexpr bbitmap(Rest ...rest) : val{(uint64_t) rest...} {}
+
+  constexpr bbitmap operator|(const bbitmap other) const
+{
+  return bbitmap_operators::template 
binary>(bbitmap_element_or,
+  *this, other);
+}
+
+  bbitmap operator|=(const bbitmap other)
+{
+  bbitmap_operators::template compound (bbitmap_element_or,
+ *this, other);
+  return this;
+}
+
+  constexpr bbitmap operator&(const bbitmap other) const
+{
+  return bbitmap_operators::template 
binary>(bbitmap_element_and,
+  *this, other);
+}
+
+

Re: [PATCH 2/2] RISC-V: Allow uninitialized preferred_else_value for RVV

2024-07-11 Thread Richard Biener

On Thu, Jul 11, 2024 at 2:13 PM YunQiang Su  wrote:
>
> From: YunQiang Su 
>
> PR target/115840.
>
> In riscv_preferred_else_value, we create an uninitialized tmp var
> for else value, instead of the 0 (as default_preferred_else_value)
> or the pre-exists VAR (as aarch64 does), so that we can use agnostic
> policy.
>
> The problem is that `warn_uninit` will emit a warning:
>   ({anonymous})’ may be used uninitialized
>
> Let's mark this tmp var as "allow_uninitialized".
>
> This problem is found when I try to build glibc with V extension.
>
> gcc
> PR target/115840.
> * config/riscv/riscv.cc(riscv_preferred_else_value): Mark
> tmp_var as allow_unitialized.
>
> gcc/testsuite
> * gcc.dg/vect/pr115840.c: New testcase.
> ---
>  gcc/config/riscv/riscv.cc|  6 +-
>  gcc/testsuite/gcc.dg/vect/pr115840.c | 11 +++
>  2 files changed, 16 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr115840.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 61fa74e9322..08159d7cbbc 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -11431,7 +11431,11 @@ riscv_preferred_else_value (unsigned ifn, tree 
> vectype, unsigned int nops,
> tree *ops)
>  {
>if (riscv_v_ext_mode_p (TYPE_MODE (vectype)))
> -return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
> +{
> +  tree tmp_var = create_tmp_var (vectype);
> +  TREE_ALLOW_UNINITIALIZED (tmp_var) = 1;

Does it work when you do

 TREE_NO_WARNING (tmp_var) = 1;

?

> +  return get_or_create_ssa_default_def (cfun, tmp_var);
> +}
>
>return default_preferred_else_value (ifn, vectype, nops, ops);
>  }
> diff --git a/gcc/testsuite/gcc.dg/vect/pr115840.c 
> b/gcc/testsuite/gcc.dg/vect/pr115840.c
> new file mode 100644
> index 000..09dc9e4eb7c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr115840.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Wall -Werror" } */
> +
> +double loads[16];
> +
> +void
> +foo (double loadavg[], int count)
> +{
> +  for (int i = 0; i < count; i++)
> +loadavg[i] = loads[i] / 1.5;
> +}
> --
> 2.45.1
>

Re: [PATCH 1/2] Add allow_uninitialized to tree_base.u.bits for VAR_DECL

2024-07-11 Thread Richard Biener

On Thu, Jul 11, 2024 at 2:14 PM YunQiang Su  wrote:
>
> From: YunQiang Su 
>
> Uninitialized internal temp variable may be useful in some case,
> such as for COND_LEN_MUL etc on RISC-V with V extension: If an
> const or pre-exists VAR is used, we have to use "undisturbed"
> policy; if an uninitialized VAR is used, we can use "agnostic".
> With "agnostic", the microarchitectures can omit copying part of
> the VAR.

No please, there's TREE_NO_WARNING already.

> gcc
> * tree-core.h(tree_base): Add u.bits.allow_uninitialized.
> * tree.h: Add new macro TREE_ALLOW_UNINITIALIZED.
> * tree-ssa-uninit.cc(warn_uninit): Don't warn if VAR is
> marked as allow_uninitialized.
> ---
>  gcc/tree-core.h| 5 -
>  gcc/tree-ssa-uninit.cc | 4 
>  gcc/tree.h | 4 
>  3 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index 27c569c7702..984201199f6 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1101,7 +1101,10 @@ struct GTY(()) tree_base {
>unsigned nameless_flag : 1;
>unsigned atomic_flag : 1;
>unsigned unavailable_flag : 1;
> -  unsigned spare0 : 2;
> +  /* Don't warn if uninitialized.  RISC-V V has tail agnostic/undisturbed
> +policy, which may be get benifits if we use an uninitialized var.  */
> +  unsigned allow_uninitialized : 1;
> +  unsigned spare0 : 1;
>
>unsigned spare1 : 8;
>
> diff --git a/gcc/tree-ssa-uninit.cc b/gcc/tree-ssa-uninit.cc
> index 726684e472a..12861e1dbc9 100644
> --- a/gcc/tree-ssa-uninit.cc
> +++ b/gcc/tree-ssa-uninit.cc
> @@ -142,6 +142,10 @@ warn_uninit (opt_code opt, tree t, tree var, gimple 
> *context,
>if (!has_undefined_value_p (t))
>  return;
>
> +  /* VAR may mark itself as allow_uninitialized.  */
> +  if (TREE_ALLOW_UNINITIALIZED (var))
> +return;
> +
>/* Ignore COMPLEX_EXPR as initializing only a part of a complex
>   turns in a COMPLEX_EXPR with the not initialized part being
>   set to its previous (undefined) value.  */
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 28e8e71b036..381780fde2e 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -3311,6 +3311,10 @@ extern void decl_fini_priority_insert (tree, 
> priority_type);
>  #define VAR_DECL_IS_VIRTUAL_OPERAND(NODE) \
>(VAR_DECL_CHECK (NODE)->base.u.bits.saturating_flag)
>
> +/* In a VAR_DECL, nonzero if NODE is allowed to be uninitialized.  */
> +#define TREE_ALLOW_UNINITIALIZED(NODE) \
> +  (VAR_DECL_CHECK (NODE)->base.u.bits.allow_uninitialized)
> +
>  /* In a VAR_DECL, nonzero if this is a non-local frame structure.  */
>  #define DECL_NONLOCAL_FRAME(NODE)  \
>(VAR_DECL_CHECK (NODE)->base.default_def_flag)
> --
> 2.45.1
>

[PATCH v2] RISC-V: NO_WARNING preferred else value for RVV

2024-07-11 Thread YunQiang Su

From: YunQiang Su 

PR target/115840.

In riscv_preferred_else_value, we create an uninitialized tmp var
for else value, instead of the 0 (as default_preferred_else_value)
or the pre-exists VAR (as aarch64 does), so that we can use agnostic
policy.

The problem is that `warn_uninit` will emit a warning:
  '({anonymous})' may be used uninitialized

Let's mark this tmp var as NO_WARNING.

This problem is found when I try to build glibc with V extension.

gcc
PR target/115840.
* config/riscv/riscv.cc(riscv_preferred_else_value): Mark
tmp_var as NO_WARNING.

gcc/testsuite
* gcc.dg/vect/pr115840.c: New testcase.
---
 gcc/config/riscv/riscv.cc|  6 +-
 gcc/testsuite/gcc.dg/vect/pr115840.c | 11 +++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr115840.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 61fa74e9322..276998a992b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11431,7 +11431,11 @@ riscv_preferred_else_value (unsigned ifn, tree 
vectype, unsigned int nops,
tree *ops)
 {
   if (riscv_v_ext_mode_p (TYPE_MODE (vectype)))
-return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
+{
+  tree tmp_var = create_tmp_var (vectype);
+  TREE_NO_WARNING (tmp_var) = 1;
+  return get_or_create_ssa_default_def (cfun, tmp_var);
+}
 
   return default_preferred_else_value (ifn, vectype, nops, ops);
 }
diff --git a/gcc/testsuite/gcc.dg/vect/pr115840.c 
b/gcc/testsuite/gcc.dg/vect/pr115840.c
new file mode 100644
index 000..09dc9e4eb7c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115840.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wall -Werror" } */
+
+double loads[16];
+
+void
+foo (double loadavg[], int count)
+{
+  for (int i = 0; i < count; i++)
+loadavg[i] = loads[i] / 1.5;
+}
-- 
2.45.1

Re: [PATCH 2/2] RISC-V: Allow uninitialized preferred_else_value for RVV

2024-07-11 Thread YunQiang Su

Richard Biener  于2024年7月11日周四 20:21写道：
>
> On Thu, Jul 11, 2024 at 2:13 PM YunQiang Su  wrote:
> >
> > From: YunQiang Su 
> >
> > PR target/115840.
> >
> > In riscv_preferred_else_value, we create an uninitialized tmp var
> > for else value, instead of the 0 (as default_preferred_else_value)
> > or the pre-exists VAR (as aarch64 does), so that we can use agnostic
> > policy.
> >
> > The problem is that `warn_uninit` will emit a warning:
> >   ({anonymous})’ may be used uninitialized
> >
> > Let's mark this tmp var as "allow_uninitialized".
> >
> > This problem is found when I try to build glibc with V extension.
> >
> > gcc
> > PR target/115840.
> > * config/riscv/riscv.cc(riscv_preferred_else_value): Mark
> > tmp_var as allow_unitialized.
> >
> > gcc/testsuite
> > * gcc.dg/vect/pr115840.c: New testcase.
> > ---
> >  gcc/config/riscv/riscv.cc|  6 +-
> >  gcc/testsuite/gcc.dg/vect/pr115840.c | 11 +++
> >  2 files changed, 16 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr115840.c
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 61fa74e9322..08159d7cbbc 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -11431,7 +11431,11 @@ riscv_preferred_else_value (unsigned ifn, tree 
> > vectype, unsigned int nops,
> > tree *ops)
> >  {
> >if (riscv_v_ext_mode_p (TYPE_MODE (vectype)))
> > -return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
> > +{
> > +  tree tmp_var = create_tmp_var (vectype);
> > +  TREE_ALLOW_UNINITIALIZED (tmp_var) = 1;
>
> Does it work when you do
>
>  TREE_NO_WARNING (tmp_var) = 1;
>

Thanks.  It works.  I did notice it, while I worried that there may be
some other
warnings, that TREE_NO_WARNING may cover them.

> ?
>
> > +  return get_or_create_ssa_default_def (cfun, tmp_var);
> > +}
> >
> >return default_preferred_else_value (ifn, vectype, nops, ops);
> >  }
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr115840.c 
> > b/gcc/testsuite/gcc.dg/vect/pr115840.c
> > new file mode 100644
> > index 000..09dc9e4eb7c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/pr115840.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-Wall -Werror" } */
> > +
> > +double loads[16];
> > +
> > +void
> > +foo (double loadavg[], int count)
> > +{
> > +  for (int i = 0; i < count; i++)
> > +loadavg[i] = loads[i] / 1.5;
> > +}
> > --
> > 2.45.1
> >

Re: [PATCH] fixincludes: skip stdio_stdarg_h on darwin

2024-07-11 Thread FX Coudert

Hi Iain,

Sorry about that, thanks for reverting. It appears to be a SDK version issue, 
so my analysis of the old SDK versions was incorrect. Could you try (when you 
get some time) the attached patch on one of the versions that was broken by my 
earlier patch? (darwin19 or darwin21).

FX



test.diff
Description: Binary data

Re: [PATCH 2/2] RISC-V: Allow uninitialized preferred_else_value for RVV

2024-07-11 Thread Richard Biener

On Thu, Jul 11, 2024 at 2:45 PM YunQiang Su  wrote:
>
> Richard Biener  于2024年7月11日周四 20:21写道：
> >
> > On Thu, Jul 11, 2024 at 2:13 PM YunQiang Su  wrote:
> > >
> > > From: YunQiang Su 
> > >
> > > PR target/115840.
> > >
> > > In riscv_preferred_else_value, we create an uninitialized tmp var
> > > for else value, instead of the 0 (as default_preferred_else_value)
> > > or the pre-exists VAR (as aarch64 does), so that we can use agnostic
> > > policy.
> > >
> > > The problem is that `warn_uninit` will emit a warning:
> > >   ({anonymous})’ may be used uninitialized
> > >
> > > Let's mark this tmp var as "allow_uninitialized".
> > >
> > > This problem is found when I try to build glibc with V extension.
> > >
> > > gcc
> > > PR target/115840.
> > > * config/riscv/riscv.cc(riscv_preferred_else_value): Mark
> > > tmp_var as allow_unitialized.
> > >
> > > gcc/testsuite
> > > * gcc.dg/vect/pr115840.c: New testcase.
> > > ---
> > >  gcc/config/riscv/riscv.cc|  6 +-
> > >  gcc/testsuite/gcc.dg/vect/pr115840.c | 11 +++
> > >  2 files changed, 16 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr115840.c
> > >
> > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > > index 61fa74e9322..08159d7cbbc 100644
> > > --- a/gcc/config/riscv/riscv.cc
> > > +++ b/gcc/config/riscv/riscv.cc
> > > @@ -11431,7 +11431,11 @@ riscv_preferred_else_value (unsigned ifn, tree 
> > > vectype, unsigned int nops,
> > > tree *ops)
> > >  {
> > >if (riscv_v_ext_mode_p (TYPE_MODE (vectype)))
> > > -return get_or_create_ssa_default_def (cfun, create_tmp_var 
> > > (vectype));
> > > +{
> > > +  tree tmp_var = create_tmp_var (vectype);
> > > +  TREE_ALLOW_UNINITIALIZED (tmp_var) = 1;
> >
> > Does it work when you do
> >
> >  TREE_NO_WARNING (tmp_var) = 1;
> >
>
> Thanks.  It works.  I did notice it, while I worried that there may be
> some other
> warnings, that TREE_NO_WARNING may cover them.

That's quite unlikely in this case but yes, TREE_NO_WARNING is a big hammer.

> > ?
> >
> > > +  return get_or_create_ssa_default_def (cfun, tmp_var);
> > > +}
> > >
> > >return default_preferred_else_value (ifn, vectype, nops, ops);
> > >  }
> > > diff --git a/gcc/testsuite/gcc.dg/vect/pr115840.c 
> > > b/gcc/testsuite/gcc.dg/vect/pr115840.c
> > > new file mode 100644
> > > index 000..09dc9e4eb7c
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/pr115840.c
> > > @@ -0,0 +1,11 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-additional-options "-Wall -Werror" } */
> > > +
> > > +double loads[16];
> > > +
> > > +void
> > > +foo (double loadavg[], int count)
> > > +{
> > > +  for (int i = 0; i < count; i++)
> > > +loadavg[i] = loads[i] / 1.5;
> > > +}
> > > --
> > > 2.45.1
> > >

[i386] adjust flag_omit_frame_pointer in a single function [PR113719] (was: Re: [PATCH] [i386] restore recompute to override opts after change [PR113719])

2024-07-11 Thread Alexandre Oliva

On Jul  4, 2024, Alexandre Oliva  wrote:

> On Jul  3, 2024, Rainer Orth  wrote:

> Hmm, I wonder if leaf frame pointer has to do with that.

It did, in a way.

The first two patches for PR113719 have each regressed
gcc.dg/ipa/iinline-attr.c on a different target.  The reason for this
instability is that there are competing flag_omit_frame_pointer
overriders on x86:

- ix86_recompute_optlev_based_flags computes and sets a
  -f[no-]omit-frame-pointer default depending on
  USE_IX86_FRAME_POINTER and, in 32-bit mode, optimize_size

- ix86_option_override_internal enables flag_omit_frame_pointer for
  -momit-leaf-frame-pointer to take effect

ix86_option_override[_internal] calls
ix86_recompute_optlev_based_flags before setting
flag_omit_frame_pointer.  It is called during global process_options.

But ix86_recompute_optlev_based_flags is also called by
parse_optimize_options, during attribute processing, and at that
point, ix86_option_override is not called, so the final overrider for
global options is not applied to the optimize attributes.  If they
differ, the testcase fails.

In order to fix this, we need to process all overriders of this option
whenever we process any of them.  Since this setting is affected by
optimization options, it makes sense to compute it in
parse_optimize_options, rather than in process_options.

Regstrapped on x86_64-linux-gnu.  Also verified that the regression is
cured with a i686-solaris cross compiler.  Ok to install?

for  gcc/ChangeLog

PR target/113719
* config/i386/i386-options.cc (ix86_option_override_internal):
Move flag_omit_frame_pointer final overrider...
(ix86_recompute_optlev_based_flags): ... here.
---
 gcc/config/i386/i386-options.cc |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 5824c0cb072eb..059ef3ae6ad44 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -1911,6 +1911,12 @@ ix86_recompute_optlev_based_flags (struct gcc_options 
*opts,
opts->x_flag_pcc_struct_return = DEFAULT_PCC_STRUCT_RETURN;
}
 }
+
+  /* Keep nonleaf frame pointers.  */
+  if (opts->x_flag_omit_frame_pointer)
+opts->x_target_flags &= ~MASK_OMIT_LEAF_FRAME_POINTER;
+  else if (TARGET_OMIT_LEAF_FRAME_POINTER_P (opts->x_target_flags))
+opts->x_flag_omit_frame_pointer = 1;
 }

 /* Implement part of TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook.  */
@@ -2590,12 +2596,6 @@ ix86_option_override_internal (bool main_args_p,
 opts->x_target_flags |= MASK_NO_RED_ZONE;
 }

-  /* Keep nonleaf frame pointers.  */
-  if (opts->x_flag_omit_frame_pointer)
-opts->x_target_flags &= ~MASK_OMIT_LEAF_FRAME_POINTER;
-  else if (TARGET_OMIT_LEAF_FRAME_POINTER_P (opts->x_target_flags))
-opts->x_flag_omit_frame_pointer = 1;
-
   /* If we're doing fast math, we don't care about comparison order
  wrt NaNs.  This lets us use a shorter comparison sequence.  */
   if (opts->x_flag_finite_math_only)

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

RE: [PATCH][ivopts]: perform affine fold on unsigned addressing modes known not to overflow. [PR114932]

2024-07-11 Thread Tamar Christina

 -Original Message-
> From: Richard Biener 
> Sent: Thursday, July 11, 2024 12:39 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; bin.ch...@linux.alibaba.com
> Subject: RE: [PATCH][ivopts]: perform affine fold on unsigned addressing modes
> known not to overflow. [PR114932]
> 
> On Wed, 10 Jul 2024, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Thursday, June 20, 2024 8:55 AM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ;
> bin.ch...@linux.alibaba.com
> > > Subject: RE: [PATCH][ivopts]: perform affine fold on unsigned addressing
> modes
> > > known not to overflow. [PR114932]
> > >
> > > On Wed, 19 Jun 2024, Tamar Christina wrote:
> > >
> > > > > -Original Message-
> > > > > From: Richard Biener 
> > > > > Sent: Wednesday, June 19, 2024 1:14 PM
> > > > > To: Tamar Christina 
> > > > > Cc: gcc-patches@gcc.gnu.org; nd ;
> > > bin.ch...@linux.alibaba.com
> > > > > Subject: Re: [PATCH][ivopts]: perform affine fold on unsigned 
> > > > > addressing
> > > modes
> > > > > known not to overflow. [PR114932]
> > > > >
> > > > > On Fri, 14 Jun 2024, Tamar Christina wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > When the patch for PR114074 was applied we saw a good boost in
> > > exchange2.
> > > > > >
> > > > > > This boost was partially caused by a simplification of the 
> > > > > > addressing
> modes.
> > > > > > With the patch applied IV opts saw the following form for the base
> > > addressing;
> > > > > >
> > > > > >   Base: (integer(kind=4) *) &block + ((sizetype) ((unsigned long) 
> > > > > > l0_19(D) *
> > > > > > 324) + 36)
> > > > > >
> > > > > > vs what we normally get:
> > > > > >
> > > > > >   Base: (integer(kind=4) *) &block + ((sizetype) ((integer(kind=8)) 
> > > > > > l0_19(D)
> > > > > > * 81) + 9) * 4
> > > > > >
> > > > > > This is because the patch promoted multiplies where one operand is a
> > > constant
> > > > > > from a signed multiply to an unsigned one, to attempt to fold away 
> > > > > > the
> > > constant.
> > > > > >
> > > > > > This patch attempts the same but due to the various problems with 
> > > > > > SCEV
> and
> > > > > > niters not being able to analyze the resulting forms (i.e. 
> > > > > > PR114322) we
> can't
> > > > > > do it during SCEV or in the general form like in fold-const like
> extract_muldiv
> > > > > > attempts.
> > > > > >
> > > > > > Instead this applies the simplification during IVopts 
> > > > > > initialization when we
> > > > > > create the IV.  Essentially when we know the IV won't overflow with
> regards
> > > to
> > > > > > niters then we perform an affine fold which gets it to simplify the 
> > > > > > internal
> > > > > > computation, even if this is signed because we know that for IVOPTs 
> > > > > > uses
> the
> > > > > > IV won't ever overflow.  This allows IV opts to see the simplified 
> > > > > > form
> > > > > > without influencing the rest of the compiler.
> > > > > >
> > > > > > as mentioned in PR114074 it would be good to fix the missed
> optimization in
> > > the
> > > > > > other passes so we can perform this in general.
> > > > > >
> > > > > > The reason this has a big impact on fortran code is that fortran 
> > > > > > doesn't
> seem
> > > to
> > > > > > have unsigned integer types.  As such all it's addressing are 
> > > > > > created with
> > > > > > signed types and folding does not happen on them due to the possible
> > > overflow.
> > > > > >
> > > > > > concretely on AArch64 this changes the results from generation:
> > > > > >
> > > > > > mov x27, -108
> > > > > > mov x24, -72
> > > > > > mov x23, -36
> > > > > > add x21, x1, x0, lsl 2
> > > > > > add x19, x20, x22
> > > > > > .L5:
> > > > > > add x0, x22, x19
> > > > > > add x19, x19, 324
> > > > > > ldr d1, [x0, x27]
> > > > > > add v1.2s, v1.2s, v15.2s
> > > > > > str d1, [x20, 216]
> > > > > > ldr d0, [x0, x24]
> > > > > > add v0.2s, v0.2s, v15.2s
> > > > > > str d0, [x20, 252]
> > > > > > ldr d31, [x0, x23]
> > > > > > add v31.2s, v31.2s, v15.2s
> > > > > > str d31, [x20, 288]
> > > > > > bl  digits_20_
> > > > > > cmp x21, x19
> > > > > > bne .L5
> > > > > >
> > > > > > into:
> > > > > >
> > > > > > .L5:
> > > > > > ldr d1, [x19, -108]
> > > > > > add v1.2s, v1.2s, v15.2s
> > > > > > str d1, [x20, 216]
> > > > > > ldr d0, [x19, -72]
> > > > > > add v0.2s, v0.2s, v15.2s
> > > > > > str d0, [x20, 252]
> > > > > > ldr d31, [x19, -36]
> > > > > > add x19, x19, 324
> > > > > > add v31.2s, v31.2s, v15.2s
> > > > > > str d31, [x20, 288]
> > > > > > bl  digits_20_
> > > > > > cmp x21, x19
> > > > > >

Re: [PATCH 3/3] RISC-V: load and store-lanes with SLP

2024-07-11 Thread Richard Biener

On Wed, 10 Jul 2024, Richard Sandiford wrote:

> Richard Biener  writes:
> > The following is a prototype for how to represent load/store-lanes
> > within SLP.  I've for now settled with having a single load node
> > with multiple permute nodes acting as selection, one for each loaded lane
> > and a single store node fed from all stored lanes.  For
> >
> >   for (int i = 0; i < 1024; ++i)
> > {
> >   a[2*i] = b[2*i] + 7;
> >   a[2*i+1] = b[2*i+1] * 3;
> > }
> >
> > you have the following SLP graph where I explain how things are set
> > up and code-generated:
> >
> > t.c:23:21: note:   SLP graph after lowering permutations:
> > t.c:23:21: note:   node 0x50dc8b0 (max_nunits=1, refcnt=1) vector(4) int
> > t.c:23:21: note:   op template: *_6 = _7;
> > t.c:23:21: note:stmt 0 *_6 = _7;
> > t.c:23:21: note:stmt 1 *_12 = _13;
> > t.c:23:21: note:children 0x50dc488 0x50dc6e8
> >
> > This is the store node, it's marked with ldst_lanes = true during
> > SLP discovery.  This node code-generates
> >
> >   vect_array.65[0] = vect__7.61_29;
> >   vect_array.65[1] = vect__13.62_28;
> >   MEM  [(int *)vectp_a.63_27] = .STORE_LANES (vect_array.65);
> >
> > ...
> > t.c:23:21: note:   node 0x50dc520 (max_nunits=4, refcnt=2) vector(4) int
> > t.c:23:21: note:   op: VEC_PERM_EXPR
> > t.c:23:21: note:stmt 0 _5 = *_4;
> > t.c:23:21: note:lane permutation { 0[0] }
> > t.c:23:21: note:children 0x50dc948
> > t.c:23:21: note:   node 0x50dc780 (max_nunits=4, refcnt=2) vector(4) int
> > t.c:23:21: note:   op: VEC_PERM_EXPR
> > t.c:23:21: note:stmt 0 _11 = *_10;
> > t.c:23:21: note:lane permutation { 0[1] }
> > t.c:23:21: note:children 0x50dc948
> >
> > These are the selection nodes, marked with ldst_lanes = true.
> > They code generate nothing.
> >
> > t.c:23:21: note:   node 0x50dc948 (max_nunits=4, refcnt=3) vector(4) int
> > t.c:23:21: note:   op template: _5 = *_4;
> > t.c:23:21: note:stmt 0 _5 = *_4;
> > t.c:23:21: note:stmt 1 _11 = *_10;
> > t.c:23:21: note:load permutation { 0 1 }
> >
> > This is the load node, marked with ldst_lanes = true (the load
> > permutation is only accurate when taking into account the lane permute
> > in the selection nodes).  It code generates
> >
> >   vect_array.58 = .LOAD_LANES (MEM  [(int *)vectp_b.56_33]);
> >   vect__5.59_31 = vect_array.58[0];
> >   vect__5.60_30 = vect_array.58[1];
> >
> > This scheme allows to leave code generation in vectorizable_load/store
> > mostly as-is.
> >
> > While this should support both load-lanes and (masked) store-lanes
> > the decision to do either is done during SLP discovery time and
> > cannot be reversed without altering the SLP tree - as-is the SLP
> > tree is not usable for non-store-lanes on the store side, the
> > load side is OK representation-wise but will very likely fail
> > permute handling as the lowering to deal with the two input vector
> > restriction isn't done - but of course since the permute node is
> > marked as to be ignored that doesn't work out.  So I've put
> > restrictions in place that fail vectorization if a load/store-lane
> > SLP tree is later classified differently by get_load_store_type.
> >
> > With this I've disabled the code scrapping SLP as it will no longer
> > fire.  I'll note that for example 
> > gcc.target/aarch64/sve/mask_struct_store_3.c
> > will not get SLP store-lanes used because the full store SLPs just
> > fine though we then fail to handle the "splat" load-permutation
> >
> > t2.c:5:21: note:   node 0x4db2630 (max_nunits=4, refcnt=2) vector([4,4]) int
> > t2.c:5:21: note:   op template: _6 = *_5;
> > t2.c:5:21: note:stmt 0 _6 = *_5;
> > t2.c:5:21: note:stmt 1 _6 = *_5;
> > t2.c:5:21: note:stmt 2 _6 = *_5;
> > t2.c:5:21: note:stmt 3 _6 = *_5;
> > t2.c:5:21: note:load permutation { 0 0 0 0 }
> >
> > the load permute lowering code currently doesn't consider it worth
> > lowering single loads from a group (or in this case not grouped loads).
> > The expectation is the target can handle this by two interleaves with
> > itself.
> >
> > So what we see here is that while the explicit SLP representation is
> > helpful in some cases, in cases like this it would require changing
> > it when we make decisions how to vectorize.  My idea is that this
> > all will change a lot when we re-do SLP discovery (for loops) and
> > when we get rid of non-SLP as I think vectorizable_* should be
> > allowed to alter the SLP graph during analysis.
> >
> > I'm not sure what's the best way forward - if we can decide to
> > live with (temporary) regressions in this area?  There is the possibility
> > to do the "non-SLP" mode by forcing single-lane discovery everywhere(?)
> > as a temporary measure.  Unfortunately this will alter the VF and thus
> > cannot be done on-the-fly per SLP instance I think (much like we cannot
> > currently cancel only one SLP instance without a full re-analysis).
>

Re: [PATCH] c++, coroutines, contracts: Handle coroutine and void functions [PR110871,PR110872,PR115434].

2024-07-11 Thread Alexandre Oliva

On Jul  9, 2024, Iain Sandoe  wrote:

>if (!gimple_seq_empty_p (n) && !gimple_seq_empty_p (e))
>{
> geh_else *stmt = gimple_build_eh_else (n, e);
> gimple_seq_add_stmt (&cleanup, stmt);
>   }

> Which essentially says “if either of the sub-expressions to this are empty, 
> then do not build it”
> Was there a reason for this, or is it a typo?

Most certainly a thinko :-(

Thanks for identifying it and for proposing a fix for it!

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH] [libstdc++] [testsuite] xfail 128bit from_chars on all aarch64--

2024-07-11 Thread Alexandre Oliva



Having observed failures of these two tests on yet another aarch64
operating system, and having concluded that the conditions that
trigger the problem ought to be present on all aarch64 targets, I'm
now matching any aarch64 target_os to enable the workaround.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 targeting
aarch64.  Ok to install?


for  libstdc++-v3/ChangeLog

* testsuite/20_util/from_chars/8.cc: Define SKIP_LONG_DOUBLE
on all aarch64-*-* targets.
* testsuite/20_util/to_chars/float128_c++23.cc: Xfail on all
aarch64-*-* targets.
---
 libstdc++-v3/testsuite/20_util/from_chars/8.cc |2 +-
 .../testsuite/20_util/to_chars/float128_c++23.cc   |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/testsuite/20_util/from_chars/8.cc 
b/libstdc++-v3/testsuite/20_util/from_chars/8.cc
index bacad89943b5f..e92b64349025e 100644
--- a/libstdc++-v3/testsuite/20_util/from_chars/8.cc
+++ b/libstdc++-v3/testsuite/20_util/from_chars/8.cc
@@ -17,7 +17,7 @@
 
 // { dg-do run { target c++23 } }
 // { dg-add-options ieee }
-// { dg-additional-options "-DSKIP_LONG_DOUBLE" { target aarch64-*-vxworks* 
aarch64-*-rtems* } }
+// { dg-additional-options "-DSKIP_LONG_DOUBLE" { target aarch64-*-* } }
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc 
b/libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc
index 6cb9cadcd2041..840131c1e5691 100644
--- a/libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc
+++ b/libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc
@@ -19,7 +19,7 @@
 // { dg-require-effective-target ieee_floats }
 // { dg-require-effective-target size32plus }
 // { dg-add-options ieee }
-// { dg-xfail-run-if "from_chars limited to double-precision" { 
aarch64-*-vxworks* aarch64-*-rtems* } }
+// { dg-xfail-run-if "from_chars limited to double-precision" { aarch64-*-* } }
 
 #include 
 #include 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH] [libstdc++] [testsuite] require dfprt on some tests

2024-07-11 Thread Alexandre Oliva



On a target that doesn't enable decimal float components in libgcc
(because the libc doens't define all required FE_* macros), but whose
compiler supports _Decimal* types, the effective target requirement
dfp passes, but several tests won't link because the runtime support
they depend on is missing.  State their dfprt requirement.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 targeting
aarch64.  Ok to install?


for  libstdc++-v3/ChangeLog

* testsuite/decimal/binary-arith.cc: Require dfprt.
* testsuite/decimal/comparison.cc: Likewise.
* testsuite/decimal/compound-assignment-memfunc.cc: Likewise.
* testsuite/decimal/make-decimal.cc: Likewise.
* testsuite/decimal/pr54036-1.cc: Likewise.
* testsuite/decimal/pr54036-2.cc: Likewise.
* testsuite/decimal/pr54036-3.cc: Likewise.
* testsuite/decimal/unary-arith.cc: Likewise.
---
 libstdc++-v3/testsuite/decimal/binary-arith.cc |2 +-
 libstdc++-v3/testsuite/decimal/comparison.cc   |2 +-
 .../decimal/compound-assignment-memfunc.cc |2 +-
 .../testsuite/decimal/compound-assignment.cc   |2 +-
 libstdc++-v3/testsuite/decimal/make-decimal.cc |2 +-
 libstdc++-v3/testsuite/decimal/pr54036-1.cc|2 +-
 libstdc++-v3/testsuite/decimal/pr54036-2.cc|2 +-
 libstdc++-v3/testsuite/decimal/pr54036-3.cc|2 +-
 libstdc++-v3/testsuite/decimal/unary-arith.cc  |2 +-
 9 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/libstdc++-v3/testsuite/decimal/binary-arith.cc 
b/libstdc++-v3/testsuite/decimal/binary-arith.cc
index c10a8b6466cb0..3eeed7ea97501 100644
--- a/libstdc++-v3/testsuite/decimal/binary-arith.cc
+++ b/libstdc++-v3/testsuite/decimal/binary-arith.cc
@@ -15,7 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-require-effective-target dfp }
+// { dg-require-effective-target dfprt }
 
 // ISO/IEC TR 24733  3.2.8  Binary arithmetic operators.
 
diff --git a/libstdc++-v3/testsuite/decimal/comparison.cc 
b/libstdc++-v3/testsuite/decimal/comparison.cc
index cf34c8d74badc..424dd8bd26659 100644
--- a/libstdc++-v3/testsuite/decimal/comparison.cc
+++ b/libstdc++-v3/testsuite/decimal/comparison.cc
@@ -15,7 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-require-effective-target dfp }
+// { dg-require-effective-target dfprt }
 
 // ISO/IEC TR 24733  3.2.9  Comparison operators.
 
diff --git a/libstdc++-v3/testsuite/decimal/compound-assignment-memfunc.cc 
b/libstdc++-v3/testsuite/decimal/compound-assignment-memfunc.cc
index 817d4bb10b1e9..d520af9a68d49 100644
--- a/libstdc++-v3/testsuite/decimal/compound-assignment-memfunc.cc
+++ b/libstdc++-v3/testsuite/decimal/compound-assignment-memfunc.cc
@@ -15,7 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-require-effective-target dfp }
+// { dg-require-effective-target dfprt }
 
 // ISO/IEC TR 24733  3.2.2.6  Compound assignment (decimal32).
 // ISO/IEC TR 24733  3.2.3.6  Compound assignment (decimal64).
diff --git a/libstdc++-v3/testsuite/decimal/compound-assignment.cc 
b/libstdc++-v3/testsuite/decimal/compound-assignment.cc
index 2d3e325856988..5aa87e78a739a 100644
--- a/libstdc++-v3/testsuite/decimal/compound-assignment.cc
+++ b/libstdc++-v3/testsuite/decimal/compound-assignment.cc
@@ -15,7 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-require-effective-target dfp }
+// { dg-require-effective-target dfprt }
 
 // ISO/IEC TR 24733  3.2.2.6  Compound assignment (decimal32).
 // ISO/IEC TR 24733  3.2.3.6  Compound assignment (decimal64).
diff --git a/libstdc++-v3/testsuite/decimal/make-decimal.cc 
b/libstdc++-v3/testsuite/decimal/make-decimal.cc
index aa75ac89d4792..560196cb305e1 100644
--- a/libstdc++-v3/testsuite/decimal/make-decimal.cc
+++ b/libstdc++-v3/testsuite/decimal/make-decimal.cc
@@ -15,7 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-require-effective-target dfp }
+// { dg-require-effective-target dfprt }
 // { dg-options "-Wno-pedantic" }
 
 // ISO/IEC TR 24733  3.2.5  Initialization from coefficient and exponent.
diff --git a/libstdc++-v3/testsuite/decimal/pr54036-1.cc 
b/libstdc++-v3/testsuite/decimal/pr54036-1.cc
index 508738701ca01..a07e4c351651c 100644
--- a/libstdc++-v3/testsuite/decimal/pr54036-1.cc
+++ b/libstdc++-v3/testsuite/decimal/pr54036-1.cc
@@ -15,7 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-require-effective-target dfp }
+// { dg-require-effective-target dfprt }
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/decimal/pr54036-2.cc 
b/libstdc++-v3/testsuite/decimal/pr54036-2.cc
index cb9e8c5932adb..e0a5797a25d49 100644
--- a/libstdc++-v3/testsuite

[PATCH] [libstdc++] [testsuite] avoid arbitrary errno codes

2024-07-11 Thread Alexandre Oliva



Passing an arbitrary error number to strerror* functions doesn't seem
to be portable; 19_diagnostics/system_error/cons-1.cc is hitting
runtime errors in the block that attempts to instantiate a
std:;system_error for error number 95.  The range of errno macros
defined on this target doesn't reach 95.

I'm changing the test to try to use a couple of select error codes,
falling back to a lower error number if neither are present.
Hopefully this doesn't change the nature of what is being tested for.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 targeting
aarch64.  Ok to install?


for  libstdc++-v3/ChangeLog

* testsuite/19_diagnostics/system_error/cons-1.cc: Use lower
error numbers, preferring some macro-defined ones.
---
 .../19_diagnostics/system_error/cons-1.cc  |   14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/testsuite/19_diagnostics/system_error/cons-1.cc 
b/libstdc++-v3/testsuite/19_diagnostics/system_error/cons-1.cc
index 16aa960b2ee28..e227c67542411 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/system_error/cons-1.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/system_error/cons-1.cc
@@ -37,8 +37,18 @@ int main()
 
   // 2
   {
-std::system_error err2(95, std::system_category(), s);
-VERIFY( err2.code() == std::error_code(95, std::system_category()) );
+int eno;
+#if defined EOPNOTSUPP
+eno = EOPNOTSUPP;
+#elif defined ENOSYS
+eno = ENOSYS;
+#else
+// strerror (used to combine with the given message) may fail if
+// the error number is out of range for the system.
+eno = 42;
+#endif
+std::system_error err2(eno, std::system_category(), s);
+VERIFY( err2.code() == std::error_code(eno, std::system_category()) );
 VERIFY( std::string((err2.what(), s)).find(s) != std::string::npos );
   }
 

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]

2024-07-11 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Thursday, July 11, 2024 1:10 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; bin.ch...@linux.alibaba.com
> Subject: RE: [PATCH][ivopts]: use affine_tree when comparing IVs during 
> candidate
> selection [PR114932]
> 
> On Wed, 10 Jul 2024, Tamar Christina wrote:
> 
> > > > > I might also point back to the idea I threw in somewhere, adding
> > > > > OEP_VALUE (or a better name) to the set of flags accepted by
> > > > > operand_equal_p.  You mentioned hashing IIRC but I don't see the 
> > > > > patches
> > > > > touching hashing?
> > > > >
> > > >
> > > > Yes, That can indeed be done with this approach.  The hashing was that
> before I
> > > > was trying to prevent the "duplicate" IV expressions from being created 
> > > > in the
> > > > first place by modifying get_loop_invariant_expr.
> > > >
> > > > This function looks up if we have already seen a particular IV 
> > > > expression and if
> > > > we have it just returns that expression.  However after reading more of 
> > > > the
> code
> > > > I realized this wasn't the right approach, as without also dealing with 
> > > > the
> > > candidates
> > > > we'd end up creating IV expression that can't be handled by any 
> > > > candidate.
> > > >
> > > > IVops would just give up then.   Reading the code it seems that
> > > get_loop_invariant_expr
> > > > is just there to prevent blatant duplicates.  i.e. it treats `(signed) 
> > > > a` and `a` as
> the
> > > same.
> > > >
> > > > This is also why I think that everywhere else *has* to continue 
> > > > stripping the
> > > expression.
> > > >
> > > > On a note from Richard S that he thought IVopts already had some code to
> deal
> > > with
> > > > expressions that differ only in signs led me to take a different 
> > > > approach.
> > > >
> > > > The goal wasn't to remove the different sign/unsigned IV expressions, 
> > > > but
> > > instead get
> > > > Then to be servable by the same candidates. i.e. we want them in the 
> > > > same
> > > candidate
> > > > groups and then candidate optimization will just do its thing.
> > > >
> > > > That seemed a more natural fit to how it worked.
> > >
> > > Yeah, I agree that sounds like the better strathegy.
> > >
> > > > Do you want me to try the operand_equal_p approach? Though in this case
> the
> > > issue
> > > > is we not only need to know they're equal, but also need to know the 
> > > > scale
> > > factor.
> > >
> > > For this case yes, but if you'd keep the code as-is, the equal with scale
> > > factor one case would be fixed.  Not a case with different scale factors
> > > though - but conversions "elsewhere" should be handled via the stripping.
> > > So it would work to simply adjust the operand_equal_p check here?
> > >
> > > > get_computation_aff_1 scales the common type IV by the scale we
> determined,
> > > > so I think operand_equal_p would not be very useful here.  But it does 
> > > > look
> like
> > > > constant_multiple_of can just be implemented with
> > > aff_combination_constant_multiple_p.
> > > >
> > > > Should I try?
> > >
> > > You've had the other places where you replace operand_equal_p with
> > > affine-compute and compare.  As said that has some associated cost
> > > as well as a limit on the number of elements after which it resorts
> > > back to operand_equal_p.  So for strict equality tests implementing
> > > a weaker operand_equal_p might be a better solution.
> > >
> >
> > The structural comparison is implemented as a new mode for operand_equal_p
> which
> > compares two expressions ignoring NOP converts (unless their bitsizes 
> > differ)
> > and ignoring constant values, but requiring both operands be a constant.
> >
> > There is one downside compared to affine comparison, in that this approach
> does
> > not deal well with commutative operations. i.e. it does not see a + (b + c) 
> > as
> > equivalent to c + (b + a).
> >
> > This means we lose out on some of the more complicated addressing modes, but
> > with so many operations the address will likely be split anyway and we'll 
> > deal
> > with it then.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > x86_64-pc-linux-gnu -m32, -m64 and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/114932
> > * fold-const.cc (operand_compare::operand_equal_p): Use it.
> > (operand_compare::verify_hash_value): Likewise.
> > * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
> > * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/114932
> > * gfortran.dg/addressing-modes_2.f90: New test.
> >
> > -- inline copy of --
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index
> 710d697c0217c784b34f9f9f7b00b1945369076a..3d43020541c082c09416472
> 4da9d17fbb5793237 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
>

[PATCH] [analyzer] [testsuite] avoid unexpected null dereference warning

2024-07-11 Thread Alexandre Oliva



The analyzer testsuite, on a customer's own operating system, reports
a potential NULL pointer dereference in flex-without-call-summaries.c.
I'm not sure why it shows up on that system, but not on others, but
the test is not meant to test for that warning, so I'm silencing it.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 targeting
aarch64.  Ok to install?


for  gcc/testsuite/ChangeLog

* c-c++-common/analyzer/flex-without-call-summaries.c: Disable
null dereference analyzer warnings.
---
 .../analyzer/flex-without-call-summaries.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c 
b/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c
index c6ecb25d25d59..1aad2bc896b7e 100644
--- a/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c
+++ b/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c
@@ -3,6 +3,7 @@
 
 /* { dg-additional-options "-fno-analyzer-call-summaries" } */
 
+/* { dg-additional-options "-Wno-analyzer-null-dereference" } */
 /* { dg-additional-options "-Wno-analyzer-too-complex" } */
 /* { dg-additional-options "-D_POSIX_SOURCE" } */
 

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [Fortran, Patch, PR 96992, V4] Fix Class arrays of different ranks are rejected as storage association argument

2024-07-11 Thread Andre Vehreschild

Hi Richard,

bootstrap finally passed and the fix is now merged as
gcc-15-1971-gb9513c6746b

Thanks for your patience.

- Andre


On Thu, 11 Jul 2024 14:01:02 +0200
Richard Biener  wrote:

> On Thu, Jul 11, 2024 at 11:24 AM Andre Vehreschild 
> wrote:
> >
> > Hi Richard,
> >
> > would that be sufficient? Bootstrap is still running for me...  
> 
> Yes.
> 
> Richard.
> 
> > From c30c2cf829a094ba5e4c2c31333bed6e8c0d32af Mon Sep 17 00:00:00
> > 2001 From: Andre Vehreschild 
> > Date: Thu, 11 Jul 2024 11:21:04 +0200
> > Subject: [PATCH] [Fortran] Fix bootstrap broken by
> > gcc-15-1965-ge4f2f46e015
> >
> > gcc/fortran/ChangeLog:
> >
> > * trans-array.cc (gfc_conv_array_parameter): Init variable
> > to NULL_TREE to fix bootstrap.
> > ---
> >  gcc/fortran/trans-array.cc | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
> > index 0fffa07495c..5558ab69969 100644
> > --- a/gcc/fortran/trans-array.cc
> > +++ b/gcc/fortran/trans-array.cc
> > @@ -8750,7 +8750,7 @@ gfc_conv_array_parameter (gfc_se *se,
> > gfc_expr *expr, bool g77, tree stmt;
> >tree parent = DECL_CONTEXT (current_function_decl);
> >tree ctree;
> > -  tree pack_attr;
> > +  tree pack_attr = NULL_TREE; /* Set when packing class arrays.  */
> >bool full_array_var;
> >bool this_array_result;
> >bool contiguous;
> > --
> > 2.45.2
> >
> > Sorry for the breakage.
> >
> > Regards,
> > Andre
> >
> > On Thu, 11 Jul 2024 11:06:47 +0200
> > Richard Biener  wrote:
> >  
> > > On Thu, Jul 11, 2024 at 11:04 AM Andre Vehreschild 
> > > wrote:  
> > > >
> > > > Hi Richard,
> > > >
> > > > I am sorry to hear that. Shall I revert?  
> > >
> > > I would suggest to instead fix by initializing the variable with
> > > NULL (and a comment).
> > >  
> > > > - Andre
> > > > On Thu, 11 Jul 2024 10:57:48 +0200
> > > > Richard Biener  wrote:
> > > >  
> > > > > On Thu, Jul 11, 2024 at 10:54 AM Richard Biener
> > > > >  wrote:  
> > > > > >
> > > > > > On Thu, Jul 11, 2024 at 10:04 AM Andre Vehreschild
> > > > > >  wrote:  
> > > > > > >
> > > > > > > Hi Harald,
> > > > > > >
> > > > > > > thank you very much for ok'ing this large patch. Merged as
> > > > > > > gcc-15-1965-ge4f2f46e015
> > > > > > >
> > > > > > > Looking forward to get (no) bug reports ;-)  
> > > > > >
> > > > > > This seems to break bootstrap with
> > > > > >
> > > > > > ../../gcc/gcc/fortran/trans-array.cc: In function ‘void
> > > > > > gfc_conv_array_paramete (gfc_se*, gfc_expr*, bool, const
> > > > > > gfc_symbol*, const char*, tree_node**, tree_node**,
> > > > > > tree_node**)’: ../../gcc/gcc/fortran/trans-array.cc:9135:41:
> > > > > > error: ‘pack_attr’ may be used uninitialized
> > > > > > [-Werror=maybe-uninitialized] 9135 |   tmp =
> > > > > > build_call_expr_loc (input_location, |
> > > > > > ^~~~ 9136 |
> > > > > > gfor_fndecl_in_unpack_class, 4, tmp,
> > > > > >   |
> > > > > > 
> > > > > >  9137 |  packedptr,
> > > > > >   |  ~~
> > > > > >  9138 |
> > > > > > size_in_bytes (TREE_TYPE (ctree)),
> > > > > >   |
> > > > > > ~~
> > > > > >  9139 |  pack_attr);
> > > > > >   |  ~~
> > > > > > ../../gcc/gcc/fortran/trans-array.cc:8665:8: note:
> > > > > > ‘pack_attr’ was declared here 8665 |   tree pack_attr;
> > > > > >   |^
> > > > > > cc1plus: all warnings being treated as errors
> > > > > > make[3]: *** [Makefile:1198: fortran/trans-array.o] Error 1
> > > > > >  
> > > > >
> > > > > It seems to be a false positive but GCCs little mind is too
> > > > > weak to prove that (yes, we error on the side of emitting a
> > > > > diagnostic if we can't prove it's initialized)
> > > > >
> > > > > Richard.
> > > > >  
> > > > > >  
> > > > > > > Thanks again,
> > > > > > >
> > > > > > > Andre
> > > > > > >
> > > > > > > On Wed, 10 Jul 2024 20:52:37 +0200
> > > > > > > Harald Anlauf  wrote:
> > > > > > >  
> > > > > > > > Hi Andre,
> > > > > > > >
> > > > > > > > Am 10.07.24 um 10:45 schrieb Andre Vehreschild:  
> > > > > > > > > Hi Harald,
> > > > > > > > >
> > > > > > > > > thanks for the review. I totally agree, that this
> > > > > > > > > patch has gotten bigger than I expected (and wanted).
> > > > > > > > > But things are as they are.
> > > > > > > > >
> > > > > > > > > About the coding style: I have worked in so many
> > > > > > > > > projects, that I consider a consistent coding style
> > > > > > > > > luxury. I esp. do not have my own one anymore. The
> > > > > > > > > formating you are seeing in my patches is the result
> > > > > > > > > of clang-format with the provided parameter file in
> > > > > > > > > contrib/clang-format. I was happy to have a tool to
> > > > >

Re: [PATCH 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns.

2024-07-11 Thread Richard Sandiford

Kyrylo Tkachov  writes:
> Hi Victor,
>
>> On 10 Jul 2024, at 16:05, Victor Do Nascimento  
>> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> Given recent changes to the dot_prod standard pattern name, this patch
>> fixes the aarch64 back-end by implementing the following changes:
>> 
>> 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
>> 2. Rewrite initialization and function expansion mechanism for simd
>> builtins.
>> 3. Fix all direct calls to back-end `dot_prod' patterns in SVE
>> builtins.
>> 
>> Finally, given that it is now possible for the compiler to
>> differentiate between the two- and four-way dot product, we add a test
>> to ensure that autovectorization picks up on dot-product patterns
>> where the result is twice the width of the operands.
>> 
>> gcc/ChangeLog:
>> 
>>* config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
>>New AARCH64_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI,
>>UDOTV8QI, UDOTV16QI, USDOTV8QI, USDOTV16QI.
>>(aarch64_init_builtin_dotprod_functions): New.
>>(aarch64_init_simd_builtins): Add call to
>>`aarch64_init_builtin_dotprod_functions'.
>>(aarch64_general_gimple_fold_builtin): Add DOT_PROD_EXPR
>>handling.
>>* config/aarch64/aarch64-simd-builtins.def: Remove macro
>>expansion-based initialization and expansion
>>of (u|s|us)dot_prod builtins.
>>* config/aarch64/aarch64-simd.md
>>(dot_prod): Deleted.
>>(dot_prod): New.
>>(usdot_prod): Deleted.
>>(usdot_prod): New.
>>(sadv16qi): Adjust call to gen_udot_prod take second mode.
>>(popcount): fix use of `udot_prod_optab'.
>>* config/aarch64/aarch64-sve-builtins-base.cc
>>(svdot_impl::expand): s/direct/convert/ in
>>`convert_optab_handler_for_sign' function call.
>>(svusdot_impl::expand): add second mode argument in call to
>>`code_for_dot_prod'.
>>* config/aarch64/aarch64-sve-builtins.cc
>>(function_expander::convert_optab_handler_for_sign): New class
>>method.
>>* config/aarch64/aarch64-sve-builtins.h
>>(class function_expander): Add prototype for new
>>`convert_optab_handler_for_sign' method.
>>* gcc/config/aarch64/aarch64-sve.md
>>(dot_prod): Deleted.
>>(dot_prod): New.
>>(@dot_prod): Deleted.
>>(@dot_prod): New.
>>(sad): Adjust call to gen_udot_prod take second mode.
>>* gcc/config/aarch64/aarch64-sve2.md
>>(@aarch64_sve_dotvnx4sivnx8hi): Deleted.
>>(dot_prodvnx4sivnx8hi): New.
>> 
>> gcc/testsuite/ChangeLog:
>>* gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.
>> ---
>> gcc/config/aarch64/aarch64-builtins.cc| 71 +++
>> gcc/config/aarch64/aarch64-simd-builtins.def  |  4 --
>> gcc/config/aarch64/aarch64-simd.md|  9 +--
>> .../aarch64/aarch64-sve-builtins-base.cc  | 13 ++--
>> gcc/config/aarch64/aarch64-sve-builtins.cc| 17 +
>> gcc/config/aarch64/aarch64-sve-builtins.h |  3 +
>> gcc/config/aarch64/aarch64-sve.md |  6 +-
>> gcc/config/aarch64/aarch64-sve2.md|  2 +-
>> gcc/config/aarch64/iterators.md   |  1 +
>> .../aarch64/sme/vect-dotprod-twoway.c | 25 +++
>> 10 files changed, 133 insertions(+), 18 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
>> 
>> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
>> b/gcc/config/aarch64/aarch64-builtins.cc
>> index 30669f8aa18..6c7c86d0e6e 100644
>> --- a/gcc/config/aarch64/aarch64-builtins.cc
>> +++ b/gcc/config/aarch64/aarch64-builtins.cc
>> @@ -783,6 +783,12 @@ enum aarch64_builtins
>>   AARCH64_SIMD_PATTERN_START = AARCH64_SIMD_BUILTIN_LANE_CHECK + 1,
>>   AARCH64_SIMD_BUILTIN_MAX = AARCH64_SIMD_PATTERN_START
>>  + ARRAY_SIZE (aarch64_simd_builtin_data) - 1,
>> +  AARCH64_BUILTIN_SDOTV8QI,
>> +  AARCH64_BUILTIN_SDOTV16QI,
>> +  AARCH64_BUILTIN_UDOTV8QI,
>> +  AARCH64_BUILTIN_UDOTV16QI,
>> +  AARCH64_BUILTIN_USDOTV8QI,
>> +  AARCH64_BUILTIN_USDOTV16QI,
>>   AARCH64_CRC32_BUILTIN_BASE,
>>   AARCH64_CRC32_BUILTINS
>>   AARCH64_CRC32_BUILTIN_MAX,
>> @@ -1642,6 +1648,60 @@ handle_arm_neon_h (void)
>>   aarch64_init_simd_intrinsics ();
>> }
>> 
>> +void
>> +aarch64_init_builtin_dotprod_functions (void)
>> +{
>> +  tree fndecl = NULL;
>> +  tree ftype = NULL;
>> +
>> +  tree uv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_unsigned);
>> +  tree sv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_none);
>> +  tree uv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_unsigned);
>> +  tree sv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_none);
>> +  tree uv2si = aarch64_simd_builtin_type (V2SImode, qualifier_unsigned);
>> +  tree sv2si = aarch64_simd_builtin_type (V2SImode, qualifier_none);
>> +  tree uv4si = aarch64_simd_built

[Patch, Fortran, PR88624, v1] Fix Rejects allocatable coarray passed as a dummy argument

2024-07-11 Thread Andre Vehreschild

Hi all,

attached patch fixes using of coarrays as dummy arguments. The coarray
dummy argument was not dereferenced correctly, which is fixed now.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline.

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gcc dot gnu dot org
From 374ab1eec7621136de2d9f642b8abf13de197a41 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Thu, 11 Jul 2024 10:07:12 +0200
Subject: [PATCH] [Fortran] Fix Rejects allocatable coarray passed as a dummy
 argument [88624]

Coarray parameters of procedures/functions need to be dereffed, because
they are references to the descriptor but the routine expected the
descriptor directly.

	PR fortran/88624

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_conv_procedure_call): Treat
	pointers/references (e.g. from parameters) correctly by derefing
	them.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/dummy_1.f90: Add calling function trough
	function.
---
 gcc/fortran/trans-expr.cc | 35 +--
 gcc/testsuite/gfortran.dg/coarray/dummy_1.f90 |  2 ++
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 60495f199dc..0eba029a67a 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -7797,16 +7797,26 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 		   && CLASS_DATA (fsym)->attr.codimension
 		   && !CLASS_DATA (fsym)->attr.allocatable)))
 	{
-	  tree caf_decl, caf_type;
+	  tree caf_decl, caf_type, caf_desc = NULL_TREE;
 	  tree offset, tmp2;

 	  caf_decl = gfc_get_tree_for_caf_expr (e);
 	  caf_type = TREE_TYPE (caf_decl);
-
-	  if (GFC_DESCRIPTOR_TYPE_P (caf_type)
-	  && (GFC_TYPE_ARRAY_AKIND (caf_type) == GFC_ARRAY_ALLOCATABLE
-		  || GFC_TYPE_ARRAY_AKIND (caf_type) == GFC_ARRAY_POINTER))
-	tmp = gfc_conv_descriptor_token (caf_decl);
+	  if (POINTER_TYPE_P (caf_type)
+	  && GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (caf_type)))
+	caf_desc = TREE_TYPE (caf_type);
+	  else if (GFC_DESCRIPTOR_TYPE_P (caf_type))
+	caf_desc = caf_type;
+
+	  if (caf_desc
+	  && (GFC_TYPE_ARRAY_AKIND (caf_desc) == GFC_ARRAY_ALLOCATABLE
+		  || GFC_TYPE_ARRAY_AKIND (caf_desc) == GFC_ARRAY_POINTER))
+	{
+	  tmp = POINTER_TYPE_P (TREE_TYPE (caf_decl))
+		  ? build_fold_indirect_ref (caf_decl)
+		  : caf_decl;
+	  tmp = gfc_conv_descriptor_token (tmp);
+	}
 	  else if (DECL_LANG_SPECIFIC (caf_decl)
 		   && GFC_DECL_TOKEN (caf_decl) != NULL_TREE)
 	tmp = GFC_DECL_TOKEN (caf_decl);
@@ -7819,8 +7829,8 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,

 	  vec_safe_push (stringargs, tmp);

-	  if (GFC_DESCRIPTOR_TYPE_P (caf_type)
-	  && GFC_TYPE_ARRAY_AKIND (caf_type) == GFC_ARRAY_ALLOCATABLE)
+	  if (caf_desc
+	  && GFC_TYPE_ARRAY_AKIND (caf_desc) == GFC_ARRAY_ALLOCATABLE)
 	offset = build_int_cst (gfc_array_index_type, 0);
 	  else if (DECL_LANG_SPECIFIC (caf_decl)
 		   && GFC_DECL_CAF_OFFSET (caf_decl) != NULL_TREE)
@@ -7830,8 +7840,13 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	  else
 	offset = build_int_cst (gfc_array_index_type, 0);

-	  if (GFC_DESCRIPTOR_TYPE_P (caf_type))
-	tmp = gfc_conv_descriptor_data_get (caf_decl);
+	  if (caf_desc)
+	{
+	  tmp = POINTER_TYPE_P (TREE_TYPE (caf_decl))
+		  ? build_fold_indirect_ref (caf_decl)
+		  : caf_decl;
+	  tmp = gfc_conv_descriptor_data_get (tmp);
+	}
 	  else
 	{
 	  gcc_assert (POINTER_TYPE_P (caf_type));
diff --git a/gcc/testsuite/gfortran.dg/coarray/dummy_1.f90 b/gcc/testsuite/gfortran.dg/coarray/dummy_1.f90
index 33e95853ad4..c437b2a10fc 100644
--- a/gcc/testsuite/gfortran.dg/coarray/dummy_1.f90
+++ b/gcc/testsuite/gfortran.dg/coarray/dummy_1.f90
@@ -66,5 +66,7 @@
 if (lcobound(A, dim=1) /= 2) STOP 13
 if (ucobound(A, dim=1) /= 3) STOP 14
 if (lcobound(A, dim=2) /= 5) STOP 15
+
+call sub4(A)  ! Check PR88624 is fixed.
   end subroutine sub5
   end
--
2.45.2

Re: [PATCH v2] RISC-V: NO_WARNING preferred else value for RVV

2024-07-11 Thread Kito Cheng

Lgtm, thanks :)

YunQiang Su 於 2024年7月11日 週四，20:45寫道：

> From: YunQiang Su 
>
> PR target/115840.
>
> In riscv_preferred_else_value, we create an uninitialized tmp var
> for else value, instead of the 0 (as default_preferred_else_value)
> or the pre-exists VAR (as aarch64 does), so that we can use agnostic
> policy.
>
> The problem is that `warn_uninit` will emit a warning:
>   '({anonymous})' may be used uninitialized
>
> Let's mark this tmp var as NO_WARNING.
>
> This problem is found when I try to build glibc with V extension.
>
> gcc
> PR target/115840.
> * config/riscv/riscv.cc(riscv_preferred_else_value): Mark
> tmp_var as NO_WARNING.
>
> gcc/testsuite
> * gcc.dg/vect/pr115840.c: New testcase.
> ---
>  gcc/config/riscv/riscv.cc|  6 +-
>  gcc/testsuite/gcc.dg/vect/pr115840.c | 11 +++
>  2 files changed, 16 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr115840.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 61fa74e9322..276998a992b 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -11431,7 +11431,11 @@ riscv_preferred_else_value (unsigned ifn, tree
> vectype, unsigned int nops,
> tree *ops)
>  {
>if (riscv_v_ext_mode_p (TYPE_MODE (vectype)))
> -return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
> +{
> +  tree tmp_var = create_tmp_var (vectype);
> +  TREE_NO_WARNING (tmp_var) = 1;
> +  return get_or_create_ssa_default_def (cfun, tmp_var);
> +}
>
>return default_preferred_else_value (ifn, vectype, nops, ops);
>  }
> diff --git a/gcc/testsuite/gcc.dg/vect/pr115840.c
> b/gcc/testsuite/gcc.dg/vect/pr115840.c
> new file mode 100644
> index 000..09dc9e4eb7c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr115840.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Wall -Werror" } */
> +
> +double loads[16];
> +
> +void
> +foo (double loadavg[], int count)
> +{
> +  for (int i = 0; i < count; i++)
> +loadavg[i] = loads[i] / 1.5;
> +}
> --
> 2.45.1
>
>

[Patch, Fortran, PR84244, v1] Fix ICE in recompute_tree_invariant_for_addr_expr, at tree.c:4535

2024-07-11 Thread Andre Vehreschild

Hi all,

the attached patch fixes a segfault in the compiler, where for pointer
components of a derived type the caf_token in the component was not
set, when the derived was previously used outside of a coarray.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gcc dot gnu dot org
From 88f209316a980fbe78423d6aba747bb6b7fd404f Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Thu, 11 Jul 2024 15:44:56 +0200
Subject: [PATCH] [Fortran] Fix ICE in recompute_tree_invariant_for_addr_expr,
 at tree.c:4535 [PR84244]

Declaring an unused function with a derived type having a pointer
component and using that derived type as a coarray, lead the compiler to
ICE because the caf_token for the pointer was not linked into the
component correctly.

	PR fortran/84244

gcc/fortran/ChangeLog:

	* trans-types.cc (gfc_get_derived_type): When a caf_sub_token is
	generated for a component, link it to the component it is
	generated for (the previous one).

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/ptr_comp_5.f08: New test.
---
 gcc/fortran/trans-types.cc|  6 +-
 .../gfortran.dg/coarray/ptr_comp_5.f08| 19 +++
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/ptr_comp_5.f08

diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index c76cdca4eae..83c0708ccbd 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -2647,7 +2647,7 @@ gfc_get_derived_type (gfc_symbol * derived, int codimen)
   tree *chain = NULL;
   bool got_canonical = false;
   bool unlimited_entity = false;
-  gfc_component *c;
+  gfc_component *c, *last_c = nullptr;
   gfc_namespace *ns;
   tree tmp;
   bool coarray_flag, class_coarray_flag;
@@ -2947,10 +2947,14 @@ gfc_get_derived_type (gfc_symbol * derived, int codimen)
 	 types.  */
   if (class_coarray_flag || !c->backend_decl)
 	c->backend_decl = field;
+  if (c->attr.caf_token && last_c)
+	last_c->caf_token = field;

   if (c->attr.pointer && (c->attr.dimension || c->attr.codimension)
 	  && !(c->ts.type == BT_DERIVED && strcmp (c->name, "_data") == 0))
 	GFC_DECL_PTR_ARRAY_P (c->backend_decl) = 1;
+
+  last_c = c;
 }

   /* Now lay out the derived type, including the fields.  */
diff --git a/gcc/testsuite/gfortran.dg/coarray/ptr_comp_5.f08 b/gcc/testsuite/gfortran.dg/coarray/ptr_comp_5.f08
new file mode 100644
index 000..ed3a8db13fa
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/ptr_comp_5.f08
@@ -0,0 +1,19 @@
+! { dg-do compile }
+
+! Check PR84244 does not ICE anymore.
+
+program ptr_comp_5
+  integer, target :: dest = 42
+  type t
+integer, pointer :: p
+  end type
+  type(t) :: o[*]
+
+  o%p => dest
+contains
+  ! This unused routine is crucial for the ICE.
+  function f(x)
+type(t), intent(in) ::x
+  end function
+end program
+
--
2.45.2

Re: [PATCH 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns.

2024-07-11 Thread Kyrylo Tkachov



> On 11 Jul 2024, at 15:41, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Kyrylo Tkachov  writes:
>> Hi Victor,
>> 
>>> On 10 Jul 2024, at 16:05, Victor Do Nascimento 
>>>  wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Given recent changes to the dot_prod standard pattern name, this patch
>>> fixes the aarch64 back-end by implementing the following changes:
>>> 
>>> 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
>>> 2. Rewrite initialization and function expansion mechanism for simd
>>> builtins.
>>> 3. Fix all direct calls to back-end `dot_prod' patterns in SVE
>>> builtins.
>>> 
>>> Finally, given that it is now possible for the compiler to
>>> differentiate between the two- and four-way dot product, we add a test
>>> to ensure that autovectorization picks up on dot-product patterns
>>> where the result is twice the width of the operands.
>>> 
>>> gcc/ChangeLog:
>>> 
>>>   * config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
>>>   New AARCH64_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI,
>>>   UDOTV8QI, UDOTV16QI, USDOTV8QI, USDOTV16QI.
>>>   (aarch64_init_builtin_dotprod_functions): New.
>>>   (aarch64_init_simd_builtins): Add call to
>>>   `aarch64_init_builtin_dotprod_functions'.
>>>   (aarch64_general_gimple_fold_builtin): Add DOT_PROD_EXPR
>>>   handling.
>>>   * config/aarch64/aarch64-simd-builtins.def: Remove macro
>>>   expansion-based initialization and expansion
>>>   of (u|s|us)dot_prod builtins.
>>>   * config/aarch64/aarch64-simd.md
>>>   (dot_prod): Deleted.
>>>   (dot_prod): New.
>>>   (usdot_prod): Deleted.
>>>   (usdot_prod): New.
>>>   (sadv16qi): Adjust call to gen_udot_prod take second mode.
>>>   (popcount): fix use of `udot_prod_optab'.
>>>   * config/aarch64/aarch64-sve-builtins-base.cc
>>>   (svdot_impl::expand): s/direct/convert/ in
>>>   `convert_optab_handler_for_sign' function call.
>>>   (svusdot_impl::expand): add second mode argument in call to
>>>   `code_for_dot_prod'.
>>>   * config/aarch64/aarch64-sve-builtins.cc
>>>   (function_expander::convert_optab_handler_for_sign): New class
>>>   method.
>>>   * config/aarch64/aarch64-sve-builtins.h
>>>   (class function_expander): Add prototype for new
>>>   `convert_optab_handler_for_sign' method.
>>>   * gcc/config/aarch64/aarch64-sve.md
>>>   (dot_prod): Deleted.
>>>   (dot_prod): New.
>>>   (@dot_prod): Deleted.
>>>   (@dot_prod): New.
>>>   (sad): Adjust call to gen_udot_prod take second mode.
>>>   * gcc/config/aarch64/aarch64-sve2.md
>>>   (@aarch64_sve_dotvnx4sivnx8hi): Deleted.
>>>   (dot_prodvnx4sivnx8hi): New.
>>> 
>>> gcc/testsuite/ChangeLog:
>>>   * gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.
>>> ---
>>> gcc/config/aarch64/aarch64-builtins.cc| 71 +++
>>> gcc/config/aarch64/aarch64-simd-builtins.def  |  4 --
>>> gcc/config/aarch64/aarch64-simd.md|  9 +--
>>> .../aarch64/aarch64-sve-builtins-base.cc  | 13 ++--
>>> gcc/config/aarch64/aarch64-sve-builtins.cc| 17 +
>>> gcc/config/aarch64/aarch64-sve-builtins.h |  3 +
>>> gcc/config/aarch64/aarch64-sve.md |  6 +-
>>> gcc/config/aarch64/aarch64-sve2.md|  2 +-
>>> gcc/config/aarch64/iterators.md   |  1 +
>>> .../aarch64/sme/vect-dotprod-twoway.c | 25 +++
>>> 10 files changed, 133 insertions(+), 18 deletions(-)
>>> create mode 100644 
>>> gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
>>> 
>>> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
>>> b/gcc/config/aarch64/aarch64-builtins.cc
>>> index 30669f8aa18..6c7c86d0e6e 100644
>>> --- a/gcc/config/aarch64/aarch64-builtins.cc
>>> +++ b/gcc/config/aarch64/aarch64-builtins.cc
>>> @@ -783,6 +783,12 @@ enum aarch64_builtins
>>>  AARCH64_SIMD_PATTERN_START = AARCH64_SIMD_BUILTIN_LANE_CHECK + 1,
>>>  AARCH64_SIMD_BUILTIN_MAX = AARCH64_SIMD_PATTERN_START
>>> + ARRAY_SIZE (aarch64_simd_builtin_data) - 1,
>>> +  AARCH64_BUILTIN_SDOTV8QI,
>>> +  AARCH64_BUILTIN_SDOTV16QI,
>>> +  AARCH64_BUILTIN_UDOTV8QI,
>>> +  AARCH64_BUILTIN_UDOTV16QI,
>>> +  AARCH64_BUILTIN_USDOTV8QI,
>>> +  AARCH64_BUILTIN_USDOTV16QI,
>>>  AARCH64_CRC32_BUILTIN_BASE,
>>>  AARCH64_CRC32_BUILTINS
>>>  AARCH64_CRC32_BUILTIN_MAX,
>>> @@ -1642,6 +1648,60 @@ handle_arm_neon_h (void)
>>>  aarch64_init_simd_intrinsics ();
>>> }
>>> 
>>> +void
>>> +aarch64_init_builtin_dotprod_functions (void)
>>> +{
>>> +  tree fndecl = NULL;
>>> +  tree ftype = NULL;
>>> +
>>> +  tree uv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_unsigned);
>>> +  tree sv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_none);
>>> +  tree uv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_unsigned);
>>> +  tree sv16qi = aarch64_simd_builtin_type (V16QImode, qualifi

Re: [PATCH 01/10] optabs: Make all `*dot_prod_optab's modeled as conversions

2024-07-11 Thread Richard Sandiford

Victor Do Nascimento  writes:
> Given the specification in the GCC internals manual defines the
> {u|s}dot_prod standard name as taking "two signed elements of the
> same mode, adding them to a third operand of wider mode", there is
> currently ambiguity in the relationship between the mode of the first
> two arguments and that of the third.
>
> This vagueness means that, in theory, different modes may be
> supportable in the third argument.  This flexibility would allow for a
> given backend to add to the accumulator a different number of
> vectorized products, e.g. A backend may provide instructions for both:
>
>   accum += a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]
>
> and
>
>   accum += a[0] * b[0] + a[1] * b[1],
>
> as is now seen in the SVE2.1 extension to AArch64.  In spite of the
> aforementioned flexibility, modeling the dot-product operation as a
> direct optab means that we have no way to encode both input and the
> accumulator data modes into the backend pattern name, which prevents
> us from harnessing this flexibility.
>
> We therefore make all dot_prod optabs conversions, allowing, for
> example, for the encoding of both 2-way and 4-way dot product backend
> patterns.
>
> gcc/ChangeLog:
>
>   * optabs.def (sdot_prod_optab): Convert from OPTAB_D to
>   OPTAB_CD.
>   (udot_prod_optab): Likewise.
>   (usdot_prod_optab): Likewise.
>   * doc/md.texi (Standard Names): update entries for u,s and us
>   dot_prod names.
> ---
>  gcc/doc/md.texi | 18 +-
>  gcc/optabs.def  |  6 +++---
>  2 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 7f4335e0aac..2a74e473f05 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5748,15 +5748,15 @@ for (i = 0; i < LEN + BIAS; i++)
>  operand0 += operand2[i];
>  @end smallexample
>  
> -@cindex @code{sdot_prod@var{m}} instruction pattern
> -@item @samp{sdot_prod@var{m}}
> +@cindex @code{sdot_prod@var{m}@var{n}} instruction pattern
> +@item @samp{sdot_prod@var{m}@var{n}}
>  
>  Compute the sum of the products of two signed elements.
>  Operand 1 and operand 2 are of the same mode. Their
>  product, which is of a wider mode, is computed and added to operand 3.
>  Operand 3 is of a mode equal or wider than the mode of the product. The
>  result is placed in operand 0, which is of the same mode as operand 3.
> -@var{m} is the mode of operand 1 and operand 2.
> +@var{m} is the mode of operands 0 and 3 and @var{n} the mode of operands 1 
> and 2.

Now that we can put names to both modes, how about replacing the
description with something like this:

Multiply operand 1 by operand 2 without loss of precision, given that
both operands contain signed elements.  Add each product to the overlapping
element of operand 3 and store the result in operand 0.  Operands 0 and 3
have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n}
having narrower elements than @var{m}.

This is all personal taste though, so it's just a suggestion.

Same idea for the others.

OK with that change from my POV, but happy to hear other suggestions.

Thanks,
Richard

>  Semantically the expressions perform the multiplication in the following 
> signs
>  
> @@ -5766,15 +5766,15 @@ sdot 
> ==
>  @dots{}
>  @end smallexample
>  
> -@cindex @code{udot_prod@var{m}} instruction pattern
> -@item @samp{udot_prod@var{m}}
> +@cindex @code{udot_prod@var{m}@var{n}} instruction pattern
> +@item @samp{udot_prod@var{m}@var{n}}
>  
>  Compute the sum of the products of two unsigned elements.
>  Operand 1 and operand 2 are of the same mode. Their
>  product, which is of a wider mode, is computed and added to operand 3.
>  Operand 3 is of a mode equal or wider than the mode of the product. The
>  result is placed in operand 0, which is of the same mode as operand 3.
> -@var{m} is the mode of operand 1 and operand 2.
> +@var{m} is the mode of operands 0 and 3 and @var{n} the mode of operands 1 
> and 2.
>  
>  Semantically the expressions perform the multiplication in the following 
> signs
>  
> @@ -5784,14 +5784,14 @@ udot unsigned op3> ==
>  @dots{}
>  @end smallexample
>  
> -@cindex @code{usdot_prod@var{m}} instruction pattern
> -@item @samp{usdot_prod@var{m}}
> +@cindex @code{usdot_prod@var{m}@var{n}} instruction pattern
> +@item @samp{usdot_prod@var{m}@var{n}}
>  Compute the sum of the products of elements of different signs.
>  Operand 1 must be unsigned and operand 2 signed. Their
>  product, which is of a wider mode, is computed and added to operand 3.
>  Operand 3 is of a mode equal or wider than the mode of the product. The
>  result is placed in operand 0, which is of the same mode as operand 3.
> -@var{m} is the mode of operand 1 and operand 2.
> +@var{m} is the mode of operands 0 and 3 and @var{n} the mode of operands 1 
> and 2.
>  
>  Semantically the expressions perform the multiplication in the following 
> signs
>  
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 4

Re: [PATCH-1v4] Value Range: Add range op for builtin isinf

2024-07-11 Thread Andrew MacLeod

I have no issues with any of the range-op work in this patch series,  
but I am unequipped to review the floating point aspects of any of the 
patches.


We just need someone to signoff that this properly reflects those builtins.

Andrew


On 7/11/24 03:32, HAO CHEN GUI wrote:

Hi,
   The builtin isinf is not folded at front end if the corresponding optab
exists. It causes the range evaluation failed on the targets which has
optab_isinf. For instance, range-sincos.c will fail on the targets which
has optab_isinf as it calls builtin_isinf.

   This patch fixed the problem by adding range op for builtin isinf. It
also fixed the issue in PR114678.

   Compared with previous version, the main change is to remove xfail for
s390 in range-sincos.c and vrp-float-abs-1.c.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html

   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
Value Range: Add range op for builtin isinf

The builtin isinf is not folded at front end if the corresponding optab
exists.  So the range op for isinf is needed for value range analysis.
This patch adds range op for builtin isinf.

gcc/
PR target/114678
* gimple-range-op.cc (class cfn_isinf): New.
(op_cfn_isinf): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_FLT_FN (BUILT_IN_ISINF).

gcc/testsuite/
PR target/114678
* gcc.dg/tree-ssa/range-isinf.c: New test.
* gcc.dg/tree-ssa/range-sincos.c: Remove xfail for s390.
* gcc.dg/tree-ssa/vrp-float-abs-1.c: Likewise.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index a80b93cf063..24559951dd6 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1153,6 +1153,63 @@ private:
bool m_is_pos;
  } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);

+// Implement range operator for CFN_BUILT_IN_ISINF
+class cfn_isinf : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange &r, tree type, const frange &op1,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isinf ())
+  {
+   wide_int one = wi::one (TYPE_PRECISION (type));
+   r.set (type, one, one);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || (!real_isinf (&op1.lower_bound ())
+   && !real_isinf (&op1.upper_bound (
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange &r, tree type, const irange &lhs,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   nan_state nan (true);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   // The range is [-INF,-INF][+INF,+INF], but it can't be represented.
+   // Set range to [-INF,+INF]
+   r.set_varying (type);
+   r.clear_nan ();
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isinf;

  // Implement range operator for CFN_BUILT_IN_
  class cfn_parity : public range_operator
@@ -1246,6 +1303,11 @@ gimple_range_op_handler::maybe_builtin_call ()
m_operator = &op_cfn_signbit;
break;

+CASE_FLT_FN (BUILT_IN_ISINF):
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = &op_cfn_isinf;
+  break;
+
  CASE_CFN_COPYSIGN_ALL:
m_op1 = gimple_call_arg (call, 0);
m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
new file mode 100644
index 000..468f1bcf5c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void
+test1 (double x)
+{
+  if (x > __DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test2 (float x)
+{
+  if (x > __FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test3 (double x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__)
+link_error ();
+}
+
+void
+test4 (float x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__)
+link_error ();
+}
+
+/* { dg-

Re: [PATCH] [libstdc++] [testsuite] avoid arbitrary errno codes

2024-07-11 Thread Jonathan Wakely

On Thu, 11 Jul 2024 at 14:23, Alexandre Oliva  wrote:
>
>
> Passing an arbitrary error number to strerror* functions doesn't seem
> to be portable; 19_diagnostics/system_error/cons-1.cc is hitting
> runtime errors in the block that attempts to instantiate a
> std:;system_error for error number 95.  The range of errno macros
> defined on this target doesn't reach 95.

The C standard is clear:

"Typically, the values for errnum come from errno, but strerror shall
map any value of type int to a message. "

And std::system_error doesn't limit the values that can be passed to
it either. So I'd prefer to keep testing with arbitrary int values,
because that *should* work.


>
> I'm changing the test to try to use a couple of select error codes,
> falling back to a lower error number if neither are present.
> Hopefully this doesn't change the nature of what is being tested for.

I think it does.

>
> Regstrapped on x86_64-linux-gnu, also tested with gcc-13 targeting
> aarch64.  Ok to install?
>
>
> for  libstdc++-v3/ChangeLog
>
> * testsuite/19_diagnostics/system_error/cons-1.cc: Use lower
> error numbers, preferring some macro-defined ones.
> ---
>  .../19_diagnostics/system_error/cons-1.cc  |   14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/testsuite/19_diagnostics/system_error/cons-1.cc 
> b/libstdc++-v3/testsuite/19_diagnostics/system_error/cons-1.cc
> index 16aa960b2ee28..e227c67542411 100644
> --- a/libstdc++-v3/testsuite/19_diagnostics/system_error/cons-1.cc
> +++ b/libstdc++-v3/testsuite/19_diagnostics/system_error/cons-1.cc
> @@ -37,8 +37,18 @@ int main()
>
>// 2
>{
> -std::system_error err2(95, std::system_category(), s);
> -VERIFY( err2.code() == std::error_code(95, std::system_category()) );
> +int eno;
> +#if defined EOPNOTSUPP
> +eno = EOPNOTSUPP;
> +#elif defined ENOSYS
> +eno = ENOSYS;
> +#else
> +// strerror (used to combine with the given message) may fail if
> +// the error number is out of range for the system.
> +eno = 42;
> +#endif
> +std::system_error err2(eno, std::system_category(), s);
> +VERIFY( err2.code() == std::error_code(eno, std::system_category()) );
>  VERIFY( std::string((err2.what(), s)).find(s) != std::string::npos );
>}
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
>

[PATCH] s390: Align cjump_64 and icjump_64

2024-07-11 Thread Stefan Schulze Frielinghaus

During machine reorg we optimize backward jumps and transform insns as
e.g.

(jump_insn 118 117 119 (set (pc)
(if_then_else (ne (reg:CCRAW 33 %cc)
(const_int 8 [0x8]))
(label_ref 134)
(pc))) "dec_math_1.f90":204:8 discrim 1 2161 {*cjump_64}
 (expr_list:REG_DEAD (reg:CCRAW 33 %cc)
(int_list:REG_BR_PROB 719407028 (nil)))
 -> 134)

into

(jump_insn 118 117 432 (set (pc)
(if_then_else (ne (reg:CCRAW 33 %cc)
(const_int 8 [0x8]))
(pc)
(label_ref 433))) "dec_math_1.f90":204:8 discrim 1 -1
 (expr_list:REG_DEAD (reg:CCRAW 33 %cc)
(int_list:REG_BR_PROB 719407028 (nil)))
 -> 433)

The latter is not recognized anymore since *icjump_64 only matches
CC_REGNUM against zero.  Fixed by aligning *cjump_64 and *icjump_64.

gcc/ChangeLog:

* config/s390/s390.md (*icjump_64): Allow raw CC comparisons,
i.e., any constant integer between 0 and 15 for CC comparisons.
---
 Bootstrap and regtest or still running.  Assuming no regressions, ok
 for {mainline,11,12,13,14}?  Would be great to see this in 14.2 RC :)

 gcc/config/s390/s390.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index f5d7003dfad..d3931b09417 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -9556,7 +9556,8 @@
 (define_insn "*icjump_64"
   [(set (pc)
 (if_then_else
-  (match_operator 1 "s390_comparison" [(reg CC_REGNUM) (const_int 0)])
+  (match_operator 1 "s390_comparison" [(reg CC_REGNUM)
+  (match_operand 2 
"const_int_operand" "")])
   (pc)
   (label_ref (match_operand 0 "" ""]
   ""
-- 
2.45.2

Re: [PATCH] [libstdc++] [testsuite] require dfprt on some tests

2024-07-11 Thread Jonathan Wakely

On Thu, 11 Jul 2024 at 14:22, Alexandre Oliva  wrote:
>
>
> On a target that doesn't enable decimal float components in libgcc
> (because the libc doens't define all required FE_* macros), but whose
> compiler supports _Decimal* types, the effective target requirement
> dfp passes, but several tests won't link because the runtime support
> they depend on is missing.  State their dfprt requirement.
>
> Regstrapped on x86_64-linux-gnu, also tested with gcc-13 targeting
> aarch64.  Ok to install?

OK, thanks.


>
>
> for  libstdc++-v3/ChangeLog
>
> * testsuite/decimal/binary-arith.cc: Require dfprt.
> * testsuite/decimal/comparison.cc: Likewise.
> * testsuite/decimal/compound-assignment-memfunc.cc: Likewise.
> * testsuite/decimal/make-decimal.cc: Likewise.
> * testsuite/decimal/pr54036-1.cc: Likewise.
> * testsuite/decimal/pr54036-2.cc: Likewise.
> * testsuite/decimal/pr54036-3.cc: Likewise.
> * testsuite/decimal/unary-arith.cc: Likewise.
> ---
>  libstdc++-v3/testsuite/decimal/binary-arith.cc |2 +-
>  libstdc++-v3/testsuite/decimal/comparison.cc   |2 +-
>  .../decimal/compound-assignment-memfunc.cc |2 +-
>  .../testsuite/decimal/compound-assignment.cc   |2 +-
>  libstdc++-v3/testsuite/decimal/make-decimal.cc |2 +-
>  libstdc++-v3/testsuite/decimal/pr54036-1.cc|2 +-
>  libstdc++-v3/testsuite/decimal/pr54036-2.cc|2 +-
>  libstdc++-v3/testsuite/decimal/pr54036-3.cc|2 +-
>  libstdc++-v3/testsuite/decimal/unary-arith.cc  |2 +-
>  9 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/libstdc++-v3/testsuite/decimal/binary-arith.cc 
> b/libstdc++-v3/testsuite/decimal/binary-arith.cc
> index c10a8b6466cb0..3eeed7ea97501 100644
> --- a/libstdc++-v3/testsuite/decimal/binary-arith.cc
> +++ b/libstdc++-v3/testsuite/decimal/binary-arith.cc
> @@ -15,7 +15,7 @@
>  // with this library; see the file COPYING3.  If not see
>  // .
>
> -// { dg-require-effective-target dfp }
> +// { dg-require-effective-target dfprt }
>
>  // ISO/IEC TR 24733  3.2.8  Binary arithmetic operators.
>
> diff --git a/libstdc++-v3/testsuite/decimal/comparison.cc 
> b/libstdc++-v3/testsuite/decimal/comparison.cc
> index cf34c8d74badc..424dd8bd26659 100644
> --- a/libstdc++-v3/testsuite/decimal/comparison.cc
> +++ b/libstdc++-v3/testsuite/decimal/comparison.cc
> @@ -15,7 +15,7 @@
>  // with this library; see the file COPYING3.  If not see
>  // .
>
> -// { dg-require-effective-target dfp }
> +// { dg-require-effective-target dfprt }
>
>  // ISO/IEC TR 24733  3.2.9  Comparison operators.
>
> diff --git a/libstdc++-v3/testsuite/decimal/compound-assignment-memfunc.cc 
> b/libstdc++-v3/testsuite/decimal/compound-assignment-memfunc.cc
> index 817d4bb10b1e9..d520af9a68d49 100644
> --- a/libstdc++-v3/testsuite/decimal/compound-assignment-memfunc.cc
> +++ b/libstdc++-v3/testsuite/decimal/compound-assignment-memfunc.cc
> @@ -15,7 +15,7 @@
>  // with this library; see the file COPYING3.  If not see
>  // .
>
> -// { dg-require-effective-target dfp }
> +// { dg-require-effective-target dfprt }
>
>  // ISO/IEC TR 24733  3.2.2.6  Compound assignment (decimal32).
>  // ISO/IEC TR 24733  3.2.3.6  Compound assignment (decimal64).
> diff --git a/libstdc++-v3/testsuite/decimal/compound-assignment.cc 
> b/libstdc++-v3/testsuite/decimal/compound-assignment.cc
> index 2d3e325856988..5aa87e78a739a 100644
> --- a/libstdc++-v3/testsuite/decimal/compound-assignment.cc
> +++ b/libstdc++-v3/testsuite/decimal/compound-assignment.cc
> @@ -15,7 +15,7 @@
>  // with this library; see the file COPYING3.  If not see
>  // .
>
> -// { dg-require-effective-target dfp }
> +// { dg-require-effective-target dfprt }
>
>  // ISO/IEC TR 24733  3.2.2.6  Compound assignment (decimal32).
>  // ISO/IEC TR 24733  3.2.3.6  Compound assignment (decimal64).
> diff --git a/libstdc++-v3/testsuite/decimal/make-decimal.cc 
> b/libstdc++-v3/testsuite/decimal/make-decimal.cc
> index aa75ac89d4792..560196cb305e1 100644
> --- a/libstdc++-v3/testsuite/decimal/make-decimal.cc
> +++ b/libstdc++-v3/testsuite/decimal/make-decimal.cc
> @@ -15,7 +15,7 @@
>  // with this library; see the file COPYING3.  If not see
>  // .
>
> -// { dg-require-effective-target dfp }
> +// { dg-require-effective-target dfprt }
>  // { dg-options "-Wno-pedantic" }
>
>  // ISO/IEC TR 24733  3.2.5  Initialization from coefficient and exponent.
> diff --git a/libstdc++-v3/testsuite/decimal/pr54036-1.cc 
> b/libstdc++-v3/testsuite/decimal/pr54036-1.cc
> index 508738701ca01..a07e4c351651c 100644
> --- a/libstdc++-v3/testsuite/decimal/pr54036-1.cc
> +++ b/libstdc++-v3/testsuite/decimal/pr54036-1.cc
> @@ -15,7 +15,7 @@
>  // with this library; see the file COPYING3.  If not see
>  // .
>
> -// { dg-require

Re: [PATCH] [libstdc++] [testsuite] require dfprt on some tests

2024-07-11 Thread Jonathan Wakely

On Thu, 11 Jul 2024 at 15:28, Jonathan Wakely  wrote:
>
> On Thu, 11 Jul 2024 at 14:22, Alexandre Oliva  wrote:
> >
> >
> > On a target that doesn't enable decimal float components in libgcc
> > (because the libc doens't define all required FE_* macros), but whose
> > compiler supports _Decimal* types, the effective target requirement
> > dfp passes, but several tests won't link because the runtime support
> > they depend on is missing.  State their dfprt requirement.
> >
> > Regstrapped on x86_64-linux-gnu, also tested with gcc-13 targeting
> > aarch64.  Ok to install?
>
> OK, thanks.

btw, you touched it last, so now you own the decimal floating-point code ;-)

>
>
> >
> >
> > for  libstdc++-v3/ChangeLog
> >
> > * testsuite/decimal/binary-arith.cc: Require dfprt.
> > * testsuite/decimal/comparison.cc: Likewise.
> > * testsuite/decimal/compound-assignment-memfunc.cc: Likewise.
> > * testsuite/decimal/make-decimal.cc: Likewise.
> > * testsuite/decimal/pr54036-1.cc: Likewise.
> > * testsuite/decimal/pr54036-2.cc: Likewise.
> > * testsuite/decimal/pr54036-3.cc: Likewise.
> > * testsuite/decimal/unary-arith.cc: Likewise.
> > ---
> >  libstdc++-v3/testsuite/decimal/binary-arith.cc |2 +-
> >  libstdc++-v3/testsuite/decimal/comparison.cc   |2 +-
> >  .../decimal/compound-assignment-memfunc.cc |2 +-
> >  .../testsuite/decimal/compound-assignment.cc   |2 +-
> >  libstdc++-v3/testsuite/decimal/make-decimal.cc |2 +-
> >  libstdc++-v3/testsuite/decimal/pr54036-1.cc|2 +-
> >  libstdc++-v3/testsuite/decimal/pr54036-2.cc|2 +-
> >  libstdc++-v3/testsuite/decimal/pr54036-3.cc|2 +-
> >  libstdc++-v3/testsuite/decimal/unary-arith.cc  |2 +-
> >  9 files changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/libstdc++-v3/testsuite/decimal/binary-arith.cc 
> > b/libstdc++-v3/testsuite/decimal/binary-arith.cc
> > index c10a8b6466cb0..3eeed7ea97501 100644
> > --- a/libstdc++-v3/testsuite/decimal/binary-arith.cc
> > +++ b/libstdc++-v3/testsuite/decimal/binary-arith.cc
> > @@ -15,7 +15,7 @@
> >  // with this library; see the file COPYING3.  If not see
> >  // .
> >
> > -// { dg-require-effective-target dfp }
> > +// { dg-require-effective-target dfprt }
> >
> >  // ISO/IEC TR 24733  3.2.8  Binary arithmetic operators.
> >
> > diff --git a/libstdc++-v3/testsuite/decimal/comparison.cc 
> > b/libstdc++-v3/testsuite/decimal/comparison.cc
> > index cf34c8d74badc..424dd8bd26659 100644
> > --- a/libstdc++-v3/testsuite/decimal/comparison.cc
> > +++ b/libstdc++-v3/testsuite/decimal/comparison.cc
> > @@ -15,7 +15,7 @@
> >  // with this library; see the file COPYING3.  If not see
> >  // .
> >
> > -// { dg-require-effective-target dfp }
> > +// { dg-require-effective-target dfprt }
> >
> >  // ISO/IEC TR 24733  3.2.9  Comparison operators.
> >
> > diff --git a/libstdc++-v3/testsuite/decimal/compound-assignment-memfunc.cc 
> > b/libstdc++-v3/testsuite/decimal/compound-assignment-memfunc.cc
> > index 817d4bb10b1e9..d520af9a68d49 100644
> > --- a/libstdc++-v3/testsuite/decimal/compound-assignment-memfunc.cc
> > +++ b/libstdc++-v3/testsuite/decimal/compound-assignment-memfunc.cc
> > @@ -15,7 +15,7 @@
> >  // with this library; see the file COPYING3.  If not see
> >  // .
> >
> > -// { dg-require-effective-target dfp }
> > +// { dg-require-effective-target dfprt }
> >
> >  // ISO/IEC TR 24733  3.2.2.6  Compound assignment (decimal32).
> >  // ISO/IEC TR 24733  3.2.3.6  Compound assignment (decimal64).
> > diff --git a/libstdc++-v3/testsuite/decimal/compound-assignment.cc 
> > b/libstdc++-v3/testsuite/decimal/compound-assignment.cc
> > index 2d3e325856988..5aa87e78a739a 100644
> > --- a/libstdc++-v3/testsuite/decimal/compound-assignment.cc
> > +++ b/libstdc++-v3/testsuite/decimal/compound-assignment.cc
> > @@ -15,7 +15,7 @@
> >  // with this library; see the file COPYING3.  If not see
> >  // .
> >
> > -// { dg-require-effective-target dfp }
> > +// { dg-require-effective-target dfprt }
> >
> >  // ISO/IEC TR 24733  3.2.2.6  Compound assignment (decimal32).
> >  // ISO/IEC TR 24733  3.2.3.6  Compound assignment (decimal64).
> > diff --git a/libstdc++-v3/testsuite/decimal/make-decimal.cc 
> > b/libstdc++-v3/testsuite/decimal/make-decimal.cc
> > index aa75ac89d4792..560196cb305e1 100644
> > --- a/libstdc++-v3/testsuite/decimal/make-decimal.cc
> > +++ b/libstdc++-v3/testsuite/decimal/make-decimal.cc
> > @@ -15,7 +15,7 @@
> >  // with this library; see the file COPYING3.  If not see
> >  // .
> >
> > -// { dg-require-effective-target dfp }
> > +// { dg-require-effective-target dfprt }
> >  // { dg-options "-Wno-pedantic" }
> >
> >  // ISO/IEC TR 24733  3.2.5  Initialization from coefficient and exponent.
> > diff --git a/libstdc++-v3/testsuite/deci

Re: [PATCH] [libstdc++] [testsuite] xfail 128bit from_chars on all aarch64--

2024-07-11 Thread Jonathan Wakely

On Thu, 11 Jul 2024 at 14:21, Alexandre Oliva  wrote:
>
>
> Having observed failures of these two tests on yet another aarch64
> operating system, and having concluded that the conditions that
> trigger the problem ought to be present on all aarch64 targets, I'm
> now matching any aarch64 target_os to enable the workaround.

That's concerning, aarch64-unknown-linux-gnu with glibc should work OK
for float128_t, because aarch64 already has 128-bit long double, so
there's no good reason that float128_t wouldn't work. What are the
conditions that trigger the problem?

I've only just noticed that the macro name makes no sense, as this is
about float128_t, not long double. They're distinct types (although
for aarch64 they have the same representation). So we should at the
very least rename the macro to something like SKIP_FLOAT128, but I'd
also like to understand why float128_t fails on a target where long
double works OK and has the same binary128 representation.

>
> Regstrapped on x86_64-linux-gnu, also tested with gcc-13 targeting
> aarch64.  Ok to install?
>
>
> for  libstdc++-v3/ChangeLog
>
> * testsuite/20_util/from_chars/8.cc: Define SKIP_LONG_DOUBLE
> on all aarch64-*-* targets.
> * testsuite/20_util/to_chars/float128_c++23.cc: Xfail on all
> aarch64-*-* targets.
> ---
>  libstdc++-v3/testsuite/20_util/from_chars/8.cc |2 +-
>  .../testsuite/20_util/to_chars/float128_c++23.cc   |2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/testsuite/20_util/from_chars/8.cc 
> b/libstdc++-v3/testsuite/20_util/from_chars/8.cc
> index bacad89943b5f..e92b64349025e 100644
> --- a/libstdc++-v3/testsuite/20_util/from_chars/8.cc
> +++ b/libstdc++-v3/testsuite/20_util/from_chars/8.cc
> @@ -17,7 +17,7 @@
>
>  // { dg-do run { target c++23 } }
>  // { dg-add-options ieee }
> -// { dg-additional-options "-DSKIP_LONG_DOUBLE" { target aarch64-*-vxworks* 
> aarch64-*-rtems* } }
> +// { dg-additional-options "-DSKIP_LONG_DOUBLE" { target aarch64-*-* } }
>
>  #include 
>  #include 
> diff --git a/libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc 
> b/libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc
> index 6cb9cadcd2041..840131c1e5691 100644
> --- a/libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc
> +++ b/libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc
> @@ -19,7 +19,7 @@
>  // { dg-require-effective-target ieee_floats }
>  // { dg-require-effective-target size32plus }
>  // { dg-add-options ieee }
> -// { dg-xfail-run-if "from_chars limited to double-precision" { 
> aarch64-*-vxworks* aarch64-*-rtems* } }
> +// { dg-xfail-run-if "from_chars limited to double-precision" { aarch64-*-* 
> } }
>
>  #include 
>  #include 
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
>

Re: [PATCH] [libstdc++] [testsuite] xfail 128bit from_chars on all aarch64--

2024-07-11 Thread Andreas Schwab

On Jul 11 2024, Jonathan Wakely wrote:

> On Thu, 11 Jul 2024 at 14:21, Alexandre Oliva  wrote:
>>
>>
>> Having observed failures of these two tests on yet another aarch64
>> operating system, and having concluded that the conditions that
>> trigger the problem ought to be present on all aarch64 targets, I'm
>> now matching any aarch64 target_os to enable the workaround.
>
> That's concerning, aarch64-unknown-linux-gnu with glibc should work OK
> for float128_t, because aarch64 already has 128-bit long double, so
> there's no good reason that float128_t wouldn't work. What are the
> conditions that trigger the problem?

Both tests run sucessfully on aarch64-linux.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-11 Thread Filip Kastl

> > > > +/* Check that the "exponential index transform" can be applied to this 
> > > > switch.
> > > > +
> > > > +   See comment of the exp_index_transform function for details about 
> > > > this
> > > > +   transformation.
> > > > +
> > > > +   We want:
> > > > +   - This form of the switch is more efficient
> > > > +   - Cases are powers of 2
> > > > +
> > > > +   Expects that SWTCH has at least one case.  */
> > > > +
> > > > +bool
> > > > +switch_conversion::is_exp_index_transform_viable (gswitch *swtch)
> > > > +{
> > > > +  tree index = gimple_switch_index (swtch);
> > > > +  tree index_type = TREE_TYPE (index);
> > > > +  basic_block swtch_bb = gimple_bb (swtch);
> > > > +  unsigned num_labels = gimple_switch_num_labels (swtch);
> > > > +
> > > > +  /* Check that we can efficiently compute logarithm of 2^k (using 
> > > > FFS) and
> > > > + test that a given number is 2^k for some k (using POPCOUNT).  */
> > > > +  optimization_type opt_type = bb_optimization_type (swtch_bb);
> > > > +  if (!direct_internal_fn_supported_p (IFN_FFS, index_type, opt_type)
> > > > +|| !direct_internal_fn_supported_p (IFN_POPCOUNT, index_type, 
> > > > opt_type))
> > > > +return false;
> > > > +
> > > 
> > > See above, I think this can be improved.  Would be nice to split out
> > > a can_pow2p (type) and can_log2 (type) and a corresponding
> > > gen_pow2p (op) and gen_log2 (op) function so this could be re-used
> > > and alternate variants added when required.
> > > 
> > 
> > Just to check that I understand this correctly:  You'd like me to create
> > functions can_pow2p, can_log2.  Those functions will check that there are
> > optabs for the target machine which allow us to efficiently test
> > power-of-2-ness of a number and which allow us to efficiently compute the
> > base-2 log of a power-of-2 number.  You'd also like me to create functions
> > gen_pow2p and gen_log2 which generate this code.  For now these functions 
> > will
> > just use POPCOUNT and FFS but they can be later extended to also consider
> > different instructions.  Is that right?
> 
> Right.
> 
> > Into which file should I put these functions?
> 
> Just in this file for now.
>  
> > Is can_pow2p and gen_pow2p necessary?  As you noted one can always use
> > (x & -x) == x so testing pow2p can always be done efficiently.
> 
> If you add this fallback then can_pow2p / gen_pow2p wouldn't be
> necessary indeed.

Hi Richard,

I put some work into splitting out the can_ and gen_ functions as you
suggested.  I'm still a bit unsure what your vision of these is so before I
submit all the changes I made to the patch as version 2 I would like to share
how I implemented the functions (see bellow).  Is this how you imagined the
functions?  Would you change something or do the they look ok?

I wasn't sure how generic to make the functions.  The more generic the more
convoluted the implementation becomes.  For example: I could make them more
generic by also including a gsi_iterator_update parameter or I could make them
less generic but more straightforward by removing the BEFORE parameter.

Cheers,
Filip Kastl


/* Does the target have optabs needed to efficiently compute exact base two
   logarithm of a value with type TYPE?

   See gen_log2.  */

static bool
can_log2 (tree type, optimization_type opt_type)
{
  /* Check if target supports FFS.  */
  return direct_internal_fn_supported_p (IFN_FFS, type, opt_type);
}

/* Assume that OP is a power of two.  Build a sequence of gimple statements
   efficiently computing the base two logarithm of OP using special optabs.
   Insert statements before GSI if BEFORE is true or after GSI otherwise.
   Return the result value as an ssa name tree.

   Leave GSI at the same statement (GSI_SAME_STMT behavior).

   Should only be used if target supports the needed optabs.  See can_log2.  */

static tree
gen_log2 (gimple_stmt_iterator *gsi, bool before, location_t loc, tree op)
{
  /* Use .FFS (op) - 1.  */
  gimple *s = gsi->ptr;
  tree type = TREE_TYPE (op);
  gsi_iterator_update update = before ? GSI_SAME_STMT : GSI_NEW_STMT;
  tree tmp1 = gimple_build (gsi, before, update, loc, IFN_FFS, type, op);
  tree tmp2 = gimple_build (gsi, before, update, loc, MINUS_EXPR, type, tmp1,
build_one_cst (type));
  gsi->ptr = s;
  return tmp2;
}

/* Build a sequence of gimple statements checking that OP is a power of 2.  Use
   special optabs if targets supports them.  Insert statements before GSI if
   BEFORE is true or after GSI otherwise.  Return the result value as a
   boolean_type_node ssa name tree.

   Leave GSI at the same statement (GSI_SAME_STMT behavior).  */

static tree
gen_pow2p (gimple_stmt_iterator *gsi, bool before, location_t loc, tree op)
{
  tree result;

  /* Use either .POPCOUNT (op) == 1 or op & -op == op.  */
  tree type = TREE_TYPE (op);
  gimple *s = gsi->ptr;
  gsi_iterator_update update = before ? GSI_SAME_STMT : GSI_NEW_STMT;
  optimization_type opt_type = bb_optimization

Re: [PATCH] s390: Align cjump_64 and icjump_64

2024-07-11 Thread Stefan Schulze Frielinghaus

On Thu, Jul 11, 2024 at 04:29:19PM +0200, Stefan Schulze Frielinghaus wrote:
> During machine reorg we optimize backward jumps and transform insns as
> e.g.
> 
> (jump_insn 118 117 119 (set (pc)
> (if_then_else (ne (reg:CCRAW 33 %cc)
> (const_int 8 [0x8]))
> (label_ref 134)
> (pc))) "dec_math_1.f90":204:8 discrim 1 2161 {*cjump_64}
>  (expr_list:REG_DEAD (reg:CCRAW 33 %cc)
> (int_list:REG_BR_PROB 719407028 (nil)))
>  -> 134)
> 
> into
> 
> (jump_insn 118 117 432 (set (pc)
> (if_then_else (ne (reg:CCRAW 33 %cc)
> (const_int 8 [0x8]))
> (pc)
> (label_ref 433))) "dec_math_1.f90":204:8 discrim 1 -1
>  (expr_list:REG_DEAD (reg:CCRAW 33 %cc)
> (int_list:REG_BR_PROB 719407028 (nil)))
>  -> 433)
> 
> The latter is not recognized anymore since *icjump_64 only matches
> CC_REGNUM against zero.  Fixed by aligning *cjump_64 and *icjump_64.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.md (*icjump_64): Allow raw CC comparisons,
>   i.e., any constant integer between 0 and 15 for CC comparisons.
> ---
>  Bootstrap and regtest or still running.  Assuming no regressions, ok
>  for {mainline,11,12,13,14}?  Would be great to see this in 14.2 RC :)

I didn't have the schedule for 11.5 RC in mind which is tomorrow and the
release a week afterwards.  I hope this is still appropriate for 11.5?

Cheers,
Stefan

> 
>  gcc/config/s390/s390.md | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
> index f5d7003dfad..d3931b09417 100644
> --- a/gcc/config/s390/s390.md
> +++ b/gcc/config/s390/s390.md
> @@ -9556,7 +9556,8 @@
>  (define_insn "*icjump_64"
>[(set (pc)
>  (if_then_else
> -  (match_operator 1 "s390_comparison" [(reg CC_REGNUM) (const_int 
> 0)])
> +  (match_operator 1 "s390_comparison" [(reg CC_REGNUM)
> +(match_operand 2 
> "const_int_operand" "")])
>(pc)
>(label_ref (match_operand 0 "" ""]
>""
> -- 
> 2.45.2
>

Re: [PATCH] s390: Align cjump_64 and icjump_64

2024-07-11 Thread Jakub Jelinek

On Thu, Jul 11, 2024 at 05:09:41PM +0200, Stefan Schulze Frielinghaus wrote:
> I didn't have the schedule for 11.5 RC in mind which is tomorrow and the
> release a week afterwards.  I hope this is still appropriate for 11.5?

>From my side, if Andreas or somebody else approves it, it is tested on 11
branch and committed by tomorrow, it can be added.
But I'd like to know what patches I should wait for tomorrow and approximate
ETA (and ideally before end of working day in Europe).  Once rc1 is done, only
severe blockers will be possible.

Jakub

Re: [PATCH v2 1/3] c++: Introduce USING_DECLs for non-function usings [PR114683]

2024-07-11 Thread Jason Merrill


On 7/11/24 6:22 AM, Nathaniel Shead wrote:

On Tue, Jul 09, 2024 at 05:43:59PM -0400, Jason Merrill wrote:

On 7/9/24 9:44 AM, Nathaniel Shead wrote:

On Mon, Jul 08, 2024 at 12:26:41PM -0400, Jason Merrill wrote:

For a using-decl in the same scope as the original decl, won't this replace
it so only the using-decl is visible to lookup?  I had expected to omit the
USING_DECL in that case.


Yup it will; I think I'd originally done that so that more recent
(re-)declaration would be the one referred to by diagnostics, but on
retrospect that seems unhelpful; fixed.  (Though need to keep the
replacement for CONST_DECLs, because the modules handling otherwise only
handles them in the context of their containing enumeration type, which
isn't what we want here; I've added a new test for this as well.)


Ah, using-25, sure.  I would think we could still tell the difference by
comparing PURVIEW/EXPORT on the CONST_DECL to those of its type?

Or perhaps have add_binding_entity skip implicitly inserted enumerators, and
instead insert them again when reading the enum, which should also save a
bit of space.

Jason



So maybe something like the following?

Bootstrapped and regtested on x86_64-pc-linux-gnu, can be applied either
incrementally on the previous patch or separately as you prefer.


Looks good. I think squash this patch into the previous one; the 
combined patch is OK.



-- >8 --

Subject: [PATCH] c++/modules: Avoid unnecessary wrapping for CONST_DECLs

Enumerators are only written when writing the type definition (at which
point they are all written); this will happen regardless of scoped vs
unscoped or whether the enum is explicitly exported.  All other cases
where an enumerator needs to be written (e.g. template parameters) they
are just a backreference to the type decl and the name of the value.

'add_binding_entity' needs to explicitly write the names of unscoped
enumerators so that lazy loading will trigger when the name is found by
name lookup; it does this by pretending that the enum declarations are
always usings so that it doesn't double-write definitions.  By also
checking if the enumerator was marked purview/exported we can use that
to override a non-purview/non-exported TYPE_DECL and ensure it's made
visible regardless.

When reading we should get the exported flag on the enumeration
constant, and so should properly create a binding for it.  We don't need
to do anything to handle importedness as that checking is skipped for
EK_USINGs.

Some other places assume that module information for a CONST_DECL
inherits module information from its containing type.  This includes:

- get_originating_module_decl, for determining if the name was imported
   or has module attachment; I don't /think/ this change should affect
   that, so I'm leaving this untouched.

- binding_cmp, for sorting by exportedness; since now an enumerator
   could be exported without the containing decl being exported, we need
   to handle this here too.

With all this in mind, we can avoid creating a new USING_DECL for a
same-scope using that reveals a CONST_DECL by ensuring that we
special-case CONST_DECLs with purview/exported flags appropriately.

gcc/cp/ChangeLog:

* module.cc (depset::hash::add_binding_entity): Handle
CONST_DECLs with different purview/exported from their enum.
(binding_cmp): Likewise.
(set_instantiating_module): Support CONST_DECLs.
* name-lookup.cc (do_nonmember_using_decl): Don't special-case
CONST_DECLs.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc  | 19 +--
  gcc/cp/name-lookup.cc |  6 ++
  2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 7187d251d1d..d385b422168 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -13129,7 +13129,10 @@ depset::hash::add_binding_entity (tree decl, WMB_Flags 
flags, void *data_)
tree inner = decl;
  
if (TREE_CODE (inner) == CONST_DECL

- && TREE_CODE (DECL_CONTEXT (inner)) == ENUMERAL_TYPE)
+ && TREE_CODE (DECL_CONTEXT (inner)) == ENUMERAL_TYPE
+ /* A using-decl could make a CONST_DECL purview for a non-purview
+enumeration.  */
+ && (!DECL_LANG_SPECIFIC (inner) || !DECL_MODULE_PURVIEW_P (inner)))
inner = TYPE_NAME (DECL_CONTEXT (inner));
else if (TREE_CODE (inner) == TEMPLATE_DECL)
inner = DECL_TEMPLATE_RESULT (inner);
@@ -13164,7 +13167,10 @@ depset::hash::add_binding_entity (tree decl, WMB_Flags 
flags, void *data_)
  gcc_checking_assert (TREE_CODE (decl) == CONST_DECL);
  
  	  flags = WMB_Flags (flags | WMB_Using);

- if (DECL_MODULE_EXPORT_P (TYPE_NAME (TREE_TYPE (decl
+ if (DECL_MODULE_EXPORT_P (TYPE_NAME (TREE_TYPE (decl)))
+ /* A using-decl can make an enum constant exported for a
+non-exported enumeration.  */
+ || (DECL_LANG_SPECIFIC (decl) && DECL_MODULE_EXPORT_P (decl)

Re: [PATCH v2 01/11] aarch64: Remove unused global aarch64_tune_flags

2024-07-11 Thread Kyrylo Tkachov

Hi Andrew,

> On 11 Jul 2024, at 14:11, Andrew Carlotti  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> gcc/ChangeLog:
> 
>* config/aarch64/aarch64.cc
>(aarch64_tune_flags): Remove unused global variable.
>(aarch64_override_options_internal): Remove dead assignment.
> 
> 

Ok. I’d consider it obvious even.
Thanks,
Kyrill

> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 7f0cc47d0f071de9297068baa85c6d5fc4d7fa5b..2a67383bf9d21631664aba82e753120a0173efcf
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -349,9 +349,6 @@ static bool aarch64_print_address_internal (FILE*, 
> machine_mode, rtx,
> /* The processor for which instructions should be scheduled.  */
> enum aarch64_processor aarch64_tune = cortexa53;
> 
> -/* Mask to specify which instruction scheduling options should be used.  */
> -uint64_t aarch64_tune_flags = 0;
> -
> /* Global flag for PC relative loads.  */
> bool aarch64_pcrelative_literal_loads;
> 
> @@ -18273,7 +18270,6 @@ void
> aarch64_override_options_internal (struct gcc_options *opts)
> {
>   const struct processor *tune = aarch64_get_tune_cpu (opts->x_selected_tune);
> -  aarch64_tune_flags = tune->flags;
>   aarch64_tune = tune->sched_core;
>   /* Make a copy of the tuning parameters attached to the core, which
>  we may later overwrite.  */

1 2 >

1 - 100 of 189 matches

Mail list logo