date:20231023

[Committed] RISC-V: Fix typo[VSETVL PASS]

2023-10-23 Thread Juzhe-Zhong

When fixing an issue, I find there is a typo in VSETVL PASS.

Change 'use_by' into 'used_by'.

Committed it as it is very obvious.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info): 
Fix typo.
(pre_vsetvl::pre_global_vsetvl_info): Ditto.

---
 gcc/config/riscv/riscv-vsetvl.cc | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 5948d7260c2..47b459fddd4 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -778,7 +778,7 @@ public:
   bb_info *get_bb () const { return m_bb; }
   uint8_t get_max_sew () const { return m_max_sew; }
   insn_info *get_read_vl_insn () const { return m_read_vl_insn; }
-  bool vl_use_by_non_rvv_insn_p () const { return m_vl_used_by_non_rvv_insn; }
+  bool vl_used_by_non_rvv_insn_p () const { return m_vl_used_by_non_rvv_insn; }
 
   bool has_imm_avl () const { return m_avl && CONST_INT_P (m_avl); }
   bool has_vlmax_avl () const { return vlmax_avl_p (m_avl); }
@@ -1204,7 +1204,7 @@ public:
 if (get_read_vl_insn ())
   fprintf (file, "%sread_vl_insn: insn %u\n", indent,
   get_read_vl_insn ()->uid ());
-if (vl_use_by_non_rvv_insn_p ())
+if (vl_used_by_non_rvv_insn_p ())
   fprintf (file, "%suse_by_non_rvv_insn=true\n", indent);
   }
 };
@@ -1486,7 +1486,7 @@ private:
 if (prev.get_ratio () != next.get_ratio ())
   return false;
 
-if (next.has_vl () && next.vl_use_by_non_rvv_insn_p ())
+if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ())
   return false;
 
 if (vector_config_insn_p (prev.get_insn ()->rtl ()) && next.get_avl_def ()
@@ -2721,7 +2721,7 @@ pre_vsetvl::fuse_local_vsetvl_info ()
  curr_info.dump (dump_file, "");
  fprintf (dump_file, "\n");
}
- if (!curr_info.vl_use_by_non_rvv_insn_p ()
+ if (!curr_info.vl_used_by_non_rvv_insn_p ()
  && vsetvl_insn_p (curr_info.get_insn ()->rtl ()))
m_delete_list.safe_push (curr_info);
 
@@ -3133,7 +3133,7 @@ pre_vsetvl::pre_global_vsetvl_info ()
continue;
  curr_info = block_info.local_infos[0];
}
-  if (curr_info.valid_p () && !curr_info.vl_use_by_non_rvv_insn_p ()
+  if (curr_info.valid_p () && !curr_info.vl_used_by_non_rvv_insn_p ()
  && preds_has_same_avl_p (curr_info))
curr_info.set_change_vtype_only ();
 
-- 
2.36.3

[PATCH 0/4] RISC-V: Fix 'Zicbop'-related bugs (fix ICE and remove broken built-in)

2023-10-23 Thread Tsukasa OI

Hello,

As I explained earlier:
,
the builtin function for RISC-V "__builtin_riscv_zicbop_cbo_prefetchi" is
completely broken and should be removed.

Also, I noted that:

__builtin_prefetch built-in function with the first argument NULL or (not
all but) some fixed addresses (like ((void*)0x20)) can cause an ICE.


Instead of making not broken prefetch built-in functions, this patch set
focuses on fixing those major bugs and intended for fast approval to make
it to GCC 14
(except renaming "prefetch" availabilities for built-in functions).


Thanks,
Tsukasa




Tsukasa OI (4):
  RISC-V: Recategorize "prefetch" availabilities
  RISC-V: Remove broken __builtin_riscv_zicbop_cbo_prefetchi
  RISC-V: Add not broken RW prefetch RTL instructions without offsets
  RISC-V: Fix ICE by expansion and register coercion

 gcc/config/riscv/riscv-builtins.cc|  4 +-
 gcc/config/riscv/riscv-cmo.def|  4 --
 gcc/config/riscv/riscv.md | 56 +--
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c |  6 --
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c |  8 +--
 .../riscv/cmo-zicbop-by-common-ice-1.c| 13 +
 .../riscv/cmo-zicbop-by-common-ice-2.c|  7 +++
 7 files changed, 61 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-2.c


base-commit: c2d41cdfeadb82d921b01c0e104d83f47e2262a2
-- 
2.42.0

[PATCH 1/4] RISC-V: Recategorize "prefetch" availabilities

2023-10-23 Thread Tsukasa OI

From: Tsukasa OI 

Because they are for all prefetch instructions, "prefetch" fits better
than "prefetchi".

gcc/ChangeLog:

* config/riscv/riscv-builtins.cc: Rename availabilities
"prefetchi{32,64}" to "prefetch{32,64}".
* config/riscv/riscv-cmo.def
(__builtin_riscv_zicbop_cbo_prefetchi):
Reflect availability name changes.
---
 gcc/config/riscv/riscv-builtins.cc | 4 ++--
 gcc/config/riscv/riscv-cmo.def | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index fc3976f3ba12..ce549eb3782d 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -103,8 +103,8 @@ AVAIL (inval32, TARGET_ZICBOM && !TARGET_64BIT)
 AVAIL (inval64, TARGET_ZICBOM && TARGET_64BIT)
 AVAIL (zero32,  TARGET_ZICBOZ && !TARGET_64BIT)
 AVAIL (zero64,  TARGET_ZICBOZ && TARGET_64BIT)
-AVAIL (prefetchi32, TARGET_ZICBOP && !TARGET_64BIT)
-AVAIL (prefetchi64, TARGET_ZICBOP && TARGET_64BIT)
+AVAIL (prefetch32, TARGET_ZICBOP && !TARGET_64BIT)
+AVAIL (prefetch64, TARGET_ZICBOP && TARGET_64BIT)
 AVAIL (crypto_zbkb32, TARGET_ZBKB && !TARGET_64BIT)
 AVAIL (crypto_zbkb64, TARGET_ZBKB && TARGET_64BIT)
 AVAIL (crypto_zbkx32, TARGET_ZBKX && !TARGET_64BIT)
diff --git a/gcc/config/riscv/riscv-cmo.def b/gcc/config/riscv/riscv-cmo.def
index ff713b78e19e..017370d1d0e3 100644
--- a/gcc/config/riscv/riscv-cmo.def
+++ b/gcc/config/riscv/riscv-cmo.def
@@ -13,8 +13,8 @@ RISCV_BUILTIN (zero_si, "zicboz_cbo_zero", 
RISCV_BUILTIN_DIRECT_NO_TARGET, RISCV
 RISCV_BUILTIN (zero_di, "zicboz_cbo_zero", RISCV_BUILTIN_DIRECT_NO_TARGET, 
RISCV_VOID_FTYPE_VOID_PTR, zero64),
 
 // zicbop
-RISCV_BUILTIN (prefetchi_si, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, 
RISCV_USI_FTYPE_USI, prefetchi32),
-RISCV_BUILTIN (prefetchi_di, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, 
RISCV_UDI_FTYPE_UDI, prefetchi64),
+RISCV_BUILTIN (prefetchi_si, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, 
RISCV_USI_FTYPE_USI, prefetch32),
+RISCV_BUILTIN (prefetchi_di, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, 
RISCV_UDI_FTYPE_UDI, prefetch64),
 
 // zbkc or zbc
 RISCV_BUILTIN (clmul_si, "clmul", RISCV_BUILTIN_DIRECT, 
RISCV_USI_FTYPE_USI_USI, clmul_zbkc32_or_zbc32),
-- 
2.42.0

[PATCH 2/4] RISC-V: Remove broken __builtin_riscv_zicbop_cbo_prefetchi

2023-10-23 Thread Tsukasa OI

From: Tsukasa OI 

__builtin_riscv_zicbop_cbo_prefetchi (corresponding "prefetch.i"
instruction from the 'Zicbop' extension) is completely broken (not even
functional) and should be removed rather than fixing it because it has
no good way to "fix" this built-in function.

gcc/ChangeLog:

* config/riscv/riscv-cmo.def
(__builtin_riscv_zicbop_cbo_prefetchi): Remove since it's broken.
* config/riscv/riscv.md
(unspecv) Remove UNSPECV_PREI.
(riscv_prefetchi_): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbop-1.c: Remove references to
__builtin_riscv_zicbop_cbo_prefetchi.
* gcc.target/riscv/cmo-zicbop-2.c: Ditto with minor tidying.
---
 gcc/config/riscv/riscv-cmo.def| 4 
 gcc/config/riscv/riscv.md | 9 -
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c | 6 --
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c | 8 +---
 4 files changed, 1 insertion(+), 26 deletions(-)

diff --git a/gcc/config/riscv/riscv-cmo.def b/gcc/config/riscv/riscv-cmo.def
index 017370d1d0e3..dbd5d2f0d9eb 100644
--- a/gcc/config/riscv/riscv-cmo.def
+++ b/gcc/config/riscv/riscv-cmo.def
@@ -12,10 +12,6 @@ RISCV_BUILTIN (inval_di, "zicbom_cbo_inval", 
RISCV_BUILTIN_DIRECT_NO_TARGET, RIS
 RISCV_BUILTIN (zero_si, "zicboz_cbo_zero", RISCV_BUILTIN_DIRECT_NO_TARGET, 
RISCV_VOID_FTYPE_VOID_PTR, zero32),
 RISCV_BUILTIN (zero_di, "zicboz_cbo_zero", RISCV_BUILTIN_DIRECT_NO_TARGET, 
RISCV_VOID_FTYPE_VOID_PTR, zero64),
 
-// zicbop
-RISCV_BUILTIN (prefetchi_si, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, 
RISCV_USI_FTYPE_USI, prefetch32),
-RISCV_BUILTIN (prefetchi_di, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, 
RISCV_UDI_FTYPE_UDI, prefetch64),
-
 // zbkc or zbc
 RISCV_BUILTIN (clmul_si, "clmul", RISCV_BUILTIN_DIRECT, 
RISCV_USI_FTYPE_USI_USI, clmul_zbkc32_or_zbc32),
 RISCV_BUILTIN (clmul_di, "clmul", RISCV_BUILTIN_DIRECT, 
RISCV_UDI_FTYPE_UDI_UDI, clmul_zbkc64_or_zbc64),
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 23d91331290b..4b445cb8be9c 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -118,7 +118,6 @@
   UNSPECV_FLUSH
   UNSPECV_INVAL
   UNSPECV_ZERO
-  UNSPECV_PREI
 
   ;; Zihintpause unspec
   UNSPECV_PAUSE
@@ -3493,14 +3492,6 @@
 }
   [(set_attr "type" "cbo")])
 
-(define_insn "riscv_prefetchi_"
-  [(unspec_volatile:X [(match_operand:X 0 "address_operand" "r")
-  (match_operand:X 1 "imm5_operand" "i")]
-  UNSPECV_PREI)]
-  "TARGET_ZICBOP"
-  "prefetch.i\t%a0"
-  [(set_attr "type" "cbo")])
-
 (define_expand "extv"
   [(set (match_operand:GPR 0 "register_operand" "=r")
(sign_extract:GPR (match_operand:GPR 1 "register_operand" "r")
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
index c5d78c1763d3..54b764fb7452 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
@@ -13,11 +13,5 @@ void foo (char *p)
   __builtin_prefetch (p, 1, 3);
 }
 
-int foo1()
-{
-  return __builtin_riscv_zicbop_cbo_prefetchi(1);
-}
-
-/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
 /* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
 /* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
index 6576365b39ca..917adc8f2008 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
@@ -13,11 +13,5 @@ void foo (char *p)
   __builtin_prefetch (p, 1, 3);
 }
 
-int foo1()
-{
-  return __builtin_riscv_zicbop_cbo_prefetchi(1);
-}
-
-/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
 /* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
-/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */ 
+/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
-- 
2.42.0

[PATCH 3/4] RISC-V: Add not broken RW prefetch RTL instructions without offsets

2023-10-23 Thread Tsukasa OI

From: Tsukasa OI 

To prepare adding new not broken prefetch built-in functions and fixing
an ICE in __builtin_prefetch, this commit adds two new instructions,
each corresponding a 'Zicbop' prefetch hint instruction, but with no
specifiable offset field for simplicity.

This commit also excludes new instruction corresponding "prefetch.i"
because it is not needed to fix an ICE (so new instruction corresponding
"prefetch.i" is going to be a separate commit).

gcc/ChangeLog:

* config/riscv/riscv.md (unspecv) Add UNSPECV_PREFETCH_[RW].
(riscv_prefetch_w_, riscv_prefetch_w_): New.
---
 gcc/config/riscv/riscv.md | 16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4b445cb8be9c..e67a6d1f1b81 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -118,6 +118,8 @@
   UNSPECV_FLUSH
   UNSPECV_INVAL
   UNSPECV_ZERO
+  UNSPECV_PREFETCH_R
+  UNSPECV_PREFETCH_W
 
   ;; Zihintpause unspec
   UNSPECV_PAUSE
@@ -3492,6 +3494,20 @@
 }
   [(set_attr "type" "cbo")])
 
+(define_insn "riscv_prefetch_r_"
+  [(unspec_volatile:X [(match_operand:X 0 "register_operand" "r")]
+  UNSPECV_PREFETCH_R)]
+  "TARGET_ZICBOP"
+  "prefetch.r\t0(%0)"
+  [(set_attr "type" "cbo")])
+
+(define_insn "riscv_prefetch_w_"
+  [(unspec_volatile:X [(match_operand:X 0 "register_operand" "r")]
+  UNSPECV_PREFETCH_W)]
+  "TARGET_ZICBOP"
+  "prefetch.w\t0(%0)"
+  [(set_attr "type" "cbo")])
+
 (define_expand "extv"
   [(set (match_operand:GPR 0 "register_operand" "=r")
(sign_extract:GPR (match_operand:GPR 1 "register_operand" "r")
-- 
2.42.0

[PATCH 4/4] RISC-V: Fix ICE by expansion and register coercion

2023-10-23 Thread Tsukasa OI

From: Tsukasa OI 

A "prefetch" instruction on RISC-V GCC emits a machine hint instruction
directly when the 'Zicbop' extension is enabled but it could cause an ICE
when the address argument of __builtin_prefetch is an integral constant
(such like 0 [NULL] or some other [but possibly not all] fixed addresses).

This is caused by the fact that the "r" constraint is not actually checked
and something other than a register can be the first argument of the
"prefetch" RTL instruction.

It fixes the problem by changing "prefetch" from a native instruction to
an expansion and coercing the address to a register there.

gcc/ChangeLog:

* config/riscv/riscv.md (prefetch): Expand to a native prefetch
instruction instead of emitting a machine instruction directly.
Coerce the address argument into a register.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbop-by-common-ice-1.c: New ICE test.
* gcc.target/riscv/cmo-zicbop-by-common-ice-2.c: Ditto.
---
 gcc/config/riscv/riscv.md | 43 ---
 .../riscv/cmo-zicbop-by-common-ice-1.c| 13 ++
 .../riscv/cmo-zicbop-by-common-ice-2.c|  7 +++
 3 files changed, 48 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-2.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index e67a6d1f1b81..bf232345b1ab 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3479,21 +3479,6 @@
   [(set_attr "type" "cbo")]
 )
 
-(define_insn "prefetch"
-  [(prefetch (match_operand 0 "address_operand" "r")
- (match_operand 1 "imm5_operand" "i")
- (match_operand 2 "const_int_operand" "n"))]
-  "TARGET_ZICBOP"
-{
-  switch (INTVAL (operands[1]))
-  {
-case 0: return "prefetch.r\t%a0";
-case 1: return "prefetch.w\t%a0";
-default: gcc_unreachable ();
-  }
-}
-  [(set_attr "type" "cbo")])
-
 (define_insn "riscv_prefetch_r_"
   [(unspec_volatile:X [(match_operand:X 0 "register_operand" "r")]
   UNSPECV_PREFETCH_R)]
@@ -3508,6 +3493,34 @@
   "prefetch.w\t0(%0)"
   [(set_attr "type" "cbo")])
 
+(define_expand "prefetch"
+  [(prefetch (match_operand 0 "address_operand" "")
+(match_operand 1 "const_int_operand" "")
+(match_operand 2 "const_int_operand" ""))]
+  "TARGET_ZICBOP"
+{
+  operands[0] = force_reg (Pmode, operands[0]);
+  switch (INTVAL (operands[1]))
+{
+case 0:
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_prefetch_r_di (operands[0]));
+  else
+   emit_insn (gen_riscv_prefetch_r_si (operands[0]));
+  break;
+case 1:
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_prefetch_w_di (operands[0]));
+  else
+   emit_insn (gen_riscv_prefetch_w_si (operands[0]));
+  break;
+default:
+  gcc_unreachable ();
+}
+  DONE;
+}
+  [(set_attr "type" "cbo")])
+
 (define_expand "extv"
   [(set (match_operand:GPR 0 "register_operand" "=r")
(sign_extract:GPR (match_operand:GPR 1 "register_operand" "r")
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-1.c
new file mode 100644
index ..47e83f29cc5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32i_zicbop -mabi=ilp32" } */
+
+void foo (void)
+{
+  /* Second argument defaults to zero (read).  */
+  __builtin_prefetch (0);
+  __builtin_prefetch (0, 0);
+  __builtin_prefetch (0, 1);
+}
+
+/* { dg-final { scan-assembler-times "prefetch\\.r" 2 } } */
+/* { dg-final { scan-assembler-times "prefetch\\.w" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-2.c
new file mode 100644
index ..a245b8163c1f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-2.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64i_zicbop -mabi=lp64" } */
+
+#include "cmo-zicbop-by-common-ice-1.c"
+
+/* { dg-final { scan-assembler-times "prefetch\\.r" 2 } } */
+/* { dg-final { scan-assembler-times "prefetch\\.w" 1 } } */
-- 
2.42.0

Pushed: [PATCH 0/5] LoongArch: Better balance between relaxation and scheduling

2023-10-23 Thread Xi Ruoyao

Pushed r14-{4848..4852}.

On Thu, 2023-10-19 at 22:02 +0800, Xi Ruoyao wrote:
> For relaxation we are now generating assembler macros for symbolic
> addresses everywhere, but this is limiting scheduling and there are
> known situations where the relaxation cannot improve the code.
> 
> 1. When we are performing LTO during a final link and the linker plugin
> is used, la.global won't be relaxed because they reference to an
> external or preemptable symbol.
> 2. The linker currently do not relax la.tls.*.
> 3. For la.local + ld/st pairs, if the address is only used once,
> emitting pcalau12i + ld/st is always not worse than relying on linker
> relaxation.
> 
> Add -mexplicit-relocs=auto to allow the compiler to use explicit relocs
> for these cases, but assembler macros for other cases.  Use it as the
> default if the assembler supports both explicit relocs and relaxation.
> 
> LTO-bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> 
> Xi Ruoyao (5):
>   LoongArch: Add enum-style -mexplicit-relocs= option
>   LoongArch: Use explicit relocs for GOT access when
>     -mexplicit-relocs=auto and LTO during a final link with linker
>     plugin
>   LoongArch: Use explicit relocs for TLS access with
>     -mexplicit-relocs=auto
>   LoongArch: Use explicit relocs for addresses only used for one load or
>     store with -mexplicit-relocs=auto and -mcmodel={normal,medium}
>   LoongArch: Document -mexplicit-relocs={auto,none,always}
> 
>  .../loongarch/genopts/loongarch-strings   |   6 +
>  gcc/config/loongarch/genopts/loongarch.opt.in |  21 ++-
>  gcc/config/loongarch/loongarch-def.h  |   6 +
>  gcc/config/loongarch/loongarch-protos.h   |   1 +
>  gcc/config/loongarch/loongarch-str.h  |   5 +
>  gcc/config/loongarch/loongarch.cc |  75 --
>  gcc/config/loongarch/loongarch.h  |   3 +
>  gcc/config/loongarch/loongarch.md | 128 +-
>  gcc/config/loongarch/loongarch.opt    |  21 ++-
>  gcc/config/loongarch/predicates.md    |  15 +-
>  gcc/doc/invoke.texi   |  37 +++--
>  .../loongarch/explicit-relocs-auto-lto.c  |  26 
>  ...-relocs-auto-single-load-store-no-anchor.c |   6 +
>  .../explicit-relocs-auto-single-load-store.c  |  14 ++
>  .../explicit-relocs-auto-tls-ld-gd.c  |   9 ++
>  .../explicit-relocs-auto-tls-le-ie.c  |   6 +
>  16 files changed, 343 insertions(+), 36 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-lto.c
>  create mode 100644 
> gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-no-anchor.c
>  create mode 100644 
> gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store.c
>  create mode 100644 
> gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
>  create mode 100644 
> gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-le-ie.c
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] convert_to_complex vs invalid_conversion [PR111903]

2023-10-23 Thread Richard Biener

On Sat, Oct 21, 2023 at 8:07 PM Andrew Pinski  wrote:
>
> convert_to_complex when creating a COMPLEX_EXPR does
> not currently check if either the real or imag parts
> was not error_mark_node. This later on confuses the gimpilfier
> when there was a SAVE_EXPR wrapped around that COMPLEX_EXPR.
> The simple fix is after calling convert inside convert_to_complex_1,
> check that the either result was an error_operand and return
> an error_mark_node in that case.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

Richard.

> PR c/111903
>
> gcc/ChangeLog:
>
> * convert.cc (convert_to_complex_1): Return
> error_mark_node if either convert was an error
> when converting from a scalar.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/float16-8.c: New test.
> ---
>  gcc/convert.cc|  9 +++--
>  gcc/testsuite/gcc.target/i386/float16-8.c | 12 
>  2 files changed, 19 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-8.c
>
> diff --git a/gcc/convert.cc b/gcc/convert.cc
> index 80d86fe3708..ac6af7026a7 100644
> --- a/gcc/convert.cc
> +++ b/gcc/convert.cc
> @@ -1006,8 +1006,13 @@ convert_to_complex_1 (tree type, tree expr, bool 
> fold_p)
>  case ENUMERAL_TYPE:
>  case BOOLEAN_TYPE:
>  case BITINT_TYPE:
> -  return build2 (COMPLEX_EXPR, type, convert (subtype, expr),
> -convert (subtype, integer_zero_node));
> +  {
> +   tree real = convert (subtype, expr);
> +   tree imag = convert (subtype, integer_zero_node);
> +   if (error_operand_p (real) || error_operand_p (imag))
> + return error_mark_node;
> +   return build2 (COMPLEX_EXPR, type, real, imag);
> +  }
>
>  case COMPLEX_TYPE:
>{
> diff --git a/gcc/testsuite/gcc.target/i386/float16-8.c 
> b/gcc/testsuite/gcc.target/i386/float16-8.c
> new file mode 100644
> index 000..003f82e7146
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-8.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mno-sse" } */
> +/* PR c/111903 */
> +
> +int i;
> +_Float16 f;
> +int bar(...);
> +void
> +foo (void)
> +{
> +  i /= bar ((_Complex _Float16) f); /* { dg-error "" } */
> +}
> --
> 2.39.3
>

Re: [PATCHv2] move the (a-b) CMP 0 ? (a-b) : (b-a) optimization from fold_cond_expr_with_comparison to match

2023-10-23 Thread Richard Biener

On Sun, Oct 22, 2023 at 2:13 AM Andrew Pinski  wrote:
>
> From: Andrew Pinski 
>
> This patch moves the `(a-b) CMP 0 ? (a-b) : (b-a)` optimization
> from fold_cond_expr_with_comparison to match.
>
> Bootstrapped and tested on x86_64-linux-gnu.

OK.

> Changes in:
> v2: Removes `(a == b) ? 0 : (b - a)` handling since it was handled
> via r14-3606-g3d86e7f4a8ae
> Change zerop to integer_zerop for `(a - b) == 0 ? 0 : (b - a)`,
> Add `(a - b) != 0 ? (a - b) : 0` handling.
>
> gcc/ChangeLog:
>
> * match.pd (`(A - B) CMP 0 ? (A - B) : (B - A)`):
> New patterns.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/phi-opt-38.c: New test.
> ---
>  gcc/match.pd   | 46 --
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c | 45 +
>  2 files changed, 88 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index a56838fb388..ce8d159d260 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5650,9 +5650,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(cnd (logical_inverted_value truth_valued_p@0) @1 @2)
>(cnd @0 @2 @1)))
>
> -/* abs/negative simplifications moved from fold_cond_expr_with_comparison,
> -   Need to handle (A - B) case as fold_cond_expr_with_comparison does.
> -   Need to handle UN* comparisons.
> +/* abs/negative simplifications moved from fold_cond_expr_with_comparison.
>
> None of these transformations work for modes with signed
> zeros.  If A is +/-0, the first two transformations will
> @@ -5717,6 +5715,48 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (convert (negate (absu:utype @0
> (negate (abs @0)
>   )
> +
> + /* (A - B) == 0 ? (A - B) : (B - A)same as (B - A) */
> + (for cmp (eq uneq)
> +  (simplify
> +   (cnd (cmp (minus@0 @1 @2) zerop) @0 (minus@3 @2 @1))
> +   (if (!HONOR_SIGNED_ZEROS (type))
> +@3))
> +  (simplify
> +   (cnd (cmp (minus@0 @1 @2) integer_zerop) integer_zerop (minus@3 @2 @1))
> +   @3)
> + )
> + /* (A - B) != 0 ? (A - B) : (B - A)same as (A - B) */
> + (for cmp (ne ltgt)
> +  (simplify
> +   (cnd (cmp (minus@0 @1 @2) zerop) @0 (minus @2 @1))
> +   (if (!HONOR_SIGNED_ZEROS (type))
> +@0))
> +  (simplify
> +   (cnd (cmp (minus@0 @1 @2) integer_zerop) @0 integer_zerop)
> +   @0)
> + )
> + /* (A - B) >=/> 0 ? (A - B) : (B - A)same as abs (A - B) */
> + (for cmp (ge gt)
> +  (simplify
> +   (cnd (cmp (minus@0 @1 @2) zerop) @0 (minus @2 @1))
> +   (if (!HONOR_SIGNED_ZEROS (type)
> +   && !TYPE_UNSIGNED (type))
> +(abs @0
> + /* (A - B) <=/< 0 ? (A - B) : (B - A)same as -abs (A - B) */
> + (for cmp (le lt)
> +  (simplify
> +   (cnd (cmp (minus@0 @1 @2) zerop) @0 (minus @2 @1))
> +   (if (!HONOR_SIGNED_ZEROS (type)
> +   && !TYPE_UNSIGNED (type))
> +(if (ANY_INTEGRAL_TYPE_P (type)
> +&& !TYPE_OVERFLOW_WRAPS (type))
> + (with {
> +tree utype = unsigned_type_for (type);
> +  }
> +  (convert (negate (absu:utype @0
> +  (negate (abs @0)
> + )
>  )
>
>  /* -(type)!A -> (type)A - 1.  */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c
> new file mode 100644
> index 000..0f0e3170f8d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c
> @@ -0,0 +1,45 @@
> +/* { dg-options "-O2 -fno-signed-zeros -fdump-tree-phiopt" } */
> +int minus1(int a, int b)
> +{
> +  int c = a - b;
> +  if (c == 0) c = b - a;
> +  return c;
> +}
> +int minus2(int a, int b)
> +{
> +  int c = a - b;
> +  if (c != 0) c = b - a;
> +  return c;
> +}
> +int minus3(int a, int b)
> +{
> +  int c = a - b;
> +  if (c == 0) c = 0;
> +  else c = b - a;
> +  return c;
> +}
> +int minus4(int a, int b)
> +{
> +  int c;
> +  if (a == b) c = 0;
> +  else
> +c = b - a;
> +  return c;
> +}
> +int abs0(int a, int b)
> +{
> +  int c = a - b;
> +  if (c <= 0) c = b - a;
> +  return c;
> +}
> +int negabs(int a, int b)
> +{
> +  int c = a - b;
> +  if (c >= 0) c = b - a;
> +  return c;
> +}
> +
> +/* The above should be optimized at phiopt1 except for negabs which has to 
> wait
> +  until phiopt2 as -abs is not acceptable in early phiopt.  */
> +/* { dg-final { scan-tree-dump-times "if" 1  "phiopt1"  } } */
> +/* { dg-final { scan-tree-dump-not "if" "phiopt2" } } */
> --
> 2.39.3
>

Re: [PATCH] Use error_mark_node after error in convert

2023-10-23 Thread Richard Biener

On Mon, Oct 23, 2023 at 12:22 AM Andrew Pinski  wrote:
>
> While working on PR c/111903, I Noticed that
> convert will convert integer_zero_node to that
> type after an error instead of returning error_mark_node.
> From what I can tell this was the old way of not having
> error recovery since other places in this file does return
> error_mark_node and the places I am replacing date from
> when the file was imported into the repro (either via a gcc2 merge
> or earlier).
>
> I also had to update the objc front-end to allow for the error_mark_node
> change, I suspect you could hit the ICE without this change though.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> gcc/ChangeLog:
>
> * convert.cc (convert_to_pointer_1): Return error_mark_node
> after an error.
> (convert_to_real_1): Likewise.
> (convert_to_integer_1): Likewise.
> (convert_to_complex_1): Likewise.
>
> gcc/objc/ChangeLog:
>
> * objc-gnu-runtime-abi-01.cc (build_objc_method_call): Allow
> for error_operand after call to build_c_cast.
> * objc-next-runtime-abi-01.cc (build_objc_method_call): Likewise.
> * objc-next-runtime-abi-02.cc (build_v2_build_objc_method_call): 
> Likewise.
> ---
>  gcc/convert.cc   | 12 ++--
>  gcc/objc/objc-gnu-runtime-abi-01.cc  |  3 +++
>  gcc/objc/objc-next-runtime-abi-01.cc |  3 +++
>  gcc/objc/objc-next-runtime-abi-02.cc |  3 +++
>  4 files changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/convert.cc b/gcc/convert.cc
> index 5357609d8f0..ac6af7026a7 100644
> --- a/gcc/convert.cc
> +++ b/gcc/convert.cc
> @@ -96,7 +96,7 @@ convert_to_pointer_1 (tree type, tree expr, bool fold_p)
>
>  default:
>error ("cannot convert to a pointer type");
> -  return convert_to_pointer_1 (type, integer_zero_node, fold_p);
> +  return error_mark_node;
>  }
>  }
>
> @@ -332,11 +332,11 @@ convert_to_real_1 (tree type, tree expr, bool fold_p)
>  case POINTER_TYPE:
>  case REFERENCE_TYPE:
>error ("pointer value used where a floating-point was expected");
> -  return convert_to_real_1 (type, integer_zero_node, fold_p);
> +  return error_mark_node;
>
>  default:
>error ("aggregate value used where a floating-point was expected");
> -  return convert_to_real_1 (type, integer_zero_node, fold_p);
> +  return error_mark_node;
>  }
>  }
>
> @@ -959,7 +959,7 @@ convert_to_integer_1 (tree type, tree expr, bool dofold)
>
>  default:
>error ("aggregate value used where an integer was expected");
> -  return convert (type, integer_zero_node);
> +  return error_mark_node;
>  }
>  }
>
> @@ -1053,11 +1053,11 @@ convert_to_complex_1 (tree type, tree expr, bool 
> fold_p)
>  case POINTER_TYPE:
>  case REFERENCE_TYPE:
>error ("pointer value used where a complex was expected");
> -  return convert_to_complex_1 (type, integer_zero_node, fold_p);
> +  return error_mark_node;
>
>  default:
>error ("aggregate value used where a complex was expected");
> -  return convert_to_complex_1 (type, integer_zero_node, fold_p);
> +  return error_mark_node;
>  }
>  }
>
> diff --git a/gcc/objc/objc-gnu-runtime-abi-01.cc 
> b/gcc/objc/objc-gnu-runtime-abi-01.cc
> index fbf8307297a..6f45283b307 100644
> --- a/gcc/objc/objc-gnu-runtime-abi-01.cc
> +++ b/gcc/objc/objc-gnu-runtime-abi-01.cc
> @@ -700,6 +700,9 @@ build_objc_method_call (location_t loc, int super_flag, 
> tree method_prototype,
>
>lookup_object = build_c_cast (loc, rcv_p, lookup_object);
>
> +  if (error_operand_p (lookup_object))
> +return error_mark_node;
> +
>/* Use SAVE_EXPR to avoid evaluating the receiver twice.  */
>lookup_object = save_expr (lookup_object);
>
> diff --git a/gcc/objc/objc-next-runtime-abi-01.cc 
> b/gcc/objc/objc-next-runtime-abi-01.cc
> index 70ab5262e17..9e28976043e 100644
> --- a/gcc/objc/objc-next-runtime-abi-01.cc
> +++ b/gcc/objc/objc-next-runtime-abi-01.cc
> @@ -846,6 +846,9 @@ build_objc_method_call (location_t loc, int super_flag, 
> tree method_prototype,
>
>lookup_object = build_c_cast (loc, rcv_p, lookup_object);
>
> +  if (error_operand_p (lookup_object))
> +return error_mark_node;
> +
>/* Use SAVE_EXPR to avoid evaluating the receiver twice.  */
>lookup_object = save_expr (lookup_object);
>
> diff --git a/gcc/objc/objc-next-runtime-abi-02.cc 
> b/gcc/objc/objc-next-runtime-abi-02.cc
> index 6548c0078e0..723b47c9cf6 100644
> --- a/gcc/objc/objc-next-runtime-abi-02.cc
> +++ b/gcc/objc/objc-next-runtime-abi-02.cc
> @@ -1729,6 +1729,9 @@ build_v2_build_objc_method_call (int super, tree 
> method_prototype,
>
>lookup_object = build_c_cast (loc, rcv_p, lookup_object);
>
> +  if (error_operand_p (lookup_object))
> +return error_mark_node;
> +
>/* Use SAVE_EXPR to avoid evaluating the receiver twice.  */
>lookup_object = save_expr (lookup_object);
>
> --
> 2.39.

Re: [PATCH 1/5] Remove obsolete debugging formats from names list

2023-10-23 Thread Richard Biener

On Mon, Oct 23, 2023 at 2:56 AM Mark Harmstone  wrote:
>
> STABS and xcoff have been removed, but are still in debug_type_names,
> which ought to match debug_type_masks. This results in the following
> minor bug with GCC 13:
>
> $ x86_64-pc-linux-gnu-gcc -gvms -c tmp.c
> cc1: error: target system does not support the ‘dwarf-2’ debug format

OK for trunk and branch.

Richard.

> ---
>  gcc/opts.cc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/opts.cc b/gcc/opts.cc
> index 573dcf8e497..8015cb7556a 100644
> --- a/gcc/opts.cc
> +++ b/gcc/opts.cc
> @@ -50,7 +50,7 @@ static void set_Wstrict_aliasing (struct gcc_options *opts, 
> int onoff);
>
>  const char *const debug_type_names[] =
>  {
> -  "none", "stabs", "dwarf-2", "xcoff", "vms", "ctf", "btf"
> +  "none", "dwarf-2", "vms", "ctf", "btf"
>  };
>
>  /* Bitmasks of fundamental debug info formats indexed by enum
> @@ -65,7 +65,7 @@ static uint32_t debug_type_masks[] =
>  /* Names of the set of debug formats requested by user.  Updated and accessed
> via debug_set_names.  */
>
> -static char df_set_names[sizeof "none stabs dwarf-2 xcoff vms ctf btf"];
> +static char df_set_names[sizeof "none dwarf-2 vms ctf btf"];
>
>  /* Get enum debug_info_type of the specified debug format, for error 
> messages.
> Can be used only for individual debug format types.  */
> --
> 2.41.0
>

[PATCH v1] RISC-V: Bugfix for merging undef tmp register for trunc

2023-10-23 Thread pan2 . li

From: Pan Li 

For trunc function autovec, there will be one step like below take MU
for the merge operand.

rtx tmp = gen_reg_rtx (vec_int_mode);
emit_vec_cvt_x_f_rtz (tmp, op_1, mask, vec_fp_mode);

The MU will leave the tmp (aka dest register) register unmasked elements
unchanged and it is undefined here. This patch would like to adjust the
MU to MA.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_vec_cvt_x_f_rtz): Add insn type
arg.
(expand_vec_trunc): Take MA instead of MU for cvt_x_f_rtz.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-v.cc | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 91ad6a61fa8..fb6a4e561db 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4144,12 +4144,20 @@ emit_vec_cvt_f_x (rtx op_dest, rtx op_src, rtx mask,
 
 static void
 emit_vec_cvt_x_f_rtz (rtx op_dest, rtx op_src, rtx mask,
- machine_mode vec_mode)
+ insn_type type, machine_mode vec_mode)
 {
-  rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src};
   insn_code icode = code_for_pred (FIX, vec_mode);
 
-  emit_vlmax_insn (icode, UNARY_OP_TAMU, cvt_x_ops);
+  if (type & USE_VUNDEF_MERGE_P)
+{
+  rtx cvt_x_ops[] = {op_dest, mask, op_src};
+  emit_vlmax_insn (icode, type, cvt_x_ops);
+}
+  else
+{
+  rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src};
+  emit_vlmax_insn (icode, type, cvt_x_ops);
+}
 }
 
 void
@@ -4285,7 +4293,7 @@ expand_vec_trunc (rtx op_0, rtx op_1, machine_mode 
vec_fp_mode,
 
   /* Step-3: Convert to integer on mask, rounding to zero (aka truncate).  */
   rtx tmp = gen_reg_rtx (vec_int_mode);
-  emit_vec_cvt_x_f_rtz (tmp, op_1, mask, vec_fp_mode);
+  emit_vec_cvt_x_f_rtz (tmp, op_1, mask, UNARY_OP_TAMA, vec_fp_mode);
 
   /* Step-4: Convert to floating-point on mask for the rint result.  */
   emit_vec_cvt_f_x (op_0, tmp, mask, UNARY_OP_TAMU_FRM_DYN, vec_fp_mode);
-- 
2.34.1

Re: [PATCH v1] RISC-V: Bugfix for merging undef tmp register for trunc

2023-10-23 Thread juzhe.zh...@rivai.ai

LGTM。



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-23 15:53
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Bugfix for merging undef tmp register for trunc
From: Pan Li 
 
For trunc function autovec, there will be one step like below take MU
for the merge operand.
 
rtx tmp = gen_reg_rtx (vec_int_mode);
emit_vec_cvt_x_f_rtz (tmp, op_1, mask, vec_fp_mode);
 
The MU will leave the tmp (aka dest register) register unmasked elements
unchanged and it is undefined here. This patch would like to adjust the
MU to MA.
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (emit_vec_cvt_x_f_rtz): Add insn type
arg.
(expand_vec_trunc): Take MA instead of MU for cvt_x_f_rtz.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-v.cc | 16 
1 file changed, 12 insertions(+), 4 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 91ad6a61fa8..fb6a4e561db 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4144,12 +4144,20 @@ emit_vec_cvt_f_x (rtx op_dest, rtx op_src, rtx mask,
static void
emit_vec_cvt_x_f_rtz (rtx op_dest, rtx op_src, rtx mask,
-   machine_mode vec_mode)
+   insn_type type, machine_mode vec_mode)
{
-  rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src};
   insn_code icode = code_for_pred (FIX, vec_mode);
-  emit_vlmax_insn (icode, UNARY_OP_TAMU, cvt_x_ops);
+  if (type & USE_VUNDEF_MERGE_P)
+{
+  rtx cvt_x_ops[] = {op_dest, mask, op_src};
+  emit_vlmax_insn (icode, type, cvt_x_ops);
+}
+  else
+{
+  rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src};
+  emit_vlmax_insn (icode, type, cvt_x_ops);
+}
}
void
@@ -4285,7 +4293,7 @@ expand_vec_trunc (rtx op_0, rtx op_1, machine_mode 
vec_fp_mode,
   /* Step-3: Convert to integer on mask, rounding to zero (aka truncate).  */
   rtx tmp = gen_reg_rtx (vec_int_mode);
-  emit_vec_cvt_x_f_rtz (tmp, op_1, mask, vec_fp_mode);
+  emit_vec_cvt_x_f_rtz (tmp, op_1, mask, UNARY_OP_TAMA, vec_fp_mode);
   /* Step-4: Convert to floating-point on mask for the rint result.  */
   emit_vec_cvt_f_x (op_0, tmp, mask, UNARY_OP_TAMU_FRM_DYN, vec_fp_mode);
-- 
2.34.1

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Richard Biener

On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao  wrote:
>
>
>
> > On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar  wrote:
> >
> > On 2023-10-20 14:38, Qing Zhao wrote:
> >> How about the following:
> >>   Add one more parameter to __builtin_dynamic_object_size(), i.e
> >> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
> >> When we see the structure field has counted_by attribute.
> >
> > Or maybe add a barrier preventing any assignments to array_annotated->foo 
> > from being reordered below the __bdos call? Basically an __asm__ with 
> > array_annotated->foo in the clobber list ought to do it I think.
>
> Maybe just adding the array_annotated->foo to the use list of the call to 
> __builtin_dynamic_object_size should be enough?
>
> But I am not sure how to implement this in the TREE level, is there a 
> USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the 
> counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos?
>
> This might be the simplest solution?

If the dynamic object size is derived of a field then I think you need to
put the "load" of that memory location at the point (as argument)
of the __bos call right at parsing time.  I know that's awkward because
you try to play tricks "discovering" that field only late, but that's not
going to work.

A related issue is that assignment to the field and storage allocation
are not tied together - if there's no use of the size data we might
remove the store of it as dead.

Of course I guess __bos then behaves like sizeof ().

Richard.

>
> Qing
>
> >
> > It may not work for something like this though:
> >
> > static size_t
> > get_size_of (void *ptr)
> > {
> >  return __bdos (ptr, 1);
> > }
> >
> > void
> > foo (size_t sz)
> > {
> >  array_annotated = __builtin_malloc (sz);
> >  array_annotated = sz;
> >
> >  ...
> >  __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
> >  ...
> > }
> >
> > because the call to get_size_of () may not have been inlined that early.
> >
> > The more fool-proof alternative may be to put a compile time barrier right 
> > below the assignment to array_annotated->foo; I reckon you could do that 
> > early in the front end by marking the size identifier and then tracking 
> > assignments to that identifier.  That may have a slight runtime performance 
> > overhead since it may prevent even legitimate reordering.  I can't think of 
> > another alternative at the moment...
> >
> > Sid
>

RE: [PATCH v1] RISC-V: Bugfix for merging undef tmp register for trunc

2023-10-23 Thread Li, Pan2

Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Monday, October 23, 2023 3:56 PM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Bugfix for merging undef tmp register for trunc

LGTM。


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-10-23 15:53
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Bugfix for merging undef tmp register for trunc
From: Pan Li mailto:pan2...@intel.com>>

For trunc function autovec, there will be one step like below take MU
for the merge operand.

rtx tmp = gen_reg_rtx (vec_int_mode);
emit_vec_cvt_x_f_rtz (tmp, op_1, mask, vec_fp_mode);

The MU will leave the tmp (aka dest register) register unmasked elements
unchanged and it is undefined here. This patch would like to adjust the
MU to MA.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_vec_cvt_x_f_rtz): Add insn type
arg.
(expand_vec_trunc): Take MA instead of MU for cvt_x_f_rtz.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/riscv-v.cc | 16 
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 91ad6a61fa8..fb6a4e561db 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4144,12 +4144,20 @@ emit_vec_cvt_f_x (rtx op_dest, rtx op_src, rtx mask,
static void
emit_vec_cvt_x_f_rtz (rtx op_dest, rtx op_src, rtx mask,
-   machine_mode vec_mode)
+   insn_type type, machine_mode vec_mode)
{
-  rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src};
   insn_code icode = code_for_pred (FIX, vec_mode);
-  emit_vlmax_insn (icode, UNARY_OP_TAMU, cvt_x_ops);
+  if (type & USE_VUNDEF_MERGE_P)
+{
+  rtx cvt_x_ops[] = {op_dest, mask, op_src};
+  emit_vlmax_insn (icode, type, cvt_x_ops);
+}
+  else
+{
+  rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src};
+  emit_vlmax_insn (icode, type, cvt_x_ops);
+}
}
void
@@ -4285,7 +4293,7 @@ expand_vec_trunc (rtx op_0, rtx op_1, machine_mode 
vec_fp_mode,
   /* Step-3: Convert to integer on mask, rounding to zero (aka truncate).  */
   rtx tmp = gen_reg_rtx (vec_int_mode);
-  emit_vec_cvt_x_f_rtz (tmp, op_1, mask, vec_fp_mode);
+  emit_vec_cvt_x_f_rtz (tmp, op_1, mask, UNARY_OP_TAMA, vec_fp_mode);
   /* Step-4: Convert to floating-point on mask for the rint result.  */
   emit_vec_cvt_f_x (op_0, tmp, mask, UNARY_OP_TAMU_FRM_DYN, vec_fp_mode);
--
2.34.1

Re: [PATCH] [PR111520] set hardcmp eh probs (was: rename make_eh_edges to make_eh_edge)

2023-10-23 Thread Richard Biener

On Sat, Oct 21, 2023 at 9:17 AM Alexandre Oliva  wrote:
>
> On Oct 20, 2023, Richard Biener  wrote:
>
> >> * tree-eh.h (make_eh_edges): Rename to...
> >> (make_eh_edge): ... this.
> >> * tree-eh.cc: Likewise.  Adjust all callers.
>
> Once the above goes in (it depends on the strub monster patch), the
> following one should apply as well.  Regstrapped on x86_64-linux-gnu.
> Ok to install?

OK.

> Set execution count of EH blocks, and probability of EH edges.
>
>
> for  gcc/ChangeLog
>
> PR tree-optimization/111520
> * gimple-harden-conditionals.cc
> (pass_harden_compares::execute): Set EH edge probability and
> EH block execution count.
>
> for  gcc/testsuite/ChangeLog
>
> PR tree-optimization/111520
> * g++.dg/torture/harden-comp-pr111520.cc: New.
> ---
>  gcc/gimple-harden-conditionals.cc  |   12 +++-
>  .../g++.dg/torture/harden-comp-pr111520.cc |   17 +
>  2 files changed, 28 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/harden-comp-pr111520.cc
>
> diff --git a/gcc/gimple-harden-conditionals.cc 
> b/gcc/gimple-harden-conditionals.cc
> index 1999e827a04ca..bded288985063 100644
> --- a/gcc/gimple-harden-conditionals.cc
> +++ b/gcc/gimple-harden-conditionals.cc
> @@ -580,11 +580,21 @@ pass_harden_compares::execute (function *fun)
>   if (throwing_compare_p)
> {
>   add_stmt_to_eh_lp (asgnck, lookup_stmt_eh_lp (asgn));
> - make_eh_edge (asgnck);
> + edge eh = make_eh_edge (asgnck);
> + /* This compare looks like it could raise an exception,
> +but it's dominated by the original compare, that
> +would raise an exception first, so the EH edge from
> +this one is never really taken.  */
> + eh->probability = profile_probability::never ();
> + if (eh->dest->count.initialized_p ())
> +   eh->dest->count += eh->count ();
> + else
> +   eh->dest->count = eh->count ();
>
>   edge ckeh;
>   basic_block nbb = split_edge (non_eh_succ_edge
> (gimple_bb (asgnck), &ckeh));
> + gcc_checking_assert (eh == ckeh);
>   gsi_split = gsi_start_bb (nbb);
>
>   if (dump_file)
> diff --git a/gcc/testsuite/g++.dg/torture/harden-comp-pr111520.cc 
> b/gcc/testsuite/g++.dg/torture/harden-comp-pr111520.cc
> new file mode 100644
> index 0..b4381b4d84ec4
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/harden-comp-pr111520.cc
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fharden-compares -fsignaling-nans -fnon-call-exceptions" } 
> */
> +
> +struct S
> +{
> +  S (bool);
> +  ~S ();
> +};
> +
> +float f;
> +
> +void
> +foo ()
> +{
> +  S a = 0;
> +  S b = f;
> +}
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [PATCH 01/22] Add condition coverage profiling

2023-10-23 Thread Richard Biener

On Sat, 21 Oct 2023, J?rgen Kvalsvik wrote:

> On 05/10/2023 22:39, J?rgen Kvalsvik wrote:
> > On 05/10/2023 21:59, Jan Hubicka wrote:
> >>>
> >>> Like Wahlen et al this implementation records coverage in fixed-size
> >>> bitsets which gcov knows how to interpret. This is very fast, but
> >>> introduces a limit on the number of terms in a single boolean
> >>> expression, the number of bits in a gcov_unsigned_type (which is
> >>> typedef'd to uint64_t), so for most practical purposes this would be
> >>> acceptable. This limitation is in the implementation and not the
> >>> algorithm, so support for more conditions can be added by also
> >>> introducing arbitrary-sized bitsets.
> >>
> >> This should not be too hard to do - if conditionalis more complex you
> >> simply introduce more than one counter for it, right?
> >> How many times this trigers on GCC sources?
> > 
> > It shouldn't be, no. But when dynamic bitsets are on the table it would be
> > much better to length-encode in smaller multiples than the 64-bit counters.
> > Most expressions are small (<4 terms), so the savings would be substantial.
> > I opted for the simpler fixed-size to start with because it is much simpler
> > and would not introduce any branching or decisions in the instrumentation.
> 
> I just posted v6 of this patch, and the bitsets are still fixed size. I
> consider dynamic bitsets out of scope for this particular effort, although I
> think it might be worth pursuing later.
> 
> > 
> >>>
> >>> For space overhead, the instrumentation needs two accumulators
> >>> (gcov_unsigned_type) per condition in the program which will be written
> >>> to the gcov file. In addition, every function gets a pair of local
> >>> accumulators, but these accmulators are reused between conditions in the
> >>> same function.
> >>>
> >>> For time overhead, there is a zeroing of the local accumulators for
> >>> every condition and one or two bitwise operation on every edge taken in
> >>> the an expression.
> >>>
> >>> In action it looks pretty similar to the branch coverage. The -g short
> >>> opt carries no significance, but was chosen because it was an available
> >>> option with the upper-case free too.
> >>>
> >>> gcov --conditions:
> >>>
> >>>  3:   17:void fn (int a, int b, int c, int d) {
> >>>  3:   18:    if ((a && (b || c)) && d)
> >>> conditions covered 3/8
> >>> condition  0 not covered (true)
> >>> condition  0 not covered (false)
> >>> condition  1 not covered (true)
> >>> condition  2 not covered (true)
> >>> condition  3 not covered (true)
> >> It seems understandable, but for bigger conditionals I guess it will be
> >> bit hard to make sense between condition numbers and the actual source
> >> code.  We could probably also show the conditions as ranges in the
> >> conditional?  I am adding David Malcolm to CC, he may have some ideas.
> >>
> >> I wonder how much this information is confused by early optimizations
> >> happening before coverage profiling?
> 
> Yes, but I could not figure out strong gcov mechanisms to make a truly better
> one, and the json + extra tooling is more in line with what you want for large
> conditionals, I think. I also specifically did one unit of information per
> line to play nicer with grep, so it currently looks like:
> 
> conditions covered 3/8
> condition  0 not covered (true false)
> ...
> 
> I think improving gcov's printing abilities would do wonders, but I couldn't
> find anything that would support it currently. Did I miss something?
> 
> >>>
> >>> Some expressions, mostly those without else-blocks, are effectively
> >>> "rewritten" in the CFG construction making the algorithm unable to
> >>> distinguish them:
> >>>
> >>> and.c:
> >>>
> >>>  if (a && b && c)
> >>>  x = 1;
> >>>
> >>> ifs.c:
> >>>
> >>>  if (a)
> >>>  if (b)
> >>>  if (c)
> >>>  x = 1;
> >>>
> >>> gcc will build the same graph for both these programs, and gcov will
> >>> report boths as 3-term expressions. It is vital that it is not
> >>> interpreted the other way around (which is consistent with the shape of
> >>> the graph) because otherwise the masking would be wrong for the and.c
> >>> program which is a more severe error. While surprising, users would
> >>> probably expect some minor rewriting of semantically-identical
> >>> expressions.
> >>>
> >>> and.c.gcov:
> >>>  #:    2:    if (a && b && c)
> >>> conditions covered 6/6
> >>>  #:    3:    x = 1;
> >>>
> >>> ifs.c.gcov:
> >>>  #:    2:    if (a)
> >>>  #:    3:    if (b)
> >>>  #:    4:    if (c)
> >>>  #:    5:    x = 1;
> >>> conditions covered 6/6
> >>
> >> Maybe one can use location information to distinguish those cases?
> >> Don't we store discriminator info about individual statements that is also
> >> used for
> >> auto-FDO?
> > 
> > That is one possibility, which I tried for a bit, but abandoned to focus on
> > getting the rest

Re: [PATCH] middle-end: don't keep .MEM guard nodes for PHI nodes who dominate loop [PR111860]

2023-10-23 Thread Richard Biener

On Fri, 20 Oct 2023, Tamar Christina wrote:

> Hi All,
> 
> The previous patch tried to remove PHI nodes that dominated the first loop,
> however the correct fix is to only remove .MEM nodes.
> 
> This patch thus makes the condition a bit stricter and only tries to remove
> MEM phi nodes.
> 
> I couldn't figure out a way to easily determine if a particular PHI is vUSE
> related, so the patch does:
> 
> 1. check if the definition is a vDEF and not defined in main loop.
> 2. check if the definition is a PHI and not defined in main loop. 
> 3. check if the definition is a default definition.
> 
> For no 2 and 3 we may misidentify the PHI, in both cases the value is defined
> outside of the loop version block which also makes it ok to remove.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, 
> powerpc64le-unknown-linux-gnu,
> x86_64-none-linux-gnu and no issues.
> 
> Tested all default testsuites.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/111860
>   * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
>   Drop .MEM nodes only.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/111860
>   * gcc.dg/vect/pr111860-2.c: New test.
>   * gcc.dg/vect/pr111860-3.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.dg/vect/pr111860-2.c 
> b/gcc/testsuite/gcc.dg/vect/pr111860-2.c
> new file mode 100644
> index 
> ..07f64ffb5318c9d7817d46802d123cc9a2d65ec9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr111860-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fno-tree-sink -ftree-vectorize" } */
> +int buffer_ctrl_ctx_0, buffer_ctrl_p1, buffer_ctrl_cmd;
> +
> +int
> +buffer_ctrl (long ret, int i)
> +{
> +  switch (buffer_ctrl_cmd)
> +{
> +case 1:
> +  buffer_ctrl_ctx_0 = 0;
> +  for (; i; i++)
> + if (buffer_ctrl_p1)
> +   ret++;
> +}
> +  return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/pr111860-3.c 
> b/gcc/testsuite/gcc.dg/vect/pr111860-3.c
> new file mode 100644
> index 
> ..07f64ffb5318c9d7817d46802d123cc9a2d65ec9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr111860-3.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fno-tree-sink -ftree-vectorize" } */
> +int buffer_ctrl_ctx_0, buffer_ctrl_p1, buffer_ctrl_cmd;
> +
> +int
> +buffer_ctrl (long ret, int i)
> +{
> +  switch (buffer_ctrl_cmd)
> +{
> +case 1:
> +  buffer_ctrl_ctx_0 = 0;
> +  for (; i; i++)
> + if (buffer_ctrl_p1)
> +   ret++;
> +}
> +  return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/pr111860.c 
> b/gcc/testsuite/gcc.dg/vect/pr111860.c
> new file mode 100644
> index 
> ..36f0774601040418bc6b7f27c9425b2bf93b18cb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr111860.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +
> +int optimize_path_n, optimize_path_d;
> +int *optimize_path_d_0;
> +extern void path_threeOpt( long);
> +void optimize_path() {
> +  int i;
> +  long length;
> +  i = 0;
> +  for (; i <= optimize_path_n; i++)
> +optimize_path_d = 0;
> +  i = 0;
> +  for (; i < optimize_path_n; i++)
> +length += optimize_path_d_0[i];
> +  path_threeOpt(length);
> +}
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> 1f7779b9834c3aef3c6a993fab916224fab03147..fc55278e63f7a48943fdc32c5e207110cf14507e
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1626,13 +1626,33 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
> edge temp_e = redirect_edge_and_branch (exit, new_preheader);
> flush_pending_stmts (temp_e);
>   }
> -
>/* Record the new SSA names in the cache so that we can skip 
> materializing
>them again when we fill in the rest of the LCSSA variables.  */
>for (auto phi : new_phis)
>   {
> tree new_arg = gimple_phi_arg (phi, 0)->def;
> new_phi_args.put (new_arg, gimple_phi_result (phi));

don't you want to skip this as well?

> +
> +   if (!SSA_VAR_P (new_arg))
> + continue;
> +   /* If the PHI MEM node dominates the loop then we shouldn't create
> +   a new LC-SSSA PHI for it in the intermediate block.   */
> +   gimple *def_stmt = SSA_NAME_DEF_STMT (new_arg);
> +   basic_block def_bb = gimple_bb (def_stmt);
> +   /* A MEM phi that consitutes a new DEF for the vUSE chain can either
> +  be a .VDEF or a PHI that operates on MEM.  */
> +   if (((gimple_vdef (def_stmt) || is_a  (def_stmt))
> +   /* And said definition must not be inside the main loop.  */
> +&& (!def_bb || !flow_bb_inside_loop_p (loop, def_bb)))
> +   /* Or we must be a parameter.  In the last two cases we may remove
> +  a non-MEM PHI node, but since they do

[PING ^1][PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-10-23 Thread Ajit Agarwal




Ping ^1.

 Forwarded Message 
Subject: [PING ^0][PATCH v2] rs6000: Add new pass for replacement of contiguous 
addresses vector load lxv with lxvp
Date: Sun, 15 Oct 2023 17:43:24 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Segher Boessenkool , Kewen.Lin 
, Peter Bergner 

Hello All:

Please review.

Thanks & Regards
Ajit


 Forwarded Message 
Subject: [PATCH v2] rs6000: Add new pass for replacement of contiguous 
addresses vector load lxv with lxvp
Date: Sun, 8 Oct 2023 00:34:27 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Segher Boessenkool , Peter Bergner 
, Kewen.Lin 

Hello All:

This patch add new pass to replace contiguous addresses vector load lxv with 
mma instruction
lxvp. This patch addresses one regressions failure in ARM architecture.

Bootstrapped and regtested with powepc64-linux-gnu.

Thanks & Regards
Ajit


rs6000: Add new pass for replacement of contiguous lxv with lxvp.

New pass to replace contiguous addresses lxv with lxvp. This pass
is registered after ree rtl pass.

2023-10-07  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/rs6000-passes.def: Registered vecload pass.
* config/rs6000/rs6000-vecload-opt.cc: Add new pass.
* config.gcc: Add new executable.
* config/rs6000/rs6000-protos.h: Add new prototype for vecload
pass.
* config/rs6000/rs6000.cc: Add new prototype for vecload pass.
* config/rs6000/t-rs6000: Add new rule.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/vecload.C: New test.
---
 gcc/config.gcc |   4 +-
 gcc/config/rs6000/rs6000-passes.def|   1 +
 gcc/config/rs6000/rs6000-protos.h  |   2 +
 gcc/config/rs6000/rs6000-vecload-opt.cc| 234 +
 gcc/config/rs6000/rs6000.cc|   3 +-
 gcc/config/rs6000/t-rs6000 |   4 +
 gcc/testsuite/g++.target/powerpc/vecload.C |  15 ++
 7 files changed, 260 insertions(+), 3 deletions(-)
 create mode 100644 gcc/config/rs6000/rs6000-vecload-opt.cc
 create mode 100644 gcc/testsuite/g++.target/powerpc/vecload.C

diff --git a/gcc/config.gcc b/gcc/config.gcc
index ee46d96bf62..482ab094b89 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -515,7 +515,7 @@ or1k*-*-*)
;;
 powerpc*-*-*)
cpu_type=rs6000
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-vecload-opt.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
@@ -552,7 +552,7 @@ riscv*)
;;
 rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-vecload-opt.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-logue.cc 
\$(srcdir)/config/rs6000/rs6000-call.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
diff --git a/gcc/config/rs6000/rs6000-passes.def 
b/gcc/config/rs6000/rs6000-passes.def
index ca899d5f7af..9ecf8ce6a9c 100644
--- a/gcc/config/rs6000/rs6000-passes.def
+++ b/gcc/config/rs6000/rs6000-passes.def
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
  The power8 does not have instructions that automaticaly do the byte swaps
  for loads and stores.  */
   INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
+  INSERT_PASS_AFTER (pass_ree, 1, pass_analyze_vecload);
 
   /* Pass to do the PCREL_OPT optimization that combines the load of an
  external symbol's address along with a single load or store using that
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index f70118ea40f..9c44bae33d3 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -91,6 +91,7 @@ extern int mems_ok_for_quad_peep (rtx, rtx);
 extern bool gpr_or_gpr_p (rtx, rtx);
 extern bool direct_move_p (rtx, rtx);
 extern bool quad_address_p (rtx, machine_mode, bool);
+extern bool mode_supports_dq_form (machine_mode);
 extern bool quad_load_store_p (rtx, rtx);
 extern bool fusion_gpr_load_p (rtx, rtx, rtx, rtx);
 extern void expand_fusion_gpr_load (rtx *);
@@ -344,6 +345,7 @@ class rtl_opt_pass;
 
 extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
 extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *);
+extern rtl_opt_pass *make_pass_analyze_vecload (gcc::context *);
 extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
 extern bool rs6000_quadword_masked_address_p (const_rtx exp);
 extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx);
diff --git a/gcc/config/rs6000/rs6000-vecload-opt.cc 
b/gcc/config/rs6000/rs

[PATCH v2] gcc.c-torture/execute/builtins/fputs.c: fputs_unlocked prototype

2023-10-23 Thread Florian Weimer

Current glibc headers only declare fputs_unlocked for _GNU_SOURCE,
so define it to obtain an official prototype.

Add a fallback prototype declaration for other systems that do not
have fputs_unlocked.  This seems to the most straightforward approach
to avoid an implicit function declaration, without reducing test
coverage and introducing ongoing maintenance requirements (e.g.g,
FreeBSD added fputs_unlocked support fairly recently).

gcc/testsuite/

* gcc.c-torture/execute/builtins/fputs.c (_GNU_SOURCE):
Define.
(fputs_unlocked): Declare.

---
 gcc/testsuite/gcc.c-torture/execute/builtins/fputs.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/fputs.c 
b/gcc/testsuite/gcc.c-torture/execute/builtins/fputs.c
index 93fa9736449..a94ea993364 100644
--- a/gcc/testsuite/gcc.c-torture/execute/builtins/fputs.c
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/fputs.c
@@ -5,9 +5,13 @@
 
Written by Kaveh R. Ghazi, 10/30/2000.  */
 
+#define _GNU_SOURCE /* For fputs_unlocked.  */
 #include 
 extern void abort(void);
 
+/* Not all systems have fputs_unlocked.  See fputs-lib.c.  */
+extern int (fputs_unlocked) (const char *, FILE *);
+
 int i;
 
 void

base-commit: 0e29c6f65523dad20068ba69cd03d8f6f82cab41

Re: [PATCH] gcc.c-torture/execute/builtins/fputs.c: Define _GNU_SOURCE

2023-10-23 Thread Florian Weimer

* Andrew Pinski:

> On Sun, Oct 22, 2023 at 12:47 AM Florian Weimer  wrote:
>>
>> Current glibc headers only declare fputs_unlocked for _GNU_SOURCE.
>> Defining the macro avoids an implicit function declaration.
>
> This does not help targets that don't use glibc though.
> Note for builtins testsuite there is a lib-fputs.c file which will
> define a fputs_unlock which is how it will link even if the libc does
> not define a fputs_unlock.

That's a good point.  I've sent a v2 which also adds a prototype
declaration in addition to _GNU_SOURCE.  I've thought about it for a bit
and it seems to be the least intrusive option.

Thanks,
Florian

[PATCH] Support vec_cmpmn/vcondmn for v2hf/v4hf.

2023-10-23 Thread liuhongt

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

PR target/103861
* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle
V2HF/V2BF/V4HF/V4BFmode.
* config/i386/mmx.md (vec_cmpv4hfqi): New expander.
(vcondv4hf): Ditto.
(vcondv4hi): Ditto.
(vconduv4hi): Ditto.
(vcond_mask_v4hi): Ditto.
(vcond_mask_qi): Ditto.
(vec_cmpv2hfqi): Ditto.
(vcondv2hf): Ditto.
(vcondv2hi): Ditto.
(vconduv2hi): Ditto.
(vcond_mask_v2hi): Ditto.
* config/i386/sse.md (vcond): Merge this with ..
(vcond): .. this into ..
(vcond): .. this,
and extend to V8BF/V16BF/V32BFmode.

gcc/testsuite/ChangeLog:

* g++.target/i386/part-vect-vcondhf.C: New test.
* gcc.target/i386/part-vect-vec_cmphf.c: New test.
---
 gcc/config/i386/i386-expand.cc|   4 +
 gcc/config/i386/mmx.md| 237 +-
 gcc/config/i386/sse.md|  25 +-
 .../g++.target/i386/part-vect-vcondhf.C   |  34 +++
 .../gcc.target/i386/part-vect-vec_cmphf.c |  26 ++
 5 files changed, 304 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/part-vect-vcondhf.C
 create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-vec_cmphf.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 1eae9d7c78c..9658f9c5a2d 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -4198,6 +4198,8 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
rtx op_false)
   break;
 case E_V8QImode:
 case E_V4HImode:
+case E_V4HFmode:
+case E_V4BFmode:
 case E_V2SImode:
   if (TARGET_SSE4_1)
{
@@ -4207,6 +4209,8 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
rtx op_false)
   break;
 case E_V4QImode:
 case E_V2HImode:
+case E_V2HFmode:
+case E_V2BFmode:
   if (TARGET_SSE4_1)
{
  gen = gen_mmx_pblendvb_v4qi;
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 491a0a51272..b9617e9d8c6 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -61,6 +61,9 @@ (define_mode_iterator MMXMODE248 [V4HI V2SI V1DI])
 (define_mode_iterator V_32 [V4QI V2HI V1SI V2HF V2BF])
 
 (define_mode_iterator V2FI_32 [V2HF V2BF V2HI])
+(define_mode_iterator V4FI_64 [V4HF V4BF V4HI])
+(define_mode_iterator V4F_64 [V4HF V4BF])
+(define_mode_iterator V2F_32 [V2HF V2BF])
 ;; 4-byte integer vector modes
 (define_mode_iterator VI_32 [V4QI V2HI])
 
@@ -1972,10 +1975,12 @@ (define_mode_attr mov_to_sse_suffix
   [(V2HF "d") (V4HF "q") (V2HI "d") (V4HI "q")])
 
 (define_mode_attr mmxxmmmode
-  [(V2HF "V8HF") (V2HI "V8HI") (V2BF "V8BF")])
+  [(V2HF "V8HF") (V2HI "V8HI") (V2BF "V8BF")
+   (V4HF "V8HF") (V4HI "V8HI") (V4BF "V8BF")])
 
 (define_mode_attr mmxxmmmodelower
-  [(V2HF "v8hf") (V2HI "v8hi") (V2BF "v8bf")])
+  [(V2HF "v8hf") (V2HI "v8hi") (V2BF "v8bf")
+   (V4HF "v8hf") (V4HI "v8hi") (V4BF "v8bf")])
 
 (define_expand "movd__to_sse"
   [(set (match_operand: 0 "register_operand")
@@ -2114,6 +2119,234 @@ (define_insn_and_split "*mmx_nabs2"
   [(set (match_dup 0)
(ior: (match_dup 1) (match_dup 2)))])
 
+;
+;;
+;; Parallel half-precision floating point comparisons
+;;
+;
+
+(define_expand "vec_cmpv4hfqi"
+  [(set (match_operand:QI 0 "register_operand")
+   (match_operator:QI 1 ""
+ [(match_operand:V4HF 2 "nonimmediate_operand")
+  (match_operand:V4HF 3 "nonimmediate_operand")]))]
+  "TARGET_MMX_WITH_SSE && TARGET_AVX512FP16 && TARGET_AVX512VL
+   && ix86_partial_vec_fp_math"
+{
+  rtx ops[4];
+  ops[3] = gen_reg_rtx (V8HFmode);
+  ops[2] = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_movq_v4hf_to_sse (ops[3], operands[3]));
+  emit_insn (gen_movq_v4hf_to_sse (ops[2], operands[2]));
+  emit_insn (gen_vec_cmpv8hfqi (operands[0], operands[1], ops[2], ops[3]));
+  DONE;
+})
+
+(define_expand "vcondv4hf"
+  [(set (match_operand:V4FI_64 0 "register_operand")
+   (if_then_else:V4FI_64
+ (match_operator 3 ""
+   [(match_operand:V4HF 4 "nonimmediate_operand")
+(match_operand:V4HF 5 "nonimmediate_operand")])
+ (match_operand:V4FI_64 1 "general_operand")
+ (match_operand:V4FI_64 2 "general_operand")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL
+  && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math"
+{
+  rtx ops[6];
+  ops[5] = gen_reg_rtx (V8HFmode);
+  ops[4] = gen_reg_rtx (V8HFmode);
+  ops[0] = gen_reg_rtx (mode);
+  ops[1] = lowpart_subreg (mode,
+  force_reg (mode, operands[1]),
+  mode);
+  ops[2] = lowpart_subreg (mode,
+  force_reg (mode, operands[2]),
+  mode);
+  ops

[PATCH] RISC-V: Fix ICE for the fusion case from vsetvl to scalar move[PR111927]

2023-10-23 Thread Juzhe-Zhong

ICE:

during RTL pass: vsetvl
: In function 'riscv_lms_f32':
:240:1: internal compiler error: in merge, at 
config/riscv/riscv-vsetvl.cc:1997
  240 | }

In general compatible_p (avl_equal_p) has:

if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ())
  return false;

Don't fuse AVL of vsetvl if the VL operand is used by non-RVV instructrions.

It is reasonable to add it into 'can_use_next_avl_p' since we don't want to
fuse AVL of vsetvl into a scalar move instruction which doesn't demand AVL.
And after the fusion, we will alway use compatible_p to check whether the demand
is correct or not.

PR target/111927

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Fix ICE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr111927.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  |  23 ++
 .../gcc.target/riscv/rvv/vsetvl/pr111927.c| 243 ++
 2 files changed, 266 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 47b459fddd4..42295732ed7 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1541,6 +1541,29 @@ private:
   inline bool can_use_next_avl_p (const vsetvl_info &prev,
  const vsetvl_info &next)
   {
+/* Forbid the AVL/VL propagation if VL of NEXT is used
+   by non-RVV instructions.  This is because:
+
+bb 2:
+  scalar move (no AVL)
+bb 3:
+  vsetvl a5(VL), a4(AVL) ...
+  branch a5,zero
+
+   Since user vsetvl instruction is no side effect instruction
+   which should be placed in the correct and optimal location
+   of the program by the previous PASS, it is unreasonble that
+   VSETVL PASS tries to move it to another places if it used by
+   non-RVV instructions.
+
+   Note: We only forbid the cases that VL is used by the following
+   non-RVV instructions which will cause issues.  We don't forbid
+   other cases since it won't cause correctness issues and we still
+   more more demand info are fused backward.  The later LCM algorithm
+   should know the optimal location of the vsetvl.  */
+if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ())
+  return false;
+
 if (!next.has_nonvlmax_reg_avl () && !next.has_vl ())
   return true;
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c
new file mode 100644
index 000..62f395fee33
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c
@@ -0,0 +1,243 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include "riscv_vector.h"
+#include 
+
+#define RISCV_MATH_LOOPUNROLL
+#define RISCV_MATH_VECTOR
+typedef  float float32_t;
+
+  typedef struct
+  {
+  uint16_t numTaps;/**< number of coefficients in the filter. */
+  float32_t *pState;   /**< points to the state variable array. The 
array is of length numTaps+blockSize-1. */
+  float32_t *pCoeffs;  /**< points to the coefficient array. The array 
is of length numTaps. */
+  float32_t mu;/**< step size that controls filter coefficient 
updates. */
+  } riscv_lms_instance_f32;
+
+
+void riscv_lms_f32(
+  const riscv_lms_instance_f32 * S,
+  const float32_t * pSrc,
+float32_t * pRef,
+float32_t * pOut,
+float32_t * pErr,
+uint32_t blockSize)
+{
+float32_t *pState = S->pState; /* State pointer */
+float32_t *pCoeffs = S->pCoeffs;   /* Coefficient pointer 
*/
+float32_t *pStateCurnt;/* Points to the 
current sample of the state */
+float32_t *px, *pb;/* Temporary pointers 
for state and coefficient buffers */
+float32_t mu = S->mu;  /* Adaptive factor */
+float32_t acc, e;  /* Accumulator, error */
+float32_t w;   /* Weight factor */
+uint32_t numTaps = S->numTaps; /* Number of filter 
coefficients in the filter */
+uint32_t tapCnt, blkCnt;   /* Loop counters */
+
+  /* Initializations of error,  difference, Coefficient update */
+  e = 0.0f;
+  w = 0.0f;
+
+  /* S->pState points to state array which contains previous frame (numTaps - 
1) samples */
+  /* pStateCurnt points to the location where the new input data should be 
written */
+  pStateCurnt = &(S->pState[(numTaps - 1U)]);
+
+  /* initialise loop count */
+  blkCnt = blockSize;
+
+  while (blkCnt > 0U)
+  {
+/* Copy the new input sample into the state buffer */
+*pStateCurnt++ = *pSrc++;
+
+/* Initialize pState pointer */
+px = pState;
+
+/* Initialize coefficient pointer */
+pb = pCoeffs;
+
+

Re: [PATCH] rust: build failure after NON_DEPENDENT_EXPR removal [PR111899]

2023-10-23 Thread Thomas Schwinge

Hi Patrick!

On 2023-10-20T13:36:30-0400, Patrick Palka  wrote:
> Built on x86_64-pc-linux-gnu, pushed to trunk as obvious (hopefully).
>
> -- >8 --
>
> This patch removes stray NON_DEPENDENT_EXPR checks following the removal
> of this tree code from the C++ FE.  (Since this restores the build I
> supppose it means the Rust FE never creates NON_DEPENDENT_EXPR trees in
> the first place, so no further analysis is needed.)

ACK, thanks!


For context: indeed, a non-trivial amount of C++ front end 'constexpr'
code was copied into the Rust front end, for implementing related Rust
functionality, mostly as part of the 2022 GSoC project
"Support for Constant Folding in Rust Frontend" (Faisal Abbas),
.

Yes, this should eventually be cleaned up (and merged with the original
C++ front end code, as much as feasible -- which I don't know whether or
to which extent it is).


Grüße
 Thomas


>   PR rust/111899
>
> gcc/rust/ChangeLog:
>
>   * backend/rust-constexpr.cc (potential_constant_expression_1):
>   Remove NON_DEPENDENT_EXPR handling.
>   * backend/rust-tree.cc (mark_exp_read): Likewise.
>   (mark_use): Likewise.
>   (lvalue_kind): Likewise.
> ---
>  gcc/rust/backend/rust-constexpr.cc | 1 -
>  gcc/rust/backend/rust-tree.cc  | 3 ---
>  2 files changed, 4 deletions(-)
>
> diff --git a/gcc/rust/backend/rust-constexpr.cc 
> b/gcc/rust/backend/rust-constexpr.cc
> index b28fa27b2d0..a7ae4166ea0 100644
> --- a/gcc/rust/backend/rust-constexpr.cc
> +++ b/gcc/rust/backend/rust-constexpr.cc
> @@ -6151,7 +6151,6 @@ potential_constant_expression_1 (tree t, bool 
> want_rval, bool strict, bool now,
>  case CLEANUP_POINT_EXPR:
>  case EXPR_STMT:
>  case PAREN_EXPR:
> -case NON_DEPENDENT_EXPR:
>/* For convenience.  */
>  case LOOP_EXPR:
>  case EXIT_EXPR:
> diff --git a/gcc/rust/backend/rust-tree.cc b/gcc/rust/backend/rust-tree.cc
> index 66e859cd70c..7040c75f825 100644
> --- a/gcc/rust/backend/rust-tree.cc
> +++ b/gcc/rust/backend/rust-tree.cc
> @@ -72,7 +72,6 @@ mark_exp_read (tree exp)
>  case ADDR_EXPR:
>  case INDIRECT_REF:
>  case FLOAT_EXPR:
> -case NON_DEPENDENT_EXPR:
>  case VIEW_CONVERT_EXPR:
>mark_exp_read (TREE_OPERAND (exp, 0));
>break;
> @@ -128,7 +127,6 @@ mark_use (tree expr, bool rvalue_p, bool read_p,
>switch (TREE_CODE (expr))
>  {
>  case COMPONENT_REF:
> -case NON_DEPENDENT_EXPR:
>recurse_op[0] = true;
>break;
>  case COMPOUND_EXPR:
> @@ -4520,7 +4518,6 @@ lvalue_kind (const_tree ref)
>lvalues.  */
>return (DECL_NONSTATIC_MEMBER_FUNCTION_P (ref) ? clk_none : 
> clk_ordinary);
>
> -case NON_DEPENDENT_EXPR:
>  case PAREN_EXPR:
>return lvalue_kind (TREE_OPERAND (ref, 0));
>
> --
> 2.42.0.411.g813d9a9188
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

[PATCH] LoongArch:Enable vcond_mask_mn expanders for SF/DF modes.

2023-10-23 Thread Jiahao Xu

If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.

gcc/ChangeLog:

* config/loongarch/lasx.md
(vcond_mask_): Change to
(vcond_mask_): this.
* config/loongarch/lsx.md
(vcond_mask_): Change to
(vcond_mask_): this.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vcond-1.c: New test.
* gcc.target/loongarch/vcond-2.c: Ditto.

Change-Id: If9716f356c0b83748a208235e835feb402b5c78f

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 442fda24606..ba2c5eec7d0 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -906,15 +906,15 @@ (define_expand "vcond"
 })
 
 ;; Same as vcond_
-(define_expand "vcond_mask_"
-  [(match_operand:ILASX 0 "register_operand")
-   (match_operand:ILASX 1 "reg_or_m1_operand")
-   (match_operand:ILASX 2 "reg_or_0_operand")
-   (match_operand:ILASX 3 "register_operand")]
+(define_expand "vcond_mask_"
+  [(match_operand:LASX 0 "register_operand")
+   (match_operand:LASX 1 "reg_or_m1_operand")
+   (match_operand:LASX 2 "reg_or_0_operand")
+   (match_operand: 3 "register_operand")]
   "ISA_HAS_LASX"
 {
-  loongarch_expand_vec_cond_mask_expr (mode,
- mode, operands);
+  loongarch_expand_vec_cond_mask_expr (mode,
+mode, operands);
   DONE;
 })
 
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index b4e92ae9c54..7e77ac4ad6a 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -644,15 +644,15 @@ (define_expand "vcond"
   DONE;
 })
 
-(define_expand "vcond_mask_"
-  [(match_operand:ILSX 0 "register_operand")
-   (match_operand:ILSX 1 "reg_or_m1_operand")
-   (match_operand:ILSX 2 "reg_or_0_operand")
-   (match_operand:ILSX 3 "register_operand")]
+(define_expand "vcond_mask_"
+  [(match_operand:LSX 0 "register_operand")
+   (match_operand:LSX 1 "reg_or_m1_operand")
+   (match_operand:LSX 2 "reg_or_0_operand")
+   (match_operand: 3 "register_operand")]
   "ISA_HAS_LSX"
 {
-  loongarch_expand_vec_cond_mask_expr (mode,
- mode, operands);
+  loongarch_expand_vec_cond_mask_expr (mode,
+  mode, operands);
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/loongarch/vcond-1.c 
b/gcc/testsuite/gcc.target/loongarch/vcond-1.c
new file mode 100644
index 000..57064eac9dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vcond-1.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fno-unroll-loops -fno-vect-cost-model 
-mlasx" } */
+
+#include 
+
+#define DEF_VCOND_VAR(DATA_TYPE, CMP_TYPE, COND, SUFFIX)   \
+  void __attribute__ ((noinline, noclone)) \
+  vcond_var_##CMP_TYPE##_##SUFFIX (DATA_TYPE *__restrict__ r,  \
+  DATA_TYPE *__restrict__ x,   \
+  DATA_TYPE *__restrict__ y,   \
+  CMP_TYPE *__restrict__ a,\
+  CMP_TYPE *__restrict__ b,\
+  int n)   \
+  {\
+for (int i = 0; i < n; i++)\
+  {\
+   DATA_TYPE xval = x[i], yval = y[i]; \
+   CMP_TYPE aval = a[i], bval = b[i];  \
+   r[i] = aval COND bval ? xval : yval;\
+  }\
+  }
+
+#define TEST_COND_VAR_SIGNED_ALL(T, COND, SUFFIX)  \
+  T (int8_t, int8_t, COND, SUFFIX) \
+  T (int16_t, int16_t, COND, SUFFIX)   \
+  T (int32_t, int32_t, COND, SUFFIX)   \
+  T (int64_t, int64_t, COND, SUFFIX)   \
+  T (float, int32_t, COND, SUFFIX##_float) \
+  T (double, int64_t, COND, SUFFIX##_double)
+
+#define TEST_COND_VAR_UNSIGNED_ALL(T, COND, SUFFIX)\
+  T (uint8_t, uint8_t, COND, SUFFIX)   \
+  T (uint16_t, uint16_t, COND, SUFFIX) \
+  T (uint32_t, uint32_t, COND, SUFFIX) \
+  T (uint64_t, uint64_t, COND, SUFFIX) \
+  T (float, uint32_t, COND, SUFFIX##_float)\
+  T (double, uint64_t, COND, SUFFIX##_double)
+
+#define TEST_COND_VAR_ALL(T, COND, SUFFIX) \
+  TEST_COND_VAR_SIGNED_ALL (T, COND, SUFFIX)   \
+  TEST_COND_VAR_UNSIGNED_ALL (T, COND, SUFFIX)
+
+#define TEST_VAR_ALL(T)\
+  TEST_COND_VAR_ALL (T, >, _gt)\
+  TEST_COND_VAR_ALL (T, <, _lt)\
+  TEST_COND_VAR_ALL (T, >=, _ge)   \
+  TEST_COND_VAR_ALL (T, <=, _le)   \
+  TEST_COND_VAR_ALL (T, ==, _eq)   \
+  TEST_COND_VAR_ALL (T, !=, _ne)
+
+TEST_VAR_

Re: [PATCH] RISC-V: Fix ICE for the fusion case from vsetvl to scalar move[PR111927]

2023-10-23 Thread Kito Cheng

Few minor comments:

On Mon, Oct 23, 2023 at 5:04 PM Juzhe-Zhong  wrote:
>
> ICE:
>
> during RTL pass: vsetvl
> : In function 'riscv_lms_f32':
> :240:1: internal compiler error: in merge, at 
> config/riscv/riscv-vsetvl.cc:1997
>   240 | }
>
> In general compatible_p (avl_equal_p) has:
>
> if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ())
>   return false;
>
> Don't fuse AVL of vsetvl if the VL operand is used by non-RVV instructrions.

instructrions -> instructions

>
> It is reasonable to add it into 'can_use_next_avl_p' since we don't want to
> fuse AVL of vsetvl into a scalar move instruction which doesn't demand AVL.
> And after the fusion, we will alway use compatible_p to check whether the 
> demand
> is correct or not.
>
> PR target/111927
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc: Fix ICE.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/pr111927.c: New test.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc  |  23 ++
>  .../gcc.target/riscv/rvv/vsetvl/pr111927.c| 243 ++
>  2 files changed, 266 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 47b459fddd4..42295732ed7 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -1541,6 +1541,29 @@ private:
>inline bool can_use_next_avl_p (const vsetvl_info &prev,
>   const vsetvl_info &next)
>{
> +/* Forbid the AVL/VL propagation if VL of NEXT is used
> +   by non-RVV instructions.  This is because:
> +
> +bb 2:
> +  scalar move (no AVL)

Could you add few comment to mention this is prev

> +bb 3:
> +  vsetvl a5(VL), a4(AVL) ...

and this is next

> +  branch a5,zero
> +
> +   Since user vsetvl instruction is no side effect instruction
> +   which should be placed in the correct and optimal location
> +   of the program by the previous PASS, it is unreasonble that

unreasonble -> unreasonable

> +   VSETVL PASS tries to move it to another places if it used by
> +   non-RVV instructions.
> +
> +   Note: We only forbid the cases that VL is used by the following
> +   non-RVV instructions which will cause issues.  We don't forbid
> +   other cases since it won't cause correctness issues and we still
> +   more more demand info are fused backward.  The later LCM algorithm

more more -> more

> +   should know the optimal location of the vsetvl.  */
> +if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ())
> +  return false;
> +
>  if (!next.has_nonvlmax_reg_avl () && !next.has_vl ())
>return true;
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c
> new file mode 100644
> index 000..62f395fee33
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c
> @@ -0,0 +1,243 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#include "riscv_vector.h"
> +#include 

Including stdio.h will cause multi-lib testing issues, and I don't saw
any function or declaration defined in stdio.h are used in the file,
so I assume this is safe to remove

and could you clean up the testcase? at least drop those unused #else parts?

[PATCH V2] RISC-V: Fix ICE for the fusion case from vsetvl to scalar move[PR111927]

2023-10-23 Thread Juzhe-Zhong

ICE:

during RTL pass: vsetvl
: In function 'riscv_lms_f32':
:240:1: internal compiler error: in merge, at 
config/riscv/riscv-vsetvl.cc:1997
  240 | }

In general compatible_p (avl_equal_p) has:

if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ())
  return false;

Don't fuse AVL of vsetvl if the VL operand is used by non-RVV instructions.

It is reasonable to add it into 'can_use_next_avl_p' since we don't want to
fuse AVL of vsetvl into a scalar move instruction which doesn't demand AVL.
And after the fusion, we will alway use compatible_p to check whether the demand
is correct or not.

PR target/111927

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr111927.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  |  23 +++
 .../gcc.target/riscv/rvv/vsetvl/pr111927.c| 170 ++
 2 files changed, 193 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 47b459fddd4..f3922a051c5 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1541,6 +1541,29 @@ private:
   inline bool can_use_next_avl_p (const vsetvl_info &prev,
  const vsetvl_info &next)
   {
+/* Forbid the AVL/VL propagation if VL of NEXT is used
+   by non-RVV instructions.  This is because:
+
+bb 2:
+  PREV: scalar move (no AVL)
+bb 3:
+  NEXT: vsetvl a5(VL), a4(AVL) ...
+  branch a5,zero
+
+   Since user vsetvl instruction is no side effect instruction
+   which should be placed in the correct and optimal location
+   of the program by the previous PASS, it is unreasonable that
+   VSETVL PASS tries to move it to another places if it used by
+   non-RVV instructions.
+
+   Note: We only forbid the cases that VL is used by the following
+   non-RVV instructions which will cause issues.  We don't forbid
+   other cases since it won't cause correctness issues and we still
+   more demand info are fused backward.  The later LCM algorithm
+   should know the optimal location of the vsetvl.  */
+if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ())
+  return false;
+
 if (!next.has_nonvlmax_reg_avl () && !next.has_vl ())
   return true;
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c
new file mode 100644
index 000..ab599add57f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c
@@ -0,0 +1,170 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include "riscv_vector.h"
+
+#define RISCV_MATH_LOOPUNROLL
+#define RISCV_MATH_VECTOR
+typedef  float float32_t;
+
+  typedef struct
+  {
+  uint16_t numTaps;/**< number of coefficients in the filter. */
+  float32_t *pState;   /**< points to the state variable array. The 
array is of length numTaps+blockSize-1. */
+  float32_t *pCoeffs;  /**< points to the coefficient array. The array 
is of length numTaps. */
+  float32_t mu;/**< step size that controls filter coefficient 
updates. */
+  } riscv_lms_instance_f32;
+
+
+void riscv_lms_f32(
+  const riscv_lms_instance_f32 * S,
+  const float32_t * pSrc,
+float32_t * pRef,
+float32_t * pOut,
+float32_t * pErr,
+uint32_t blockSize)
+{
+float32_t *pState = S->pState; /* State pointer */
+float32_t *pCoeffs = S->pCoeffs;   /* Coefficient pointer 
*/
+float32_t *pStateCurnt;/* Points to the 
current sample of the state */
+float32_t *px, *pb;/* Temporary pointers 
for state and coefficient buffers */
+float32_t mu = S->mu;  /* Adaptive factor */
+float32_t acc, e;  /* Accumulator, error */
+float32_t w;   /* Weight factor */
+uint32_t numTaps = S->numTaps; /* Number of filter 
coefficients in the filter */
+uint32_t tapCnt, blkCnt;   /* Loop counters */
+
+  /* Initializations of error,  difference, Coefficient update */
+  e = 0.0f;
+  w = 0.0f;
+
+  /* S->pState points to state array which contains previous frame (numTaps - 
1) samples */
+  /* pStateCurnt points to the location where the new input data should be 
written */
+  pStateCurnt = &(S->pState[(numTaps - 1U)]);
+
+  /* initialise loop count */
+  blkCnt = blockSize;
+
+  while (blkCnt > 0U)
+  {
+/* Copy the new input sample into the state buffer */
+*pStateCurnt++ = *pSrc++;
+
+/* Initialize pState pointer */
+px = pState;
+
+/* Initialize coefficient pointer */
+pb = pCoeffs;
+
+/*

[PATCH] LoongArch:Enable vcond_mask_mn expanders for SF/DF modes.

2023-10-23 Thread Jiahao Xu

If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.

gcc/ChangeLog:

* config/loongarch/lasx.md
(vcond_mask_): Change to
(vcond_mask_): this.
* config/loongarch/lsx.md
(vcond_mask_): Change to
(vcond_mask_): this.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-cond-1.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: Ditto.

Change-Id: If9716f356c0b83748a208235e835feb402b5c78f

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 442fda24606..ba2c5eec7d0 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -906,15 +906,15 @@ (define_expand "vcond"
 })
 
 ;; Same as vcond_
-(define_expand "vcond_mask_"
-  [(match_operand:ILASX 0 "register_operand")
-   (match_operand:ILASX 1 "reg_or_m1_operand")
-   (match_operand:ILASX 2 "reg_or_0_operand")
-   (match_operand:ILASX 3 "register_operand")]
+(define_expand "vcond_mask_"
+  [(match_operand:LASX 0 "register_operand")
+   (match_operand:LASX 1 "reg_or_m1_operand")
+   (match_operand:LASX 2 "reg_or_0_operand")
+   (match_operand: 3 "register_operand")]
   "ISA_HAS_LASX"
 {
-  loongarch_expand_vec_cond_mask_expr (mode,
- mode, operands);
+  loongarch_expand_vec_cond_mask_expr (mode,
+mode, operands);
   DONE;
 })
 
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index b4e92ae9c54..7e77ac4ad6a 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -644,15 +644,15 @@ (define_expand "vcond"
   DONE;
 })
 
-(define_expand "vcond_mask_"
-  [(match_operand:ILSX 0 "register_operand")
-   (match_operand:ILSX 1 "reg_or_m1_operand")
-   (match_operand:ILSX 2 "reg_or_0_operand")
-   (match_operand:ILSX 3 "register_operand")]
+(define_expand "vcond_mask_"
+  [(match_operand:LSX 0 "register_operand")
+   (match_operand:LSX 1 "reg_or_m1_operand")
+   (match_operand:LSX 2 "reg_or_0_operand")
+   (match_operand: 3 "register_operand")]
   "ISA_HAS_LSX"
 {
-  loongarch_expand_vec_cond_mask_expr (mode,
- mode, operands);
+  loongarch_expand_vec_cond_mask_expr (mode,
+  mode, operands);
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-1.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-1.c
new file mode 100644
index 000..ee9cb1a1fa7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-1.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fno-unroll-loops -fno-vect-cost-model 
-mlasx" } */
+
+#include 
+
+#define DEF_VCOND_VAR(DATA_TYPE, CMP_TYPE, COND, SUFFIX)   \
+  void __attribute__ ((noinline, noclone)) \
+  vcond_var_##CMP_TYPE##_##SUFFIX (DATA_TYPE *__restrict__ r,  \
+  DATA_TYPE *__restrict__ x,   \
+  DATA_TYPE *__restrict__ y,   \
+  CMP_TYPE *__restrict__ a,\
+  CMP_TYPE *__restrict__ b,\
+  int n)   \
+  {\
+for (int i = 0; i < n; i++)\
+  {\
+   DATA_TYPE xval = x[i], yval = y[i]; \
+   CMP_TYPE aval = a[i], bval = b[i];  \
+   r[i] = aval COND bval ? xval : yval;\
+  }\
+  }
+
+#define TEST_COND_VAR_SIGNED_ALL(T, COND, SUFFIX)  \
+  T (int8_t, int8_t, COND, SUFFIX) \
+  T (int16_t, int16_t, COND, SUFFIX)   \
+  T (int32_t, int32_t, COND, SUFFIX)   \
+  T (int64_t, int64_t, COND, SUFFIX)   \
+  T (float, int32_t, COND, SUFFIX##_float) \
+  T (double, int64_t, COND, SUFFIX##_double)
+
+#define TEST_COND_VAR_UNSIGNED_ALL(T, COND, SUFFIX)\
+  T (uint8_t, uint8_t, COND, SUFFIX)   \
+  T (uint16_t, uint16_t, COND, SUFFIX) \
+  T (uint32_t, uint32_t, COND, SUFFIX) \
+  T (uint64_t, uint64_t, COND, SUFFIX) \
+  T (float, uint32_t, COND, SUFFIX##_float)\
+  T (double, uint64_t, COND, SUFFIX##_double)
+
+#define TEST_COND_VAR_ALL(T, COND, SUFFIX) \
+  TEST_COND_VAR_SIGNED_ALL (T, COND, SUFFIX)   \
+  TEST_COND_VAR_UNSIGNED_ALL (T, COND, SUFFIX)
+
+#define TEST_VAR_ALL(T)\
+  TEST_COND_VAR_ALL (T, >, _gt)\
+  TEST_COND_VAR_ALL (T, <, _lt)

Re: [PATCH V2] RISC-V: Fix ICE for the fusion case from vsetvl to scalar move[PR111927]

2023-10-23 Thread Kito Cheng

LGTM

Juzhe-Zhong  於 2023年10月23日 週一 17:41 寫道：

> ICE:
>
> during RTL pass: vsetvl
> : In function 'riscv_lms_f32':
> :240:1: internal compiler error: in merge, at
> config/riscv/riscv-vsetvl.cc:1997
>   240 | }
>
> In general compatible_p (avl_equal_p) has:
>
> if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ())
>   return false;
>
> Don't fuse AVL of vsetvl if the VL operand is used by non-RVV instructions.
>
> It is reasonable to add it into 'can_use_next_avl_p' since we don't want to
> fuse AVL of vsetvl into a scalar move instruction which doesn't demand AVL.
> And after the fusion, we will alway use compatible_p to check whether the
> demand
> is correct or not.
>
> PR target/111927
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc: Fix bug.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/pr111927.c: New test.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc  |  23 +++
>  .../gcc.target/riscv/rvv/vsetvl/pr111927.c| 170 ++
>  2 files changed, 193 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 47b459fddd4..f3922a051c5 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -1541,6 +1541,29 @@ private:
>inline bool can_use_next_avl_p (const vsetvl_info &prev,
>   const vsetvl_info &next)
>{
> +/* Forbid the AVL/VL propagation if VL of NEXT is used
> +   by non-RVV instructions.  This is because:
> +
> +bb 2:
> +  PREV: scalar move (no AVL)
> +bb 3:
> +  NEXT: vsetvl a5(VL), a4(AVL) ...
> +  branch a5,zero
> +
> +   Since user vsetvl instruction is no side effect instruction
> +   which should be placed in the correct and optimal location
> +   of the program by the previous PASS, it is unreasonable that
> +   VSETVL PASS tries to move it to another places if it used by
> +   non-RVV instructions.
> +
> +   Note: We only forbid the cases that VL is used by the following
> +   non-RVV instructions which will cause issues.  We don't forbid
> +   other cases since it won't cause correctness issues and we still
> +   more demand info are fused backward.  The later LCM algorithm
> +   should know the optimal location of the vsetvl.  */
> +if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ())
> +  return false;
> +
>  if (!next.has_nonvlmax_reg_avl () && !next.has_vl ())
>return true;
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c
> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c
> new file mode 100644
> index 000..ab599add57f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c
> @@ -0,0 +1,170 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +#define RISCV_MATH_LOOPUNROLL
> +#define RISCV_MATH_VECTOR
> +typedef  float float32_t;
> +
> +  typedef struct
> +  {
> +  uint16_t numTaps;/**< number of coefficients in the filter.
> */
> +  float32_t *pState;   /**< points to the state variable array.
> The array is of length numTaps+blockSize-1. */
> +  float32_t *pCoeffs;  /**< points to the coefficient array. The
> array is of length numTaps. */
> +  float32_t mu;/**< step size that controls filter
> coefficient updates. */
> +  } riscv_lms_instance_f32;
> +
> +
> +void riscv_lms_f32(
> +  const riscv_lms_instance_f32 * S,
> +  const float32_t * pSrc,
> +float32_t * pRef,
> +float32_t * pOut,
> +float32_t * pErr,
> +uint32_t blockSize)
> +{
> +float32_t *pState = S->pState; /* State pointer */
> +float32_t *pCoeffs = S->pCoeffs;   /* Coefficient
> pointer */
> +float32_t *pStateCurnt;/* Points to the
> current sample of the state */
> +float32_t *px, *pb;/* Temporary
> pointers for state and coefficient buffers */
> +float32_t mu = S->mu;  /* Adaptive factor
> */
> +float32_t acc, e;  /* Accumulator,
> error */
> +float32_t w;   /* Weight factor */
> +uint32_t numTaps = S->numTaps; /* Number of
> filter coefficients in the filter */
> +uint32_t tapCnt, blkCnt;   /* Loop counters */
> +
> +  /* Initializations of error,  difference, Coefficient update */
> +  e = 0.0f;
> +  w = 0.0f;
> +
> +  /* S->pState points to state array which contains previous frame
> (numTaps - 1) samples */
> +  /* pStateCurnt points to the location where the new input data should
> be written */
> +  pStateCurnt = &(S->pState[(numTaps - 1U)]);
> +
> +  /* init

[PATCH v1] RISC-V: Remove unnecessary asm check for vec cvt

2023-10-23 Thread pan2 . li

From: Pan Li 

The vsetvl asm check is unnecessary for the vector convert. We
should be focus for constrait and leave the vsetvl test to the
vsetvl pass.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/cvt-0.c: Remove the vsetvl
asm check from func body.
* gcc.target/riscv/rvv/autovec/unop/cvt-1.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c | 3 +--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c
index 762b1408994..7d66ed3e943 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c
@@ -7,9 +7,8 @@
 /*
 ** test_int65_to_fp16:
 **   ...
-**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*mf2,\s*ta,\s*ma
 **   vfncvt\.f\.x\.w\s+v[0-9]+,\s*v[0-9]+
-**   vsetvli\s+zero,\s*zero,\s*e16,\s*mf4,\s*ta,\s*ma
+**   ...
 **   vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+
 **   ...
 */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c
index 3180ba3612c..af08c51ef8b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c
@@ -7,9 +7,8 @@
 /*
 ** test_uint65_to_fp16:
 **   ...
-**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*mf2,\s*ta,\s*ma
 **   vfncvt\.f\.xu\.w\s+v[0-9]+,\s*v[0-9]+
-**   vsetvli\s+zero,\s*zero,\s*e16,\s*mf4,\s*ta,\s*ma
+**   ...
 **   vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+
 **   ...
 */
-- 
2.34.1

Re: [PATCH v1] RISC-V: Remove unnecessary asm check for vec cvt

2023-10-23 Thread juzhe.zh...@rivai.ai

LGTM。



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-23 17:54
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Remove unnecessary asm check for vec cvt
From: Pan Li 
 
The vsetvl asm check is unnecessary for the vector convert. We
should be focus for constrait and leave the vsetvl test to the
vsetvl pass.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/cvt-0.c: Remove the vsetvl
asm check from func body.
* gcc.target/riscv/rvv/autovec/unop/cvt-1.c: Ditto.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c | 3 +--
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c | 3 +--
2 files changed, 2 insertions(+), 4 deletions(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c
index 762b1408994..7d66ed3e943 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c
@@ -7,9 +7,8 @@
/*
** test_int65_to_fp16:
**   ...
-**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*mf2,\s*ta,\s*ma
**   vfncvt\.f\.x\.w\s+v[0-9]+,\s*v[0-9]+
-**   vsetvli\s+zero,\s*zero,\s*e16,\s*mf4,\s*ta,\s*ma
+**   ...
**   vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+
**   ...
*/
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c
index 3180ba3612c..af08c51ef8b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c
@@ -7,9 +7,8 @@
/*
** test_uint65_to_fp16:
**   ...
-**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*mf2,\s*ta,\s*ma
**   vfncvt\.f\.xu\.w\s+v[0-9]+,\s*v[0-9]+
-**   vsetvli\s+zero,\s*zero,\s*e16,\s*mf4,\s*ta,\s*ma
+**   ...
**   vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+
**   ...
*/
-- 
2.34.1

RE: [PATCH v1] RISC-V: Remove unnecessary asm check for vec cvt

2023-10-23 Thread Li, Pan2

Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Monday, October 23, 2023 5:57 PM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Remove unnecessary asm check for vec cvt

LGTM。


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-10-23 17:54
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Remove unnecessary asm check for vec cvt
From: Pan Li mailto:pan2...@intel.com>>

The vsetvl asm check is unnecessary for the vector convert. We
should be focus for constrait and leave the vsetvl test to the
vsetvl pass.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/cvt-0.c: Remove the vsetvl
asm check from func body.
* gcc.target/riscv/rvv/autovec/unop/cvt-1.c: Ditto.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c | 3 +--
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c | 3 +--
2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c
index 762b1408994..7d66ed3e943 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c
@@ -7,9 +7,8 @@
/*
** test_int65_to_fp16:
**   ...
-**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*mf2,\s*ta,\s*ma
**   vfncvt\.f\.x\.w\s+v[0-9]+,\s*v[0-9]+
-**   vsetvli\s+zero,\s*zero,\s*e16,\s*mf4,\s*ta,\s*ma
+**   ...
**   vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+
**   ...
*/
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c
index 3180ba3612c..af08c51ef8b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c
@@ -7,9 +7,8 @@
/*
** test_uint65_to_fp16:
**   ...
-**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*mf2,\s*ta,\s*ma
**   vfncvt\.f\.xu\.w\s+v[0-9]+,\s*v[0-9]+
-**   vsetvli\s+zero,\s*zero,\s*e16,\s*mf4,\s*ta,\s*ma
+**   ...
**   vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+
**   ...
*/
--
2.34.1

[PATCH] LoongArch:Enable vcond_mask_mn expanders for SF/DF modes.

2023-10-23 Thread Jiahao Xu

If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.

gcc/ChangeLog:

* config/loongarch/lasx.md
(vcond_mask_): Change to
(vcond_mask_): this.
* config/loongarch/lsx.md
(vcond_mask_): Change to
(vcond_mask_): this.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-cond-1.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: Ditto.

Change-Id: If9716f356c0b83748a208235e835feb402b5c78f

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 442fda24606..ba2c5eec7d0 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -906,15 +906,15 @@ (define_expand "vcond"
 })
 
 ;; Same as vcond_
-(define_expand "vcond_mask_"
-  [(match_operand:ILASX 0 "register_operand")
-   (match_operand:ILASX 1 "reg_or_m1_operand")
-   (match_operand:ILASX 2 "reg_or_0_operand")
-   (match_operand:ILASX 3 "register_operand")]
+(define_expand "vcond_mask_"
+  [(match_operand:LASX 0 "register_operand")
+   (match_operand:LASX 1 "reg_or_m1_operand")
+   (match_operand:LASX 2 "reg_or_0_operand")
+   (match_operand: 3 "register_operand")]
   "ISA_HAS_LASX"
 {
-  loongarch_expand_vec_cond_mask_expr (mode,
- mode, operands);
+  loongarch_expand_vec_cond_mask_expr (mode,
+mode, operands);
   DONE;
 })
 
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index b4e92ae9c54..7e77ac4ad6a 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -644,15 +644,15 @@ (define_expand "vcond"
   DONE;
 })
 
-(define_expand "vcond_mask_"
-  [(match_operand:ILSX 0 "register_operand")
-   (match_operand:ILSX 1 "reg_or_m1_operand")
-   (match_operand:ILSX 2 "reg_or_0_operand")
-   (match_operand:ILSX 3 "register_operand")]
+(define_expand "vcond_mask_"
+  [(match_operand:LSX 0 "register_operand")
+   (match_operand:LSX 1 "reg_or_m1_operand")
+   (match_operand:LSX 2 "reg_or_0_operand")
+   (match_operand: 3 "register_operand")]
   "ISA_HAS_LSX"
 {
-  loongarch_expand_vec_cond_mask_expr (mode,
- mode, operands);
+  loongarch_expand_vec_cond_mask_expr (mode,
+  mode, operands);
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-1.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-1.c
new file mode 100644
index 000..ee9cb1a1fa7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-1.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fno-unroll-loops -fno-vect-cost-model 
-mlasx" } */
+
+#include 
+
+#define DEF_VCOND_VAR(DATA_TYPE, CMP_TYPE, COND, SUFFIX)   \
+  void __attribute__ ((noinline, noclone)) \
+  vcond_var_##CMP_TYPE##_##SUFFIX (DATA_TYPE *__restrict__ r,  \
+  DATA_TYPE *__restrict__ x,   \
+  DATA_TYPE *__restrict__ y,   \
+  CMP_TYPE *__restrict__ a,\
+  CMP_TYPE *__restrict__ b,\
+  int n)   \
+  {\
+for (int i = 0; i < n; i++)\
+  {\
+   DATA_TYPE xval = x[i], yval = y[i]; \
+   CMP_TYPE aval = a[i], bval = b[i];  \
+   r[i] = aval COND bval ? xval : yval;\
+  }\
+  }
+
+#define TEST_COND_VAR_SIGNED_ALL(T, COND, SUFFIX)  \
+  T (int8_t, int8_t, COND, SUFFIX) \
+  T (int16_t, int16_t, COND, SUFFIX)   \
+  T (int32_t, int32_t, COND, SUFFIX)   \
+  T (int64_t, int64_t, COND, SUFFIX)   \
+  T (float, int32_t, COND, SUFFIX##_float) \
+  T (double, int64_t, COND, SUFFIX##_double)
+
+#define TEST_COND_VAR_UNSIGNED_ALL(T, COND, SUFFIX)\
+  T (uint8_t, uint8_t, COND, SUFFIX)   \
+  T (uint16_t, uint16_t, COND, SUFFIX) \
+  T (uint32_t, uint32_t, COND, SUFFIX) \
+  T (uint64_t, uint64_t, COND, SUFFIX) \
+  T (float, uint32_t, COND, SUFFIX##_float)\
+  T (double, uint64_t, COND, SUFFIX##_double)
+
+#define TEST_COND_VAR_ALL(T, COND, SUFFIX) \
+  TEST_COND_VAR_SIGNED_ALL (T, COND, SUFFIX)   \
+  TEST_COND_VAR_UNSIGNED_ALL (T, COND, SUFFIX)
+
+#define TEST_VAR_ALL(T)\
+  TEST_COND_VAR_ALL (T, >, _gt)\
+  TEST_COND_VAR_ALL (T, <, _lt)

RE: [PATCH V2] RISC-V: Fix ICE for the fusion case from vsetvl to scalar move[PR111927]

2023-10-23 Thread Li, Pan2

Committed, thanks Kito.

Pan

From: Kito Cheng 
Sent: Monday, October 23, 2023 5:50 PM
To: Juzhe-Zhong 
Cc: GCC Patches ; Kito Cheng ; 
Jeff Law ; Robin Dapp 
Subject: Re: [PATCH V2] RISC-V: Fix ICE for the fusion case from vsetvl to 
scalar move[PR111927]

LGTM

Juzhe-Zhong mailto:juzhe.zh...@rivai.ai>> 於 2023年10月23日 
週一 17:41 寫道：
ICE:

during RTL pass: vsetvl
: In function 'riscv_lms_f32':
:240:1: internal compiler error: in merge, at 
config/riscv/riscv-vsetvl.cc:1997
  240 | }

In general compatible_p (avl_equal_p) has:

if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ())
  return false;

Don't fuse AVL of vsetvl if the VL operand is used by non-RVV instructions.

It is reasonable to add it into 'can_use_next_avl_p' since we don't want to
fuse AVL of vsetvl into a scalar move instruction which doesn't demand AVL.
And after the fusion, we will alway use compatible_p to check whether the demand
is correct or not.

PR target/111927

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr111927.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  |  23 +++
 .../gcc.target/riscv/rvv/vsetvl/pr111927.c| 170 ++
 2 files changed, 193 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 47b459fddd4..f3922a051c5 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1541,6 +1541,29 @@ private:
   inline bool can_use_next_avl_p (const vsetvl_info &prev,
  const vsetvl_info &next)
   {
+/* Forbid the AVL/VL propagation if VL of NEXT is used
+   by non-RVV instructions.  This is because:
+
+bb 2:
+  PREV: scalar move (no AVL)
+bb 3:
+  NEXT: vsetvl a5(VL), a4(AVL) ...
+  branch a5,zero
+
+   Since user vsetvl instruction is no side effect instruction
+   which should be placed in the correct and optimal location
+   of the program by the previous PASS, it is unreasonable that
+   VSETVL PASS tries to move it to another places if it used by
+   non-RVV instructions.
+
+   Note: We only forbid the cases that VL is used by the following
+   non-RVV instructions which will cause issues.  We don't forbid
+   other cases since it won't cause correctness issues and we still
+   more demand info are fused backward.  The later LCM algorithm
+   should know the optimal location of the vsetvl.  */
+if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ())
+  return false;
+
 if (!next.has_nonvlmax_reg_avl () && !next.has_vl ())
   return true;

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c
new file mode 100644
index 000..ab599add57f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c
@@ -0,0 +1,170 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include "riscv_vector.h"
+
+#define RISCV_MATH_LOOPUNROLL
+#define RISCV_MATH_VECTOR
+typedef  float float32_t;
+
+  typedef struct
+  {
+  uint16_t numTaps;/**< number of coefficients in the filter. */
+  float32_t *pState;   /**< points to the state variable array. The 
array is of length numTaps+blockSize-1. */
+  float32_t *pCoeffs;  /**< points to the coefficient array. The array 
is of length numTaps. */
+  float32_t mu;/**< step size that controls filter coefficient 
updates. */
+  } riscv_lms_instance_f32;
+
+
+void riscv_lms_f32(
+  const riscv_lms_instance_f32 * S,
+  const float32_t * pSrc,
+float32_t * pRef,
+float32_t * pOut,
+float32_t * pErr,
+uint32_t blockSize)
+{
+float32_t *pState = S->pState; /* State pointer */
+float32_t *pCoeffs = S->pCoeffs;   /* Coefficient pointer 
*/
+float32_t *pStateCurnt;/* Points to the 
current sample of the state */
+float32_t *px, *pb;/* Temporary pointers 
for state and coefficient buffers */
+float32_t mu = S->mu;  /* Adaptive factor */
+float32_t acc, e;  /* Accumulator, error */
+float32_t w;   /* Weight factor */
+uint32_t numTaps = S->numTaps; /* Number of filter 
coefficients in the filter */
+uint32_t tapCnt, blkCnt;   /* Loop counters */
+
+  /* Initializations of error,  difference, Coefficient update */
+  e = 0.0f;
+  w = 0.0f;
+
+  /* S->pState points to state array which contains previous frame (numTaps - 
1) samples */
+  /* pStateCurnt points to the location where the new input data should be 
written */
+  pStateCurnt = &(S

[PATCH] LoongArch:Enable vcond_mask_mn expanders for SF/DF modes.

2023-10-23 Thread Jiahao Xu

If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.

gcc/ChangeLog:

* config/loongarch/lasx.md
(vcond_mask_): Change to
(vcond_mask_): this.
* config/loongarch/lsx.md
(vcond_mask_): Change to
(vcond_mask_): this.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: Ditto.

Change-Id: If9716f356c0b83748a208235e835feb402b5c78f

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 442fda24606..ba2c5eec7d0 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -906,15 +906,15 @@ (define_expand "vcond"
 })
 
 ;; Same as vcond_
-(define_expand "vcond_mask_"
-  [(match_operand:ILASX 0 "register_operand")
-   (match_operand:ILASX 1 "reg_or_m1_operand")
-   (match_operand:ILASX 2 "reg_or_0_operand")
-   (match_operand:ILASX 3 "register_operand")]
+(define_expand "vcond_mask_"
+  [(match_operand:LASX 0 "register_operand")
+   (match_operand:LASX 1 "reg_or_m1_operand")
+   (match_operand:LASX 2 "reg_or_0_operand")
+   (match_operand: 3 "register_operand")]
   "ISA_HAS_LASX"
 {
-  loongarch_expand_vec_cond_mask_expr (mode,
- mode, operands);
+  loongarch_expand_vec_cond_mask_expr (mode,
+mode, operands);
   DONE;
 })
 
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index b4e92ae9c54..7e77ac4ad6a 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -644,15 +644,15 @@ (define_expand "vcond"
   DONE;
 })
 
-(define_expand "vcond_mask_"
-  [(match_operand:ILSX 0 "register_operand")
-   (match_operand:ILSX 1 "reg_or_m1_operand")
-   (match_operand:ILSX 2 "reg_or_0_operand")
-   (match_operand:ILSX 3 "register_operand")]
+(define_expand "vcond_mask_"
+  [(match_operand:LSX 0 "register_operand")
+   (match_operand:LSX 1 "reg_or_m1_operand")
+   (match_operand:LSX 2 "reg_or_0_operand")
+   (match_operand: 3 "register_operand")]
   "ISA_HAS_LSX"
 {
-  loongarch_expand_vec_cond_mask_expr (mode,
- mode, operands);
+  loongarch_expand_vec_cond_mask_expr (mode,
+  mode, operands);
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-1.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-1.c
new file mode 100644
index 000..ee9cb1a1fa7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-1.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fno-unroll-loops -fno-vect-cost-model 
-mlasx" } */
+
+#include 
+
+#define DEF_VCOND_VAR(DATA_TYPE, CMP_TYPE, COND, SUFFIX)   \
+  void __attribute__ ((noinline, noclone)) \
+  vcond_var_##CMP_TYPE##_##SUFFIX (DATA_TYPE *__restrict__ r,  \
+  DATA_TYPE *__restrict__ x,   \
+  DATA_TYPE *__restrict__ y,   \
+  CMP_TYPE *__restrict__ a,\
+  CMP_TYPE *__restrict__ b,\
+  int n)   \
+  {\
+for (int i = 0; i < n; i++)\
+  {\
+   DATA_TYPE xval = x[i], yval = y[i]; \
+   CMP_TYPE aval = a[i], bval = b[i];  \
+   r[i] = aval COND bval ? xval : yval;\
+  }\
+  }
+
+#define TEST_COND_VAR_SIGNED_ALL(T, COND, SUFFIX)  \
+  T (int8_t, int8_t, COND, SUFFIX) \
+  T (int16_t, int16_t, COND, SUFFIX)   \
+  T (int32_t, int32_t, COND, SUFFIX)   \
+  T (int64_t, int64_t, COND, SUFFIX)   \
+  T (float, int32_t, COND, SUFFIX##_float) \
+  T (double, int64_t, COND, SUFFIX##_double)
+
+#define TEST_COND_VAR_UNSIGNED_ALL(T, COND, SUFFIX)\
+  T (uint8_t, uint8_t, COND, SUFFIX)   \
+  T (uint16_t, uint16_t, COND, SUFFIX) \
+  T (uint32_t, uint32_t, COND, SUFFIX) \
+  T (uint64_t, uint64_t, COND, SUFFIX) \
+  T (float, uint32_t, COND, SUFFIX##_float)\
+  T (double, uint64_t, COND, SUFFIX##_double)
+
+#define TEST_COND_VAR_ALL(T, COND, SUFFIX) \
+  TEST_COND_VAR_SIGNED_ALL (T, COND, SUFFIX)   \
+  TEST_COND_VAR_UNSIGNED_ALL (T, COND, SUFFIX)
+
+#define TEST_VAR_ALL(T)\
+  TEST_COND_VAR_ALL (T, >, _gt)\
+  TEST_COND_VAR_ALL (T, <, _lt)

[PATCH V12 4/4] ree: Improve ree pass using defined abi interfaces

2023-10-23 Thread Ajit Agarwal

Hello Vineet, Jeff and Bernhard:

This version 11 of the patch uses abi interfaces to remove zero and sign 
extension elimination.
Bootstrapped and regtested on powerpc-linux-gnu.

In this version (version 11) of the patch following review comments are 
incorporated.

a) Removal of hard code zero_extend and sign_extend  in abi interfaces.
b) Source and destination with different registers are considered.
c) Further enhancements.
d) Added sign extension elimination using abi interfaces.
d) Addressed remaining review comments from Vineet.
e) Addressed review comments from Bernhard.
f) Fix aarch64 regressions failure.

Please check if its addressed bootstrapped failure with RISC-V.
Also please let me know if there is anything missing in this patch.

Ok for trunk?

Thanks & Regards
Ajit

ree: Improve ree pass using defined abi interfaces

For rs6000 target we see redundant zero and sign extension and done
to improve ree pass to eliminate such redundant zero and sign extension
using defined ABI interfaces.

2023-10-23  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (combine_reaching_defs): Use of zero_extend and sign_extend
defined abi interfaces.
(add_removable_extension): Use of defined abi interfaces for no
reaching defs.
(abi_extension_candidate_return_reg_p): New function.
(abi_extension_candidate_p): New function.
(abi_extension_candidate_argno_p): New function.
(abi_handle_regs): New function.
(abi_target_promote_function_mode): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim-3.C
---
 gcc/ree.cc| 147 +-
 .../g++.target/powerpc/zext-elim-3.C  |  13 ++
 2 files changed, 154 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..9fdc06562ad 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
 if (REGNO (DF_REF_REG (def)) == REGNO (reg))
   break;
 
-  gcc_assert (def != NULL);
+  if (def == NULL)
+return NULL;
 
   ref_chain = DF_REF_CHAIN (def);
 
@@ -750,6 +751,120 @@ get_extended_src_reg (rtx src)
   return src;
 }
 
+/* Return TRUE if target mode is equal to source mode, false otherwise.  */
+
+static bool
+abi_target_promote_function_mode (machine_mode mode)
+{
+  int unsignedp;
+  machine_mode tgt_mode
+= targetm.calls.promote_function_mode (NULL_TREE, mode, &unsignedp,
+  NULL_TREE, 1);
+
+  if (tgt_mode == mode)
+return true;
+  else
+return false;
+}
+
+/* Return TRUE if regno is a return register.  */
+
+static inline bool
+abi_extension_candidate_return_reg_p (int regno)
+{
+  if (targetm.calls.function_value_regno_p (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the following conditions are satisified.
+
+  a) reg source operand is argument register and not return register.
+  b) mode of source and destination operand are different.
+  c) if not promoted REGNO of source and destination operand are same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  machine_mode dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set), 0);
+
+  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  || abi_extension_candidate_return_reg_p (/*insn,*/ REGNO (orig_src)))
+return false;
+
+  /* Return FALSE if mode of destination and source is same.  */
+  if (dst_mode == GET_MODE (orig_src))
+return false;
+
+  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
+  bool promote_p = abi_target_promote_function_mode (mode);
+
+  /* Return FALSE if promote is false and REGNO of source and destination
+ is different.  */
+  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
+return false;
+
+  return true;
+}
+
+/* Return TRUE if regno is an argument register.  */
+
+static inline bool
+abi_extension_candidate_argno_p (int regno)
+{
+  return FUNCTION_ARG_REGNO_P (regno);
+}
+
+/* Return TRUE if the candidate insn doesn't have defs and have
+ * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class.  */
+
+static bool
+abi_handle_regs (rtx_insn *insn)
+{
+  if (side_effects_p (PATTERN (insn)))
+return false;
+
+  struct df_link *uses = get_uses (insn, SET_DEST (PATTERN (insn)));
+
+  if (!uses)
+return false;
+
+  for (df_link *use = uses; use; use = use->next)
+{
+  if (!use->ref)
+   return false;
+
+  if (BLOCK_FOR_INSN (insn) != BLOCK_FOR_INSN (DF_REF_INSN (use->ref)))
+   return false;
+
+  rtx_insn *use_insn = DF_REF_INSN (use->ref);
+
+  if (GET_CODE (PATTERN (use_insn)) == SET)
+   {
+ rtx_code code = GET_CODE (SET_SRC (PATTERN (use_insn)));
+
+ if (GET_RTX_CLASS (code) == RTX_BIN_ARITH
+ || GET_RTX_CLASS (code) == RTX_COMM_ARITH
+ || GET_RT

Re: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2023-10-23 Thread Andre Vieira (lists)

Ping for Jeff or another global maintainer to review the target agnostic 
bits of this, that's:

loop-doloop.cc
df-core.{c,h}

I do have a nitpick myself that I missed last time around:
  /* We expect the condition to be of the form (reg != 0)  */
  cond = XEXP (SET_SRC (cmp), 0);
- if (GET_CODE (cond) != NE || XEXP (cond, 1) != const0_rtx)
+ if ((GET_CODE (cond) != NE && GET_CODE (cond) != GE)
+ || XEXP (cond, 1) != const0_rtx)
return 0;
}
Could do with updating the comment to reflect allowing >= now. But happy 
for you to change this once approved by a maintainer.


Kind regards,
Andre

On 11/10/2023 12:34, Stamatis Markianos-Wright wrote:

Hi all,

On 28/09/2023 13:51, Andre Vieira (lists) wrote:

Hi,

On 14/09/2023 13:10, Kyrylo Tkachov via Gcc-patches wrote:

Hi Stam,





The arm parts look sensible but we'd need review for the df-core.h 
and df-core.cc changes.

Maybe Jeff can help or can recommend someone to take a look?


Just thought I'd do a follow-up "ping" on this :)



Thanks,
Kyrill



FWIW the changes LGTM, if we don't want these in df-core we can always 
implement the extra utility locally. It's really just a helper 
function to check if df_bb_regno_first_def_find and 
df_bb_regno_last_def_find yield the same result, meaning we only have 
a single definition.


Kind regards,
Andre


Thanks,

Stam

[PATCH] tree-optimization/111917 - bougs IL after guard hoisting

2023-10-23 Thread Richard Biener

The unswitching code to hoist guards inserts conditions in wrong
places.  The following fixes this, simplifying code.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111917
* tree-ssa-loop-unswitch.cc (hoist_guard): Always insert
new conditional after last stmt.

* gcc.dg/torture/pr111917.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr111917.c | 23 +++
 gcc/tree-ssa-loop-unswitch.cc   |  5 +
 2 files changed, 24 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr111917.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr111917.c 
b/gcc/testsuite/gcc.dg/torture/pr111917.c
new file mode 100644
index 000..532e30200b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr111917.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-funswitch-loops" } */
+
+long t;
+long a() {
+  long b = t, c = t;
+  for (; b < 31; b++)
+c <<= 1;
+  return c;
+}
+long t1;
+static
+int d() {
+  if (!t1)
+return 0;
+e:
+f:
+  for (; a();)
+;
+  goto f;
+  return 0;
+}
+int main() { d(); }
diff --git a/gcc/tree-ssa-loop-unswitch.cc b/gcc/tree-ssa-loop-unswitch.cc
index 619b50fb4bb..b4611ac8256 100644
--- a/gcc/tree-ssa-loop-unswitch.cc
+++ b/gcc/tree-ssa-loop-unswitch.cc
@@ -1455,10 +1455,7 @@ hoist_guard (class loop *loop, edge guard)
   cond_stmt = as_a  (stmt);
   extract_true_false_edges_from_block (guard_bb, &te, &fe);
   /* Insert guard to PRE_HEADER.  */
-  if (!empty_block_p (pre_header))
-gsi = gsi_last_bb (pre_header);
-  else
-gsi = gsi_start_bb (pre_header);
+  gsi = gsi_last_bb (pre_header);
   /* Create copy of COND_STMT.  */
   new_cond_stmt = gimple_build_cond (gimple_cond_code (cond_stmt),
 gimple_cond_lhs (cond_stmt),
-- 
2.35.3

Re: [PATCH] vect: Don't set excess bits in unform masks

2023-10-23 Thread Richard Biener

On Fri, 20 Oct 2023, Andrew Stubbs wrote:

> This patch fixes a wrong-code bug on amdgcn in which the excess "ones" in the
> mask enable extra lanes that were supposed to be unused and are therefore
> undefined.
> 
> Richi suggested an alternative approach involving narrower types and then a
> zero-extend to the actual mask type.  This solved the problem for the specific
> test case that I had, but I'm not sure if it would work with V2 and V4 modes
> (not that I've observed bad behaviour from them anyway, but still).  There
> were some other caveats involving "two-lane division" that I don't fully
> understand, so I went with the simpler implementation.
> 
> This patch does have the disadvantage of an additional "and" instruction in
> the non-constant case even for machines that don't need it. I'm not sure how
> to fix that without an additional target hook. (If GCC could use the 64-lane
> vectors more effectively without the assistance of artificially reduced sizes
> then this problem wouldn't exist.)
> 
> OK to commit?

-   convert_move (target, op0, 0);
+   rtx tmp = gen_reg_rtx (mode);
+   convert_move (tmp, op0, 0);
+
+   if (known_ne (TYPE_VECTOR_SUBPARTS (type),
+ GET_MODE_PRECISION (mode)))

Usually this would be maybe_ne, but then ...

+ {
+   /* Ensure no excess bits are set.
+  GCN needs this, AVX does not.  */
+   expand_binop (mode, and_optab, tmp,
+ GEN_INT ((1 << (TYPE_VECTOR_SUBPARTS (type)
+ .to_constant())) - 1),
+ target, true, OPTAB_DIRECT);

here you have .to_constant ().  I think with having an integer mode
we know subparts is constant so I'd prefer

auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
if (maybe_ne (GET_MODE_PRECISION (mode), nunits)
...

+ }
+   else
+ emit_move_insn (target, tmp);

note you need the emit_move_insn also for the expand_binop
path since it's not guaranteed that 'target' is used there.  Thus

  tmp = expand_binop (...)
  if (tmp != target)
emit_move_insn (...)

Otherwise looks good to me.  The and is needed on x86 for
two and four bit masks, it would be more efficient to use
smaller modes for the sign-extension I guess.

Thanks,
Richard.

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-23 Thread Richard Biener

On Thu, 19 Oct 2023, Robin Dapp wrote:

> Ugh, I didn't push yet because with a rebased trunk I am
> seeing different behavior for some riscv testcases.
> 
> A reduction is not recognized because there is yet another
> "double use" occurrence in check_reduction_path.  I guess it's
> reasonable to loosen the restriction for conditional operations
> here as well.
> 
> The only change to v4 therefore is:
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index ebab1953b9c..64654a55e4c 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -4085,7 +4094,15 @@ pop:
> || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
>   FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> cnt++;
> -  if (cnt != 1)
> +
> +  bool cond_fn_p = op.code.is_internal_fn ()
> +   && (conditional_internal_fn_code (internal_fn (*code))
> +   != ERROR_MARK);
> +
> +  /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have
> +op1 twice (once as definition, once as else) in the same operation.
> +Allow this.  */
> +  if ((!cond_fn_p && cnt != 1) || (opi == 1 && cond_fn_p && cnt != 2))
> 
> Bootstrapped and regtested again on x86, aarch64 and power10.
> Testsuite on riscv unchanged.

Hmm, why opi == 1 only?  I think

# _1 = PHI <.., _4>
 _3 = .COND_ADD (_1, _2, _1);
 _4 = .COND_ADD (_3, _5, _3);

would be fine as well.  I think we want to simply ignore the 'else' value
of conditional internal functions.  I suppose we have unary, binary
and ternary conditional functions - I miss a internal_fn_else_index,
but I suppose it's always the last one?

I think a single use on .COND functions is also OK, even when on the
'else' value only?  But maybe that's not too important here.

Maybe

  gimple *op_use_stmt;
  unsigned cnt = 0;
  FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi])
if (.. op_use_stmt is conditional internal function ..)
  {
for (unsigned j = 0; j < gimple_call_num_args (call) - 1; ++j)
  if (gimple_call_arg (call, j) == op.ops[opi])
cnt++;
  }
else if (!is_gimple_debug (op_use_stmt)
&& (*code != ERROR_MARK
|| flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
cnt++;

?

> Regards
>  Robin
> 
> Subject: [PATCH v5] ifcvt/vect: Emit COND_OP for conditional scalar reduction.
> 
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD/COND_OP during
> ifcvt and adjusting some vectorizer code to handle it.
> 
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
> 
> gcc/ChangeLog:
> 
>   PR middle-end/111401
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (check_reduction_path): Don't count else operand in COND_OP.
>   (vect_is_simple_reduction): Ditto.
>   (vect_create_epilog_for_reduction): Fix whitespace.
>   (vectorize_fold_left_reduction): Add COND_OP handling.
>   (vectorizable_reduction): Don't count else operand in COND_OP.
>   (vect_transform_reduction): Add COND_OP handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
> ---
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 +++
>  .../riscv/rvv/autovec/cond/pr111401.c | 139 +++
>  .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
>  .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
>  gcc/tree-if-conv.cc   |  49 +++--
>  gcc/tree-vect-loop.cc | 168 ++
>  gcc/tree-vectorizer.h |   2 +-
>  7 files changed, 456 insertions(+), 51 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> 
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 000..7b46e7d8a2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 @

Re: [PATCH][WIP] libiberty: Support for relocation output

2023-10-23 Thread Jan Hubicka

> This patch teaches libiberty to output X86-64 Relocations.
Hello,
for actual patch submission you will need to add changelog :)
> diff --git a/libiberty/simple-object-elf.c b/libiberty/simple-object-elf.c
> index 86b7a27dc74..0bbaf4b489f 100644
> --- a/libiberty/simple-object-elf.c
> +++ b/libiberty/simple-object-elf.c
> @@ -238,6 +238,7 @@ typedef struct
>  #define STT_NOTYPE 0 /* Symbol type is unspecified */
>  #define STT_OBJECT 1 /* Symbol is a data object */
>  #define STT_FUNC 2 /* Symbol is a code object */
> +#define STT_SECTION 3 /* Symbol is associate with a section */
Associated I guess.
>  #define STT_TLS 6 /* Thread local data object */
>  #define STT_GNU_IFUNC 10 /* Symbol is an indirect code object */
> 
> @@ -248,6 +249,63 @@ typedef struct
>  #define STV_DEFAULT 0 /* Visibility is specified by binding type */
>  #define STV_HIDDEN 2 /* Can only be seen inside currect component */
> 
> +typedef struct
> +{
> +  unsigned char r_offset[4]; /* Address */
> +  unsigned char r_info[4];  /* relocation type and symbol index */
> +} Elf32_External_Rel;
> +
> +typedef struct
> +{
> +  unsigned char r_offset[8]; /* Address */
> +  unsigned char r_info[8]; /* Relocation type and symbol index */
> +} Elf64_External_Rel;
> +typedef struct
> +{
> +  unsigned char r_offset[4]; /* Address */
> +  unsigned char r_info[4];  /* Relocation type and symbol index */
> +  char r_addend[4]; /* Addend */
> +} Elf32_External_Rela;
> +typedef struct
> +{
> +  unsigned char r_offset[8]; /* Address */
> +  unsigned char r_info[8]; /* Relocation type and symbol index */
> +  unsigned char r_addend[8]; /* Addend */
> +} Elf64_External_Rela;
> +
> +/* How to extract and insert information held in the r_info field.  */
> +
> +#define ELF32_R_SYM(val) ((val) >> 8)
> +#define ELF32_R_TYPE(val) ((val) & 0xff)
> +#define ELF32_R_INFO(sym, type) (((sym) << 8) + ((type) & 0xff))
> +
> +#define ELF64_R_SYM(i) ((i) >> 32)
> +#define ELF64_R_TYPE(i) ((i) & 0x)
> +#define ELF64_R_INFO(sym,type) unsigned long) (sym)) << 32) + (type))
> +
> +/* AMD x86-64 relocations.  */
> +#define R_X86_64_NONE 0 /* No reloc */
> +#define R_X86_64_64 1 /* Direct 64 bit  */
> +#define R_X86_64_PC32 2 /* PC relative 32 bit signed */
> +#define R_X86_64_GOT32 3 /* 32 bit GOT entry */
> +#define R_X86_64_PLT32 4 /* 32 bit PLT address */
> +#define R_X86_64_COPY 5 /* Copy symbol at runtime */
> +#define R_X86_64_GLOB_DAT 6 /* Create GOT entry */
> +#define R_X86_64_JUMP_SLOT 7 /* Create PLT entry */
> +#define R_X86_64_RELATIVE 8 /* Adjust by program base */
> +#define R_X86_64_GOTPCREL 9 /* 32 bit signed PC relative
> +   offset to GOT */
> +#define R_X86_64_32 10 /* Direct 32 bit zero extended */
> +#define R_X86_64_32S 11 /* Direct 32 bit sign extended */
> +#define R_X86_64_16 12 /* Direct 16 bit zero extended */

This will eventually need to go into per-architecture table.
You support only those needed for Dwarf2out ouptut, right?

I think we need Iant's opinion on thi part of patch (he is the
maintainer of simple-object) but to me it looks reasonable. For longer
term it will be necessary to think how to make this extensible to other
architectures without writting too much of code.  (have some more
declarative way to specify relocations we output)

Honza

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Siddhesh Poyarekar


On 2023-10-23 03:57, Richard Biener wrote:

On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao  wrote:





On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar  wrote:

On 2023-10-20 14:38, Qing Zhao wrote:

How about the following:
   Add one more parameter to __builtin_dynamic_object_size(), i.e
__builtin_dynamic_object_size (_1,1,array_annotated->foo)?
When we see the structure field has counted_by attribute.


Or maybe add a barrier preventing any assignments to array_annotated->foo from 
being reordered below the __bdos call? Basically an __asm__ with 
array_annotated->foo in the clobber list ought to do it I think.


Maybe just adding the array_annotated->foo to the use list of the call to 
__builtin_dynamic_object_size should be enough?

But I am not sure how to implement this in the TREE level, is there a 
USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by 
field “array_annotated->foo” to the USE_LIST of the call to __bdos?

This might be the simplest solution?


If the dynamic object size is derived of a field then I think you need to
put the "load" of that memory location at the point (as argument)
of the __bos call right at parsing time.  I know that's awkward because
you try to play tricks "discovering" that field only late, but that's not
going to work.

A related issue is that assignment to the field and storage allocation
are not tied together - if there's no use of the size data we might
remove the store of it as dead.


Maybe the trick then is to treat the size data as volatile?  That ought 
to discourage reordering and also prevent elimination of the "dead" store?


Thanks,
Sid

Re: [PATCH 10/11] aarch64: Fix branch-protection error message tests

2023-10-23 Thread Szabolcs Nagy

The 10/13/2023 11:29, Richard Earnshaw (lists) wrote:
> On 05/09/2023 16:00, Richard Sandiford via Gcc-patches wrote:
> > Szabolcs Nagy  writes:
> >> @@ -4,19 +4,19 @@ void __attribute__ ((target("branch-protection=leaf")))
> >>  foo1 ()
> >>  {
> >>  }
> >> -/* { dg-error {invalid protection type 'leaf' in 
> >> 'target\("branch-protection="\)' pragma or attribute} "" { target *-*-* } 
> >> 5 } */
> >> +/* { dg-error {invalid argument 'leaf' for 
> >> 'target\("branch-protection="\)'} "" { target *-*-* } 5 } */
> >>  /* { dg-error {pragma or attribute 'target\("branch-protection=leaf"\)' 
> >> is not valid} "" { target *-*-* } 5 } */
> 
> 'leaf' is really a modifier for the other branch protection strategies; 
> perhaps it would be better to describe it as that.

this error message is used for arbitrary strings, e.g.
branch-protection=foobar or branch-protection=bti+foo.

with further processing we can figure out that 'leaf'
is a valid modifier for pac-ret and change the error to

invalid placement of modifier 'leaf' in 'target("branch-protection=")'

otherwise fall back to

invalid argument 'foobar' for 'target("branch-protection=")'.

does that help?

(currently 'leaf' and 'b-key' are the only modifiers.)

> But this brings up another issue/question.  If the compiler has been 
> configured with, say, '--enable-branch-protection=standard' or some other 
> variety, is there (or do we want) a way to extend that to leaf functions 
> without changing the underlying strategy?

there are several limitations in branch-protection handling,
i'm only fixing bugs and assumptions that don't work when arm
and aarch64 has different set of branch-protection options.

i think it can be useful to add/remove branch-protection options
incrementally in cflags instead of having one string, but it's
not obvious to me how to get there.

> >>  void __attribute__ ((target("branch-protection=none+pac-ret")))
> >>  foo2 ()
> >>  {
> >>  }
> >> -/* { dg-error "unexpected 'pac-ret' after 'none'" "" { target *-*-* } 12 
> >> } */
> >> +/* { dg-error {argument 'none' can only appear alone in 
> >> 'target\("branch-protection="\)'} "" { target *-*-* } 12 } */
> 
> Or maybe better still: "branch protection strategies 'none' and 'pac-ret' are 
> incompatible".

i can make this change, but e.g.

in case of branch-protection=standard+bti+foo it would
say "'standard' and 'bti' are incompatible" which can be
surprising given that standard includes bti, meanwhile
"'standard' can only appear alone" explains the problem.

> But this is all a matter of taste.
> 
> However, this patch should be merged with the patch that changes the error 
> messages.  Or has that already gone in?

i can do that merge.

Re: [PATCH] Support vec_cmpmn/vcondmn for v2hf/v4hf.

2023-10-23 Thread Richard Biener

On Mon, Oct 23, 2023 at 10:48 AM liuhongt  wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ready push to trunk.

vcond and vcondeq shouldn't be necessary if there's
vcond_mask and vcmp support which is the "modern"
way of handling vcond.  Unless the ISA really can do
compare and select with a single instruction.

Richard.

> gcc/ChangeLog:
>
> PR target/103861
> * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle
> V2HF/V2BF/V4HF/V4BFmode.
> * config/i386/mmx.md (vec_cmpv4hfqi): New expander.
> (vcondv4hf): Ditto.
> (vcondv4hi): Ditto.
> (vconduv4hi): Ditto.
> (vcond_mask_v4hi): Ditto.
> (vcond_mask_qi): Ditto.
> (vec_cmpv2hfqi): Ditto.
> (vcondv2hf): Ditto.
> (vcondv2hi): Ditto.
> (vconduv2hi): Ditto.
> (vcond_mask_v2hi): Ditto.
> * config/i386/sse.md (vcond): Merge this with ..
> (vcond): .. this into ..
> (vcond): .. this,
> and extend to V8BF/V16BF/V32BFmode.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/part-vect-vcondhf.C: New test.
> * gcc.target/i386/part-vect-vec_cmphf.c: New test.
> ---
>  gcc/config/i386/i386-expand.cc|   4 +
>  gcc/config/i386/mmx.md| 237 +-
>  gcc/config/i386/sse.md|  25 +-
>  .../g++.target/i386/part-vect-vcondhf.C   |  34 +++
>  .../gcc.target/i386/part-vect-vec_cmphf.c |  26 ++
>  5 files changed, 304 insertions(+), 22 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/part-vect-vcondhf.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-vec_cmphf.c
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 1eae9d7c78c..9658f9c5a2d 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -4198,6 +4198,8 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
> rtx op_false)
>break;
>  case E_V8QImode:
>  case E_V4HImode:
> +case E_V4HFmode:
> +case E_V4BFmode:
>  case E_V2SImode:
>if (TARGET_SSE4_1)
> {
> @@ -4207,6 +4209,8 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
> rtx op_false)
>break;
>  case E_V4QImode:
>  case E_V2HImode:
> +case E_V2HFmode:
> +case E_V2BFmode:
>if (TARGET_SSE4_1)
> {
>   gen = gen_mmx_pblendvb_v4qi;
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index 491a0a51272..b9617e9d8c6 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -61,6 +61,9 @@ (define_mode_iterator MMXMODE248 [V4HI V2SI V1DI])
>  (define_mode_iterator V_32 [V4QI V2HI V1SI V2HF V2BF])
>
>  (define_mode_iterator V2FI_32 [V2HF V2BF V2HI])
> +(define_mode_iterator V4FI_64 [V4HF V4BF V4HI])
> +(define_mode_iterator V4F_64 [V4HF V4BF])
> +(define_mode_iterator V2F_32 [V2HF V2BF])
>  ;; 4-byte integer vector modes
>  (define_mode_iterator VI_32 [V4QI V2HI])
>
> @@ -1972,10 +1975,12 @@ (define_mode_attr mov_to_sse_suffix
>[(V2HF "d") (V4HF "q") (V2HI "d") (V4HI "q")])
>
>  (define_mode_attr mmxxmmmode
> -  [(V2HF "V8HF") (V2HI "V8HI") (V2BF "V8BF")])
> +  [(V2HF "V8HF") (V2HI "V8HI") (V2BF "V8BF")
> +   (V4HF "V8HF") (V4HI "V8HI") (V4BF "V8BF")])
>
>  (define_mode_attr mmxxmmmodelower
> -  [(V2HF "v8hf") (V2HI "v8hi") (V2BF "v8bf")])
> +  [(V2HF "v8hf") (V2HI "v8hi") (V2BF "v8bf")
> +   (V4HF "v8hf") (V4HI "v8hi") (V4BF "v8bf")])
>
>  (define_expand "movd__to_sse"
>[(set (match_operand: 0 "register_operand")
> @@ -2114,6 +2119,234 @@ (define_insn_and_split "*mmx_nabs2"
>[(set (match_dup 0)
> (ior: (match_dup 1) (match_dup 2)))])
>
> +;
> +;;
> +;; Parallel half-precision floating point comparisons
> +;;
> +;
> +
> +(define_expand "vec_cmpv4hfqi"
> +  [(set (match_operand:QI 0 "register_operand")
> +   (match_operator:QI 1 ""
> + [(match_operand:V4HF 2 "nonimmediate_operand")
> +  (match_operand:V4HF 3 "nonimmediate_operand")]))]
> +  "TARGET_MMX_WITH_SSE && TARGET_AVX512FP16 && TARGET_AVX512VL
> +   && ix86_partial_vec_fp_math"
> +{
> +  rtx ops[4];
> +  ops[3] = gen_reg_rtx (V8HFmode);
> +  ops[2] = gen_reg_rtx (V8HFmode);
> +
> +  emit_insn (gen_movq_v4hf_to_sse (ops[3], operands[3]));
> +  emit_insn (gen_movq_v4hf_to_sse (ops[2], operands[2]));
> +  emit_insn (gen_vec_cmpv8hfqi (operands[0], operands[1], ops[2], ops[3]));
> +  DONE;
> +})
> +
> +(define_expand "vcondv4hf"
> +  [(set (match_operand:V4FI_64 0 "register_operand")
> +   (if_then_else:V4FI_64
> + (match_operator 3 ""
> +   [(match_operand:V4HF 4 "nonimmediate_operand")
> +(match_operand:V4HF 5 "nonimmediate_operand")])
> + (match_operand:V4FI_64 1 "general_operand")
> + (match_operand:V4FI_64

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Richard Biener

On Mon, Oct 23, 2023 at 1:27 PM Siddhesh Poyarekar  wrote:
>
> On 2023-10-23 03:57, Richard Biener wrote:
> > On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao  wrote:
> >>
> >>
> >>
> >>> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar  
> >>> wrote:
> >>>
> >>> On 2023-10-20 14:38, Qing Zhao wrote:
>  How about the following:
> Add one more parameter to __builtin_dynamic_object_size(), i.e
>  __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
>  When we see the structure field has counted_by attribute.
> >>>
> >>> Or maybe add a barrier preventing any assignments to array_annotated->foo 
> >>> from being reordered below the __bdos call? Basically an __asm__ with 
> >>> array_annotated->foo in the clobber list ought to do it I think.
> >>
> >> Maybe just adding the array_annotated->foo to the use list of the call to 
> >> __builtin_dynamic_object_size should be enough?
> >>
> >> But I am not sure how to implement this in the TREE level, is there a 
> >> USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the 
> >> counted_by field “array_annotated->foo” to the USE_LIST of the call to 
> >> __bdos?
> >>
> >> This might be the simplest solution?
> >
> > If the dynamic object size is derived of a field then I think you need to
> > put the "load" of that memory location at the point (as argument)
> > of the __bos call right at parsing time.  I know that's awkward because
> > you try to play tricks "discovering" that field only late, but that's not
> > going to work.
> >
> > A related issue is that assignment to the field and storage allocation
> > are not tied together - if there's no use of the size data we might
> > remove the store of it as dead.
>
> Maybe the trick then is to treat the size data as volatile?  That ought
> to discourage reordering and also prevent elimination of the "dead" store?

But we are an optimizing compiler, not a static analysis machine, so I
fail to see how this is a useful suggestion.

I think Martins suggestion to approach this as a language extension
is more useful and would make it easier to handle this?

Richard.

> Thanks,
> Sid

[PATCH] libgcc: make heap-based trampolines conditional on libc presence

2023-10-23 Thread Sergei Trofimovich

From: Sergei Trofimovich 

To build `libc` for a target one needs to build `gcc` without `libc`
support first. Commit r14-4823-g8abddb187b3348 "libgcc: support
heap-based trampolines" added unconditional `libc` dependency and broke
libc-less `gcc` builds.

An example failure on `x86_64-unknown-linux-gnu`:

$ mkdir -p /tmp/empty
$ ../gcc/configure \
--disable-multilib \
--without-headers \
--with-newlib \
--enable-languages=c \
--disable-bootstrap \
--disable-gcov \
--disable-threads \
--disable-shared \
--disable-libssp \
--disable-libquadmath \
--disable-libgomp \
--disable-libatomic \
--with-build-sysroot=/tmp/empty
$ make
...
/tmp/gb/./gcc/xgcc -B/tmp/gb/./gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ 
-B/usr/local/x86_64-pc-linux-gnu/lib/ -isystem 
/usr/local/x86_64-pc-linux-gnu/include -isystem 
/usr/local/x86_64-pc-linux-gnu/sys-include --sysroot=/tmp/empty   -g -O2 -O2  
-g -O2 -DIN_GCC   -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual 
-Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  -isystem 
./include  -fpic -mlong-double-80 -DUSE_ELF_SYMVER -fcf-protection -mshstk -g 
-DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -Dinhibit_libc -fpic 
-mlong-double-80 -DUSE_ELF_SYMVER -fcf-protection -mshstk -I. -I. -I../.././gcc 
-I/home/slyfox/dev/git/gcc/libgcc -I/home/slyfox/dev/git/gcc/libgcc/. 
-I/home/slyfox/dev/git/gcc/libgcc/../gcc 
-I/home/slyfox/dev/git/gcc/libgcc/../include  -DHAVE_CC_TLS  -DUSE_TLS  -o 
heap-trampoline.o -MT heap-trampoline.o -MD -MP -MF heap-trampoline.dep  -c 
.../gcc/libgcc/config/i386/heap-trampoline.c -fvisibility=hidden -DHIDE_EXPORTS
../gcc/libgcc/config/i386/heap-trampoline.c:3:10: fatal error: unistd.h: No 
such file or directory
3 | #include 
  |  ^~
compilation terminated.
make[2]: *** [.../gcc/libgcc/static-object.mk:17: heap-trampoline.o] Error 1
make[2]: Leaving directory '/tmp/gb/x86_64-pc-linux-gnu/libgcc'
make[1]: *** [Makefile:13307: all-target-libgcc] Error 2

The change inhibits any heap-based trampoline code.

libgcc/

* libgcc/config/aarch64/heap-trampoline.c: Disable when libc is
  not present.
---
 libgcc/config/aarch64/heap-trampoline.c | 5 +
 libgcc/config/i386/heap-trampoline.c| 5 +
 2 files changed, 10 insertions(+)

diff --git a/libgcc/config/aarch64/heap-trampoline.c 
b/libgcc/config/aarch64/heap-trampoline.c
index c8b83681ed7..f22233987ca 100644
--- a/libgcc/config/aarch64/heap-trampoline.c
+++ b/libgcc/config/aarch64/heap-trampoline.c
@@ -1,5 +1,8 @@
 /* Copyright The GNU Toolchain Authors. */
 
+/* libc is required to allocate trampolines.  */
+#ifndef inhibit_libc
+
 #include 
 #include 
 #include 
@@ -170,3 +173,5 @@ __builtin_nested_func_ptr_deleted (void)
   tramp_ctrl_curr = prev;
 }
 }
+
+#endif /* !inhibit_libc */
diff --git a/libgcc/config/i386/heap-trampoline.c 
b/libgcc/config/i386/heap-trampoline.c
index 96e13bf828e..4b9f4365868 100644
--- a/libgcc/config/i386/heap-trampoline.c
+++ b/libgcc/config/i386/heap-trampoline.c
@@ -1,5 +1,8 @@
 /* Copyright The GNU Toolchain Authors. */
 
+/* libc is required to allocate trampolines.  */
+#ifndef inhibit_libc
+
 #include 
 #include 
 #include 
@@ -170,3 +173,5 @@ __builtin_nested_func_ptr_deleted (void)
   tramp_ctrl_curr = prev;
 }
 }
+
+#endif /* !inhibit_libc */
-- 
2.42.0

Re: [PATCH][WIP] dwarf2out: extend to output debug section directly to object file during debug_early phase

2023-10-23 Thread Jan Hubicka

Hello,
thanks for the patch.

Overall it looks in right direction except for the code duplication in
output_die and friends.
> +/* Given a die and id, produce the appropriate abbreviations
> +   directly to lto object file */
> +
> +static void
> +output_die_abbrevs_to_object_file(unsigned long abbrev_id, dw_die_ref
> abbrev)
> +{
> +  unsigned ix;
> +  dw_attr_node *a_attr;
> +
> +  output_data_uleb128_to_object_file(abbrev_id);
> +  output_data_uleb128_to_object_file(abbrev->die_tag);
> +
> +
> +  if (abbrev->die_child != NULL)
> +output_data_to_object_file(1,DW_children_yes);
> +  else
> +output_data_to_object_file(1,DW_children_no);
> +
> +  for (ix = 0; vec_safe_iterate (abbrev->die_attr, ix, &a_attr); ix++)
> +{
> +  output_data_uleb128_to_object_file(a_attr->dw_attr);
> +  output_value_format_to_object_file(a_attr);
> +  if (value_format (a_attr) == DW_FORM_implicit_const)
> + {
> +  if (AT_class (a_attr) == dw_val_class_file_implicit)
> +{
> +  int f = maybe_emit_file (a_attr->dw_attr_val.v.val_file);
> + output_data_sleb128_to_object_file(f);
> +}
> +  else
> +  output_data_sleb128_to_object_file(a_attr->dw_attr_val.v.val_int);
> + }
> +}
> +
> +  output_data_to_object_file (1, 0);
> +  output_data_to_object_file (1, 0);

So this basically renames dw2_asm_output_data to
output_data_to_object_file and similarly for other output functions.

What would be main problems of making dw2_asm_* functions to do the
right thing when outputting to object file?
Either by conditionals or turning them to virtual functions/hooks as
Richi suggested?

It may be performance critical how quickly we sput out the bytecode.
In future we may templateize this, but right now it is likely premature
optimization.
> 
> +struct lto_simple_object
lto_simple_object is declared in lto frontend.  Why do you need to
duplicate it here?

It looks like adding relocations should be abstracted by lto API,
so you don't need to look inside this structure that is
lto/lto-object.cc only.

> +/* Output one line number table into the .debug_line section.  */
> +
> +static void
> +output_one_line_info_table (dw_line_info_table *table)
It is hard to tell from the diff.  Did you just moved these functions
earlier in source file?

Honza

Re: [PATCH] libgcc: make heap-based trampolines conditional on libc presence

2023-10-23 Thread Iain Sandoe

hi Sergei,

> On 23 Oct 2023, at 13:43, Sergei Trofimovich  wrote:
> 
> From: Sergei Trofimovich 
> 
> To build `libc` for a target one needs to build `gcc` without `libc`
> support first. Commit r14-4823-g8abddb187b3348 "libgcc: support
> heap-based trampolines" added unconditional `libc` dependency and broke
> libc-less `gcc` builds.
> 
> An example failure on `x86_64-unknown-linux-gnu`:
> 
>$ mkdir -p /tmp/empty
>$ ../gcc/configure \
>--disable-multilib \
>--without-headers \
>--with-newlib \
>--enable-languages=c \
>--disable-bootstrap \
>--disable-gcov \
>--disable-threads \
>--disable-shared \
>--disable-libssp \
>--disable-libquadmath \
>--disable-libgomp \
>--disable-libatomic \
>--with-build-sysroot=/tmp/empty
>$ make
>...
>/tmp/gb/./gcc/xgcc -B/tmp/gb/./gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ 
> -B/usr/local/x86_64-pc-linux-gnu/lib/ -isystem 
> /usr/local/x86_64-pc-linux-gnu/include -isystem 
> /usr/local/x86_64-pc-linux-gnu/sys-include --sysroot=/tmp/empty   -g -O2 -O2  
> -g -O2 -DIN_GCC   -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual 
> -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  -isystem 
> ./include  -fpic -mlong-double-80 -DUSE_ELF_SYMVER -fcf-protection -mshstk -g 
> -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -Dinhibit_libc -fpic 
> -mlong-double-80 -DUSE_ELF_SYMVER -fcf-protection -mshstk -I. -I. 
> -I../.././gcc -I/home/slyfox/dev/git/gcc/libgcc 
> -I/home/slyfox/dev/git/gcc/libgcc/. -I/home/slyfox/dev/git/gcc/libgcc/../gcc 
> -I/home/slyfox/dev/git/gcc/libgcc/../include  -DHAVE_CC_TLS  -DUSE_TLS  -o 
> heap-trampoline.o -MT heap-trampoline.o -MD -MP -MF heap-trampoline.dep  -c 
> .../gcc/libgcc/config/i386/heap-trampoline.c -fvisibility=hidden 
> -DHIDE_EXPORTS
>../gcc/libgcc/config/i386/heap-trampoline.c:3:10: fatal error: unistd.h: 
> No such file or directory
>3 | #include 
>  |  ^~
>compilation terminated.
>make[2]: *** [.../gcc/libgcc/static-object.mk:17: heap-trampoline.o] Error 
> 1
>make[2]: Leaving directory '/tmp/gb/x86_64-pc-linux-gnu/libgcc'
>make[1]: *** [Makefile:13307: all-target-libgcc] Error 2
> 
> The change inhibits any heap-based trampoline code.

That looks reasonable to me (I was considering using __has_include(), but the 
inhibit_libc is neater).

The fact that this first compiler is buit without heap-trampoline support, 
would become relevant, I guess if libc wanted to use them, it would need 
another iteration.

so, it looks fine, but I cannot actually approve it.
Iain

> 
> libgcc/
> 
>   * libgcc/config/aarch64/heap-trampoline.c: Disable when libc is
> not present.
> ---
> libgcc/config/aarch64/heap-trampoline.c | 5 +
> libgcc/config/i386/heap-trampoline.c| 5 +
> 2 files changed, 10 insertions(+)
> 
> diff --git a/libgcc/config/aarch64/heap-trampoline.c 
> b/libgcc/config/aarch64/heap-trampoline.c
> index c8b83681ed7..f22233987ca 100644
> --- a/libgcc/config/aarch64/heap-trampoline.c
> +++ b/libgcc/config/aarch64/heap-trampoline.c
> @@ -1,5 +1,8 @@
> /* Copyright The GNU Toolchain Authors. */
> 
> +/* libc is required to allocate trampolines.  */
> +#ifndef inhibit_libc
> +
> #include 
> #include 
> #include 
> @@ -170,3 +173,5 @@ __builtin_nested_func_ptr_deleted (void)
>   tramp_ctrl_curr = prev;
> }
> }
> +
> +#endif /* !inhibit_libc */
> diff --git a/libgcc/config/i386/heap-trampoline.c 
> b/libgcc/config/i386/heap-trampoline.c
> index 96e13bf828e..4b9f4365868 100644
> --- a/libgcc/config/i386/heap-trampoline.c
> +++ b/libgcc/config/i386/heap-trampoline.c
> @@ -1,5 +1,8 @@
> /* Copyright The GNU Toolchain Authors. */
> 
> +/* libc is required to allocate trampolines.  */
> +#ifndef inhibit_libc
> +
> #include 
> #include 
> #include 
> @@ -170,3 +173,5 @@ __builtin_nested_func_ptr_deleted (void)
>   tramp_ctrl_curr = prev;
> }
> }
> +
> +#endif /* !inhibit_libc */
> -- 
> 2.42.0
>

Re: [PATCH][WIP] libiberty: Support for relocation output

2023-10-23 Thread Rishi Raj

On Mon, 23 Oct 2023 at 16:30, Jan Hubicka  wrote:

> > This patch teaches libiberty to output X86-64 Relocations.
> Hello,
> for actual patch submission you will need to add changelog :)
>
I know, right :).

> > diff --git a/libiberty/simple-object-elf.c
> b/libiberty/simple-object-elf.c
> > index 86b7a27dc74..0bbaf4b489f 100644
> > --- a/libiberty/simple-object-elf.c
> > +++ b/libiberty/simple-object-elf.c
> > @@ -238,6 +238,7 @@ typedef struct
> >  #define STT_NOTYPE 0 /* Symbol type is unspecified */
> >  #define STT_OBJECT 1 /* Symbol is a data object */
> >  #define STT_FUNC 2 /* Symbol is a code object */
> > +#define STT_SECTION 3 /* Symbol is associate with a section */
> Associated I guess.
> >  #define STT_TLS 6 /* Thread local data object */
> >  #define STT_GNU_IFUNC 10 /* Symbol is an indirect code object */
> >
> > @@ -248,6 +249,63 @@ typedef struct
> >  #define STV_DEFAULT 0 /* Visibility is specified by binding type */
> >  #define STV_HIDDEN 2 /* Can only be seen inside currect component */
> >
> > +typedef struct
> > +{
> > +  unsigned char r_offset[4]; /* Address */
> > +  unsigned char r_info[4];  /* relocation type and symbol index */
> > +} Elf32_External_Rel;
> > +
> > +typedef struct
> > +{
> > +  unsigned char r_offset[8]; /* Address */
> > +  unsigned char r_info[8]; /* Relocation type and symbol index */
> > +} Elf64_External_Rel;
> > +typedef struct
> > +{
> > +  unsigned char r_offset[4]; /* Address */
> > +  unsigned char r_info[4];  /* Relocation type and symbol index */
> > +  char r_addend[4]; /* Addend */
> > +} Elf32_External_Rela;
> > +typedef struct
> > +{
> > +  unsigned char r_offset[8]; /* Address */
> > +  unsigned char r_info[8]; /* Relocation type and symbol index */
> > +  unsigned char r_addend[8]; /* Addend */
> > +} Elf64_External_Rela;
> > +
> > +/* How to extract and insert information held in the r_info field.  */
> > +
> > +#define ELF32_R_SYM(val) ((val) >> 8)
> > +#define ELF32_R_TYPE(val) ((val) & 0xff)
> > +#define ELF32_R_INFO(sym, type) (((sym) << 8) + ((type) & 0xff))
> > +
> > +#define ELF64_R_SYM(i) ((i) >> 32)
> > +#define ELF64_R_TYPE(i) ((i) & 0x)
> > +#define ELF64_R_INFO(sym,type) unsigned long) (sym)) << 32) +
> (type))
> > +
> > +/* AMD x86-64 relocations.  */
> > +#define R_X86_64_NONE 0 /* No reloc */
> > +#define R_X86_64_64 1 /* Direct 64 bit  */
> > +#define R_X86_64_PC32 2 /* PC relative 32 bit signed */
> > +#define R_X86_64_GOT32 3 /* 32 bit GOT entry */
> > +#define R_X86_64_PLT32 4 /* 32 bit PLT address */
> > +#define R_X86_64_COPY 5 /* Copy symbol at runtime */
> > +#define R_X86_64_GLOB_DAT 6 /* Create GOT entry */
> > +#define R_X86_64_JUMP_SLOT 7 /* Create PLT entry */
> > +#define R_X86_64_RELATIVE 8 /* Adjust by program base */
> > +#define R_X86_64_GOTPCREL 9 /* 32 bit signed PC relative
> > +   offset to GOT */
> > +#define R_X86_64_32 10 /* Direct 32 bit zero extended */
> > +#define R_X86_64_32S 11 /* Direct 32 bit sign extended */
> > +#define R_X86_64_16 12 /* Direct 16 bit zero extended */
>
> This will eventually need to go into per-architecture table.
> You support only those needed for Dwarf2out ouptut, right?
>
Yeah, as of now.

>
> I think we need Iant's opinion on thi part of patch (he is the
> maintainer of simple-object) but to me it looks reasonable. For longer
> term it will be necessary to think how to make this extensible to other
> architectures without writting too much of code.  (have some more
> declarative way to specify relocations we output)
>
Make sense.

>
> Honza
>

Re: [PATCH][WIP] dwarf2out: extend to output debug section directly to object file during debug_early phase

2023-10-23 Thread Rishi Raj

On Mon, 23 Oct 2023 at 18:18, Jan Hubicka  wrote:

> Hello,
> thanks for the patch.
>
> Overall it looks in right direction except for the code duplication in
> output_die and friends.
> > +/* Given a die and id, produce the appropriate abbreviations
> > +   directly to lto object file */
> > +
> > +static void
> > +output_die_abbrevs_to_object_file(unsigned long abbrev_id, dw_die_ref
> > abbrev)
> > +{
> > +  unsigned ix;
> > +  dw_attr_node *a_attr;
> > +
> > +  output_data_uleb128_to_object_file(abbrev_id);
> > +  output_data_uleb128_to_object_file(abbrev->die_tag);
> > +
> > +
> > +  if (abbrev->die_child != NULL)
> > +output_data_to_object_file(1,DW_children_yes);
> > +  else
> > +output_data_to_object_file(1,DW_children_no);
> > +
> > +  for (ix = 0; vec_safe_iterate (abbrev->die_attr, ix, &a_attr); ix++)
> > +{
> > +  output_data_uleb128_to_object_file(a_attr->dw_attr);
> > +  output_value_format_to_object_file(a_attr);
> > +  if (value_format (a_attr) == DW_FORM_implicit_const)
> > + {
> > +  if (AT_class (a_attr) == dw_val_class_file_implicit)
> > +{
> > +  int f = maybe_emit_file (a_attr->dw_attr_val.v.val_file);
> > + output_data_sleb128_to_object_file(f);
> > +}
> > +  else
> > +  output_data_sleb128_to_object_file(a_attr->dw_attr_val.v.val_int);
> > + }
> > +}
> > +
> > +  output_data_to_object_file (1, 0);
> > +  output_data_to_object_file (1, 0);
>
> So this basically renames dw2_asm_output_data to
> output_data_to_object_file and similarly for other output functions.
>
> What would be main problems of making dw2_asm_* functions to do the
> right thing when outputting to object file?
> Either by conditionals or turning them to virtual functions/hooks as
> Richi suggested?
>
I think it's doable via conditionals. Can you explain the second approach
in more detail?


>
> It may be performance critical how quickly we sput out the bytecode.
> In future we may templateize this, but right now it is likely premature
> optimization.
>
Cool.

> >
> > +struct lto_simple_object
> lto_simple_object is declared in lto frontend.  Why do you need to
> duplicate it here?
>
> It looks like adding relocations should be abstracted by lto API,
> so you don't need to look inside this structure that is
> lto/lto-object.cc only.
>
I should have taken this approach, but instead, I exposed simple objects to
dwarf2out.
That's the reason to duplicate the above struct. I will take care of this
while refactoring
and abstracting it by lto API


>
> > +/* Output one line number table into the .debug_line section.  */
> > +
> > +static void
> > +output_one_line_info_table (dw_line_info_table *table)
> It is hard to tell from the diff.  Did you just moved these functions
> earlier in source file?
>
Yeah. I will refactor the dwarf2out soon to clear these confusions.

-- 
Rishi


>
> Honza
>

[pushed] configure, libquadmath: Remove unintended AC_CHECK_LIBM [PR111928]

2023-10-23 Thread Iain Sandoe

This is a partial reversion of r14-4825-g6a6d3817afa02b to remove an
unintended change.

Tested with x86_64-linux X arm-none-eabi (and  x86_64-darwin X arm-non-eabi
and native x86_64-darwin bootstrap.  Also reported by the OP to fix the
issue, pushed to trunk, apologies for the breakage,
Iain

--- 8< ---

This was a rebase error, that managed to pass testing on Darwin and
Linux (but fails on bare metal).

PR libquadmath/111928

libquadmath/ChangeLog:

* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Remove AC_CHECK_LIBM.

Signed-off-by: Iain Sandoe 
---
 libquadmath/Makefile.in  |   1 -
 libquadmath/configure| 147 +--
 libquadmath/configure.ac |   2 -
 3 files changed, 2 insertions(+), 148 deletions(-)

diff --git a/libquadmath/Makefile.in b/libquadmath/Makefile.in
index 068af559457..dbcafb57e5b 100644
--- a/libquadmath/Makefile.in
+++ b/libquadmath/Makefile.in
@@ -355,7 +355,6 @@ INSTALL_SCRIPT = @INSTALL_SCRIPT@
 INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@
 LD = @LD@
 LDFLAGS = @LDFLAGS@
-LIBM = @LIBM@
 LIBOBJS = @LIBOBJS@
 LIBS = @LIBS@
 LIBTOOL = @LIBTOOL@
diff --git a/libquadmath/configure b/libquadmath/configure
index 5bd9a070fdc..fd527458285 100755
--- a/libquadmath/configure
+++ b/libquadmath/configure
@@ -644,7 +644,6 @@ LIBQUAD_USE_SYMVER_GNU_FALSE
 LIBQUAD_USE_SYMVER_GNU_TRUE
 LIBQUAD_USE_SYMVER_FALSE
 LIBQUAD_USE_SYMVER_TRUE
-LIBM
 toolexeclibdir
 toolexecdir
 MAINT
@@ -10922,7 +10921,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 10925 "configure"
+#line 10924 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11028,7 +11027,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11031 "configure"
+#line 11030 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12261,148 +12260,6 @@ esac
 
 
 
-LIBM=
-case $host in
-*-*-beos* | *-*-cegcc* | *-*-cygwin* | *-*-haiku* | *-*-pw32* | *-*-darwin*)
-  # These system don't have libm, or don't need it
-  ;;
-*-ncr-sysv4.3*)
-  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for _mwvalidcheckl in 
-lmw" >&5
-$as_echo_n "checking for _mwvalidcheckl in -lmw... " >&6; }
-if ${ac_cv_lib_mw__mwvalidcheckl+:} false; then :
-  $as_echo_n "(cached) " >&6
-else
-  ac_check_lib_save_LIBS=$LIBS
-LIBS="-lmw  $LIBS"
-if test x$gcc_no_link = xyes; then
-  as_fn_error $? "Link tests are not allowed after GCC_NO_EXECUTABLES." 
"$LINENO" 5
-fi
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-
-/* Override any GCC internal prototype to avoid an error.
-   Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.  */
-#ifdef __cplusplus
-extern "C"
-#endif
-char _mwvalidcheckl ();
-int
-main ()
-{
-return _mwvalidcheckl ();
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"; then :
-  ac_cv_lib_mw__mwvalidcheckl=yes
-else
-  ac_cv_lib_mw__mwvalidcheckl=no
-fi
-rm -f core conftest.err conftest.$ac_objext \
-conftest$ac_exeext conftest.$ac_ext
-LIBS=$ac_check_lib_save_LIBS
-fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_mw__mwvalidcheckl" 
>&5
-$as_echo "$ac_cv_lib_mw__mwvalidcheckl" >&6; }
-if test "x$ac_cv_lib_mw__mwvalidcheckl" = xyes; then :
-  LIBM="-lmw"
-fi
-
-  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for cos in -lm" >&5
-$as_echo_n "checking for cos in -lm... " >&6; }
-if ${ac_cv_lib_m_cos+:} false; then :
-  $as_echo_n "(cached) " >&6
-else
-  ac_check_lib_save_LIBS=$LIBS
-LIBS="-lm  $LIBS"
-if test x$gcc_no_link = xyes; then
-  as_fn_error $? "Link tests are not allowed after GCC_NO_EXECUTABLES." 
"$LINENO" 5
-fi
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-
-/* Override any GCC internal prototype to avoid an error.
-   Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.  */
-#ifdef __cplusplus
-extern "C"
-#endif
-char cos ();
-int
-main ()
-{
-return cos ();
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"; then :
-  ac_cv_lib_m_cos=yes
-else
-  ac_cv_lib_m_cos=no
-fi
-rm -f core conftest.err conftest.$ac_objext \
-conftest$ac_exeext conftest.$ac_ext
-LIBS=$ac_check_lib_save_LIBS
-fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_m_cos" >&5
-$as_echo "$ac_cv_lib_m_cos" >&6; }
-if test "x$ac_cv_lib_m_cos" = xyes; then :
-  LIBM="$LIBM -lm"
-fi
-
-  ;;
-*)
-  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for cos in -lm" >&5
-$as_echo_n "checking for cos in -lm... " >&6; }
-if ${ac_cv_lib_m_cos+:} false; then :
-  $as_echo_n "(cached) " >&6
-else
-  ac_check_lib_save_LIBS=$LIBS
-LIBS="-lm  $LIBS"
-if test x$gcc_no_link = xyes; then
-  as_fn_error $? "Link tests are not allowed after GCC_NO_EXECUTABLES." 
"$LINENO" 5
-fi
-cat confdefs.h - <<_AC

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Siddhesh Poyarekar


On 2023-10-23 08:34, Richard Biener wrote:

A related issue is that assignment to the field and storage allocation
are not tied together - if there's no use of the size data we might
remove the store of it as dead.


Maybe the trick then is to treat the size data as volatile?  That ought
to discourage reordering and also prevent elimination of the "dead" store?


But we are an optimizing compiler, not a static analysis machine, so I
fail to see how this is a useful suggestion.


Sorry I didn't meant to suggest doing this in the middle-end.


I think Martins suggestion to approach this as a language extension
is more useful and would make it easier to handle this?


I think handling for this (e.g. treating any storage allocated for the 
size member in the struct as volatile to prevent reordering or 
elimination) would have to be implemented in the front-end, regardless 
of whether it is a language extension or as a gcc attribute.  How would 
making it a language extension vs a gcc attribute make it different?


Thanks,
Sid

[SH][committed] Fix PR 111001

2023-10-23 Thread Oleg Endo

The attached patch fixes PR 111001.

Committed to master, cherry-picked to GCC-13, GCC-12 and GCC-11.
Sanity tested with 'make all-gcc'.
Bootstrapped on GCC-13 sh4-linux by Adrian.

Cheers,
Oleg

gcc/ChangeLog:

PR target/111001
* config/sh/sh_treg_combine.cc (sh_treg_combine::record_set_of_reg):
Skip over nop move insns.

From 4414818f4e5de54ea3c353e2ebb2e79a89ae211b Mon Sep 17 00:00:00 2001
From: Oleg Endo 
Date: Mon, 23 Oct 2023 22:08:37 +0900
Subject: [PATCH] SH: Fix PR 111001

gcc/ChangeLog:

	PR target/111001
	* config/sh/sh_treg_combine.cc (sh_treg_combine::record_set_of_reg):
	Skip over nop move insns.
---
 gcc/config/sh/sh_treg_combine.cc |  9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/config/sh/sh_treg_combine.cc b/gcc/config/sh/sh_treg_combine.cc
index f6553c0..685ca54 100644
--- a/gcc/config/sh/sh_treg_combine.cc
+++ b/gcc/config/sh/sh_treg_combine.cc
@@ -731,9 +731,16 @@ sh_treg_combine::record_set_of_reg (rtx reg, rtx_insn *start_insn,
 	  new_entry.cstore_type = cstore_inverted;
 	}
   else if (REG_P (new_entry.cstore.set_src ()))
 	{
-	  // If it's a reg-reg copy follow the copied reg.
+	  // If it's a reg-reg copy follow the copied reg, but ignore
+	  // nop copies of the reg onto itself.
+	  if (REGNO (new_entry.cstore.set_src ()) == REGNO (reg))
+	{
+	  i = prev_nonnote_nondebug_insn_bb (i);
+	  continue;
+	}
+
 	  new_entry.cstore_reg_reg_copies.push_back (new_entry.cstore);
 	  reg = new_entry.cstore.set_src ();
 	  i = new_entry.cstore.insn;
 
--
libgit2 1.3.2

[PATCH] ipa/111914 - perform parameter init after remapping types

2023-10-23 Thread Richard Biener

The following addresses a mismatch in SSA name vs. symbol when
we emit a dummy assignment when not optimizing.  The temporary
we create is not remapped by initialize_inlined_parameters because
we have no easy way to get at it.  The following instead emits
the additional statement after we have remapped the type of
the replacement variable.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR ipa/111914
* tree-inline.cc (setup_one_parameter): Move code emitting
a dummy load when not optimizing ...
(initialize_inlined_parameters): ... here to after when
we remapped the parameter type.

* gcc.dg/pr111914.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr111914.c | 14 ++
 gcc/tree-inline.cc  | 26 ++
 2 files changed, 32 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr111914.c

diff --git a/gcc/testsuite/gcc.dg/pr111914.c b/gcc/testsuite/gcc.dg/pr111914.c
new file mode 100644
index 000..05804bddcb9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr111914.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" } */
+
+__attribute__((always_inline))
+static inline void f(int n, int (*a())[n])
+{
+  /* Unused 'a'.  */
+}
+
+void g(void)
+{
+  int (*a())[1];
+  f(1, a);
+}
diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
index d63060c9429..69387b525f9 100644
--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -3578,9 +3578,7 @@ setup_one_parameter (copy_body_data *id, tree p, tree 
value, tree fn,
 
   STRIP_USELESS_TYPE_CONVERSION (rhs);
 
-  /* If we are in SSA form properly remap the default definition
- or assign to a dummy SSA name if the parameter is unused and
-we are not optimizing.  */
+  /* If we are in SSA form properly remap the default definition.  */
   if (gimple_in_ssa_p (cfun) && is_gimple_reg (p))
{
  if (def)
@@ -3590,11 +3588,6 @@ setup_one_parameter (copy_body_data *id, tree p, tree 
value, tree fn,
  SSA_NAME_IS_DEFAULT_DEF (def) = 0;
  set_ssa_default_def (cfun, var, NULL);
}
- else if (!optimize)
-   {
- def = make_ssa_name (var);
- init_stmt = gimple_build_assign (def, rhs);
-   }
}
   else if (!is_empty_type (TREE_TYPE (var)))
 init_stmt = gimple_build_assign (var, rhs);
@@ -3653,6 +3646,23 @@ initialize_inlined_parameters (copy_body_data *id, 
gimple *stmt,
  && SSA_NAME_VAR (*defp) == var)
TREE_TYPE (*defp) = TREE_TYPE (var);
}
+ /* When not optimizing and the parameter is unused, assign to
+a dummy SSA name.  Do this after remapping the type above.  */
+ else if (!optimize
+  && is_gimple_reg (p)
+  && i < gimple_call_num_args (stmt))
+   {
+ tree val = gimple_call_arg (stmt, i);
+ if (val != error_mark_node)
+   {
+ if (!useless_type_conversion_p (TREE_TYPE (p),
+ TREE_TYPE (val)))
+   val = force_value_to_type (TREE_TYPE (p), val);
+ def = make_ssa_name (var);
+ gimple *init_stmt = gimple_build_assign (def, val);
+ insert_init_stmt (id, bb, init_stmt);
+   }
+   }
}
 }
 
-- 
2.35.3

[PATCH] tree-optimization/111915 - mixing grouped and non-grouped accesses

2023-10-23 Thread Richard Biener

The change to allow SLP of non-grouped accesses failed to check
for the case of mixing with grouped accesses.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111915
* tree-vect-slp.cc (vect_build_slp_tree_1): Check all
accesses are either grouped or not.

* gcc.dg/vect/pr111915.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr111915.c | 12 
 gcc/tree-vect-slp.cc |  3 +++
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr111915.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr111915.c 
b/gcc/testsuite/gcc.dg/vect/pr111915.c
new file mode 100644
index 000..8614bac519c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr111915.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-tree-vrp -fno-tree-dominator-opts 
-fno-tree-ccp" } */
+
+void
+foo (int * __restrict a, int * __restrict b, int * __restrict w)
+{
+  for (int i = 0; i < 16; ++i)
+{
+  *a += w[2*i+0];
+  *b += w[2*i&1];
+}
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 24bf6582f8d..5eb310eceaf 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1297,6 +1297,9 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
|| rhs_code == INDIRECT_REF
|| rhs_code == COMPONENT_REF
|| rhs_code == MEM_REF)))
+ || (ldst_p
+ && (STMT_VINFO_GROUPED_ACCESS (stmt_info)
+ != STMT_VINFO_GROUPED_ACCESS (first_stmt_info)))
  || (ldst_p
  && (STMT_VINFO_GATHER_SCATTER_P (stmt_info)
  != STMT_VINFO_GATHER_SCATTER_P (first_stmt_info)))
-- 
2.35.3

Backport PR106878 fixes to GCC 12

2023-10-23 Thread Alex Coplan

Hi,

I'd like to submit the attached three patches from Jakub for backporting to GCC 
12.
These are the backports proposed at 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106878#c18 i.e.:

r13-2658-g645ef01a463f15fc230e2155719c7a12cec89acf without the gimple verifier 
changes.
r13-2709-g9ac9fde961f76879f0379ff3b2494a2f9ac915f7
r13-2891-gcb8f25c5dc9f6d5207c826c2dafe25f68458ceaf

For the first patch, I was thinking of backporting it with the cover letter as
follows:

Disallow pointer operands for |, ^ and partly & [PR106878]

This is a backport of 645ef01a463f15fc230e2155719c7a12cec89acf without the
changes to verify_gimple_assign_binary from the original patch.  The original
cover letter for the patch is as follows:

My change to match.pd (that added the two simplifications this patch
touches) results in more |/^/& assignments with pointer arguments,
but since r12-1608 we reject pointer operands for BIT_NOT_EXPR.

Disallowing them for BIT_NOT_EXPR and allowing for BIT_{IOR,XOR,AND}_EXPR
leads to a match.pd maintainance nightmare (see one of the patches in the
PR), so either we want to allow pointer operand on BIT_NOT_EXPR (but then
we run into issues e.g. with the ranger which expects it can emulate
BIT_NOT_EXPR ~X as - 1 - X which doesn't work for pointers which don't
support MINUS_EXPR), or the following patch disallows pointer arguments
for all of BIT_{IOR,XOR,AND}_EXPR with the exception of BIT_AND_EXPR
with INTEGER_CST last operand (for simpler pointer realignment).
I had to tweak one reassoc optimization and the two match.pd
simplifications.

2022-09-14  Jakub Jelinek  

PR tree-optimization/106878
* match.pd ((type) X op CST -> (type) (X op ((type-x) CST)),
(type) (((type2) X) op Y) -> (X op (type) Y)): Punt for
POINTER_TYPE_P or OFFSET_TYPE.
* tree-ssa-reassoc.cc (optimize_range_tests_cmp_bitwise): For
pointers cast them to pointer sized integers first.

* gcc.c-torture/compile/pr106878.c: New test.

--

The other two patches can then be simple cherry picks.  (Or we could squash them
into a single patch, if that's deemed preferable).

I've bootstrapped and tested these on top of the GCC 12 branch on both
x86_64-linux-gnu and aarch64-linux-gnu, and there were no regressions.

OK for the GCC 12 branch?

Thanks,
Alex
commit 557c126f9fbdcde256f134d4ed34ff305387fd41
Author: Jakub Jelinek 
Date:   Wed Sep 14 11:36:36 2022

Disallow pointer operands for |, ^ and partly & [PR106878]

This is a backport of 645ef01a463f15fc230e2155719c7a12cec89acf without the
changes to verify_gimple_assign_binary from the original patch.  The 
original
cover letter for the patch is as follows:

My change to match.pd (that added the two simplifications this patch
touches) results in more |/^/& assignments with pointer arguments,
but since r12-1608 we reject pointer operands for BIT_NOT_EXPR.

Disallowing them for BIT_NOT_EXPR and allowing for BIT_{IOR,XOR,AND}_EXPR
leads to a match.pd maintainance nightmare (see one of the patches in the
PR), so either we want to allow pointer operand on BIT_NOT_EXPR (but then
we run into issues e.g. with the ranger which expects it can emulate
BIT_NOT_EXPR ~X as - 1 - X which doesn't work for pointers which don't
support MINUS_EXPR), or the following patch disallows pointer arguments
for all of BIT_{IOR,XOR,AND}_EXPR with the exception of BIT_AND_EXPR
with INTEGER_CST last operand (for simpler pointer realignment).
I had to tweak one reassoc optimization and the two match.pd
simplifications.

2022-09-14  Jakub Jelinek  

PR tree-optimization/106878
* match.pd ((type) X op CST -> (type) (X op ((type-x) CST)),
(type) (((type2) X) op Y) -> (X op (type) Y)): Punt for
POINTER_TYPE_P or OFFSET_TYPE.
* tree-ssa-reassoc.cc (optimize_range_tests_cmp_bitwise): For
pointers cast them to pointer sized integers first.

* gcc.c-torture/compile/pr106878.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index a9aae484b2b..cbae09dfb28 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1663,6 +1663,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 && (int_fits_type_p (@1, TREE_TYPE (@0))
 || tree_nop_conversion_p (TREE_TYPE (@0), type)))
|| types_match (@0, @1))
+   && !POINTER_TYPE_P (TREE_TYPE (@0))
+   && TREE_CODE (TREE_TYPE (@0)) != OFFSET_TYPE
/* ???  This transform conflicts with fold-const.cc doing
  Convert (T)(x & c) into (T)x & (T)c, if c is an integer
  constants (if x has signed type, the sign bit cannot be set
@@ -1699,7 +1701,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (GIMPLE
&& TREE_CODE (@1) != INTEGER_CST
&& tree_nop_conversion_p (type, TREE_TYPE (@2))
-   && types_match (type, @0))
+   && types_match (type, @0)
+   && !POINTER_TYPE_P (TREE_TYPE (@0))
+   && TREE

[PATCH] tree-optimization/111916 - SRA of BIT_FIELD_REF of constant pool entries

2023-10-23 Thread Richard Biener

The following adjusts a leftover BIT_FIELD_REF special-casing to only
cover the cases general code doesn't handle.

Boostrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111916
* tree-sra.cc (sra_modify_assign): Do not lower all
BIT_FIELD_REF reads that are sra_handled_bf_read_p.

* gcc.dg/torture/pr111916.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr111916.c | 16 
 gcc/tree-sra.cc |  3 ++-
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr111916.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr111916.c 
b/gcc/testsuite/gcc.dg/torture/pr111916.c
new file mode 100644
index 000..2873045aaa4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr111916.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+
+#pragma pack(1)
+struct A {
+  int b : 4;
+  int c : 11;
+  int d : 2;
+  int e : 5;
+} f;
+int main()
+{
+  struct A g = {1, 1, 1, 1};
+  while (!g.b)
+f = g;
+  return 0;
+}
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index f8dff8b27d7..b985dee6964 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -4275,7 +4275,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
*gsi)
 
   if (TREE_CODE (rhs) == REALPART_EXPR || TREE_CODE (lhs) == REALPART_EXPR
   || TREE_CODE (rhs) == IMAGPART_EXPR || TREE_CODE (lhs) == IMAGPART_EXPR
-  || TREE_CODE (rhs) == BIT_FIELD_REF || TREE_CODE (lhs) == BIT_FIELD_REF)
+  || (TREE_CODE (rhs) == BIT_FIELD_REF && !sra_handled_bf_read_p (rhs))
+  || TREE_CODE (lhs) == BIT_FIELD_REF)
 {
   modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (stmt),
  gsi, false);
-- 
2.35.3

Re: [PATCH v1 1/1] gcc: config: microblaze: fix cpu version check

2023-10-23 Thread Mark Hatle


Not sure if this will work, but there is a strverscmp function in libiberty 
that I think will work.

So the microblaze version compare could be done as:

#define MICROBLAZE_VERSION_COMPARE(VA,VB) strvercmp(VA, VB)

(I've not tried this, just remembered doing something similar in the past.)

--Mark

On 10/23/23 12:48 AM, Neal Frager wrote:

There is a microblaze cpu version 10.0 included in versal. If the
minor version is only a single digit, then the version comparison
will fail as version 10.0 will appear as 100 compared to version
6.00 or 8.30 which will calculate to values 600 and 830.

The issue can be seen when using the '-mcpu=10.0' option.

With this fix, versions with a single digit minor number such as
10.0 will be calculated as greater than versions with a smaller
major version number, but with two minor version digits.

By applying this fix, several incorrect warning messages will no
longer be printed when building the versal plm application, such
as the warning message below:

warning: '-mxl-multiply-high' can be used only with '-mcpu=v6.00.a' or greater

Signed-off-by: Neal Frager 
---
  gcc/config/microblaze/microblaze.cc | 164 +---
  1 file changed, 76 insertions(+), 88 deletions(-)

diff --git a/gcc/config/microblaze/microblaze.cc 
b/gcc/config/microblaze/microblaze.cc
index c9f6c4198cf..6e1555f6eb3 100644
--- a/gcc/config/microblaze/microblaze.cc
+++ b/gcc/config/microblaze/microblaze.cc
@@ -56,8 +56,6 @@
  /* This file should be included last.  */
  #include "target-def.h"
  
-#define MICROBLAZE_VERSION_COMPARE(VA,VB) strcasecmp (VA, VB)

-
  /* Classifies an address.
  
  ADDRESS_INVALID

@@ -1297,12 +1295,73 @@ microblaze_expand_block_move (rtx dest, rtx src, rtx 
length, rtx align_rtx)
return false;
  }
  
+/*  Convert a version number of the form "vX.YY.Z" to an integer encoding

+for easier range comparison.  */
+static int
+microblaze_version_to_int (const char *version)
+{
+  const char *p, *v;
+  const char *tmpl = "vXX.YY.Z";
+  int iver1 =0, iver2 =0, iver3 =0;
+
+  p = version;
+  v = tmpl;
+
+  while (*p)
+{
+  if (*v == 'X')
+   {   /* Looking for major  */
+ if (*p == '.')
+   *v++;
+ else
+   {
+ if (!(*p >= '0' && *p <= '9'))
+   return -1;
+ iver1 += (int) (*p - '0');
+ iver1 *= 1000;
+   }
+   }
+  else if (*v == 'Y')
+   {   /* Looking for minor  */
+ if (!(*p >= '0' && *p <= '9'))
+   return -1;
+ iver2 += (int) (*p - '0');
+ iver2 *= 10;
+   }
+  else if (*v == 'Z')
+   {   /* Looking for compat  */
+ if (!(*p >= 'a' && *p <= 'z'))
+   return -1;
+ iver3 = (int) (*p - 'a');
+   }
+  else
+   {
+ if (*p != *v)
+   return -1;
+   }
+
+  v++;
+  p++;
+}
+
+  if (*p)
+return -1;
+
+  return iver1 + iver2 + iver3;
+}
+
  static bool
  microblaze_rtx_costs (rtx x, machine_mode mode, int outer_code 
ATTRIBUTE_UNUSED,
  int opno ATTRIBUTE_UNUSED, int *total,
  bool speed ATTRIBUTE_UNUSED)
  {
int code = GET_CODE (x);
+  int ver, ver_int;
+
+  if (microblaze_select_cpu == NULL)
+microblaze_select_cpu = MICROBLAZE_DEFAULT_CPU;
+
+  ver_int = microblaze_version_to_int (microblaze_select_cpu);
  
switch (code)

  {
@@ -1345,8 +1404,8 @@ microblaze_rtx_costs (rtx x, machine_mode mode, int 
outer_code ATTRIBUTE_UNUSED,
{
if (TARGET_BARREL_SHIFT)
  {
-   if (MICROBLAZE_VERSION_COMPARE (microblaze_select_cpu, "v5.00.a")
-   >= 0)
+   ver = ver_int - microblaze_version_to_int("v5.00.a");
+   if (ver >= 0)
  *total = COSTS_N_INSNS (1);
else
  *total = COSTS_N_INSNS (2);
@@ -1407,8 +1466,8 @@ microblaze_rtx_costs (rtx x, machine_mode mode, int 
outer_code ATTRIBUTE_UNUSED,
  }
else if (!TARGET_SOFT_MUL)
  {
-   if (MICROBLAZE_VERSION_COMPARE (microblaze_select_cpu, "v5.00.a")
-   >= 0)
+   ver = ver_int - microblaze_version_to_int("v5.00.a");
+   if (ver >= 0)
  *total = COSTS_N_INSNS (1);
else
  *total = COSTS_N_INSNS (3);
@@ -1681,72 +1740,13 @@ function_arg_partial_bytes (cumulative_args_t cum_v,
return 0;
  }
  
-/*  Convert a version number of the form "vX.YY.Z" to an integer encoding

-for easier range comparison.  */
-static int
-microblaze_version_to_int (const char *version)
-{
-  const char *p, *v;
-  const char *tmpl = "vXX.YY.Z";
-  int iver = 0;
-
-  p = version;
-  v = tmpl;
-
-  while (*p)
-{
-  if (*v == 'X')
-   {   /* Looking for major  */
-  if (*p == '.')
-{
-  v++;
-}
-  else
-{
- if (!(*p >=

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-23 Thread Bernhard Reutner-Fischer

On Mon, 23 Oct 2023 12:16:18 +0530
Ajit Agarwal  wrote:

> Hello All:
> 
> Addressed below review comments in the version 11 of the patch.
> Please review and please let me know if its ok for trunk.

s/satisified/satisfied/

> > As said, I don't see why the below was not cleaned up before the V1 
> > submission.
> > Iff it breaks when manually CSEing, I'm curious why?

The function below looks identical in v12 of the patch.
Why didn't you use common subexpressions?

> >   
> >>> +/* Return TRUE if reg source operand of zero_extend is argument registers
> >>> +   and not return registers and source and destination operand are same
> >>> +   and mode of source and destination operand are not same.  */
> >>> +
> >>> +static bool
> >>> +abi_extension_candidate_p (rtx_insn *insn)
> >>> +{
> >>> +  rtx set = single_set (insn);
> >>> +  machine_mode dst_mode = GET_MODE (SET_DEST (set));
> >>> +  rtx orig_src = XEXP (SET_SRC (set), 0);
> >>> +
> >>> +  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
> >>> +  || abi_extension_candidate_return_reg_p (/*insn,*/ REGNO 
> >>> (orig_src)))  
> >>> +return false;
> >>> +
> >>> +  /* Mode of destination and source should be different.  */
> >>> +  if (dst_mode == GET_MODE (orig_src))
> >>> +return false;
> >>> +
> >>> +  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
> >>> +  bool promote_p = abi_target_promote_function_mode (mode);
> >>> +
> >>> +  /* REGNO of source and destination should be same if not
> >>> +  promoted.  */
> >>> +  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
> >>> +return false;
> >>> +
> >>> +  return true;
> >>> +}
> >>> +  


> > 
> > As said, please also rephrase the above (and everything else if it 
> > obviously looks akin the above).

thanks

Re: [PATCH] libcpp: Improve the diagnostic for poisoned identifiers [PR36887]

2023-10-23 Thread David Malcolm

On Wed, 2023-09-20 at 00:12 -0400, Lewis Hyatt wrote:
> Hello-
> 
> This patch implements the PR's request to add more information to the
> diagnostic issued for using a poisoned identifier. Bootstrapped +
> regtested
> all languages on x86-64 Linux. Does it look OK please? Thanks!

Thanks!

Patch looks good to me; please go ahead and push it.

Dave

[x86 PATCH] Fine tune STV register conversion costs for -Os.

2023-10-23 Thread Roger Sayle


The eagle-eyed may have spotted that my recent testcases for DImode shifts
on x86_64 included -mno-stv in the dg-options.  This is because the
Scalar-To-Vector (STV) pass currently transforms these shifts to use
SSE vector operations, producing larger code even with -Os.  The issue
is that the compute_convert_gain currently underestimates the size of
instructions required for interunit moves, which is corrected with the
patch below.

For the simple test case:

unsigned long long shl1(unsigned long long x) { return x << 1; }

without this patch, GCC -m32 -Os -mavx2 currently generates:

shl1:   push   %ebp  // 1 byte
mov%esp,%ebp // 2 bytes
vmovq  0x8(%ebp),%xmm0   // 5 bytes
pop%ebp  // 1 byte
vpaddq %xmm0,%xmm0,%xmm0 // 4 bytes
vmovd  %xmm0,%eax// 4 bytes
vpextrd $0x1,%xmm0,%edx  // 6 bytes
ret  // 1 byte  = 24 bytes total

with this patch, we now generate the shorter

shl1:   push   %ebp // 1 byte
mov%esp,%ebp// 2 bytes
mov0x8(%ebp),%eax   // 3 bytes
mov0xc(%ebp),%edx   // 3 bytes
pop%ebp // 1 byte
add%eax,%eax// 2 bytes
adc%edx,%edx// 2 bytes
ret // 1 byte  = 15 bytes total

Benchmarking using CSiBE, shows that this patch saves 1361 bytes
when compiling with -m32 -Os, and saves 172 bytes when compiling
with -Os.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-10-23  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Provide
more accurate values (sizes) for inter-unit moves with -Os.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index cead397..6fac67e 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -752,11 +752,33 @@ general_scalar_chain::compute_convert_gain ()
 fprintf (dump_file, "  Instruction conversion gain: %d\n", gain);
 
   /* Cost the integer to sse and sse to integer moves.  */
-  cost += n_sse_to_integer * ix86_cost->sse_to_integer;
-  /* ???  integer_to_sse but we only have that in the RA cost table.
- Assume sse_to_integer/integer_to_sse are the same which they
- are at the moment.  */
-  cost += n_integer_to_sse * ix86_cost->sse_to_integer;
+  if (!optimize_function_for_size_p (cfun))
+{
+  cost += n_sse_to_integer * ix86_cost->sse_to_integer;
+  /* ???  integer_to_sse but we only have that in the RA cost table.
+  Assume sse_to_integer/integer_to_sse are the same which they
+ are at the moment.  */
+  cost += n_integer_to_sse * ix86_cost->sse_to_integer;
+}
+  else if (TARGET_64BIT || smode == SImode)
+{
+  cost += n_sse_to_integer * COSTS_N_BYTES (4);
+  cost += n_integer_to_sse * COSTS_N_BYTES (4);
+}
+  else if (TARGET_SSE4_1)
+{
+  /* vmovd (4 bytes) + vpextrd (6 bytes).  */
+  cost += n_sse_to_integer * COSTS_N_BYTES (10);
+  /* vmovd (4 bytes) + vpinsrd (6 bytes).  */
+  cost += n_integer_to_sse * COSTS_N_BYTES (10);
+}
+  else
+{
+  /* movd (4 bytes) + psrlq (5 bytes) + movd (4 bytes).  */
+  cost += n_sse_to_integer * COSTS_N_BYTES (13);
+  /* movd (4 bytes) + movd (4 bytes) + unpckldq (4 bytes).  */
+  cost += n_integer_to_sse * COSTS_N_BYTES (12);
+}
 
   if (dump_file)
 fprintf (dump_file, "  Registers conversion cost: %d\n", cost);

Re: [PATCH v1 1/1] gcc: config: microblaze: fix cpu version check

2023-10-23 Thread Frager, Neal

Hi Mark,

> Le 23 oct. 2023 à 16:07, Hatle, Mark  a écrit :
> 
> Not sure if this will work, but there is a strverscmp function in libiberty 
> that I think will work.
> 
> So the microblaze version compare could be done as:
> 
> #define MICROBLAZE_VERSION_COMPARE(VA,VB) strvercmp(VA, VB)
> 
> (I've not tried this, just remembered doing something similar in the past.)
> 
> --Mark

Thank you for the good idea.  I will have a look.  The current version of the 
patch I submitted basically came from the meta-xilinx gcc patch 0024.  If there 
is already a way to version compare, we probably never should have implemented 
our own routine to being with.

I will check this out, and submit a v2 for this patch, if it works.

Best regards,
Neal Frager
AMD


> 
>> On 10/23/23 12:48 AM, Neal Frager wrote:
>> There is a microblaze cpu version 10.0 included in versal. If the
>> minor version is only a single digit, then the version comparison
>> will fail as version 10.0 will appear as 100 compared to version
>> 6.00 or 8.30 which will calculate to values 600 and 830.
>> The issue can be seen when using the '-mcpu=10.0' option.
>> With this fix, versions with a single digit minor number such as
>> 10.0 will be calculated as greater than versions with a smaller
>> major version number, but with two minor version digits.
>> By applying this fix, several incorrect warning messages will no
>> longer be printed when building the versal plm application, such
>> as the warning message below:
>> warning: '-mxl-multiply-high' can be used only with '-mcpu=v6.00.a' or 
>> greater
>> Signed-off-by: Neal Frager 
>> ---
>>  gcc/config/microblaze/microblaze.cc | 164 +---
>>  1 file changed, 76 insertions(+), 88 deletions(-)
>> diff --git a/gcc/config/microblaze/microblaze.cc 
>> b/gcc/config/microblaze/microblaze.cc
>> index c9f6c4198cf..6e1555f6eb3 100644
>> --- a/gcc/config/microblaze/microblaze.cc
>> +++ b/gcc/config/microblaze/microblaze.cc
>> @@ -56,8 +56,6 @@
>>  /* This file should be included last.  */
>>  #include "target-def.h"
>>  -#define MICROBLAZE_VERSION_COMPARE(VA,VB) strcasecmp (VA, VB)
>> -
>>  /* Classifies an address.
>>ADDRESS_INVALID
>> @@ -1297,12 +1295,73 @@ microblaze_expand_block_move (rtx dest, rtx src, rtx 
>> length, rtx align_rtx)
>>return false;
>>  }
>>  +/*  Convert a version number of the form "vX.YY.Z" to an integer encoding
>> +for easier range comparison.  */
>> +static int
>> +microblaze_version_to_int (const char *version)
>> +{
>> +  const char *p, *v;
>> +  const char *tmpl = "vXX.YY.Z";
>> +  int iver1 =0, iver2 =0, iver3 =0;
>> +
>> +  p = version;
>> +  v = tmpl;
>> +
>> +  while (*p)
>> +{
>> +  if (*v == 'X')
>> +{/* Looking for major  */
>> +  if (*p == '.')
>> +*v++;
>> +  else
>> +{
>> +  if (!(*p >= '0' && *p <= '9'))
>> +return -1;
>> +  iver1 += (int) (*p - '0');
>> +  iver1 *= 1000;
>> +}
>> +}
>> +  else if (*v == 'Y')
>> +{/* Looking for minor  */
>> +  if (!(*p >= '0' && *p <= '9'))
>> +return -1;
>> +  iver2 += (int) (*p - '0');
>> +  iver2 *= 10;
>> +}
>> +  else if (*v == 'Z')
>> +{/* Looking for compat  */
>> +  if (!(*p >= 'a' && *p <= 'z'))
>> +return -1;
>> +  iver3 = (int) (*p - 'a');
>> +}
>> +  else
>> +{
>> +  if (*p != *v)
>> +return -1;
>> +}
>> +
>> +  v++;
>> +  p++;
>> +}
>> +
>> +  if (*p)
>> +return -1;
>> +
>> +  return iver1 + iver2 + iver3;
>> +}
>> +
>>  static bool
>>  microblaze_rtx_costs (rtx x, machine_mode mode, int outer_code 
>> ATTRIBUTE_UNUSED,
>>int opno ATTRIBUTE_UNUSED, int *total,
>>bool speed ATTRIBUTE_UNUSED)
>>  {
>>int code = GET_CODE (x);
>> +  int ver, ver_int;
>> +
>> +  if (microblaze_select_cpu == NULL)
>> +microblaze_select_cpu = MICROBLAZE_DEFAULT_CPU;
>> +
>> +  ver_int = microblaze_version_to_int (microblaze_select_cpu);
>>  switch (code)
>>  {
>> @@ -1345,8 +1404,8 @@ microblaze_rtx_costs (rtx x, machine_mode mode, int 
>> outer_code ATTRIBUTE_UNUSED,
>>{
>>  if (TARGET_BARREL_SHIFT)
>>{
>> -if (MICROBLAZE_VERSION_COMPARE (microblaze_select_cpu, "v5.00.a")
>> ->= 0)
>> +ver = ver_int - microblaze_version_to_int("v5.00.a");
>> +if (ver >= 0)
>>*total = COSTS_N_INSNS (1);
>>  else
>>*total = COSTS_N_INSNS (2);
>> @@ -1407,8 +1466,8 @@ microblaze_rtx_costs (rtx x, machine_mode mode, int 
>> outer_code ATTRIBUTE_UNUSED,
>>}
>>  else if (!TARGET_SOFT_MUL)
>>{
>> -if (MICROBLAZE_VERSION_COMPARE (microblaze_select_cpu, "v5.00.a")
>> ->= 0)
>> +ver = ver_int - microblaze_version_to_int("v5.00.a");
>> +if (ver >= 0)
>>*total = COSTS_N_INSNS (1);
>>  else
>>*total = COSTS_N_INSNS (3

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Qing Zhao



> On Oct 23, 2023, at 3:57 AM, Richard Biener  
> wrote:
> 
> On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar  wrote:
>>> 
>>> On 2023-10-20 14:38, Qing Zhao wrote:
 How about the following:
  Add one more parameter to __builtin_dynamic_object_size(), i.e
 __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
 When we see the structure field has counted_by attribute.
>>> 
>>> Or maybe add a barrier preventing any assignments to array_annotated->foo 
>>> from being reordered below the __bdos call? Basically an __asm__ with 
>>> array_annotated->foo in the clobber list ought to do it I think.
>> 
>> Maybe just adding the array_annotated->foo to the use list of the call to 
>> __builtin_dynamic_object_size should be enough?
>> 
>> But I am not sure how to implement this in the TREE level, is there a 
>> USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the 
>> counted_by field “array_annotated->foo” to the USE_LIST of the call to 
>> __bdos?
>> 
>> This might be the simplest solution?
> 
> If the dynamic object size is derived of a field then I think you need to
> put the "load" of that memory location at the point (as argument)
> of the __bos call right at parsing time.  I know that's awkward because
> you try to play tricks "discovering" that field only late, but that's not
> going to work.

Is it better to do this at gimplification phase instead of FE? 

VLA decls are handled in gimplification phase, the size calculation and call to 
alloca are all generated during this phase. (gimplify_vla_decl).

For __bdos calls, we can add an additional argument if the object’s first 
argument’s type include the counted_by attribute, i.e

***During gimplification, 
For a call to __builtin_dynamic_object_size (ptr, type)
Check whether the type of ptr includes counted_by attribute, if so, change the 
call to
__builtin_dynamic_object_size (ptr, type, counted_by field)

Then the correct data dependence should be represented well in the IR.

**During object size phase,

The call to __builtin_dynamic_object_size will become an expression includes 
the counted_by field or -1/0 when we cannot decide the size, the correct data 
dependence will be kept even the call to __builtin_dynamic_object_size is gone. 


> 
> A related issue is that assignment to the field and storage allocation
> are not tied together

Yes, this is different from VLA, in which, the size assignment and the storage 
allocation are generated and tied together by the compiler.

For the flexible array member, the storage allocation and the size assignment 
are all done by the user. So, We need to clarify such requirement  in the 
document to guide user to write correct code.  And also, we might need to 
provide tools (warnings and sanitizer option) to help users to catch such 
coding error.

> - if there's no use of the size data we might
> remove the store of it as dead.

Yes, when __bdos cannot decide the size, we need to remove the dead store to 
the field.
I guess that the compiler should be able to do this automatically?

thanks.

Qing
> 
> Of course I guess __bos then behaves like sizeof ().
> 
> Richard.
> 
>> 
>> Qing
>> 
>>> 
>>> It may not work for something like this though:
>>> 
>>> static size_t
>>> get_size_of (void *ptr)
>>> {
>>> return __bdos (ptr, 1);
>>> }
>>> 
>>> void
>>> foo (size_t sz)
>>> {
>>> array_annotated = __builtin_malloc (sz);
>>> array_annotated = sz;
>>> 
>>> ...
>>> __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
>>> ...
>>> }
>>> 
>>> because the call to get_size_of () may not have been inlined that early.
>>> 
>>> The more fool-proof alternative may be to put a compile time barrier right 
>>> below the assignment to array_annotated->foo; I reckon you could do that 
>>> early in the front end by marking the size identifier and then tracking 
>>> assignments to that identifier.  That may have a slight runtime performance 
>>> overhead since it may prevent even legitimate reordering.  I can't think of 
>>> another alternative at the moment...
>>> 
>>> Sid

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Qing Zhao

> On Oct 23, 2023, at 8:34 AM, Richard Biener  
> wrote:
> 
> On Mon, Oct 23, 2023 at 1:27 PM Siddhesh Poyarekar  
> wrote:
>> 
>> On 2023-10-23 03:57, Richard Biener wrote:
>>> On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao  wrote:

> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar  
> wrote:
> 
> On 2023-10-20 14:38, Qing Zhao wrote:
>> How about the following:
>>   Add one more parameter to __builtin_dynamic_object_size(), i.e
>> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
>> When we see the structure field has counted_by attribute.
> 
> Or maybe add a barrier preventing any assignments to array_annotated->foo 
> from being reordered below the __bdos call? Basically an __asm__ with 
> array_annotated->foo in the clobber list ought to do it I think.

 Maybe just adding the array_annotated->foo to the use list of the call to 
 __builtin_dynamic_object_size should be enough?

 But I am not sure how to implement this in the TREE level, is there a 
 USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the 
 counted_by field “array_annotated->foo” to the USE_LIST of the call to 
 __bdos?

 This might be the simplest solution?
>>> 
>>> If the dynamic object size is derived of a field then I think you need to
>>> put the "load" of that memory location at the point (as argument)
>>> of the __bos call right at parsing time.  I know that's awkward because
>>> you try to play tricks "discovering" that field only late, but that's not
>>> going to work.
>>> 
>>> A related issue is that assignment to the field and storage allocation
>>> are not tied together - if there's no use of the size data we might
>>> remove the store of it as dead.
>> 
>> Maybe the trick then is to treat the size data as volatile?  That ought
>> to discourage reordering and also prevent elimination of the "dead" store?
> 
> But we are an optimizing compiler, not a static analysis machine, so I
> fail to see how this is a useful suggestion.
> 
> I think Martins suggestion to approach this as a language extension
> is more useful and would make it easier to handle this?

I agree that making this as a language extension is a better and cleaner 
approach.

As we discussed before, the major issues with the language extension approach 
are:
1. Harder to be adopted by the existing source code due to the potential 
ABI/API change.
2. Much more effort and much longer time to be accepted.

In addition to the above issues, I guess the same issue exists even with a 
language extension, 
Since for FMA, it’s the user (not the compiler) to allocate the storage for the 
FMA. (Should we 
Also move this into compiler for the language extension? Then the existing 
source code need to
Be changed a lot to adopt the new language extension).

As a result, the size  and the storage allocation cannot be guaranteed to be 
tied together too.

Qing

> 
> Richard.
> 
>> Thanks,
>> Sid

Re: [PATCH v5 2/5] libcpp: add a function to determine UTF-8 validity of a C string

2023-10-23 Thread David Malcolm

On Wed, Jan 25, 2023 at 4:09 PM Ben Boeckel via Gcc  wrote:
>
> This simplifies the interface for other UTF-8 validity detections when a
> simple "yes" or "no" answer is sufficient.
>
> libcpp/
>
> * charset.cc: Add `_cpp_valid_utf8_str` which determines whether
> a C string is valid UTF-8 or not.
> * internal.h: Add prototype for `_cpp_valid_utf8_str`.
>
> Signed-off-by: Ben Boeckel 

[going through patches in patchwork]

What's the status of this patch; did this ever get committed?

I see that Jason preapproved this via his review of "[PATCH v3 2/3]
libcpp: add a function to determine UTF-8 validity of a C string"

Thanks
Dave

Re: [PATCH][WIP] dwarf2out: extend to output debug section directly to object file during debug_early phase

2023-10-23 Thread Jan Hubicka

> > > +  output_data_to_object_file (1, 0);
> > > +  output_data_to_object_file (1, 0);
> >
> > So this basically renames dw2_asm_output_data to
> > output_data_to_object_file and similarly for other output functions.
> >
> > What would be main problems of making dw2_asm_* functions to do the
> > right thing when outputting to object file?
> > Either by conditionals or turning them to virtual functions/hooks as
> > Richi suggested?
> >
> I think it's doable via conditionals. Can you explain the second approach
> in more detail?

Basically you want to have output functions
like dw2_asm_output_data to do the right thing and either store
it to the LTO simple object section or the assembly file.
So either we can add conditionals to every dw2_asm_* function needed
of the form
  if (outputting_to_lto)
 ... new code ...
  else
 ... existing code ...

Or have a virtual table with two different dw2_asm implementations.
Older GCC code uses hooks which is essencially a structure holding
function pointers, mostly because it was implemented before we converted
source base to C++. Some newer code uses virtual functions for this.
> > > +struct lto_simple_object
> > lto_simple_object is declared in lto frontend.  Why do you need to
> > duplicate it here?
> >
> > It looks like adding relocations should be abstracted by lto API,
> > so you don't need to look inside this structure that is
> > lto/lto-object.cc only.
> >
> I should have taken this approach, but instead, I exposed simple objects to
> dwarf2out.
> That's the reason to duplicate the above struct. I will take care of this
> while refactoring
> and abstracting it by lto API

Yep, this should not be hard to do.

Thanks for all the work!
Honza
> 
> 
> >
> > > +/* Output one line number table into the .debug_line section.  */
> > > +
> > > +static void
> > > +output_one_line_info_table (dw_line_info_table *table)
> > It is hard to tell from the diff.  Did you just moved these functions
> > earlier in source file?
> >
> Yeah. I will refactor the dwarf2out soon to clear these confusions.
> 
> -- 
> Rishi
> 
> 
> >
> > Honza
> >

Re: [PATCH v5 2/5] libcpp: add a function to determine UTF-8 validity of a C string

2023-10-23 Thread Jason Merrill


On 10/23/23 11:16, David Malcolm wrote:

On Wed, Jan 25, 2023 at 4:09 PM Ben Boeckel via Gcc  wrote:


This simplifies the interface for other UTF-8 validity detections when a
simple "yes" or "no" answer is sufficient.

libcpp/

 * charset.cc: Add `_cpp_valid_utf8_str` which determines whether
 a C string is valid UTF-8 or not.
 * internal.h: Add prototype for `_cpp_valid_utf8_str`.

Signed-off-by: Ben Boeckel 


[going through patches in patchwork]

What's the status of this patch; did this ever get committed?


It was superseded.

Jason

Re: [PATCH v5 2/5] libcpp: add a function to determine UTF-8 validity of a C string

2023-10-23 Thread David Malcolm

On Mon, 2023-10-23 at 11:24 -0400, Jason Merrill wrote:
> On 10/23/23 11:16, David Malcolm wrote:
> > On Wed, Jan 25, 2023 at 4:09 PM Ben Boeckel via Gcc
> >  wrote:
> > > 
> > > This simplifies the interface for other UTF-8 validity detections
> > > when a
> > > simple "yes" or "no" answer is sufficient.
> > > 
> > > libcpp/
> > > 
> > >  * charset.cc: Add `_cpp_valid_utf8_str` which determines
> > > whether
> > >  a C string is valid UTF-8 or not.
> > >  * internal.h: Add prototype for `_cpp_valid_utf8_str`.
> > > 
> > > Signed-off-by: Ben Boeckel 
> > 
> > [going through patches in patchwork]
> > 
> > What's the status of this patch; did this ever get committed?
> 
> It was superseded.

Thanks; closed out in patchwork.

Dave

Re: [PATCH] rust: build failure after NON_DEPENDENT_EXPR removal [PR111899]

2023-10-23 Thread Jason Merrill


On 10/23/23 05:10, Thomas Schwinge wrote:

Hi Patrick!

On 2023-10-20T13:36:30-0400, Patrick Palka  wrote:

Built on x86_64-pc-linux-gnu, pushed to trunk as obvious (hopefully).

-- >8 --

This patch removes stray NON_DEPENDENT_EXPR checks following the removal
of this tree code from the C++ FE.  (Since this restores the build I
supppose it means the Rust FE never creates NON_DEPENDENT_EXPR trees in
the first place, so no further analysis is needed.)


ACK, thanks!


For context: indeed, a non-trivial amount of C++ front end 'constexpr'
code was copied into the Rust front end, for implementing related Rust
functionality, mostly as part of the 2022 GSoC project
"Support for Constant Folding in Rust Frontend" (Faisal Abbas),
.

Yes, this should eventually be cleaned up (and merged with the original
C++ front end code, as much as feasible -- which I don't know whether or
to which extent it is).


It would be nice to move a lot of the constexpr code into the 
middle-end, but I expect that would be a significant project.


Jason

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Richard Biener




> Am 23.10.2023 um 16:56 schrieb Qing Zhao :
> 
> 
> 
>> On Oct 23, 2023, at 3:57 AM, Richard Biener  
>> wrote:
>> 
>>> On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao  wrote:
>>> 
>>> 
>>> 
 On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar  
 wrote:
 
 On 2023-10-20 14:38, Qing Zhao wrote:
> How about the following:
> Add one more parameter to __builtin_dynamic_object_size(), i.e
> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
> When we see the structure field has counted_by attribute.
 
 Or maybe add a barrier preventing any assignments to array_annotated->foo 
 from being reordered below the __bdos call? Basically an __asm__ with 
 array_annotated->foo in the clobber list ought to do it I think.
>>> 
>>> Maybe just adding the array_annotated->foo to the use list of the call to 
>>> __builtin_dynamic_object_size should be enough?
>>> 
>>> But I am not sure how to implement this in the TREE level, is there a 
>>> USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the 
>>> counted_by field “array_annotated->foo” to the USE_LIST of the call to 
>>> __bdos?
>>> 
>>> This might be the simplest solution?
>> 
>> If the dynamic object size is derived of a field then I think you need to
>> put the "load" of that memory location at the point (as argument)
>> of the __bos call right at parsing time.  I know that's awkward because
>> you try to play tricks "discovering" that field only late, but that's not
>> going to work.
> 
> Is it better to do this at gimplification phase instead of FE? 
> 
> VLA decls are handled in gimplification phase, the size calculation and call 
> to alloca are all generated during this phase. (gimplify_vla_decl).
> 
> For __bdos calls, we can add an additional argument if the object’s first 
> argument’s type include the counted_by attribute, i.e
> 
> ***During gimplification, 
> For a call to __builtin_dynamic_object_size (ptr, type)
> Check whether the type of ptr includes counted_by attribute, if so, change 
> the call to
> __builtin_dynamic_object_size (ptr, type, counted_by field)
> 
> Then the correct data dependence should be represented well in the IR.
> 
> **During object size phase,
> 
> The call to __builtin_dynamic_object_size will become an expression includes 
> the counted_by field or -1/0 when we cannot decide the size, the correct data 
> dependence will be kept even the call to __builtin_dynamic_object_size is 
> gone. 

But the whole point of the BOS pass is to derive information that is not 
available at parsing time, and that’s the cases you are after.  The case where 
the connection to the field with the length is apparent during parsing is easy 
- you simply insert a load of the value before the BOS call.  For the late case 
there’s no way to invent data flow dependence without inadvertently pessimizing 
optimization.

Richard 

> 
>> 
>> A related issue is that assignment to the field and storage allocation
>> are not tied together
> 
> Yes, this is different from VLA, in which, the size assignment and the 
> storage allocation are generated and tied together by the compiler.
> 
> For the flexible array member, the storage allocation and the size assignment 
> are all done by the user. So, We need to clarify such requirement  in the 
> document to guide user to write correct code.  And also, we might need to 
> provide tools (warnings and sanitizer option) to help users to catch such 
> coding error.
> 
>> - if there's no use of the size data we might
>> remove the store of it as dead.
> 
> Yes, when __bdos cannot decide the size, we need to remove the dead store to 
> the field.
> I guess that the compiler should be able to do this automatically?
> 
> thanks.
> 
> Qing
>> 
>> Of course I guess __bos then behaves like sizeof ().
>> 
>> Richard.
>> 
>>> 
>>> Qing
>>> 
 
 It may not work for something like this though:
 
 static size_t
 get_size_of (void *ptr)
 {
 return __bdos (ptr, 1);
 }
 
 void
 foo (size_t sz)
 {
 array_annotated = __builtin_malloc (sz);
 array_annotated = sz;
 
 ...
 __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
 ...
 }
 
 because the call to get_size_of () may not have been inlined that early.
 
 The more fool-proof alternative may be to put a compile time barrier right 
 below the assignment to array_annotated->foo; I reckon you could do that 
 early in the front end by marking the size identifier and then tracking 
 assignments to that identifier.  That may have a slight runtime 
 performance overhead since it may prevent even legitimate reordering.  I 
 can't think of another alternative at the moment...
 
 Sid
>

[PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-10-23 Thread Robin Dapp

The attached patch introduces a VCOND_MASK_LEN, helps for the riscv cases
that were broken before and looks unchanged on x86, aarch64 and power
bootstrap and testsuites.

I only went with the minimal number of new match.pd patterns and did not
try stripping the length of a COND_LEN_OP in order to simplify the
associated COND_OP.

An important part that I'm not sure how to handle properly is -
when we have a constant immediate length of e.g. 16 and the hardware
also operates on 16 units, vector length masking is actually
redundant and the vcond_mask_len can be reduced to a vec_cond.
For those (if_then_else unsplit) we have a large number of combine
patterns that fuse instruction which do not correspond to ifns
(like widening operations but also more complex ones).

Currently I achieve this in a most likely wrong way:

  auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
  bool full_len = len && known_eq (sz.coeffs[0], ilen);
  if (!len || full_len)
 "vec_cond"
  else
 "vcond_mask_len"

Another thing not done in this patch:  For vcond_mask we only expect
register operands as mask and force to a register.  For a vcond_mask_len
that results from a simplification with all-one or all-zero mask we
could allow constant immediate vectors and expand them to simple
len moves in the backend.

Regards
 Robin

>From bc72e9b2f3ee46508404ee7723ca78790fa96b6b Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Fri, 13 Oct 2023 10:20:35 +0200
Subject: [PATCH] internal-fn: Add VCOND_MASK_LEN.

In order to prevent simplification of a COND_OP with degenerate mask
(all true or all zero) into just an OP in the presence of length
masking this patch introduces a length-masked analog to VEC_COND_EXPR:
IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
length that is not the full hardware vector length a simplification now
does not result int a VEC_COND but rather a VCOND_MASK_LEN.

For cases where the masks is known to be all true or all zero the patch
introduces new match patterns that allow combination of unconditional
unary, binary and ternay operations with the respective conditional
operations if the target supports it.

Similarly, if the length is known to be equal to the target hardware
length VCOND_MASK_LEN will be simplified to VEC_COND_EXPR.

gcc/ChangeLog:

* config/riscv/autovec.md (vcond_mask_len_): Add
expander.
* config/riscv/riscv-protos.h (enum insn_type):
* doc/md.texi: Add vcond_mask_len.
* gimple-match-exports.cc (maybe_resimplify_conditional_op):
Create VCOND_MASK_LEN when
length masking.
* gimple-match.h (gimple_match_op::gimple_match_op): Allow
matching of 6 and 7 parameters.
(gimple_match_op::set_op): Ditto.
(gimple_match_op::gimple_match_op): Always initialize len and
bias.
* internal-fn.cc (vec_cond_mask_len_direct): Add.
(expand_vec_cond_mask_len_optab_fn): Add.
(direct_vec_cond_mask_len_optab_supported_p): Add.
(internal_fn_len_index): Add VCOND_MASK_LEN.
(internal_fn_mask_index): Ditto.
* internal-fn.def (VCOND_MASK_LEN): New internal function.
* match.pd: Combine unconditional unary, binary and ternary
operations into the respective COND_LEN operations.
* optabs.def (OPTAB_CD): Add vcond_mask_len optab.
---
 gcc/config/riscv/autovec.md | 20 +
 gcc/config/riscv/riscv-protos.h |  4 ++
 gcc/doc/md.texi |  9 
 gcc/gimple-match-exports.cc | 20 +++--
 gcc/gimple-match.h  | 78 -
 gcc/internal-fn.cc  | 41 +
 gcc/internal-fn.def |  2 +
 gcc/match.pd| 74 +++
 gcc/optabs.def  |  1 +
 9 files changed, 244 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 80910ba3cc2..27a71bc1ef9 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -565,6 +565,26 @@ (define_insn_and_split "vcond_mask_"
   [(set_attr "type" "vector")]
 )
 
+(define_expand "vcond_mask_len_"
+  [(match_operand:V_VLS 0 "register_operand")
+(match_operand: 3 "register_operand")
+(match_operand:V_VLS 1 "nonmemory_operand")
+(match_operand:V_VLS 2 "register_operand")
+(match_operand 4 "autovec_length_operand")
+(match_operand 5 "const_0_operand")]
+  "TARGET_VECTOR"
+  {
+/* The order of vcond_mask is opposite to pred_merge.  */
+rtx ops[] = {operands[0], operands[0], operands[2], operands[1],
+operands[3]};
+riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (mode),
+ riscv_vector::MERGE_OP_REAL_ELSE, ops,
+ operands[4]);
+DONE;
+  }
+  [(set_attr "type" "vector")]
+)
+
 ;; -

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Qing Zhao



> On Oct 23, 2023, at 11:57 AM, Richard Biener  
> wrote:
> 
> 
> 
>> Am 23.10.2023 um 16:56 schrieb Qing Zhao :
>> 
>> 
>> 
>>> On Oct 23, 2023, at 3:57 AM, Richard Biener  
>>> wrote:
>>> 
 On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao  wrote:
 
 
 
> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar  
> wrote:
> 
> On 2023-10-20 14:38, Qing Zhao wrote:
>> How about the following:
>> Add one more parameter to __builtin_dynamic_object_size(), i.e
>> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
>> When we see the structure field has counted_by attribute.
> 
> Or maybe add a barrier preventing any assignments to array_annotated->foo 
> from being reordered below the __bdos call? Basically an __asm__ with 
> array_annotated->foo in the clobber list ought to do it I think.
 
 Maybe just adding the array_annotated->foo to the use list of the call to 
 __builtin_dynamic_object_size should be enough?
 
 But I am not sure how to implement this in the TREE level, is there a 
 USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the 
 counted_by field “array_annotated->foo” to the USE_LIST of the call to 
 __bdos?
 
 This might be the simplest solution?
>>> 
>>> If the dynamic object size is derived of a field then I think you need to
>>> put the "load" of that memory location at the point (as argument)
>>> of the __bos call right at parsing time.  I know that's awkward because
>>> you try to play tricks "discovering" that field only late, but that's not
>>> going to work.
>> 
>> Is it better to do this at gimplification phase instead of FE? 
>> 
>> VLA decls are handled in gimplification phase, the size calculation and call 
>> to alloca are all generated during this phase. (gimplify_vla_decl).
>> 
>> For __bdos calls, we can add an additional argument if the object’s first 
>> argument’s type include the counted_by attribute, i.e
>> 
>> ***During gimplification, 
>> For a call to __builtin_dynamic_object_size (ptr, type)
>> Check whether the type of ptr includes counted_by attribute, if so, change 
>> the call to
>> __builtin_dynamic_object_size (ptr, type, counted_by field)
>> 
>> Then the correct data dependence should be represented well in the IR.
>> 
>> **During object size phase,
>> 
>> The call to __builtin_dynamic_object_size will become an expression includes 
>> the counted_by field or -1/0 when we cannot decide the size, the correct 
>> data dependence will be kept even the call to __builtin_dynamic_object_size 
>> is gone. 
> 
> But the whole point of the BOS pass is to derive information that is not 
> available at parsing time, and that’s the cases you are after.  The case 
> where the connection to the field with the length is apparent during parsing 
> is easy - you simply insert a load of the value before the BOS call.

Yes, this is true. 
I prefer to implement this in gimplification phase since I am more familiar 
with the code there.. (I think that implementing it in gimplification should be 
very similar as implementing it in FE? Or do I miss anything here?)

Joseph, if implement this in FE, where in the FE I should look at? 

Thanks a lot for the help.

Qing

>  For the late case there’s no way to invent data flow dependence without 
> inadvertently pessimizing optimization.
> 
> Richard 
> 
>> 
>>> 
>>> A related issue is that assignment to the field and storage allocation
>>> are not tied together
>> 
>> Yes, this is different from VLA, in which, the size assignment and the 
>> storage allocation are generated and tied together by the compiler.
>> 
>> For the flexible array member, the storage allocation and the size 
>> assignment are all done by the user. So, We need to clarify such requirement 
>>  in the document to guide user to write correct code.  And also, we might 
>> need to provide tools (warnings and sanitizer option) to help users to catch 
>> such coding error.
>> 
>>> - if there's no use of the size data we might
>>> remove the store of it as dead.
>> 
>> Yes, when __bdos cannot decide the size, we need to remove the dead store to 
>> the field.
>> I guess that the compiler should be able to do this automatically?
>> 
>> thanks.
>> 
>> Qing
>>> 
>>> Of course I guess __bos then behaves like sizeof ().
>>> 
>>> Richard.
>>> 
 
 Qing
 
> 
> It may not work for something like this though:
> 
> static size_t
> get_size_of (void *ptr)
> {
> return __bdos (ptr, 1);
> }
> 
> void
> foo (size_t sz)
> {
> array_annotated = __builtin_malloc (sz);
> array_annotated = sz;
> 
> ...
> __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
> ...
> }
> 
> because the call to get_size_of () may not have been inlined that early.
> 
> The more fool-proof alternative may be to put a compile time barrier 
> right below the assignment to array_annotated->fo

Re: [PATCH v1 1/1] gcc: config: microblaze: fix cpu version check

2023-10-23 Thread Michael Eager


On 10/22/23 22:48, Neal Frager wrote:

There is a microblaze cpu version 10.0 included in versal. If the
minor version is only a single digit, then the version comparison
will fail as version 10.0 will appear as 100 compared to version
6.00 or 8.30 which will calculate to values 600 and 830.

The issue can be seen when using the '-mcpu=10.0' option.

With this fix, versions with a single digit minor number such as
10.0 will be calculated as greater than versions with a smaller
major version number, but with two minor version digits.

By applying this fix, several incorrect warning messages will no
longer be printed when building the versal plm application, such
as the warning message below:

warning: '-mxl-multiply-high' can be used only with '-mcpu=v6.00.a' or greater

Signed-off-by: Neal Frager 
---
  gcc/config/microblaze/microblaze.cc | 164 +---
  1 file changed, 76 insertions(+), 88 deletions(-)


Please add a test case.

--
Michael Eager

Re: [PATCH] libgcc: make heap-based trampolines conditional on libc presence

2023-10-23 Thread Sergei Trofimovich

On Mon, 23 Oct 2023 13:54:01 +0100
Iain Sandoe  wrote:

> hi Sergei,
> 
> > On 23 Oct 2023, at 13:43, Sergei Trofimovich  wrote:
> > 
> > From: Sergei Trofimovich 
> > 
> > To build `libc` for a target one needs to build `gcc` without `libc`
> > support first. Commit r14-4823-g8abddb187b3348 "libgcc: support
> > heap-based trampolines" added unconditional `libc` dependency and broke
> > libc-less `gcc` builds.
> > 
> > An example failure on `x86_64-unknown-linux-gnu`:
> > 
> >$ mkdir -p /tmp/empty
> >$ ../gcc/configure \
> >--disable-multilib \
> >--without-headers \
> >--with-newlib \
> >--enable-languages=c \
> >--disable-bootstrap \
> >--disable-gcov \
> >--disable-threads \
> >--disable-shared \
> >--disable-libssp \
> >--disable-libquadmath \
> >--disable-libgomp \
> >--disable-libatomic \
> >--with-build-sysroot=/tmp/empty
> >$ make
> >...
> >/tmp/gb/./gcc/xgcc -B/tmp/gb/./gcc/ 
> > -B/usr/local/x86_64-pc-linux-gnu/bin/ -B/usr/local/x86_64-pc-linux-gnu/lib/ 
> > -isystem /usr/local/x86_64-pc-linux-gnu/include -isystem 
> > /usr/local/x86_64-pc-linux-gnu/sys-include --sysroot=/tmp/empty   -g -O2 
> > -O2  -g -O2 -DIN_GCC   -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual 
> > -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  -isystem 
> > ./include  -fpic -mlong-double-80 -DUSE_ELF_SYMVER -fcf-protection -mshstk 
> > -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -Dinhibit_libc -fpic 
> > -mlong-double-80 -DUSE_ELF_SYMVER -fcf-protection -mshstk -I. -I. 
> > -I../.././gcc -I/home/slyfox/dev/git/gcc/libgcc 
> > -I/home/slyfox/dev/git/gcc/libgcc/. 
> > -I/home/slyfox/dev/git/gcc/libgcc/../gcc 
> > -I/home/slyfox/dev/git/gcc/libgcc/../include  -DHAVE_CC_TLS  -DUSE_TLS  -o 
> > heap-trampoline.o -MT heap-trampoline.o -MD -MP -MF heap-trampoline.dep  -c 
> > .../gcc/libgcc/config/i386/heap-trampoline.c -fvisibility=hidden 
> > -DHIDE_EXPORTS
> >../gcc/libgcc/config/i386/heap-trampoline.c:3:10: fatal error: unistd.h: 
> > No such file or directory
> >3 | #include 
> >  |  ^~
> >compilation terminated.
> >make[2]: *** [.../gcc/libgcc/static-object.mk:17: heap-trampoline.o] 
> > Error 1
> >make[2]: Leaving directory '/tmp/gb/x86_64-pc-linux-gnu/libgcc'
> >make[1]: *** [Makefile:13307: all-target-libgcc] Error 2
> > 
> > The change inhibits any heap-based trampoline code.  
> 
> That looks reasonable to me (I was considering using __has_include(), but the 
> inhibit_libc is neater).
> 
> The fact that this first compiler is buit without heap-trampoline support, 
> would become relevant, I guess if libc wanted to use them, it would need 
> another iteration.
> 
> so, it looks fine, but I cannot actually approve it.

Sounds good. Let's wait for others to chime in. Maybe Richard? :)

AFAIU libcs (like `glibc`) try hard not to use link tests and uses
mainly preprocessor and code generator to specifically accommodate this
case. Maybe there is a way to pass the support flag to libc without the
reliance on code presence in libgcc.

Otherwise we could use __builtin_trap() as an implementation for exposed
symbols.

> 
> > 
> > libgcc/
> > 
> > * libgcc/config/aarch64/heap-trampoline.c: Disable when libc is
> >   not present.
> > ---
> > libgcc/config/aarch64/heap-trampoline.c | 5 +
> > libgcc/config/i386/heap-trampoline.c| 5 +
> > 2 files changed, 10 insertions(+)
> > 
> > diff --git a/libgcc/config/aarch64/heap-trampoline.c 
> > b/libgcc/config/aarch64/heap-trampoline.c
> > index c8b83681ed7..f22233987ca 100644
> > --- a/libgcc/config/aarch64/heap-trampoline.c
> > +++ b/libgcc/config/aarch64/heap-trampoline.c
> > @@ -1,5 +1,8 @@
> > /* Copyright The GNU Toolchain Authors. */
> > 
> > +/* libc is required to allocate trampolines.  */
> > +#ifndef inhibit_libc
> > +
> > #include 
> > #include 
> > #include 
> > @@ -170,3 +173,5 @@ __builtin_nested_func_ptr_deleted (void)
> >   tramp_ctrl_curr = prev;
> > }
> > }
> > +
> > +#endif /* !inhibit_libc */
> > diff --git a/libgcc/config/i386/heap-trampoline.c 
> > b/libgcc/config/i386/heap-trampoline.c
> > index 96e13bf828e..4b9f4365868 100644
> > --- a/libgcc/config/i386/heap-trampoline.c
> > +++ b/libgcc/config/i386/heap-trampoline.c
> > @@ -1,5 +1,8 @@
> > /* Copyright The GNU Toolchain Authors. */
> > 
> > +/* libc is required to allocate trampolines.  */
> > +#ifndef inhibit_libc
> > +
> > #include 
> > #include 
> > #include 
> > @@ -170,3 +173,5 @@ __builtin_nested_func_ptr_deleted (void)
> >   tramp_ctrl_curr = prev;
> > }
> > }
> > +
> > +#endif /* !inhibit_libc */
> > -- 
> > 2.42.0
> >   
> 


-- 

  Sergei

Re: [PATCH v23 31/33] libstdc++: Optimize std::is_pointer compilation performance

2023-10-23 Thread Patrick Palka

On Sun, 22 Oct 2023, Ken Matsui wrote:

> Hi Patrick,
> 
> There is an issue with the code in
> libstdc++-v3/include/bits/cpp_type_traits.h. Specifically, Clang 16
> does not accept the code, while Clang 17 does. Given that we aim to
> support the last two versions of Clang, we need to ensure that Clang
> 16 accepts this code. Can you please advise on the best course of
> action regarding this matter?

The following workaround seems to make Clang happy:

#include 

template
struct __is_pointer : std::bool_constant { };

> 
> https://godbolt.org/z/PbxhYcb7q
> 
> Sincerely,
> Ken Matsui
> 
> On Fri, Oct 20, 2023 at 7:12 AM Ken Matsui  wrote:
> >
> > This patch optimizes the compilation performance of std::is_pointer
> > by dispatching to the new __is_pointer built-in trait.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/bits/cpp_type_traits.h (__is_pointer): Use __is_pointer
> > built-in trait.
> > * include/std/type_traits (is_pointer): Likewise. Optimize its
> > implementation.
> > (is_pointer_v): Likewise.
> >
> > Co-authored-by: Jonathan Wakely 
> > Signed-off-by: Ken Matsui 
> > ---
> >  libstdc++-v3/include/bits/cpp_type_traits.h |  8 
> >  libstdc++-v3/include/std/type_traits| 44 +
> >  2 files changed, 44 insertions(+), 8 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
> > b/libstdc++-v3/include/bits/cpp_type_traits.h
> > index 4312f32a4e0..246f2cc0b17 100644
> > --- a/libstdc++-v3/include/bits/cpp_type_traits.h
> > +++ b/libstdc++-v3/include/bits/cpp_type_traits.h
> > @@ -363,6 +363,13 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
> >//
> >// Pointer types
> >//
> > +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
> > +  template
> > +struct __is_pointer : __truth_type<__is_pointer(_Tp)>
> > +{
> > +  enum { __value = __is_pointer(_Tp) };
> > +};
> > +#else
> >template
> >  struct __is_pointer
> >  {
> > @@ -376,6 +383,7 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
> >enum { __value = 1 };
> >typedef __true_type __type;
> >  };
> > +#endif
> >
> >//
> >// An arithmetic type is an integer type or a floating point type
> > diff --git a/libstdc++-v3/include/std/type_traits 
> > b/libstdc++-v3/include/std/type_traits
> > index 0641ecfdf2b..75a94cb8d7e 100644
> > --- a/libstdc++-v3/include/std/type_traits
> > +++ b/libstdc++-v3/include/std/type_traits
> > @@ -542,19 +542,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  : public true_type { };
> >  #endif
> >
> > -  template
> > -struct __is_pointer_helper
> > +  /// is_pointer
> > +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
> > +  template
> > +struct is_pointer
> > +: public __bool_constant<__is_pointer(_Tp)>
> > +{ };
> > +#else
> > +  template
> > +struct is_pointer
> >  : public false_type { };
> >
> >template
> > -struct __is_pointer_helper<_Tp*>
> > +struct is_pointer<_Tp*>
> >  : public true_type { };
> >
> > -  /// is_pointer
> >template
> > -struct is_pointer
> > -: public __is_pointer_helper<__remove_cv_t<_Tp>>::type
> > -{ };
> > +struct is_pointer<_Tp* const>
> > +: public true_type { };
> > +
> > +  template
> > +struct is_pointer<_Tp* volatile>
> > +: public true_type { };
> > +
> > +  template
> > +struct is_pointer<_Tp* const volatile>
> > +: public true_type { };
> > +#endif
> >
> >/// is_lvalue_reference
> >template
> > @@ -3252,8 +3266,22 @@ template 
> >inline constexpr bool is_array_v<_Tp[_Num]> = true;
> >  #endif
> >
> > +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
> > +template 
> > +  inline constexpr bool is_pointer_v = __is_pointer(_Tp);
> > +#else
> >  template 
> > -  inline constexpr bool is_pointer_v = is_pointer<_Tp>::value;
> > +  inline constexpr bool is_pointer_v = false;
> > +template 
> > +  inline constexpr bool is_pointer_v<_Tp*> = true;
> > +template 
> > +  inline constexpr bool is_pointer_v<_Tp* const> = true;
> > +template 
> > +  inline constexpr bool is_pointer_v<_Tp* volatile> = true;
> > +template 
> > +  inline constexpr bool is_pointer_v<_Tp* const volatile> = true;
> > +#endif
> > +
> >  template 
> >inline constexpr bool is_lvalue_reference_v = false;
> >  template 
> > --
> > 2.42.0
> >
> 
>

Re: [PATCH v24 33/33] libstdc++: Optimize std::is_invocable compilation performance

2023-10-23 Thread Patrick Palka

On Fri, Oct 20, 2023 at 12:22 PM Ken Matsui  wrote:
>
> This patch optimizes the compilation performance of std::is_invocable
> by dispatching to the new __is_invocable built-in trait.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits (is_invocable): Use __is_invocable
> built-in trait.

Nice!  We should use the trait directly in is_invocable_v too.

> * testsuite/20_util/is_invocable/incomplete_args_neg.cc: Handle
> the new error from __is_invocable.
> * testsuite/20_util/is_invocable/incomplete_neg.cc: Likewise.
>
> Signed-off-by: Ken Matsui 
> ---
>  libstdc++-v3/include/std/type_traits| 6 ++
>  .../testsuite/20_util/is_invocable/incomplete_args_neg.cc   | 1 +
>  .../testsuite/20_util/is_invocable/incomplete_neg.cc| 1 +
>  3 files changed, 8 insertions(+)
>
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index 75a94cb8d7e..91851b78c7e 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -3167,9 +3167,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  using invoke_result_t = typename invoke_result<_Fn, _Args...>::type;
>
>/// std::is_invocable
> +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_invocable)
> +  template
> +struct is_invocable
> +: public __bool_constant<__is_invocable(_Fn, _ArgTypes...)>
> +#else
>template
>  struct is_invocable
>  : __is_invocable_impl<__invoke_result<_Fn, _ArgTypes...>, void>::type
> +#endif
>  {
>static_assert(std::__is_complete_or_unbounded(__type_identity<_Fn>{}),
> "_Fn must be a complete class or an unbounded array");
> diff --git 
> a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc 
> b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
> index 34d1d9431d1..3f9e5274f3c 100644
> --- a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
> +++ b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
> @@ -18,6 +18,7 @@
>  // .
>
>  // { dg-error "must be a complete class" "" { target *-*-* } 0 }
> +// { dg-prune-output "invalid use of incomplete type" }
>
>  #include 
>
> diff --git a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc 
> b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
> index e1e54d25ee5..92af48c48b6 100644
> --- a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
> +++ b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
> @@ -18,6 +18,7 @@
>  // .
>
>  // { dg-error "must be a complete class" "" { target *-*-* } 0 }
> +// { dg-prune-output "invalid use of incomplete type" }
>
>  #include 
>
> --
> 2.42.0
>

Re: [PATCH v23 31/33] libstdc++: Optimize std::is_pointer compilation performance

2023-10-23 Thread Ken Matsui

On Mon, Oct 23, 2023 at 10:00 AM Patrick Palka  wrote:

> On Sun, 22 Oct 2023, Ken Matsui wrote:
>
> > Hi Patrick,
> >
> > There is an issue with the code in
> > libstdc++-v3/include/bits/cpp_type_traits.h. Specifically, Clang 16
> > does not accept the code, while Clang 17 does. Given that we aim to
> > support the last two versions of Clang, we need to ensure that Clang
> > 16 accepts this code. Can you please advise on the best course of
> > action regarding this matter?
>
> The following workaround seems to make Clang happy:
>
> #include 
>
> template
> struct __is_pointer : std::bool_constant { };
>

Ooh, this makes sense. Thank you!


> >
> > https://godbolt.org/z/PbxhYcb7q
> >
> > Sincerely,
> > Ken Matsui
> >
> > On Fri, Oct 20, 2023 at 7:12 AM Ken Matsui  wrote:
> > >
> > > This patch optimizes the compilation performance of std::is_pointer
> > > by dispatching to the new __is_pointer built-in trait.
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > > * include/bits/cpp_type_traits.h (__is_pointer): Use
> __is_pointer
> > > built-in trait.
> > > * include/std/type_traits (is_pointer): Likewise. Optimize its
> > > implementation.
> > > (is_pointer_v): Likewise.
> > >
> > > Co-authored-by: Jonathan Wakely 
> > > Signed-off-by: Ken Matsui 
> > > ---
> > >  libstdc++-v3/include/bits/cpp_type_traits.h |  8 
> > >  libstdc++-v3/include/std/type_traits| 44 +
> > >  2 files changed, 44 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h
> b/libstdc++-v3/include/bits/cpp_type_traits.h
> > > index 4312f32a4e0..246f2cc0b17 100644
> > > --- a/libstdc++-v3/include/bits/cpp_type_traits.h
> > > +++ b/libstdc++-v3/include/bits/cpp_type_traits.h
> > > @@ -363,6 +363,13 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
> > >//
> > >// Pointer types
> > >//
> > > +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
> > > +  template
> > > +struct __is_pointer : __truth_type<__is_pointer(_Tp)>
> > > +{
> > > +  enum { __value = __is_pointer(_Tp) };
> > > +};
> > > +#else
> > >template
> > >  struct __is_pointer
> > >  {
> > > @@ -376,6 +383,7 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
> > >enum { __value = 1 };
> > >typedef __true_type __type;
> > >  };
> > > +#endif
> > >
> > >//
> > >// An arithmetic type is an integer type or a floating point type
> > > diff --git a/libstdc++-v3/include/std/type_traits
> b/libstdc++-v3/include/std/type_traits
> > > index 0641ecfdf2b..75a94cb8d7e 100644
> > > --- a/libstdc++-v3/include/std/type_traits
> > > +++ b/libstdc++-v3/include/std/type_traits
> > > @@ -542,19 +542,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >  : public true_type { };
> > >  #endif
> > >
> > > -  template
> > > -struct __is_pointer_helper
> > > +  /// is_pointer
> > > +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
> > > +  template
> > > +struct is_pointer
> > > +: public __bool_constant<__is_pointer(_Tp)>
> > > +{ };
> > > +#else
> > > +  template
> > > +struct is_pointer
> > >  : public false_type { };
> > >
> > >template
> > > -struct __is_pointer_helper<_Tp*>
> > > +struct is_pointer<_Tp*>
> > >  : public true_type { };
> > >
> > > -  /// is_pointer
> > >template
> > > -struct is_pointer
> > > -: public __is_pointer_helper<__remove_cv_t<_Tp>>::type
> > > -{ };
> > > +struct is_pointer<_Tp* const>
> > > +: public true_type { };
> > > +
> > > +  template
> > > +struct is_pointer<_Tp* volatile>
> > > +: public true_type { };
> > > +
> > > +  template
> > > +struct is_pointer<_Tp* const volatile>
> > > +: public true_type { };
> > > +#endif
> > >
> > >/// is_lvalue_reference
> > >template
> > > @@ -3252,8 +3266,22 @@ template 
> > >inline constexpr bool is_array_v<_Tp[_Num]> = true;
> > >  #endif
> > >
> > > +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
> > > +template 
> > > +  inline constexpr bool is_pointer_v = __is_pointer(_Tp);
> > > +#else
> > >  template 
> > > -  inline constexpr bool is_pointer_v = is_pointer<_Tp>::value;
> > > +  inline constexpr bool is_pointer_v = false;
> > > +template 
> > > +  inline constexpr bool is_pointer_v<_Tp*> = true;
> > > +template 
> > > +  inline constexpr bool is_pointer_v<_Tp* const> = true;
> > > +template 
> > > +  inline constexpr bool is_pointer_v<_Tp* volatile> = true;
> > > +template 
> > > +  inline constexpr bool is_pointer_v<_Tp* const volatile> = true;
> > > +#endif
> > > +
> > >  template 
> > >inline constexpr bool is_lvalue_reference_v = false;
> > >  template 
> > > --
> > > 2.42.0
> > >
> >
> >

Re: [PATCH v24 33/33] libstdc++: Optimize std::is_invocable compilation performance

2023-10-23 Thread Ken Matsui

On Mon, Oct 23, 2023 at 10:05 AM Patrick Palka  wrote:

> On Fri, Oct 20, 2023 at 12:22 PM Ken Matsui  wrote:
> >
> > This patch optimizes the compilation performance of std::is_invocable
> > by dispatching to the new __is_invocable built-in trait.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/std/type_traits (is_invocable): Use __is_invocable
> > built-in trait.
>
> Nice!  We should use the trait directly in is_invocable_v too.
>

Thank you! But we want to take account of static_assert’s in is_invocable,
so I think we cannot use the built-in directly?


> > * testsuite/20_util/is_invocable/incomplete_args_neg.cc: Handle
> > the new error from __is_invocable.
> > * testsuite/20_util/is_invocable/incomplete_neg.cc: Likewise.
> >
> > Signed-off-by: Ken Matsui 
> > ---
> >  libstdc++-v3/include/std/type_traits| 6 ++
> >  .../testsuite/20_util/is_invocable/incomplete_args_neg.cc   | 1 +
> >  .../testsuite/20_util/is_invocable/incomplete_neg.cc| 1 +
> >  3 files changed, 8 insertions(+)
> >
> > diff --git a/libstdc++-v3/include/std/type_traits
> b/libstdc++-v3/include/std/type_traits
> > index 75a94cb8d7e..91851b78c7e 100644
> > --- a/libstdc++-v3/include/std/type_traits
> > +++ b/libstdc++-v3/include/std/type_traits
> > @@ -3167,9 +3167,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  using invoke_result_t = typename invoke_result<_Fn, _Args...>::type;
> >
> >/// std::is_invocable
> > +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_invocable)
> > +  template
> > +struct is_invocable
> > +: public __bool_constant<__is_invocable(_Fn, _ArgTypes...)>
> > +#else
> >template
> >  struct is_invocable
> >  : __is_invocable_impl<__invoke_result<_Fn, _ArgTypes...>,
> void>::type
> > +#endif
> >  {
> >
> static_assert(std::__is_complete_or_unbounded(__type_identity<_Fn>{}),
> > "_Fn must be a complete class or an unbounded array");
> > diff --git
> a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
> b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
> > index 34d1d9431d1..3f9e5274f3c 100644
> > --- a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
> > +++ b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
> > @@ -18,6 +18,7 @@
> >  // .
> >
> >  // { dg-error "must be a complete class" "" { target *-*-* } 0 }
> > +// { dg-prune-output "invalid use of incomplete type" }
> >
> >  #include 
> >
> > diff --git
> a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
> b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
> > index e1e54d25ee5..92af48c48b6 100644
> > --- a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
> > +++ b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
> > @@ -18,6 +18,7 @@
> >  // .
> >
> >  // { dg-error "must be a complete class" "" { target *-*-* } 0 }
> > +// { dg-prune-output "invalid use of incomplete type" }
> >
> >  #include 
> >
> > --
> > 2.42.0
> >
>
>

Re: [PATCH v24 33/33] libstdc++: Optimize std::is_invocable compilation performance

2023-10-23 Thread Patrick Palka

On Mon, 23 Oct 2023, Ken Matsui wrote:

> On Mon, Oct 23, 2023 at 10:05 AM Patrick Palka  wrote:
>   On Fri, Oct 20, 2023 at 12:22 PM Ken Matsui  wrote:
>   >
>   > This patch optimizes the compilation performance of std::is_invocable
>   > by dispatching to the new __is_invocable built-in trait.
>   >
>   > libstdc++-v3/ChangeLog:
>   >
>   >         * include/std/type_traits (is_invocable): Use __is_invocable
>   >         built-in trait.
> 
>   Nice!  We should use the trait directly in is_invocable_v too.
> 
> 
> Thank you! But we want to take account of static_assert’s in is_invocable, so 
> I think we cannot use the built-in directly?

Good point, I guess that's a great reason to improvement the diagnostic
that check_trait_type emits: it'd speed up the class template version
because we could get rid of the static_asserts (without regressing
diagnostic quality), and it'd speed up the variable template version
because we could use the built-in directly there.

Your patch LGTM as is though, that could be a follow-up if anything.

> 
> 
>   >         * testsuite/20_util/is_invocable/incomplete_args_neg.cc: 
> Handle
>   >         the new error from __is_invocable.
>   >         * testsuite/20_util/is_invocable/incomplete_neg.cc: Likewise.
>   >
>   > Signed-off-by: Ken Matsui 
>   > ---
>   >  libstdc++-v3/include/std/type_traits                        | 6 
> ++
>   >  .../testsuite/20_util/is_invocable/incomplete_args_neg.cc   | 1 +
>   >  .../testsuite/20_util/is_invocable/incomplete_neg.cc        | 1 +
>   >  3 files changed, 8 insertions(+)
>   >
>   > diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
>   > index 75a94cb8d7e..91851b78c7e 100644
>   > --- a/libstdc++-v3/include/std/type_traits
>   > +++ b/libstdc++-v3/include/std/type_traits
>   > @@ -3167,9 +3167,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   >      using invoke_result_t = typename invoke_result<_Fn, 
> _Args...>::type;
>   >
>   >    /// std::is_invocable
>   > +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_invocable)
>   > +  template
>   > +    struct is_invocable
>   > +    : public __bool_constant<__is_invocable(_Fn, _ArgTypes...)>
>   > +#else
>   >    template
>   >      struct is_invocable
>   >      : __is_invocable_impl<__invoke_result<_Fn, _ArgTypes...>, 
> void>::type
>   > +#endif
>   >      {
>   >        
> static_assert(std::__is_complete_or_unbounded(__type_identity<_Fn>{}),
>   >         "_Fn must be a complete class or an unbounded array");
>   > diff --git 
> a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc 
> b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
>   > index 34d1d9431d1..3f9e5274f3c 100644
>   > --- 
> a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
>   > +++ 
> b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
>   > @@ -18,6 +18,7 @@
>   >  // .
>   >
>   >  // { dg-error "must be a complete class" "" { target *-*-* } 0 }
>   > +// { dg-prune-output "invalid use of incomplete type" }
>   >
>   >  #include 
>   >
>   > diff --git 
> a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc 
> b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
>   > index e1e54d25ee5..92af48c48b6 100644
>   > --- a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
>   > +++ b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
>   > @@ -18,6 +18,7 @@
>   >  // .
>   >
>   >  // { dg-error "must be a complete class" "" { target *-*-* } 0 }
>   > +// { dg-prune-output "invalid use of incomplete type" }
>   >
>   >  #include 
>   >
>   > --
>   > 2.42.0
>   >
> 
> 
>

Re: [PATCH v24 33/33] libstdc++: Optimize std::is_invocable compilation performance

2023-10-23 Thread Ken Matsui

On Mon, Oct 23, 2023 at 10:39 AM Patrick Palka  wrote:
>
> On Mon, 23 Oct 2023, Ken Matsui wrote:
>
> > On Mon, Oct 23, 2023 at 10:05 AM Patrick Palka  wrote:
> >   On Fri, Oct 20, 2023 at 12:22 PM Ken Matsui  
> > wrote:
> >   >
> >   > This patch optimizes the compilation performance of 
> > std::is_invocable
> >   > by dispatching to the new __is_invocable built-in trait.
> >   >
> >   > libstdc++-v3/ChangeLog:
> >   >
> >   > * include/std/type_traits (is_invocable): Use __is_invocable
> >   > built-in trait.
> >
> >   Nice!  We should use the trait directly in is_invocable_v too.
> >
> >
> > Thank you! But we want to take account of static_assert’s in is_invocable, 
> > so I think we cannot use the built-in directly?
>
> Good point, I guess that's a great reason to improvement the diagnostic
> that check_trait_type emits: it'd speed up the class template version
> because we could get rid of the static_asserts (without regressing
> diagnostic quality), and it'd speed up the variable template version
> because we could use the built-in directly there.
>
> Your patch LGTM as is though, that could be a follow-up if anything.
>

Thank you! I will also work on this after other built-in traits end!

> >
> >
> >   > * testsuite/20_util/is_invocable/incomplete_args_neg.cc: 
> > Handle
> >   > the new error from __is_invocable.
> >   > * testsuite/20_util/is_invocable/incomplete_neg.cc: 
> > Likewise.
> >   >
> >   > Signed-off-by: Ken Matsui 
> >   > ---
> >   >  libstdc++-v3/include/std/type_traits| 6 
> > ++
> >   >  .../testsuite/20_util/is_invocable/incomplete_args_neg.cc   | 1 +
> >   >  .../testsuite/20_util/is_invocable/incomplete_neg.cc| 1 +
> >   >  3 files changed, 8 insertions(+)
> >   >
> >   > diff --git a/libstdc++-v3/include/std/type_traits 
> > b/libstdc++-v3/include/std/type_traits
> >   > index 75a94cb8d7e..91851b78c7e 100644
> >   > --- a/libstdc++-v3/include/std/type_traits
> >   > +++ b/libstdc++-v3/include/std/type_traits
> >   > @@ -3167,9 +3167,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >   >  using invoke_result_t = typename invoke_result<_Fn, 
> > _Args...>::type;
> >   >
> >   >/// std::is_invocable
> >   > +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_invocable)
> >   > +  template
> >   > +struct is_invocable
> >   > +: public __bool_constant<__is_invocable(_Fn, _ArgTypes...)>
> >   > +#else
> >   >template
> >   >  struct is_invocable
> >   >  : __is_invocable_impl<__invoke_result<_Fn, _ArgTypes...>, 
> > void>::type
> >   > +#endif
> >   >  {
> >   >
> > static_assert(std::__is_complete_or_unbounded(__type_identity<_Fn>{}),
> >   > "_Fn must be a complete class or an unbounded array");
> >   > diff --git 
> > a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc 
> > b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
> >   > index 34d1d9431d1..3f9e5274f3c 100644
> >   > --- 
> > a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
> >   > +++ 
> > b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
> >   > @@ -18,6 +18,7 @@
> >   >  // .
> >   >
> >   >  // { dg-error "must be a complete class" "" { target *-*-* } 0 }
> >   > +// { dg-prune-output "invalid use of incomplete type" }
> >   >
> >   >  #include 
> >   >
> >   > diff --git 
> > a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc 
> > b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
> >   > index e1e54d25ee5..92af48c48b6 100644
> >   > --- a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
> >   > +++ b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
> >   > @@ -18,6 +18,7 @@
> >   >  // .
> >   >
> >   >  // { dg-error "must be a complete class" "" { target *-*-* } 0 }
> >   > +// { dg-prune-output "invalid use of incomplete type" }
> >   >
> >   >  #include 
> >   >
> >   > --
> >   > 2.42.0
> >   >
> >
> >
> >

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Martin Uecker

Am Montag, dem 23.10.2023 um 16:37 + schrieb Qing Zhao:
> 
> > On Oct 23, 2023, at 11:57 AM, Richard Biener  
> > wrote:
> > 
> > 
> > 
> > > Am 23.10.2023 um 16:56 schrieb Qing Zhao :
> > > 
> > > 
> > > 
> > > > On Oct 23, 2023, at 3:57 AM, Richard Biener 
> > > >  wrote:
> > > > 
> > > > > On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao  
> > > > > wrote:
> > > > > 
> > > > > 
> > > > > 
> > > > > > On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar 
> > > > > >  wrote:
> > > > > > 
> > > > > > On 2023-10-20 14:38, Qing Zhao wrote:
> > > > > > > How about the following:
> > > > > > > Add one more parameter to __builtin_dynamic_object_size(), i.e
> > > > > > > __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
> > > > > > > When we see the structure field has counted_by attribute.
> > > > > > 
> > > > > > Or maybe add a barrier preventing any assignments to 
> > > > > > array_annotated->foo from being reordered below the __bdos call? 
> > > > > > Basically an __asm__ with array_annotated->foo in the clobber list 
> > > > > > ought to do it I think.
> > > > > 
> > > > > Maybe just adding the array_annotated->foo to the use list of the 
> > > > > call to __builtin_dynamic_object_size should be enough?
> > > > > 
> > > > > But I am not sure how to implement this in the TREE level, is there a 
> > > > > USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the 
> > > > > counted_by field “array_annotated->foo” to the USE_LIST of the call 
> > > > > to __bdos?
> > > > > 
> > > > > This might be the simplest solution?
> > > > 
> > > > If the dynamic object size is derived of a field then I think you need 
> > > > to
> > > > put the "load" of that memory location at the point (as argument)
> > > > of the __bos call right at parsing time.  I know that's awkward because
> > > > you try to play tricks "discovering" that field only late, but that's 
> > > > not
> > > > going to work.
> > > 
> > > Is it better to do this at gimplification phase instead of FE? 
> > > 
> > > VLA decls are handled in gimplification phase, the size calculation and 
> > > call to alloca are all generated during this phase. (gimplify_vla_decl).
> > > 
> > > For __bdos calls, we can add an additional argument if the object’s first 
> > > argument’s type include the counted_by attribute, i.e
> > > 
> > > ***During gimplification, 
> > > For a call to __builtin_dynamic_object_size (ptr, type)
> > > Check whether the type of ptr includes counted_by attribute, if so, 
> > > change the call to
> > > __builtin_dynamic_object_size (ptr, type, counted_by field)
> > > 
> > > Then the correct data dependence should be represented well in the IR.
> > > 
> > > **During object size phase,
> > > 
> > > The call to __builtin_dynamic_object_size will become an expression 
> > > includes the counted_by field or -1/0 when we cannot decide the size, the 
> > > correct data dependence will be kept even the call to 
> > > __builtin_dynamic_object_size is gone. 
> > 
> > But the whole point of the BOS pass is to derive information that is not 
> > available at parsing time, and that’s the cases you are after.  The case 
> > where the connection to the field with the length is apparent during 
> > parsing is easy - you simply insert a load of the value before the BOS call.
> 
> Yes, this is true. 
> I prefer to implement this in gimplification phase since I am more familiar 
> with the code there.. (I think that implementing it in gimplification should 
> be very similar as implementing it in FE? Or do I miss anything here?)
> 
> Joseph, if implement this in FE, where in the FE I should look at? 
> 

We should aim for a good integration with the BDOS pass, so
that it can propagate the information further, e.g. the 
following should work:

struct { int L; char buf[] __counted_by(L) } x;
x.L = N;
x.buf = ...;
char *p = &x->f;
__bdos(p) -> N

So we need to be smart on how we provide the size
information for x->f to the backend. 

This would also be desirable for the language extension. 

Martin


> Thanks a lot for the help.
> 
> Qing
> 
> >  For the late case there’s no way to invent data flow dependence without 
> > inadvertently pessimizing optimization.
> > 
> > Richard 
> > 
> > > 
> > > > 
> > > > A related issue is that assignment to the field and storage allocation
> > > > are not tied together
> > > 
> > > Yes, this is different from VLA, in which, the size assignment and the 
> > > storage allocation are generated and tied together by the compiler.
> > > 
> > > For the flexible array member, the storage allocation and the size 
> > > assignment are all done by the user. So, We need to clarify such 
> > > requirement  in the document to guide user to write correct code.  And 
> > > also, we might need to provide tools (warnings and sanitizer option) to 
> > > help users to catch such coding error.
> > > 
> > > > - if there's no use of the size data we might
> > > > remove the store of it as dead.
> > > 
> > > Yes, when __bdos cannot decide

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Joseph Myers

On Mon, 23 Oct 2023, Qing Zhao wrote:

> I prefer to implement this in gimplification phase since I am more 
> familiar with the code there.. (I think that implementing it in 
> gimplification should be very similar as implementing it in FE? Or do I 
> miss anything here?)
> 
> Joseph, if implement this in FE, where in the FE I should look at? 

I tend to think that gimplification time is appropriate for adding this 
dependency, but if you wish to rewrite a built-in function call in the 
front end before then, it could be done in build_function_call_vec.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Qing Zhao



> On Oct 20, 2023, at 3:54 PM, Martin Uecker  wrote:
> 
> Am Freitag, dem 20.10.2023 um 18:48 + schrieb Qing Zhao:
>> 
>>> On Oct 20, 2023, at 2:34 PM, Kees Cook  wrote:
>>> 
>>> On Fri, Oct 20, 2023 at 11:50:11AM +0200, Martin Uecker wrote:
 Am Donnerstag, dem 19.10.2023 um 16:33 -0700 schrieb Kees Cook:
> On Wed, Oct 18, 2023 at 09:11:43PM +, Qing Zhao wrote:
>> As I replied to Martin in another email, I plan to do the following to 
>> resolve this issue:
>> 
>> 1. No specification for signed or unsigned for counted_by field.
>> 2. Add a sanitizer option -fsanitize=counted-by-bound to catch the cases 
>> when the size of the counted-by is not positive.
> 
> I don't understand why this needs to be a runtime sanitizer. The
> signedness is known at compile time, so I would expect a -W option.
 
 The signedness of the type but not of the value.
 
 But I would not want to have a warning for signed 
 counter  types by default because I would prefer
 to use signed types (for various reasons including
 better overflow detection).
 
> Or
> do you mean you'd split up -fsanitize=bounds between unsigned and signed
> indexes? I'd find that kind of awkward for the kernel... but I feel like
> I've misunderstood something. :)
> 
> -Kees
 
 The idea would be to detect at run-time the case
 if  x->buf  is used at a time where   x->counter 
 is negative and also when x->counter * sizeof(x->buf[0])
 overflows or is too big.
 
 This would be similar to
 
 int a[n];
 
 where it is detected at run-time if n is not-positive.
>>> 
>>> Right. I guess what I mean to say is that I would expect this case to
>>> already be caught by -fsanitize=bounds -- I don't see a reason to add an
>>> additional sanitizer option.
>>> 
>>> struct foo {
>>> int count;
>>> int array[] __counted_by(count);
>>> };
>>> 
>>> foo->count = 5;
>>> foo->array[0] = 1;  // ok
>>> foo->array[10] = 1; // -fsanitize=bounds will catch this
>>> foo->array[-10] = 1;// -fsanitize=bounds will catch this too
>>> 
>>> 
>> 
>> just checked this testing case with my GCC, and YES, -fsanitize=bounds 
>> indeed caught this error:
>> 
>> ttt_1.c:31:12: runtime error: index 10 out of bounds for type 'char [*]'
>> ttt_1.c:32:12: runtime error: index -10 out of bounds for type 'char [*]’
>> 
> 
> Yes, but I thought we were discussing the case where count is
> set to a negative value:
> 
> foo->count = -1;
> int x = foo->array[3]; // UBSan should diagnose this
> 
> And also the case when foo->array becomes too big.

Oops, yes, you are right. 

Thanks.

Qing
> 
> Martin

Re: [PATCH V3 00/11] Refactor and cleanup vsetvl pass

2023-10-23 Thread Patrick O'Neill


Hi Lehua,

This patch causes a build failure with newlib 4.1.0 with -march=rv64gv_zbb.

I've creduced the failure here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111941

Thanks,
Patrick

On 10/19/23 20:58, Lehua Ding wrote:

Committed, thanks Patrick and Juzhe.

On 2023/10/20 2:04, Patrick O'Neill wrote:

I tested it this morning on my machine and it passed!

Tested against:
04d6c74564b7eb51660a00b35353aeab706b5a50

Using targets:
glibc rv32gcv qemu
glibc rv64gcv qemu

This patch series does not introduce any new failures.

Here's a list of *resolved* failures by this patch series:
rv64gcv:
FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer 
-finline-functions  execution test

FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -g  execution test

rv32gcv:
FAIL: gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c execution test
FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer 
-finline-functions  execution test

FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -g  execution test

Thanks for the quick revision Lehua!

Tested-by: Patrick O'Neill 

Patrick

On 10/19/23 01:50, 钟居哲 wrote:

LGTM now. But wait for Patrick CI testing.

Hi, @Patrick. Could you apply this patch and trigger CI in your 
github  so that we can see the full running result.


Issues · patrick-rivos/riscv-gnu-toolchain · GitHub 



 


juzhe.zh...@rivai.ai

    *From:* Lehua Ding 
    *Date:* 2023-10-19 16:33
    *To:* gcc-patches 
    *CC:* juzhe.zhong ; kito.cheng
    ; rdapp.gcc
    ; palmer ;
    jeffreyalaw ; lehua.ding
    
    *Subject:* [PATCH V3 00/11] Refactor and cleanup vsetvl pass
    This patch refactors and cleanups the vsetvl pass in order to make
    the code
    easier to modify and understand. This patch does several things:
    1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and 3
    only maintain
       and modify this virtual CFG. Phase 4 performs insertion,
    modification and
       deletion of vsetvl insns based on the virtual CFG. The Basic
    block in the
       virtual CFG is called vsetvl_block_info and the vsetvl
    information inside
       is called vsetvl_info.
    2. Combine Phase 1 and 2 into a single Phase 1 and unified the
    demand system,
       this Phase only fuse local vsetvl info in forward direction.
    3. Refactor Phase 3, change the logic for determining whether to
    uplift vsetvl
       info to a pred basic block to a more unified method that there
    is a vsetvl
       info in the vsetvl defintion reaching in compatible with it.
    4. Place all modification operations to the RTL in Phase 4 and
    Phase 5.
       Phase 4 is responsible for inserting, modifying and deleting 
vsetvl

       instructions based on fully optimized vsetvl infos. Phase 5
    removes the avl
       operand from the RVV instruction and removes the unused dest
    operand
       register from the vsetvl insns.
    These modifications resulted in some testcases needing to be
    updated. The reasons
    for updating are summarized below:
    1. more optimized
vlmax_back_prop-25.c/vlmax_back_prop-26.c/vlmax_conflict-3.c/
       vlmax_conflict-12.c/vsetvl-13.c/vsetvl-23.c/
avl_single-23.c/avl_single-89.c/avl_single-95.c/pr109773-1.c
    2. less unnecessary fusion
    avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/vsetvl-18.c
    3. local fuse direction (backward -> forward)
       scalar_move-1.c/
    4. add some bugfix testcases.
       pr111037-3.c/pr111037-4.c
       avl_single-89.c
    PR target/111037
    PR target/111234
    PR target/111725
    Lehua Ding (11):
      RISC-V: P1: Refactor
    avl_info/vl_vtype_info/vector_insn_info/vector_block_info
      RISC-V: P2: Refactor and cleanup demand system
      RISC-V: P3: Refactor vector_infos_manager
      RISC-V: P4: move method from pass_vsetvl to pre_vsetvl
      RISC-V: P5: combine phase 1 and 2
      RISC-V: P6: Add computing reaching definition data flow
      RISC-V: P7: Move earliest fuse and lcm code to pre_vsetvl class
      RISC-V: P8: Refactor emit-vsetvl phase and delete post 
optimization

      RISC-V: P9: Cleanup and reorganize helper functions
      RISC-V: P10: Delete riscv-vsetvl.h and adjust riscv-vsetvl.def
      RISC-V: P11: Adjust and add testcases
    gcc/config/riscv/riscv-vsetvl.cc  | 6502 
+++--

    gcc/config/riscv/riscv-vsetvl.def |  641 +-
    gcc/config/riscv/riscv-vsetvl.h   |  488 --
    gcc/config/riscv/t-riscv  |    2 +-
    .../gcc.target/riscv/rvv/base/scalar_move-1.c

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Martin Uecker

Am Montag, dem 23.10.2023 um 20:06 +0200 schrieb Martin Uecker:
> Am Montag, dem 23.10.2023 um 16:37 + schrieb Qing Zhao:
> > 
> > > On Oct 23, 2023, at 11:57 AM, Richard Biener  
> > > wrote:
> > > 
> > > 
> > > 
> > > > Am 23.10.2023 um 16:56 schrieb Qing Zhao :
> > > > 
> > > > 
> > > > 
> > > > > On Oct 23, 2023, at 3:57 AM, Richard Biener 
> > > > >  wrote:
> > > > > 
> > > > > > On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao  
> > > > > > wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar 
> > > > > > >  wrote:
> > > > > > > 
> > > > > > > On 2023-10-20 14:38, Qing Zhao wrote:
> > > > > > > > How about the following:
> > > > > > > > Add one more parameter to __builtin_dynamic_object_size(), i.e
> > > > > > > > __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
> > > > > > > > When we see the structure field has counted_by attribute.
> > > > > > > 
> > > > > > > Or maybe add a barrier preventing any assignments to 
> > > > > > > array_annotated->foo from being reordered below the __bdos call? 
> > > > > > > Basically an __asm__ with array_annotated->foo in the clobber 
> > > > > > > list ought to do it I think.
> > > > > > 
> > > > > > Maybe just adding the array_annotated->foo to the use list of the 
> > > > > > call to __builtin_dynamic_object_size should be enough?
> > > > > > 
> > > > > > But I am not sure how to implement this in the TREE level, is there 
> > > > > > a USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add 
> > > > > > the counted_by field “array_annotated->foo” to the USE_LIST of the 
> > > > > > call to __bdos?
> > > > > > 
> > > > > > This might be the simplest solution?
> > > > > 
> > > > > If the dynamic object size is derived of a field then I think you 
> > > > > need to
> > > > > put the "load" of that memory location at the point (as argument)
> > > > > of the __bos call right at parsing time.  I know that's awkward 
> > > > > because
> > > > > you try to play tricks "discovering" that field only late, but that's 
> > > > > not
> > > > > going to work.
> > > > 
> > > > Is it better to do this at gimplification phase instead of FE? 
> > > > 
> > > > VLA decls are handled in gimplification phase, the size calculation and 
> > > > call to alloca are all generated during this phase. (gimplify_vla_decl).
> > > > 
> > > > For __bdos calls, we can add an additional argument if the object’s 
> > > > first argument’s type include the counted_by attribute, i.e
> > > > 
> > > > ***During gimplification, 
> > > > For a call to __builtin_dynamic_object_size (ptr, type)
> > > > Check whether the type of ptr includes counted_by attribute, if so, 
> > > > change the call to
> > > > __builtin_dynamic_object_size (ptr, type, counted_by field)
> > > > 
> > > > Then the correct data dependence should be represented well in the IR.
> > > > 
> > > > **During object size phase,
> > > > 
> > > > The call to __builtin_dynamic_object_size will become an expression 
> > > > includes the counted_by field or -1/0 when we cannot decide the size, 
> > > > the correct data dependence will be kept even the call to 
> > > > __builtin_dynamic_object_size is gone. 
> > > 
> > > But the whole point of the BOS pass is to derive information that is not 
> > > available at parsing time, and that’s the cases you are after.  The case 
> > > where the connection to the field with the length is apparent during 
> > > parsing is easy - you simply insert a load of the value before the BOS 
> > > call.
> > 
> > Yes, this is true. 
> > I prefer to implement this in gimplification phase since I am more familiar 
> > with the code there.. (I think that implementing it in gimplification 
> > should be very similar as implementing it in FE? Or do I miss anything 
> > here?)
> > 
> > Joseph, if implement this in FE, where in the FE I should look at? 
> > 
> 
> We should aim for a good integration with the BDOS pass, so
> that it can propagate the information further, e.g. the 
> following should work:
> 
> struct { int L; char buf[] __counted_by(L) } x;
> x.L = N;
> x.buf = ...;
> char *p = &x->f;
> __bdos(p) -> N
> 
> So we need to be smart on how we provide the size
> information for x->f to the backend. 

To follow up on this. I do not think we should change the
builtin in the FE or gimplification. Instead, we want 
to change the field access and compute the size there. 

In my toy patch I then made this have a VLA type that 
encodes the size.  Here, this would need to be done 
differently.

But still, what we are missing in both cases
is a proper way to pass the information down to BDOS.

For VLAs this works because BDOS can see the size of
the definition.  For calls to allocation functions
it is read from an attribute. 

But I am not sure what would be the best way to encode
this information so that BDOS can later access it.

Martin




> 
> This would also be desirable for the language extension. 
> 
> Martin
> 
> 
> > Thanks a lot for the he

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-23 Thread Vineet Gupta





On 10/22/23 23:46, Ajit Agarwal wrote:

Hello All:

Addressed below review comments in the version 11 of the patch.
Please review and please let me know if its ok for trunk.

Thanks & Regards
Ajit


Again you are not paying attention to prior comments about fixing your 
submission practice and like some of the prior reviewers I'm starting to 
get tired, despite potentially good technical content.


1. The commentary above is NOT part of changelog. Either use a separate 
cover letter or add patch version change history between two "---" lines 
just before the start of code diff. And keep accumulating those as you 
post new versions. See [1]. This is so reviewers knwo what changed over 
10 months and automatically gets dropped when patch is eventually 
applied/merged into tree.


2. Acknowledge (even if it is yes) each and every comment of the 
reviewerw explicitly inline below. That ensures you don't miss 
addressing a change since this forces one to think about each of them.


I do have some technical comments which I'll follow up with later.
Just a short summary that v10 indeed bootstraps risc-v but I don't see 
any improvements at all - as in whenever abi interfaces code identifies 
and extension (saw missing a definition, the it is not able to eliminate 
any extensions despite the patch.


-Vineet

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632180.html



On 22/10/23 12:56 am, rep.dot@gmail.com wrote:

On 21 October 2023 01:56:16 CEST, Vineet Gupta  wrote:

On 10/19/23 23:50, Ajit Agarwal wrote:

Hello All:

This version 9 of the patch uses abi interfaces to remove zero and sign 
extension elimination.
Bootstrapped and regtested on powerpc-linux-gnu.

In this version (version 9) of the patch following review comments are 
incorporated.

a) Removal of hard code zero_extend and sign_extend  in abi interfaces.
b) Source and destination with different registers are considered.
c) Further enhancements.
d) Added sign extension elimination using abi interfaces.

As has been trend in the past, I don't think all the review comments have been 
addressed.

And apart from that, may I ask if this is just me, or does anybody else think 
that it might be worthwhile to actually read a patch before (re-)posting?

Seeing e.g. the proposed abi_extension_candidate_p as written in a first POC 
would deserve some manual CSE, if nothing more then for clarity and conciseness?

Just curious from a meta perspective..

And:


ree: Improve ree pass for rs6000 target using defined abi interfaces

mentioning powerpc like this, and then changing generic code could be 
interpreted as misleading, IMHO.


For rs6000 target we see redundant zero and sign extension and done
to improve ree pass to eliminate such redundant zero and sign extension
using defined ABI interfaces.

Mentioning powerpc in the body as one of the affected target(s) is of course 
fine.



   +/* Return TRUE if target mode is equal to source mode of zero_extend
+   or sign_extend otherwise false.  */

, false otherwise.

But I'm not a native speaker



+/* Return TRUE if the candidate insn is zero extend and regno is
+   a return registers.  */
+
+static bool
+abi_extension_candidate_return_reg_p (/*rtx_insn *insn, */int regno)

Leftover debug comment.


+{
+  if (targetm.calls.function_value_regno_p (regno))
+return true;
+
+  return false;
+}
+

As said, I don't see why the below was not cleaned up before the V1 submission.
Iff it breaks when manually CSEing, I'm curious why?


+/* Return TRUE if reg source operand of zero_extend is argument registers
+   and not return registers and source and destination operand are same
+   and mode of source and destination operand are not same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  machine_mode dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set), 0);
+
+  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  || abi_extension_candidate_return_reg_p (/*insn,*/ REGNO (orig_src)))

On top, debug leftover.


+return false;
+
+  /* Mode of destination and source should be different.  */
+  if (dst_mode == GET_MODE (orig_src))
+return false;
+
+  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
+  bool promote_p = abi_target_promote_function_mode (mode);
+
+  /* REGNO of source and destination should be same if not
+  promoted.  */
+  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
+return false;
+
+  return true;
+}
+

As said, please also rephrase the above (and everything else if it obviously 
looks akin the above).

The rest, mentioned below,  should largely be covered by following the coding 
convention.


+/* Return TRUE if the candidate insn is zero extend and regno is
+   an argument registers.  */

Singular register.


+
+static bool
+abi_extension_candidate_argno_p (/*rtx_code code, */int regno)

Debug leftover.
I would probably have inlined this function manually, with a respe

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Qing Zhao



> On Oct 23, 2023, at 2:06 PM, Martin Uecker  wrote:
> 
> Am Montag, dem 23.10.2023 um 16:37 + schrieb Qing Zhao:
>> 
>>> On Oct 23, 2023, at 11:57 AM, Richard Biener  
>>> wrote:
>>> 
>>> 
>>> 
 Am 23.10.2023 um 16:56 schrieb Qing Zhao :
 
 
 
> On Oct 23, 2023, at 3:57 AM, Richard Biener  
> wrote:
> 
>> On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar  
>>> wrote:
>>> 
>>> On 2023-10-20 14:38, Qing Zhao wrote:
 How about the following:
 Add one more parameter to __builtin_dynamic_object_size(), i.e
 __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
 When we see the structure field has counted_by attribute.
>>> 
>>> Or maybe add a barrier preventing any assignments to 
>>> array_annotated->foo from being reordered below the __bdos call? 
>>> Basically an __asm__ with array_annotated->foo in the clobber list 
>>> ought to do it I think.
>> 
>> Maybe just adding the array_annotated->foo to the use list of the call 
>> to __builtin_dynamic_object_size should be enough?
>> 
>> But I am not sure how to implement this in the TREE level, is there a 
>> USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the 
>> counted_by field “array_annotated->foo” to the USE_LIST of the call to 
>> __bdos?
>> 
>> This might be the simplest solution?
> 
> If the dynamic object size is derived of a field then I think you need to
> put the "load" of that memory location at the point (as argument)
> of the __bos call right at parsing time.  I know that's awkward because
> you try to play tricks "discovering" that field only late, but that's not
> going to work.
 
 Is it better to do this at gimplification phase instead of FE? 
 
 VLA decls are handled in gimplification phase, the size calculation and 
 call to alloca are all generated during this phase. (gimplify_vla_decl).
 
 For __bdos calls, we can add an additional argument if the object’s first 
 argument’s type include the counted_by attribute, i.e
 
 ***During gimplification, 
 For a call to __builtin_dynamic_object_size (ptr, type)
 Check whether the type of ptr includes counted_by attribute, if so, change 
 the call to
 __builtin_dynamic_object_size (ptr, type, counted_by field)
 
 Then the correct data dependence should be represented well in the IR.
 
 **During object size phase,
 
 The call to __builtin_dynamic_object_size will become an expression 
 includes the counted_by field or -1/0 when we cannot decide the size, the 
 correct data dependence will be kept even the call to 
 __builtin_dynamic_object_size is gone. 
>>> 
>>> But the whole point of the BOS pass is to derive information that is not 
>>> available at parsing time, and that’s the cases you are after.  The case 
>>> where the connection to the field with the length is apparent during 
>>> parsing is easy - you simply insert a load of the value before the BOS call.
>> 
>> Yes, this is true. 
>> I prefer to implement this in gimplification phase since I am more familiar 
>> with the code there.. (I think that implementing it in gimplification should 
>> be very similar as implementing it in FE? Or do I miss anything here?)
>> 
>> Joseph, if implement this in FE, where in the FE I should look at? 
>> 
> 
> We should aim for a good integration with the BDOS pass, so
> that it can propagate the information further, e.g. the 
> following should work:
> 
> struct { int L; char buf[] __counted_by(L) } x;
> x.L = N;
> x.buf = ...;
> char *p = &x->f;
Is the above line should be: 
char *p = &x.buf
?
> __bdos(p) -> N
> 
> So we need to be smart on how we provide the size
> information for x->f to the backend. 

Do you have any other suggestion here?

(Right now, what we’d like to do is to add one more argument for the function 
__bdos as
 __bdos (p, type, x.L))
> 
> This would also be desirable for the language extension. 

Yes.

Qing
> 
> Martin
> 
> 
>> Thanks a lot for the help.
>> 
>> Qing
>> 
>>> For the late case there’s no way to invent data flow dependence without 
>>> inadvertently pessimizing optimization.
>>> 
>>> Richard 
>>> 
 
> 
> A related issue is that assignment to the field and storage allocation
> are not tied together
 
 Yes, this is different from VLA, in which, the size assignment and the 
 storage allocation are generated and tied together by the compiler.
 
 For the flexible array member, the storage allocation and the size 
 assignment are all done by the user. So, We need to clarify such 
 requirement  in the document to guide user to write correct code.  And 
 also, we might need to provide tools (warnings and sanitizer option) to 
 help users to catch such coding error.
 
>>

Re: [PATCH v1 1/1] gcc: config: microblaze: fix cpu version check

2023-10-23 Thread Frager, Neal

> Le 23 oct. 2023 à 18:40, Michael Eager  a écrit :
> 
> On 10/22/23 22:48, Neal Frager wrote:
>> There is a microblaze cpu version 10.0 included in versal. If the
>> minor version is only a single digit, then the version comparison
>> will fail as version 10.0 will appear as 100 compared to version
>> 6.00 or 8.30 which will calculate to values 600 and 830.
>> The issue can be seen when using the '-mcpu=10.0' option.
>> With this fix, versions with a single digit minor number such as
>> 10.0 will be calculated as greater than versions with a smaller
>> major version number, but with two minor version digits.
>> By applying this fix, several incorrect warning messages will no
>> longer be printed when building the versal plm application, such
>> as the warning message below:
>> warning: '-mxl-multiply-high' can be used only with '-mcpu=v6.00.a' or 
>> greater
>> Signed-off-by: Neal Frager 
>> ---
>>  gcc/config/microblaze/microblaze.cc | 164 +---
>>  1 file changed, 76 insertions(+), 88 deletions(-)
> 
> Please add a test case.
> 
> --
> Michael Eager

Hi Michael,

Would you mind helping me understand how to make a gcc test case for this patch?

This patch does not change the resulting binaries of a microblaze gcc build.  
The output will be the same with our without the patch, so I do not having 
anything in the binary itself to verify.

All that happens is false warning messages will not be printed when building 
with ‘-mcpu=10.0’.  Is there a way to test for warning messages?

In any case, please do not commit v1 of this patch.  I am going to work on 
making a v2 based on Mark’s feedback.

Thanks for your help!

Best regards,
Neal Frager
AMD

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Siddhesh Poyarekar


On 2023-10-23 14:06, Martin Uecker wrote:

We should aim for a good integration with the BDOS pass, so
that it can propagate the information further, e.g. the
following should work:

struct { int L; char buf[] __counted_by(L) } x;
x.L = N;
x.buf = ...;
char *p = &x->f;
__bdos(p) -> N

So we need to be smart on how we provide the size
information for x->f to the backend.

This would also be desirable for the language extension.


This is essentially why there need to be frontend rules constraining 
reordering and reachability semantics of x.L, thus restricting DSE and 
reordering for it.  This is not really a __bdos/__bos question, because 
that bit is trivial; if the structure is visible, the value is simply 
x.L.  This is also why adding a reference to x.L in __bos/__bdos is not 
sufficient or even possible in, e.g. the above case you note.


Thanks,
Sid

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Martin Uecker

Am Montag, dem 23.10.2023 um 14:43 -0400 schrieb Siddhesh Poyarekar:
> On 2023-10-23 14:06, Martin Uecker wrote:
> > We should aim for a good integration with the BDOS pass, so
> > that it can propagate the information further, e.g. the
> > following should work:
> > 
> > struct { int L; char buf[] __counted_by(L) } x;
> > x.L = N;
> > x.buf = ...;
> > char *p = &x->f;
> > __bdos(p) -> N
> > 
> > So we need to be smart on how we provide the size
> > information for x->f to the backend.
> > 
> > This would also be desirable for the language extension.
> 
> This is essentially why there need to be frontend rules constraining 
> reordering and reachability semantics of x.L, thus restricting DSE and 
> reordering for it. 

Yes, this too.

>  This is not really a __bdos/__bos question, because 
> that bit is trivial; if the structure is visible, the value is simply 
> x.L.  This is also why adding a reference to x.L in __bos/__bdos is not 
> sufficient or even possible in, e.g. the above case you note.

The value x.L may change in time. I would argue that it needs
to be the value of x.L at the time where x.buf (not x->f, sorry) 
is accessed.  So the FE needs to evaluate x.L when x.buf is
accessed and store the value somewhere where __bdos can find
it later.  In the type information would make sense.

But I am not sure how to do this in the best way so that this 
information is not removed later when not used explicitely
before __bdos tries to look at it.

Martin

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Qing Zhao



> On Oct 23, 2023, at 2:31 PM, Martin Uecker  wrote:
> 
> Am Montag, dem 23.10.2023 um 20:06 +0200 schrieb Martin Uecker:
>> Am Montag, dem 23.10.2023 um 16:37 + schrieb Qing Zhao:
>>> 
 On Oct 23, 2023, at 11:57 AM, Richard Biener  
 wrote:
 
 
 
> Am 23.10.2023 um 16:56 schrieb Qing Zhao :
> 
> 
> 
>> On Oct 23, 2023, at 3:57 AM, Richard Biener  
>> wrote:
>> 
>>> On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao  wrote:
>>> 
>>> 
>>> 
 On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar  
 wrote:
 
 On 2023-10-20 14:38, Qing Zhao wrote:
> How about the following:
> Add one more parameter to __builtin_dynamic_object_size(), i.e
> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
> When we see the structure field has counted_by attribute.
 
 Or maybe add a barrier preventing any assignments to 
 array_annotated->foo from being reordered below the __bdos call? 
 Basically an __asm__ with array_annotated->foo in the clobber list 
 ought to do it I think.
>>> 
>>> Maybe just adding the array_annotated->foo to the use list of the call 
>>> to __builtin_dynamic_object_size should be enough?
>>> 
>>> But I am not sure how to implement this in the TREE level, is there a 
>>> USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the 
>>> counted_by field “array_annotated->foo” to the USE_LIST of the call to 
>>> __bdos?
>>> 
>>> This might be the simplest solution?
>> 
>> If the dynamic object size is derived of a field then I think you need to
>> put the "load" of that memory location at the point (as argument)
>> of the __bos call right at parsing time.  I know that's awkward because
>> you try to play tricks "discovering" that field only late, but that's not
>> going to work.
> 
> Is it better to do this at gimplification phase instead of FE? 
> 
> VLA decls are handled in gimplification phase, the size calculation and 
> call to alloca are all generated during this phase. (gimplify_vla_decl).
> 
> For __bdos calls, we can add an additional argument if the object’s first 
> argument’s type include the counted_by attribute, i.e
> 
> ***During gimplification, 
> For a call to __builtin_dynamic_object_size (ptr, type)
> Check whether the type of ptr includes counted_by attribute, if so, 
> change the call to
> __builtin_dynamic_object_size (ptr, type, counted_by field)
> 
> Then the correct data dependence should be represented well in the IR.
> 
> **During object size phase,
> 
> The call to __builtin_dynamic_object_size will become an expression 
> includes the counted_by field or -1/0 when we cannot decide the size, the 
> correct data dependence will be kept even the call to 
> __builtin_dynamic_object_size is gone. 
 
 But the whole point of the BOS pass is to derive information that is not 
 available at parsing time, and that’s the cases you are after.  The case 
 where the connection to the field with the length is apparent during 
 parsing is easy - you simply insert a load of the value before the BOS 
 call.
>>> 
>>> Yes, this is true. 
>>> I prefer to implement this in gimplification phase since I am more familiar 
>>> with the code there.. (I think that implementing it in gimplification 
>>> should be very similar as implementing it in FE? Or do I miss anything 
>>> here?)
>>> 
>>> Joseph, if implement this in FE, where in the FE I should look at? 
>>> 
>> 
>> We should aim for a good integration with the BDOS pass, so
>> that it can propagate the information further, e.g. the 
>> following should work:
>> 
>> struct { int L; char buf[] __counted_by(L) } x;
>> x.L = N;
>> x.buf = ...;
>> char *p = &x->f;
>> __bdos(p) -> N
>> 
>> So we need to be smart on how we provide the size
>> information for x->f to the backend. 
> 
> To follow up on this. I do not think we should change the
> builtin in the FE or gimplification. Instead, we want 
> to change the field access and compute the size there. 
Could you please clarify on this? What do you mean by "change the field access 
and compute the size there”?
> 
> In my toy patch I then made this have a VLA type that 
> encodes the size.  Here, this would need to be done 
> differently.
> 
> But still, what we are missing in both cases
> is a proper way to pass the information down to BDOS.

What’ s the issue with adding a new argument (x.L) to the BDOS call? What’s 
missing with this approach?

> 
> For VLAs this works because BDOS can see the size of
> the definition.  For calls to allocation functions
> it is read from an attribute. 

You mean for VLA, BDOS see the size of the definition from the attribute for 
the allocation function?
Yes, that’s the case for VLA. 

For VLA, the size computation and

[PATCH v3] gcc: Introduce -fhardened

2023-10-23 Thread Marek Polacek

On Thu, Oct 19, 2023 at 02:24:11PM +0200, Richard Biener wrote:
> On Wed, Oct 11, 2023 at 10:48 PM Marek Polacek  wrote:
> >
> > On Tue, Sep 19, 2023 at 10:58:19AM -0400, Marek Polacek wrote:
> > > On Mon, Sep 18, 2023 at 08:57:39AM +0200, Richard Biener wrote:
> > > > On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, 
> > > > > powerpc64le-unknown-linux-gnu,
> > > > > and aarch64-unknown-linux-gnu; ok for trunk?
> > > > >
> > > > > -- >8 --
> > > > > In 
> > > > > I proposed -fhardened, a new umbrella option that enables a 
> > > > > reasonable set
> > > > > of hardening flags.  The read of the room seems to be that the option
> > > > > would be useful.  So here's a patch implementing that option.
> > > > >
> > > > > Currently, -fhardened enables:
> > > > >
> > > > >   -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
> > > > >   -D_GLIBCXX_ASSERTIONS
> > > > >   -ftrivial-auto-var-init=pattern
> 
> I think =zero is much better here given the overhead is way
> cheaper and pointers get a more reliable behavior.

Ok, changed now.
 
> > > > >   -fPIE  -pie  -Wl,-z,relro,-z,now
> > > > >   -fstack-protector-strong
> > > > >   -fstack-clash-protection
> > > > >   -fcf-protection=full (x86 GNU/Linux only)
> > > > >
> > > > > -fhardened will not override options that were specified on the 
> > > > > command line
> > > > > (before or after -fhardened).  For example,
> > > > >
> > > > >  -D_FORTIFY_SOURCE=1 -fhardened
> > > > >
> > > > > means that _FORTIFY_SOURCE=1 will be used.  Similarly,
> > > > >
> > > > >   -fhardened -fstack-protector
> > > > >
> > > > > will not enable -fstack-protector-strong.
> > > > >
> > > > > In DW_AT_producer it is reflected only as -fhardened; it doesn't 
> > > > > expand
> > > > > to anything.  I think we need a better way to show what it actually
> > > > > enables.
> > > >
> > > > I do think we need to find a solution here to solve asserting 
> > > > compliance.
> > >
> > > Fair enough.
> > >
> > > > Maybe we can have -Whardened that will diagnose any altering of
> > > > -fhardened by other options on the command-line or by missed target
> > > > implementations?  People might for example use -fstack-protector
> > > > but don't really want to make protection lower than requested with 
> > > > -fhardened.
> > > >
> > > > Any such conflict is much less appearant than when you use the
> > > > flags -fhardened composes.
> > >
> > > How about: --help=hardened says which options -fhardened attempts to
> > > enable, and -Whardened warns when it didn't enable an option?  E.g.,
> > >
> > >   -fstack-protector -fhardened -Whardened
> > >
> > > would say that it didn't enable -fstack-protector-strong because
> > > -fstack-protector was specified on the command line?
> > >
> > > If !HAVE_LD_NOW_SUPPORT, --help=hardened probably doesn't even have to
> > > list -z now, likewise for -z relro.
> > >
> > > Unclear if -Whardened should be enabled by default, but probably yes?
> >
> > Here's v2 which adds -Whardened (enabled by default).
> >
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> I think it's OK but I'd like to see a second ACK here.  

Thanks!

> Can you see how our
> primary and secondary targets (+ host OS) behave here?

That's very reasonable.  I tried to build gcc on Compile Farm 119 (AIX) but
that fails with:

ar  -X64 x ../ppc64/libgcc/libgcc_s.a shr.o
ar: 0707-100 ../ppc64/libgcc/libgcc_s.a does not exist.
make[2]: *** [/home/polacek/gcc/libgcc/config/rs6000/t-slibgcc-aix:98: all] 
Error 1
make[2]: Leaving directory '/home/polacek/x/trunk/powerpc-ibm-aix7.3.1.0/libgcc'

and I tried Darwin (104) and that fails with

*** Configuration aarch64-apple-darwin21.6.0 not supported

Is anyone else able to build gcc on those machines, or test the attached
patch?

> I think the
> documentation should elaborate a bit on expectations for non-Linux/GNU
> targets, specifically I think the default configuration for a target should
> with -fhardened _not_ have any -Whardened diagnostics.  Maybe we can
> have a testcase for this?

Sorry, I'm not sure how to test that.  I suppose if -fhardened enables
something not supported on those systems, and it's something for which
we have a configure test, then we shouldn't warn.  This is already the
case for -pie, -z relro, and -z now.  

Should the docs say something like the following for features without
configure checks?

@option{-fhardened} can, on certain systems, attempt to enable features
not supported on that particular system.  In that case, it's possible to
prevent the warning using the @option{-Wno-hardened} option.

I've added a line saying:

+This option is intended to be used in production builds, not merely
+in debug builds.

-- >8 --
In 
I proposed -fhardened, a new umbrella option that enables

Re: [PATCH v1 1/1] gcc: config: microblaze: fix cpu version check

2023-10-23 Thread Michael Eager


On 10/23/23 11:37, Frager, Neal wrote:





Le 23 oct. 2023 à 18:40, Michael Eager  a écrit :

On 10/22/23 22:48, Neal Frager wrote:

There is a microblaze cpu version 10.0 included in versal. If the
minor version is only a single digit, then the version comparison
will fail as version 10.0 will appear as 100 compared to version
6.00 or 8.30 which will calculate to values 600 and 830.
The issue can be seen when using the '-mcpu=10.0' option.
With this fix, versions with a single digit minor number such as
10.0 will be calculated as greater than versions with a smaller
major version number, but with two minor version digits.
By applying this fix, several incorrect warning messages will no
longer be printed when building the versal plm application, such
as the warning message below:
warning: '-mxl-multiply-high' can be used only with '-mcpu=v6.00.a' or greater
Signed-off-by: Neal Frager 
---
  gcc/config/microblaze/microblaze.cc | 164 +---
  1 file changed, 76 insertions(+), 88 deletions(-)


Please add a test case.

--
Michael Eager


Hi Michael,

Would you mind helping me understand how to make a gcc test case for this patch?

This patch does not change the resulting binaries of a microblaze gcc build.  
The output will be the same with our without the patch, so I do not having 
anything in the binary itself to verify.

All that happens is false warning messages will not be printed when building 
with ‘-mcpu=10.0’.  Is there a way to test for warning messages?

In any case, please do not commit v1 of this patch.  I am going to work on 
making a v2 based on Mark’s feedback.


You can create a test case which passes the -mcpu=10.0 and other options
to GCC and verify that the message is not generated after the patch is
applied.

You can make all GCC warnings into errors with the "-Werror" option.
This means that the compile will fail if the warning is issued.

Take a look at gcc/testsuite/gcc.target/aarch64/bti-1.c for an example
of using { dg-options "" } to specify command line options.

There is a test suite option (dg-warning) which checks that a particular
source line generates a warning message, but it isn't clear whether is
is possible to check that a warning is not issued.

--
Michael Eager

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Martin Uecker

Am Montag, dem 23.10.2023 um 19:00 + schrieb Qing Zhao:
> 
> > On Oct 23, 2023, at 2:31 PM, Martin Uecker  wrote:
> > 
> > Am Montag, dem 23.10.2023 um 20:06 +0200 schrieb Martin Uecker:
> > > Am Montag, dem 23.10.2023 um 16:37 + schrieb Qing Zhao:
> > > > 
> > > > > On Oct 23, 2023, at 11:57 AM, Richard Biener 
> > > > >  wrote:
> > > > > 
> > > > > 
> > > > > 
> > > > > > Am 23.10.2023 um 16:56 schrieb Qing Zhao :
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > On Oct 23, 2023, at 3:57 AM, Richard Biener 
> > > > > > >  wrote:
> > > > > > > 
> > > > > > > > On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao 
> > > > > > > >  wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar 
> > > > > > > > >  wrote:
> > > > > > > > > 
> > > > > > > > > On 2023-10-20 14:38, Qing Zhao wrote:
> > > > > > > > > > How about the following:
> > > > > > > > > > Add one more parameter to __builtin_dynamic_object_size(), 
> > > > > > > > > > i.e
> > > > > > > > > > __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
> > > > > > > > > > When we see the structure field has counted_by attribute.
> > > > > > > > > 
> > > > > > > > > Or maybe add a barrier preventing any assignments to 
> > > > > > > > > array_annotated->foo from being reordered below the __bdos 
> > > > > > > > > call? Basically an __asm__ with array_annotated->foo in the 
> > > > > > > > > clobber list ought to do it I think.
> > > > > > > > 
> > > > > > > > Maybe just adding the array_annotated->foo to the use list of 
> > > > > > > > the call to __builtin_dynamic_object_size should be enough?
> > > > > > > > 
> > > > > > > > But I am not sure how to implement this in the TREE level, is 
> > > > > > > > there a USE_LIST/CLOBBER_LIST for each call?  Then I can just 
> > > > > > > > simply add the counted_by field “array_annotated->foo” to the 
> > > > > > > > USE_LIST of the call to __bdos?
> > > > > > > > 
> > > > > > > > This might be the simplest solution?
> > > > > > > 
> > > > > > > If the dynamic object size is derived of a field then I think you 
> > > > > > > need to
> > > > > > > put the "load" of that memory location at the point (as argument)
> > > > > > > of the __bos call right at parsing time.  I know that's awkward 
> > > > > > > because
> > > > > > > you try to play tricks "discovering" that field only late, but 
> > > > > > > that's not
> > > > > > > going to work.
> > > > > > 
> > > > > > Is it better to do this at gimplification phase instead of FE? 
> > > > > > 
> > > > > > VLA decls are handled in gimplification phase, the size calculation 
> > > > > > and call to alloca are all generated during this phase. 
> > > > > > (gimplify_vla_decl).
> > > > > > 
> > > > > > For __bdos calls, we can add an additional argument if the object’s 
> > > > > > first argument’s type include the counted_by attribute, i.e
> > > > > > 
> > > > > > ***During gimplification, 
> > > > > > For a call to __builtin_dynamic_object_size (ptr, type)
> > > > > > Check whether the type of ptr includes counted_by attribute, if so, 
> > > > > > change the call to
> > > > > > __builtin_dynamic_object_size (ptr, type, counted_by field)
> > > > > > 
> > > > > > Then the correct data dependence should be represented well in the 
> > > > > > IR.
> > > > > > 
> > > > > > **During object size phase,
> > > > > > 
> > > > > > The call to __builtin_dynamic_object_size will become an expression 
> > > > > > includes the counted_by field or -1/0 when we cannot decide the 
> > > > > > size, the correct data dependence will be kept even the call to 
> > > > > > __builtin_dynamic_object_size is gone. 
> > > > > 
> > > > > But the whole point of the BOS pass is to derive information that is 
> > > > > not available at parsing time, and that’s the cases you are after.  
> > > > > The case where the connection to the field with the length is 
> > > > > apparent during parsing is easy - you simply insert a load of the 
> > > > > value before the BOS call.
> > > > 
> > > > Yes, this is true. 
> > > > I prefer to implement this in gimplification phase since I am more 
> > > > familiar with the code there.. (I think that implementing it in 
> > > > gimplification should be very similar as implementing it in FE? Or do I 
> > > > miss anything here?)
> > > > 
> > > > Joseph, if implement this in FE, where in the FE I should look at? 
> > > > 
> > > 
> > > We should aim for a good integration with the BDOS pass, so
> > > that it can propagate the information further, e.g. the 
> > > following should work:
> > > 
> > > struct { int L; char buf[] __counted_by(L) } x;
> > > x.L = N;
> > > x.buf = ...;
> > > char *p = &x->f;
> > > __bdos(p) -> N
> > > 
> > > So we need to be smart on how we provide the size
> > > information for x->f to the backend. 
> > 
> > To follow up on this. I do not think we should change the
> > builtin in the FE or gimplification. Instead, we want 
> > to change the field access and compute the

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Qing Zhao

> On Oct 23, 2023, at 2:43 PM, Siddhesh Poyarekar  wrote:
> 
> On 2023-10-23 14:06, Martin Uecker wrote:
>> We should aim for a good integration with the BDOS pass, so
>> that it can propagate the information further, e.g. the
>> following should work:
>> struct { int L; char buf[] __counted_by(L) } x;
>> x.L = N;
>> x.buf = ...;
>> char *p = &x->f;
>> __bdos(p) -> N
>> So we need to be smart on how we provide the size
>> information for x->f to the backend.
>> This would also be desirable for the language extension.
> 
> This is essentially why there need to be frontend rules constraining 
> reordering and reachability semantics of x.L, thus restricting DSE and 
> reordering for it.

My understanding is that Restricting DSE and reordering should be done by the 
proper data flow information, with a new argument added to the BDOS call, this 
correct data flow information could be maintained, and then the DSE and 
reordering will not happen. 

I don’t quite understand what kind of frontend rules should be added to 
constrain reordering and reachability semantics? Can you explain this a little 
bit more? Do you mean to add some rules or requirment to the new attribute that 
the users of the attribute should follow in the source code? 

>  This is not really a __bdos/__bos question, because that bit is trivial; if 
> the structure is visible, the value is simply x.L.  This is also why adding a 
> reference to x.L in __bos/__bdos is not sufficient or even possible in, e.g. 
> the above case you note.

I am a little confused here, are we discussing how to resolve the potential 
reordering issue of the following:

"
struct annotated {
  size_t foo;
  char array[] __attribute__((counted_by (foo)));
};

  p->foo = 10;
  size = __builtin_dynamic_object_size (p->array,1);
“?

Or a bigger issue?

Qing

> 
> Thanks,
> Sid

Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Kees Cook

On Fri, Oct 20, 2023 at 09:54:05PM +0200, Martin Uecker wrote:
> Am Freitag, dem 20.10.2023 um 18:48 + schrieb Qing Zhao:
> > 
> > > On Oct 20, 2023, at 2:34 PM, Kees Cook  wrote:
> > > 
> > > On Fri, Oct 20, 2023 at 11:50:11AM +0200, Martin Uecker wrote:
> > > > Am Donnerstag, dem 19.10.2023 um 16:33 -0700 schrieb Kees Cook:
> > > > > On Wed, Oct 18, 2023 at 09:11:43PM +, Qing Zhao wrote:
> > > > > > As I replied to Martin in another email, I plan to do the following 
> > > > > > to resolve this issue:
> > > > > > 
> > > > > > 1. No specification for signed or unsigned for counted_by field.
> > > > > > 2. Add a sanitizer option -fsanitize=counted-by-bound to catch the 
> > > > > > cases when the size of the counted-by is not positive.
> > > > > 
> > > > > I don't understand why this needs to be a runtime sanitizer. The
> > > > > signedness is known at compile time, so I would expect a -W option.
> > > > 
> > > > The signedness of the type but not of the value.
> > > > 
> > > > But I would not want to have a warning for signed 
> > > > counter  types by default because I would prefer
> > > > to use signed types (for various reasons including
> > > > better overflow detection).
> > > > 
> > > > > Or
> > > > > do you mean you'd split up -fsanitize=bounds between unsigned and 
> > > > > signed
> > > > > indexes? I'd find that kind of awkward for the kernel... but I feel 
> > > > > like
> > > > > I've misunderstood something. :)
> > > > > 
> > > > > -Kees
> > > > 
> > > > The idea would be to detect at run-time the case
> > > > if  x->buf  is used at a time where   x->counter 
> > > > is negative and also when x->counter * sizeof(x->buf[0])
> > > > overflows or is too big.
> > > > 
> > > > This would be similar to
> > > > 
> > > > int a[n];
> > > > 
> > > > where it is detected at run-time if n is not-positive.
> > > 
> > > Right. I guess what I mean to say is that I would expect this case to
> > > already be caught by -fsanitize=bounds -- I don't see a reason to add an
> > > additional sanitizer option.
> > > 
> > > struct foo {
> > >   int count;
> > >   int array[] __counted_by(count);
> > > };
> > > 
> > >   foo->count = 5;
> > >   foo->array[0] = 1;  // ok
> > >   foo->array[10] = 1; // -fsanitize=bounds will catch this
> > >   foo->array[-10] = 1;// -fsanitize=bounds will catch this too
> > > 
> > > 
> > 
> > just checked this testing case with my GCC, and YES, -fsanitize=bounds 
> > indeed caught this error:
> > 
> > ttt_1.c:31:12: runtime error: index 10 out of bounds for type 'char [*]'
> > ttt_1.c:32:12: runtime error: index -10 out of bounds for type 'char [*]’
> > 
> 
> Yes, but I thought we were discussing the case where count is
> set to a negative value:
> 
> foo->count = -1;
> int x = foo->array[3]; // UBSan should diagnose this

Oh right, I keep thinking about it backwards.

Yeah, we can't trap the "count" assignment, because it may be getting used
for other purposes. But yeah, access to "array" should trap if "count"
is negative.

> And also the case when foo->array becomes too big.

How do you mean?

-- 
Kees Cook

Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Martin Uecker

Am Montag, dem 23.10.2023 um 12:52 -0700 schrieb Kees Cook:
> On Fri, Oct 20, 2023 at 09:54:05PM +0200, Martin Uecker wrote:
> > Am Freitag, dem 20.10.2023 um 18:48 + schrieb Qing Zhao:
> > > 
> > > > On Oct 20, 2023, at 2:34 PM, Kees Cook  wrote:
> > > > 
> > > > On Fri, Oct 20, 2023 at 11:50:11AM +0200, Martin Uecker wrote:
> > > > > Am Donnerstag, dem 19.10.2023 um 16:33 -0700 schrieb Kees Cook:
> > > > > > On Wed, Oct 18, 2023 at 09:11:43PM +, Qing Zhao wrote:
> > > > > > > As I replied to Martin in another email, I plan to do the 
> > > > > > > following to resolve this issue:
> > > > > > > 
> > > > > > > 1. No specification for signed or unsigned for counted_by field.
> > > > > > > 2. Add a sanitizer option -fsanitize=counted-by-bound to catch 
> > > > > > > the cases when the size of the counted-by is not positive.
> > > > > > 
> > > > > > I don't understand why this needs to be a runtime sanitizer. The
> > > > > > signedness is known at compile time, so I would expect a -W option.
> > > > > 
> > > > > The signedness of the type but not of the value.
> > > > > 
> > > > > But I would not want to have a warning for signed 
> > > > > counter  types by default because I would prefer
> > > > > to use signed types (for various reasons including
> > > > > better overflow detection).
> > > > > 
> > > > > > Or
> > > > > > do you mean you'd split up -fsanitize=bounds between unsigned and 
> > > > > > signed
> > > > > > indexes? I'd find that kind of awkward for the kernel... but I feel 
> > > > > > like
> > > > > > I've misunderstood something. :)
> > > > > > 
> > > > > > -Kees
> > > > > 
> > > > > The idea would be to detect at run-time the case
> > > > > if  x->buf  is used at a time where   x->counter 
> > > > > is negative and also when x->counter * sizeof(x->buf[0])
> > > > > overflows or is too big.
> > > > > 
> > > > > This would be similar to
> > > > > 
> > > > > int a[n];
> > > > > 
> > > > > where it is detected at run-time if n is not-positive.
> > > > 
> > > > Right. I guess what I mean to say is that I would expect this case to
> > > > already be caught by -fsanitize=bounds -- I don't see a reason to add an
> > > > additional sanitizer option.
> > > > 
> > > > struct foo {
> > > > int count;
> > > > int array[] __counted_by(count);
> > > > };
> > > > 
> > > > foo->count = 5;
> > > > foo->array[0] = 1;  // ok
> > > > foo->array[10] = 1; // -fsanitize=bounds will catch this
> > > > foo->array[-10] = 1;// -fsanitize=bounds will catch this too
> > > > 
> > > > 
> > > 
> > > just checked this testing case with my GCC, and YES, -fsanitize=bounds 
> > > indeed caught this error:
> > > 
> > > ttt_1.c:31:12: runtime error: index 10 out of bounds for type 'char [*]'
> > > ttt_1.c:32:12: runtime error: index -10 out of bounds for type 'char [*]’
> > > 
> > 
> > Yes, but I thought we were discussing the case where count is
> > set to a negative value:
> > 
> > foo->count = -1;
> > int x = foo->array[3]; // UBSan should diagnose this
> 
> Oh right, I keep thinking about it backwards.
> 
> Yeah, we can't trap the "count" assignment, because it may be getting used
> for other purposes. But yeah, access to "array" should trap if "count"
> is negative.
> 
> > And also the case when foo->array becomes too big.
> 
> How do you mean?

count * sizeof(member) could overflow or otherwise be
bigger than allowed.

Martin

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-10-23 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Friday, July 14, 2023 2:35 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> updates for early break.
> 
> On Thu, 13 Jul 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Thursday, July 13, 2023 6:31 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ;
> j...@ventanamicro.com
> > > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> > > updates for early break.
> > >
> > > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch updates the peeling code to maintain LCSSA during peeling.
> > > > The rewrite also naturally takes into account multiple exits and so it 
> > > > didn't
> > > > make sense to split them off.
> > > >
> > > > For the purposes of peeling the only change for multiple exits is that 
> > > > the
> > > > secondary exits are all wired to the start of the new loop preheader 
> > > > when
> > > doing
> > > > epilogue peeling.
> > > >
> > > > When doing prologue peeling the CFG is kept in tact.
> > > >
> > > > For both epilogue and prologue peeling we wire through between the
> two
> > > loops any
> > > > PHI nodes that escape the first loop into the second loop if flow_loops 
> > > > is
> > > > specified.  The reason for this conditionality is because
> > > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 
> > > > ways:
> > > >   - prologue peeling
> > > >   - epilogue peeling
> > > >   - loop distribution
> > > >
> > > > for the last case the loops should remain independent, and so not be
> > > connected.
> > > > Because of this propagation of only used phi nodes get_current_def can
> be
> > > used
> > > > to easily find the previous definitions.  However live statements that 
> > > > are
> > > > not used inside the loop itself are not propagated (since if unused, the
> > > moment
> > > > we add the guard in between the two loops the value across the bypass
> edge
> > > can
> > > > be wrong if the loop has been peeled.)
> > > >
> > > > This is dealt with easily enough in find_guard_arg.
> > > >
> > > > For multiple exits, while we are in LCSSA form, and have a correct DOM
> tree,
> > > the
> > > > moment we add the guard block we will change the dominators again.  To
> > > deal with
> > > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the
> blocks
> > > to
> > > > update without having to recompute the list of blocks to update again.
> > > >
> > > > When multiple exits and doing epilogue peeling we will also temporarily
> have
> > > an
> > > > incorrect VUSES chain for the secondary exits as it anticipates the 
> > > > final
> result
> > > > after the VDEFs have been moved.  This will thus be corrected once the
> code
> > > > motion is applied.
> > > >
> > > > Lastly by doing things this way we can remove the helper functions that
> > > > previously did lock step iterations to update things as it went along.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > > Not sure if I get through all of this in one go - so be prepared that
> > > the rest of the review follows another day.
> >
> > No worries, I appreciate the reviews!
> > Just giving some quick replies for when you continue.
> 
> Continueing.
> 
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * tree-loop-distribution.cc (copy_loop_before): Pass flow_loops 
> > > > =
> > > false.
> > > > * tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when 
> > > > exit==null.
> > > > * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> > > additional
> > > > assert.
> > > > (vect_set_loop_condition_normal): Skip modifying loop IV for 
> > > > multiple
> > > > exits.
> > > > (slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit
> > > peeling.
> > > > (slpeel_can_duplicate_loop_p): Likewise.
> > > > (vect_update_ivs_after_vectorizer): Don't enter this...
> > > > (vect_update_ivs_after_early_break): ...but instead enter here.
> > > > (find_guard_arg): Update for new peeling code.
> > > > (slpeel_update_phi_nodes_for_loops): Remove.
> > > > (slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0
> > > checks.
> > > > (slpeel_update_phi_nodes_for_lcssa): Remove.
> > > > (vect_do_peeling): Fix VF for multiple exits and force epilogue.
> > > > * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > > > non_break_control_flow and early_breaks.
> > > > (vect_need_peeling_or_partial_vectors_p): Force partial vector 
> > > > if
> > > > multiple exits and VLA.
> > > > (vect_analyze_loop_form): Support inner loop multiple exits.
>

Re: [PATCH v23 02/33] c-family, c++: Look up built-in traits via identifier node

2023-10-23 Thread Jason Merrill


On 10/20/23 09:53, Ken Matsui wrote:

Since RID_MAX soon reaches 255 and all built-in traits are used approximately
once in a C++ translation unit, this patch removes all RID values for built-in


These two lines are too long; please wrap at 75 columns so they don't go 
over 80 when git log adds 4 spaces at the beginning.



traits and uses the identifier node to look up the specific trait.  Rather
than holding traits as keywords, we set all trait identifiers as cik_trait,
which is a new cp_identifier_kind.  As cik_reserved_for_udlit was unused and
cp_identifier_kind is 3 bits, we replaced the unused field with the new
cik_trait.  Also, the later patch handles a subsequent token to the built-in
identifier so that we accept the use of non-function-like built-in trait
identifiers.

  /* True if this identifier is for any operator name (including
-   conversions).  Value 4, 5, 6 or 7.  */
+   conversions).  Value 4, 5, or 6.  */
  #define IDENTIFIER_ANY_OP_P(NODE) \
-  (IDENTIFIER_KIND_BIT_2 (NODE))
+  (IDENTIFIER_KIND_BIT_2 (NODE) && !IDENTIFIER_TRAIT_P (NODE))

...

+/* True if this identifier is the name of a built-in trait.  */
+#define IDENTIFIER_TRAIT_P(NODE)   \
+  (IDENTIFIER_KIND_BIT_0 (NODE)\
+   && IDENTIFIER_KIND_BIT_1 (NODE) \
+   && IDENTIFIER_KIND_BIT_2 (NODE))


The other macros use &, not &&; we might as well stay consistent with 
that pattern.


Jason

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-23 Thread Qing Zhao



> On Oct 23, 2023, at 3:37 PM, Martin Uecker  wrote:
> 
> Am Montag, dem 23.10.2023 um 19:00 + schrieb Qing Zhao:
>> 
>>> On Oct 23, 2023, at 2:31 PM, Martin Uecker  wrote:
>>> 
>>> Am Montag, dem 23.10.2023 um 20:06 +0200 schrieb Martin Uecker:
 Am Montag, dem 23.10.2023 um 16:37 + schrieb Qing Zhao:
> 
>> On Oct 23, 2023, at 11:57 AM, Richard Biener 
>>  wrote:
>> 
>> 
>> 
>>> Am 23.10.2023 um 16:56 schrieb Qing Zhao :
>>> 
>>> 
>>> 
 On Oct 23, 2023, at 3:57 AM, Richard Biener 
  wrote:
 
> On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao  
> wrote:
> 
> 
> 
>> On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar 
>>  wrote:
>> 
>> On 2023-10-20 14:38, Qing Zhao wrote:
>>> How about the following:
>>> Add one more parameter to __builtin_dynamic_object_size(), i.e
>>> __builtin_dynamic_object_size (_1,1,array_annotated->foo)?
>>> When we see the structure field has counted_by attribute.
>> 
>> Or maybe add a barrier preventing any assignments to 
>> array_annotated->foo from being reordered below the __bdos call? 
>> Basically an __asm__ with array_annotated->foo in the clobber list 
>> ought to do it I think.
> 
> Maybe just adding the array_annotated->foo to the use list of the 
> call to __builtin_dynamic_object_size should be enough?
> 
> But I am not sure how to implement this in the TREE level, is there a 
> USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the 
> counted_by field “array_annotated->foo” to the USE_LIST of the call 
> to __bdos?
> 
> This might be the simplest solution?
 
 If the dynamic object size is derived of a field then I think you need 
 to
 put the "load" of that memory location at the point (as argument)
 of the __bos call right at parsing time.  I know that's awkward because
 you try to play tricks "discovering" that field only late, but that's 
 not
 going to work.
>>> 
>>> Is it better to do this at gimplification phase instead of FE? 
>>> 
>>> VLA decls are handled in gimplification phase, the size calculation and 
>>> call to alloca are all generated during this phase. (gimplify_vla_decl).
>>> 
>>> For __bdos calls, we can add an additional argument if the object’s 
>>> first argument’s type include the counted_by attribute, i.e
>>> 
>>> ***During gimplification, 
>>> For a call to __builtin_dynamic_object_size (ptr, type)
>>> Check whether the type of ptr includes counted_by attribute, if so, 
>>> change the call to
>>> __builtin_dynamic_object_size (ptr, type, counted_by field)
>>> 
>>> Then the correct data dependence should be represented well in the IR.
>>> 
>>> **During object size phase,
>>> 
>>> The call to __builtin_dynamic_object_size will become an expression 
>>> includes the counted_by field or -1/0 when we cannot decide the size, 
>>> the correct data dependence will be kept even the call to 
>>> __builtin_dynamic_object_size is gone. 
>> 
>> But the whole point of the BOS pass is to derive information that is not 
>> available at parsing time, and that’s the cases you are after.  The case 
>> where the connection to the field with the length is apparent during 
>> parsing is easy - you simply insert a load of the value before the BOS 
>> call.
> 
> Yes, this is true. 
> I prefer to implement this in gimplification phase since I am more 
> familiar with the code there.. (I think that implementing it in 
> gimplification should be very similar as implementing it in FE? Or do I 
> miss anything here?)
> 
> Joseph, if implement this in FE, where in the FE I should look at? 
> 
 
 We should aim for a good integration with the BDOS pass, so
 that it can propagate the information further, e.g. the 
 following should work:
 
 struct { int L; char buf[] __counted_by(L) } x;
 x.L = N;
 x.buf = ...;
 char *p = &x->f;
 __bdos(p) -> N
 
 So we need to be smart on how we provide the size
 information for x->f to the backend. 
>>> 
>>> To follow up on this. I do not think we should change the
>>> builtin in the FE or gimplification. Instead, we want 
>>> to change the field access and compute the size there. 
>> Could you please clarify on this? What do you mean by
>> "change the field access and compute the size there”?
> 
> I think the FE should essentially give the
> type
> 
> char [buf.L]
> 
> to buf.x;
> 
> If the type (or its size) could be preserved
> at this point so that it can be later
> discovered by __bdos, then it could know 
> the size and propagate it further.

Currently, we already store t

1 2 >

1 - 100 of 174 matches

Mail list logo