date:20240129

[PATCH v5 4/5] LoongArch: Added support for loading __get_tls_addr symbol address using call36.

2024-01-29 Thread Lulu Cheng

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_call_tls_get_addr):
Add support for call36.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c: 
New test.
---
 gcc/config/loongarch/loongarch.cc | 22 ++-
 ...icit-relocs-medium-call36-auto-tls-ld-gd.c |  5 +
 2 files changed, 21 insertions(+), 6 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 684ae81870c..564de9c2642 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2807,17 +2807,27 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
 
case CMODEL_MEDIUM:
{
- rtx reg = gen_reg_rtx (Pmode);
  if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
{
- emit_insn (gen_pcalau12i (Pmode, reg, loongarch_tls_symbol));
- rtx call = gen_call_value_internal_1 (Pmode, v0, reg,
-   loongarch_tls_symbol,
-   const0_rtx);
- insn = emit_call_insn (call);
+ rtx call;
+
+if (HAVE_AS_SUPPORT_CALL36)
+  call = gen_call_value_internal (v0, loongarch_tls_symbol,
+  const0_rtx);
+else
+  {
+rtx reg = gen_reg_rtx (Pmode);
+emit_insn (gen_pcalau12i (Pmode, reg,
+  loongarch_tls_symbol));
+call = gen_call_value_internal_1 (Pmode, v0, reg,
+  loongarch_tls_symbol,
+  const0_rtx);
+  }
+insn = emit_call_insn (call);
}
  else
{
+ rtx reg = gen_reg_rtx (Pmode);
  emit_move_insn (reg, loongarch_tls_symbol);
  insn = emit_call_insn (gen_call_value_internal (v0,
  reg,
diff --git 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c
 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c
new file mode 100644
index 000..d1a4820834c
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fPIC -mexplicit-relocs=auto -mcmodel=medium -fplt" } */
+/* { dg-final { scan-assembler 
"pcaddu18i\t\\\$r1,%call36\\\(__tls_get_addr\\\)" { target { tls_native && 
loongarch_call36_support } } } } */
+
+#include "./explicit-relocs-auto-tls-ld-gd.c"
-- 
2.39.3

[PATCH v5 0/5] When cmodel=extreme, add macro implementation and fix problems with explicit relos implementation.

2024-01-29 Thread Lulu Cheng

When cmodel=extreme, since the symbol address is obtained through four 
instructions,
errors may occur in some cases during linking. Xi Ruoyao fixes this problem.

https://github.com/loongson/la-abi-specs/blob/release/laelf.adoc#extreme-code-model


v4 -> v5:
  1. Modify code format.
  2. Add the implementation patch submitted by Xi Ruoyao about 
'-mcmodel=extreme -mexplicit-relocs=always'.

v3 -> v4:
  1. Add macro support for TLS symbols
  2. Added support for loading __get_tls_addr symbol address using call36.
  3. Merge template got_load_tls_{ld/gd/le/ie}.
  4. Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.


v2 -> v3:
  1. Modify the detection rules of a test case.

v1 -> v2:
  1. Use the temporarily allocated registers as intermediate registers to 
implement the extreme macro.
  2. Fixed bugs in v1 test cases.



Lulu Cheng (4):
  LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.
  LoongArch: Add the macro implementation of mcmodel=extreme.
  LoongArch: Enable explicit reloc for extreme TLS GD/LD with
-mexplicit-relocs=auto.
  LoongArch: Added support for loading __get_tls_addr symbol address
using call36.

Xi Ruoyao (1):
  LoongArch: Don't split the instructions containing relocs for extreme
code model.

 gcc/config/loongarch/loongarch-protos.h   |   1 +
 gcc/config/loongarch/loongarch.cc | 265 ++
 gcc/config/loongarch/loongarch.md | 125 ++---
 gcc/config/loongarch/predicates.md|  12 +
 .../loongarch/cmodel-extreme-mi-thunk-1.C |  11 +
 .../loongarch/cmodel-extreme-mi-thunk-2.C |   6 +
 .../loongarch/cmodel-extreme-mi-thunk-3.C |   6 +
 .../gcc.target/loongarch/attr-model-5.c   |   8 +
 .../gcc.target/loongarch/cmodel-extreme-1.c   |  18 ++
 .../gcc.target/loongarch/cmodel-extreme-2.c   |   7 +
 .../explicit-relocs-extreme-auto-tls-ld-gd.c  |   5 +
 .../explicit-relocs-medium-auto-tls-ld-gd.c   |   5 +
 ...icit-relocs-medium-call36-auto-tls-ld-gd.c |   5 +
 .../loongarch/func-call-extreme-1.c   |  14 +-
 .../loongarch/func-call-extreme-2.c   |  29 +-
 .../loongarch/func-call-extreme-3.c   |   2 +-
 .../loongarch/func-call-extreme-4.c   |   2 +-
 .../loongarch/func-call-extreme-5.c   |   7 +
 .../loongarch/func-call-extreme-6.c   |   7 +
 .../gcc.target/loongarch/tls-extreme-macro.c  |  35 +++
 20 files changed, 375 insertions(+), 195 deletions(-)
 create mode 100644 
gcc/testsuite/g++.target/loongarch/cmodel-extreme-mi-thunk-1.C
 create mode 100644 
gcc/testsuite/g++.target/loongarch/cmodel-extreme-mi-thunk-2.C
 create mode 100644 
gcc/testsuite/g++.target/loongarch/cmodel-extreme-mi-thunk-3.C
 create mode 100644 gcc/testsuite/gcc.target/loongarch/attr-model-5.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/cmodel-extreme-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/cmodel-extreme-2.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-medium-auto-tls-ld-gd.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-5.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-6.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-extreme-macro.c

-- 
2.39.3

[PATCH v5 2/5] LoongArch: Add the macro implementation of mcmodel=extreme.

2024-01-29 Thread Lulu Cheng

gcc/ChangeLog:

* config/loongarch/loongarch-protos.h (loongarch_symbol_extreme_p):
Add function declaration.
* config/loongarch/loongarch.cc (loongarch_symbolic_constant_p):
For SYMBOL_PCREL64, non-zero addend of "la.local $rd,$rt,sym+addend"
is not allowed
(loongarch_load_tls): Added macro support in extreme mode.
(loongarch_call_tls_get_addr): Likewise.
(loongarch_legitimize_tls_address): Likewise.
(loongarch_force_address): Likewise.
(loongarch_legitimize_move): Likewise.
(loongarch_output_mi_thunk): Likewise.
(loongarch_option_override_internal): Remove the code that detects
explicit relocs status.
(loongarch_handle_model_attribute): Likewise.
* config/loongarch/loongarch.md (movdi_symbolic_off64): New template.
* config/loongarch/predicates.md (symbolic_off64_operand): New 
predicate.
(symbolic_off64_or_reg_operand): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/attr-model-5.c: New test.
* gcc.target/loongarch/func-call-extreme-5.c: New test.
* gcc.target/loongarch/func-call-extreme-6.c: New test.
* gcc.target/loongarch/tls-extreme-macro.c: New test.
---
 gcc/config/loongarch/loongarch-protos.h   |   1 +
 gcc/config/loongarch/loongarch.cc | 110 +++---
 gcc/config/loongarch/loongarch.md |  48 +++-
 gcc/config/loongarch/predicates.md|  12 ++
 .../gcc.target/loongarch/attr-model-5.c   |   8 ++
 .../loongarch/func-call-extreme-5.c   |   7 ++
 .../loongarch/func-call-extreme-6.c   |   7 ++
 .../gcc.target/loongarch/tls-extreme-macro.c  |  35 ++
 8 files changed, 184 insertions(+), 44 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/attr-model-5.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-5.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-6.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-extreme-macro.c

diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index 9ffc92afead..1fdfda9af01 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -222,4 +222,5 @@ extern rtx loongarch_build_signbit_mask (machine_mode, 
bool, bool);
 extern void loongarch_emit_swrsqrtsf (rtx, rtx, machine_mode, bool);
 extern void loongarch_emit_swdivsf (rtx, rtx, rtx, machine_mode);
 extern bool loongarch_explicit_relocs_p (enum loongarch_symbol_type);
+extern bool loongarch_symbol_extreme_p (enum loongarch_symbol_type);
 #endif /* ! GCC_LOONGARCH_PROTOS_H */
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 7b4edf1c1fd..a0c14f908a8 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1935,8 +1935,13 @@ loongarch_symbolic_constant_p (rtx x, enum 
loongarch_symbol_type *symbol_type)
  relocations.  */
   switch (*symbol_type)
 {
-case SYMBOL_PCREL:
 case SYMBOL_PCREL64:
+  /* When the code model is extreme, the non-zero offset situation
+has not been handled well, so it is disabled here now.  */
+  if (!loongarch_explicit_relocs_p (SYMBOL_PCREL64))
+   return false;
+/* fall through */
+case SYMBOL_PCREL:
   /* GAS rejects offsets outside the range [-2^31, 2^31-1].  */
   return sext_hwi (INTVAL (offset), 32) == INTVAL (offset);
 
@@ -2739,9 +2744,15 @@ static GTY (()) rtx loongarch_tls_symbol;
 /* Load an entry for a TLS access.  */
 
 static rtx
-loongarch_load_tls (rtx dest, rtx sym)
+loongarch_load_tls (rtx dest, rtx sym, enum loongarch_symbol_type type)
 {
-  return gen_load_tls (Pmode, dest, sym);
+  /* TLS LE gets a 32 or 64 bit offset here, so one register can do it.  */
+  if (type == SYMBOL_TLS_LE)
+return gen_load_tls (Pmode, dest, sym);
+
+  return loongarch_symbol_extreme_p (type)
+? gen_movdi_symbolic_off64 (dest, sym, gen_reg_rtx (DImode))
+: gen_load_tls (Pmode, dest, sym);
 }
 
 /* Return an instruction sequence that calls __tls_get_addr.  SYM is
@@ -2773,8 +2784,6 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
 
   if (TARGET_CMODEL_EXTREME)
{
- gcc_assert (TARGET_EXPLICIT_RELOCS);
-
  rtx tmp1 = gen_reg_rtx (Pmode);
  emit_insn (gen_tls_low (Pmode, tmp1, gen_rtx_REG (Pmode, 0), loc));
  emit_insn (gen_lui_h_lo20 (tmp1, tmp1, loc));
@@ -2785,7 +2794,7 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
emit_insn (gen_tls_low (Pmode, a0, high, loc));
 }
   else
-emit_insn (loongarch_load_tls (a0, loc));
+emit_insn (loongarch_load_tls (a0, loc, type));
 
   if (flag_plt)
 {
@@ -2852,22 +2861,28 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
 
case CMODEL_EXTREME:

[PATCH v5 3/5] LoongArch: Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.

2024-01-29 Thread Lulu Cheng

Binutils does not support relaxation using four instructions to obtain
symbol addresses

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
When the code model of the symbol is extreme and -mexplicit-relocs=auto,
the macro instruction loading symbol address is not applicable.
(loongarch_call_tls_get_addr): Adjust code.
(loongarch_legitimize_tls_address): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c: New 
test.
* gcc.target/loongarch/explicit-relocs-medium-auto-tls-ld-gd.c: New 
test.
---
 gcc/config/loongarch/loongarch.cc | 19 +--
 .../explicit-relocs-extreme-auto-tls-ld-gd.c  |  5 +
 .../explicit-relocs-medium-auto-tls-ld-gd.c   |  5 +
 3 files changed, 19 insertions(+), 10 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-medium-auto-tls-ld-gd.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index a0c14f908a8..684ae81870c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1971,6 +1971,10 @@ loongarch_explicit_relocs_p (enum loongarch_symbol_type 
type)
   if (la_opt_explicit_relocs != EXPLICIT_RELOCS_AUTO)
 return la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS;
 
+  /* The linker don't know how to relax accesses in extreme code model.  */
+  if (loongarch_symbol_extreme_p (type))
+return true;
+
   switch (type)
 {
   case SYMBOL_TLS_IE:
@@ -1982,11 +1986,6 @@ loongarch_explicit_relocs_p (enum loongarch_symbol_type 
type)
   does not relax 64-bit pc-relative accesses as at now.  */
return true;
   case SYMBOL_GOT_DISP:
-   /* The linker don't know how to relax GOT accesses in extreme
-  code model.  */
-   if (TARGET_CMODEL_EXTREME)
- return true;
-
/* If we are performing LTO for a final link, and we have the
   linker plugin so we know the resolution of the symbols, then
   all GOT references are binding to external symbols or
@@ -2776,7 +2775,7 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
 
   start_sequence ();
 
-  if (la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS)
+  if (loongarch_explicit_relocs_p (type))
 {
   /* Split tls symbol to high and low.  */
   rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc));
@@ -2809,7 +2808,7 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
case CMODEL_MEDIUM:
{
  rtx reg = gen_reg_rtx (Pmode);
- if (TARGET_EXPLICIT_RELOCS)
+ if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
{
  emit_insn (gen_pcalau12i (Pmode, reg, loongarch_tls_symbol));
  rtx call = gen_call_value_internal_1 (Pmode, v0, reg,
@@ -2845,7 +2844,7 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
case CMODEL_NORMAL:
case CMODEL_MEDIUM:
{
- if (TARGET_EXPLICIT_RELOCS)
+ if (loongarch_explicit_relocs_p (SYMBOL_GOT_DISP))
{
  rtx high = gen_reg_rtx (Pmode);
  loongarch_emit_move (high,
@@ -2939,7 +2938,7 @@ loongarch_legitimize_tls_address (rtx loc)
  tmp1 = gen_reg_rtx (Pmode);
  tmp2 = loongarch_unspec_address (loc, SYMBOL_TLS_IE);
  dest = gen_reg_rtx (Pmode);
- if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
+ if (loongarch_explicit_relocs_p (SYMBOL_TLS_IE))
{
  tmp3 = gen_reg_rtx (Pmode);
  rtx high = gen_rtx_HIGH (Pmode, copy_rtx (tmp2));
@@ -2996,7 +2995,7 @@ loongarch_legitimize_tls_address (rtx loc)
  tmp2 = loongarch_unspec_address (loc, SYMBOL_TLS_LE);
  dest = gen_reg_rtx (Pmode);
 
- if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
+ if (loongarch_explicit_relocs_p (SYMBOL_TLS_LE))
{
  tmp3 = gen_reg_rtx (Pmode);
  rtx high = gen_rtx_HIGH (Pmode, copy_rtx (tmp2));
diff --git 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
new file mode 100644
index 000..35bd4570a9e
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fPIC -mexplicit-relocs=auto -mcmodel=extreme -fno-plt" } 
*/
+/* { dg-final { scan-assembler-not "la.tls.\[lg\]d" { target tls_native } } } 
*/
+
+#include "./explicit-relocs-auto-tls-ld-gd.c"
diff --git 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-medium-auto-tls-ld-gd.c 
b/gcc/testsuite/gcc.target/loongarch/explicit-r

[PATCH v5 5/5] LoongArch: Don't split the instructions containing relocs for extreme code model.

2024-01-29 Thread Lulu Cheng

From: Xi Ruoyao 

The ABI mandates the pcalau12i/addi.d/lu32i.d/lu52i.d instructions for
addressing a symbol to be adjacent.  So model them as "one large
instruction", i.e. define_insn, with two output registers.  The real
address is the sum of these two registers.

The advantage of this approach is the RTL passes can still use ldx/stx
instructions to skip an addi.d instruction.

gcc/ChangeLog:

* config/loongarch/loongarch.md (unspec): Add
UNSPEC_LA_PCREL_64_PART1 and UNSPEC_LA_PCREL_64_PART2.
(la_pcrel64_two_parts): New define_insn.
* config/loongarch/loongarch.cc (loongarch_tls_symbol): Fix a
typo in the comment.
(loongarch_call_tls_get_addr): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, use la_pcrel64_two_parts for
addressing the TLS symbol and __tls_get_addr.  Emit an REG_EQUAL
note to allow CSE addressing __tls_get_addr.
(loongarch_legitimize_tls_address): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, address TLS IE symbols with
la_pcrel64_two_parts.
(loongarch_split_symbol): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, address symbols with
la_pcrel64_two_parts.
(loongarch_output_mi_thunk): Clean up unreachable code.  If
-mcmodel=extreme -mexplicit-relocs={always,auto}, address the MI
thunks with la_pcrel64_two_parts.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/func-call-extreme-1.c (dg-options):
Use -O2 instead of -O0 to ensure the pcalau12i/addi/lu32i/lu52i
instruction sequences are not reordered by the compiler.
(NOIPA): Disallow interprocedural optimizations.
* gcc.target/loongarch/func-call-extreme-2.c: Remove the content
duplicated from func-call-extreme-1.c, include it instead.
(dg-options): Likewise.
* gcc.target/loongarch/func-call-extreme-3.c (dg-options):
Likewise.
* gcc.target/loongarch/func-call-extreme-4.c (dg-options):
Likewise.
* gcc.target/loongarch/cmodel-extreme-1.c: New test.
* gcc.target/loongarch/cmodel-extreme-2.c: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-1.C: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-2.C: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-3.C: New test.
---
 gcc/config/loongarch/loongarch.cc | 131 ++
 gcc/config/loongarch/loongarch.md |  20 +++
 .../loongarch/cmodel-extreme-mi-thunk-1.C |  11 ++
 .../loongarch/cmodel-extreme-mi-thunk-2.C |   6 +
 .../loongarch/cmodel-extreme-mi-thunk-3.C |   6 +
 .../gcc.target/loongarch/cmodel-extreme-1.c   |  18 +++
 .../gcc.target/loongarch/cmodel-extreme-2.c   |   7 +
 .../loongarch/func-call-extreme-1.c   |  14 +-
 .../loongarch/func-call-extreme-2.c   |  29 +---
 .../loongarch/func-call-extreme-3.c   |   2 +-
 .../loongarch/func-call-extreme-4.c   |   2 +-
 11 files changed, 154 insertions(+), 92 deletions(-)
 create mode 100644 
gcc/testsuite/g++.target/loongarch/cmodel-extreme-mi-thunk-1.C
 create mode 100644 
gcc/testsuite/g++.target/loongarch/cmodel-extreme-mi-thunk-2.C
 create mode 100644 
gcc/testsuite/g++.target/loongarch/cmodel-extreme-mi-thunk-3.C
 create mode 100644 gcc/testsuite/gcc.target/loongarch/cmodel-extreme-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/cmodel-extreme-2.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 564de9c2642..89dd33553da 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2737,7 +2737,7 @@ loongarch_add_offset (rtx temp, rtx reg, HOST_WIDE_INT 
offset)
   return plus_constant (Pmode, reg, offset);
 }
 
-/* The __tls_get_attr symbol.  */
+/* The __tls_get_addr symbol.  */
 static GTY (()) rtx loongarch_tls_symbol;
 
 /* Load an entry for a TLS access.  */
@@ -2777,20 +2777,22 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
 
   if (loongarch_explicit_relocs_p (type))
 {
-  /* Split tls symbol to high and low.  */
-  rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc));
-  high = loongarch_force_temporary (tmp, high);
-
   if (TARGET_CMODEL_EXTREME)
{
- rtx tmp1 = gen_reg_rtx (Pmode);
- emit_insn (gen_tls_low (Pmode, tmp1, gen_rtx_REG (Pmode, 0), loc));
- emit_insn (gen_lui_h_lo20 (tmp1, tmp1, loc));
- emit_insn (gen_lui_h_hi12 (tmp1, tmp1, loc));
- emit_move_insn (a0, gen_rtx_PLUS (Pmode, high, tmp1));
+ rtx part1 = gen_reg_rtx (Pmode);
+ rtx part2 = gen_reg_rtx (Pmode);
+
+ emit_insn (gen_la_pcrel64_two_parts (part1, part2, loc));
+ emit_move_insn (a0, gen_rtx_PLUS (Pmode, part1, part2));
}
   else
-   emit_insn (gen_tls_low (Pmode, a0, high, loc));
+   {
+ /* Split tls symbol to high and low.  */
+ rtx high = g

[PATCH] tree-ssa-strlen: Fix pdata->maxlen computation [PR110603]

2024-01-29 Thread Jakub Jelinek

Hi!

On the following testcase we emit an invalid range of [2, 1] due to
UB in the source.  Older VRP code silently swapped the boundaries and
made [1, 2] range out of it, but newer code just ICEs on it.

The reason for pdata->minlen 2 is that we see a memcpy in this case
setting both elements of the array to non-zero value, so strlen (a)
can't be smaller than 2.  The reason for pdata->maxlen 1 is that in
char a[2] array without UB there can be at most 1 non-zero character
because there needs to be '\0' termination in the buffer too.

IMHO we shouldn't create invalid ranges like that and even creating
for that case a range [1, 2] looks wrong to me, so the following patch
just doesn't set maxlen in that case to the array size - 1, matching
what will really happen at runtime when triggering such UB (strlen will
be at least 2, perhaps more or will crash).
This is what the second hunk of the patch does.

The first hunk fixes a fortunately harmless thinko.
If the strlen pass knows the string length (i.e. get_string_length
function returns non-NULL), we take a different path, we get to this
only if all we know is that there are certain number of non-zero
characters but we don't know what it is followed with, whether further
non-zero characters or zero termination or either of that.
If we know exactly how many non-zero characters it is, such as
char a[42];
...
  memcpy (a, "01234567890123456789", 20);
then we take an earlier if for the INTEGER_CST case and set correctly
just pdata->minlen to 20 in that case, but if we have something like
  int len;
  ...
  if (len < 15 || len > 32) return;
  memcpy (a, "0123456789012345678901234567890123456789", len);
then we have [15, 32] range for the nonzero_chars and we set pdata->minlen
correctly to 15, but incorrectly set also pdata->maxlen to 32.  That is
not what the above implies, it just means that in some cases we know that
there are at least 32 non-zero characters, followed by something we don't
know.  There is no guarantee that there is '\0' right after it, so it
means nothing.
The reason this is harmless, just confusing, is that the code a few lines
later fortunately overwrites this incorrect pdata->maxlen value with
something different (either array length - 1 or all ones etc.).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-29  Jakub Jelinek  

PR tree-optimization/110603
* tree-ssa-strlen.cc (get_range_strlen_dynamic): Remove incorrect
setting of pdata->maxlen to vr.upper_bound (which is unconditionally
overwritten anyway).  Avoid creating invalid range with minlen
larger than maxlen.  Formatting fix.

* gcc.c-torture/compile/pr110603.c: New test.

--- gcc/tree-ssa-strlen.cc.jj   2024-01-03 11:51:32.664715465 +0100
+++ gcc/tree-ssa-strlen.cc  2024-01-27 13:32:25.506401969 +0100
@@ -1228,7 +1228,6 @@ get_range_strlen_dynamic (tree src, gimp
{
  tree type = vr.type ();
  pdata->minlen = wide_int_to_tree (type, vr.lower_bound ());
- pdata->maxlen = wide_int_to_tree (type, vr.upper_bound ());
}
}
  else
@@ -1253,9 +1252,21 @@ get_range_strlen_dynamic (tree src, gimp
{
  ++off;   /* Increment for the terminating nul.  */
  tree toffset = build_int_cst (size_type_node, off);
- pdata->maxlen = fold_build2 (MINUS_EXPR, size_type_node, size,
-  toffset);
- pdata->maxbound = pdata->maxlen;
+ pdata->maxlen = fold_build2 (MINUS_EXPR, size_type_node,
+  size, toffset);
+ if (tree_int_cst_lt (pdata->maxlen, pdata->minlen))
+   /* This can happen when triggering UB, when base is an
+  array which is known to be filled with at least size
+  non-zero bytes.  E.g. for
+  char a[2]; memcpy (a, "12", sizeof a);
+  We don't want to create an invalid range [2, 1]
+  where 2 comes from the number of non-zero bytes and
+  1 from longest valid zero-terminated string that can
+  be stored in such an array, so pick just one of
+  those, pdata->minlen.  See PR110603.  */
+   pdata->maxlen = build_all_ones_cst (size_type_node);
+ else
+   pdata->maxbound = pdata->maxlen;
}
  else  
pdata->maxlen = build_all_ones_cst (size_type_node);
--- gcc/testsuite/gcc.c-torture/compile/pr110603.c.jj   2024-01-27 
13:37:29.375194755 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr110603.c  2024-01-27 
13:37:03.104558479 +0100
@@ -0,0 +1,16 @@
+/* PR tree-optimization/110603 */
+
+typedef __SIZE_TYPE__ size_t;
+void *memcpy (void *, const void *, size_t);
+in

[PATCH v5 1/5] LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.

2024-01-29 Thread Lulu Cheng

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_load_tls):
Load all types of tls symbols through one function.
(loongarch_got_load_tls_gd): Delete.
(loongarch_got_load_tls_ld): Delete.
(loongarch_got_load_tls_ie): Delete.
(loongarch_got_load_tls_le): Delete.
(loongarch_call_tls_get_addr): Modify the called function name.
(loongarch_legitimize_tls_address): Likewise.
* config/loongarch/loongarch.md (@got_load_tls_gd): Delete.
(@load_tls): New template.
(@got_load_tls_ld): Delete.
(@got_load_tls_le): Delete.
(@got_load_tls_ie): Delete.
---
 gcc/config/loongarch/loongarch.cc | 47 +---
 gcc/config/loongarch/loongarch.md | 59 ---
 2 files changed, 30 insertions(+), 76 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index b494040d165..7b4edf1c1fd 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2736,36 +2736,12 @@ loongarch_add_offset (rtx temp, rtx reg, HOST_WIDE_INT 
offset)
 /* The __tls_get_attr symbol.  */
 static GTY (()) rtx loongarch_tls_symbol;
 
-/* Load an entry from the GOT for a TLS GD access.  */
+/* Load an entry for a TLS access.  */
 
 static rtx
-loongarch_got_load_tls_gd (rtx dest, rtx sym)
+loongarch_load_tls (rtx dest, rtx sym)
 {
-  return gen_got_load_tls_gd (Pmode, dest, sym);
-}
-
-/* Load an entry from the GOT for a TLS LD access.  */
-
-static rtx
-loongarch_got_load_tls_ld (rtx dest, rtx sym)
-{
-  return gen_got_load_tls_ld (Pmode, dest, sym);
-}
-
-/* Load an entry from the GOT for a TLS IE access.  */
-
-static rtx
-loongarch_got_load_tls_ie (rtx dest, rtx sym)
-{
-  return gen_got_load_tls_ie (Pmode, dest, sym);
-}
-
-/* Add in the thread pointer for a TLS LE access.  */
-
-static rtx
-loongarch_got_load_tls_le (rtx dest, rtx sym)
-{
-  return gen_got_load_tls_le (Pmode, dest, sym);
+  return gen_load_tls (Pmode, dest, sym);
 }
 
 /* Return an instruction sequence that calls __tls_get_addr.  SYM is
@@ -2809,14 +2785,7 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
emit_insn (gen_tls_low (Pmode, a0, high, loc));
 }
   else
-{
-  if (type == SYMBOL_TLSLDM)
-   emit_insn (loongarch_got_load_tls_ld (a0, loc));
-  else if (type == SYMBOL_TLSGD)
-   emit_insn (loongarch_got_load_tls_gd (a0, loc));
-  else
-   gcc_unreachable ();
-}
+emit_insn (loongarch_load_tls (a0, loc));
 
   if (flag_plt)
 {
@@ -2953,10 +2922,10 @@ loongarch_legitimize_tls_address (rtx loc)
  /* la.tls.ie; tp-relative add.  */
  tp = gen_rtx_REG (Pmode, THREAD_POINTER_REGNUM);
  tmp1 = gen_reg_rtx (Pmode);
+ tmp2 = loongarch_unspec_address (loc, SYMBOL_TLS_IE);
  dest = gen_reg_rtx (Pmode);
  if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
{
- tmp2 = loongarch_unspec_address (loc, SYMBOL_TLS_IE);
  tmp3 = gen_reg_rtx (Pmode);
  rtx high = gen_rtx_HIGH (Pmode, copy_rtx (tmp2));
  high = loongarch_force_temporary (tmp3, high);
@@ -2979,7 +2948,7 @@ loongarch_legitimize_tls_address (rtx loc)
emit_insn (gen_ld_from_got (Pmode, tmp1, high, tmp2));
}
  else
-   emit_insn (loongarch_got_load_tls_ie (tmp1, loc));
+   emit_insn (loongarch_load_tls (tmp1, tmp2));
  emit_insn (gen_add3_insn (dest, tmp1, tp));
}
   break;
@@ -3011,11 +2980,11 @@ loongarch_legitimize_tls_address (rtx loc)
 
  tp = gen_rtx_REG (Pmode, THREAD_POINTER_REGNUM);
  tmp1 = gen_reg_rtx (Pmode);
+ tmp2 = loongarch_unspec_address (loc, SYMBOL_TLS_LE);
  dest = gen_reg_rtx (Pmode);
 
  if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
{
- tmp2 = loongarch_unspec_address (loc, SYMBOL_TLS_LE);
  tmp3 = gen_reg_rtx (Pmode);
  rtx high = gen_rtx_HIGH (Pmode, copy_rtx (tmp2));
  high = loongarch_force_temporary (tmp3, high);
@@ -3043,7 +3012,7 @@ loongarch_legitimize_tls_address (rtx loc)
}
}
  else
-   emit_insn (loongarch_got_load_tls_le (tmp1, loc));
+   emit_insn (loongarch_load_tls (tmp1, tmp2));
  emit_insn (gen_add3_insn (dest, tmp1, tp));
}
   break;
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index dda3cdf8be5..231c6568c85 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -51,10 +51,7 @@ (define_c_enum "unspec" [
   UNSPEC_BITREV_8B
 
   ;; TLS
-  UNSPEC_TLS_GD
-  UNSPEC_TLS_LD
-  UNSPEC_TLS_LE
-  UNSPEC_TLS_IE
+  UNSPEC_TLS
 
   ;; Stack tie
   UNSPEC_TIE
@@ -2701,45 +2698,33 @@ (define_insn "store_word"
 
 ;; Thread-Local Storage
 
-(define_insn "@got_load_tls_gd"
+(define_insn "@load

Re: [PATCH] tree-ssa-strlen: Fix pdata->maxlen computation [PR110603]

2024-01-29 Thread Richard Biener

On Mon, 29 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> On the following testcase we emit an invalid range of [2, 1] due to
> UB in the source.  Older VRP code silently swapped the boundaries and
> made [1, 2] range out of it, but newer code just ICEs on it.
> 
> The reason for pdata->minlen 2 is that we see a memcpy in this case
> setting both elements of the array to non-zero value, so strlen (a)
> can't be smaller than 2.  The reason for pdata->maxlen 1 is that in
> char a[2] array without UB there can be at most 1 non-zero character
> because there needs to be '\0' termination in the buffer too.
> 
> IMHO we shouldn't create invalid ranges like that and even creating
> for that case a range [1, 2] looks wrong to me, so the following patch
> just doesn't set maxlen in that case to the array size - 1, matching
> what will really happen at runtime when triggering such UB (strlen will
> be at least 2, perhaps more or will crash).
> This is what the second hunk of the patch does.
> 
> The first hunk fixes a fortunately harmless thinko.
> If the strlen pass knows the string length (i.e. get_string_length
> function returns non-NULL), we take a different path, we get to this
> only if all we know is that there are certain number of non-zero
> characters but we don't know what it is followed with, whether further
> non-zero characters or zero termination or either of that.
> If we know exactly how many non-zero characters it is, such as
> char a[42];
> ...
>   memcpy (a, "01234567890123456789", 20);
> then we take an earlier if for the INTEGER_CST case and set correctly
> just pdata->minlen to 20 in that case, but if we have something like
>   int len;
>   ...
>   if (len < 15 || len > 32) return;
>   memcpy (a, "0123456789012345678901234567890123456789", len);
> then we have [15, 32] range for the nonzero_chars and we set pdata->minlen
> correctly to 15, but incorrectly set also pdata->maxlen to 32.  That is
> not what the above implies, it just means that in some cases we know that
> there are at least 32 non-zero characters, followed by something we don't
> know.  There is no guarantee that there is '\0' right after it, so it
> means nothing.
> The reason this is harmless, just confusing, is that the code a few lines
> later fortunately overwrites this incorrect pdata->maxlen value with
> something different (either array length - 1 or all ones etc.).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK

Richard.

> 2024-01-29  Jakub Jelinek  
> 
>   PR tree-optimization/110603
>   * tree-ssa-strlen.cc (get_range_strlen_dynamic): Remove incorrect
>   setting of pdata->maxlen to vr.upper_bound (which is unconditionally
>   overwritten anyway).  Avoid creating invalid range with minlen
>   larger than maxlen.  Formatting fix.
> 
>   * gcc.c-torture/compile/pr110603.c: New test.
> 
> --- gcc/tree-ssa-strlen.cc.jj 2024-01-03 11:51:32.664715465 +0100
> +++ gcc/tree-ssa-strlen.cc2024-01-27 13:32:25.506401969 +0100
> @@ -1228,7 +1228,6 @@ get_range_strlen_dynamic (tree src, gimp
>   {
> tree type = vr.type ();
> pdata->minlen = wide_int_to_tree (type, vr.lower_bound ());
> -   pdata->maxlen = wide_int_to_tree (type, vr.upper_bound ());
>   }
>   }
> else
> @@ -1253,9 +1252,21 @@ get_range_strlen_dynamic (tree src, gimp
>   {
> ++off;   /* Increment for the terminating nul.  */
> tree toffset = build_int_cst (size_type_node, off);
> -   pdata->maxlen = fold_build2 (MINUS_EXPR, size_type_node, size,
> -toffset);
> -   pdata->maxbound = pdata->maxlen;
> +   pdata->maxlen = fold_build2 (MINUS_EXPR, size_type_node,
> +size, toffset);
> +   if (tree_int_cst_lt (pdata->maxlen, pdata->minlen))
> + /* This can happen when triggering UB, when base is an
> +array which is known to be filled with at least size
> +non-zero bytes.  E.g. for
> +char a[2]; memcpy (a, "12", sizeof a);
> +We don't want to create an invalid range [2, 1]
> +where 2 comes from the number of non-zero bytes and
> +1 from longest valid zero-terminated string that can
> +be stored in such an array, so pick just one of
> +those, pdata->minlen.  See PR110603.  */
> + pdata->maxlen = build_all_ones_cst (size_type_node);
> +   else
> + pdata->maxbound = pdata->maxlen;
>   }
> else  
>   pdata->maxlen = build_all_ones_cst (size_type_node);
> --- gcc/testsuite/gcc.c-torture/compile/pr110603.c.jj 2024-01-27 
> 13:37:29.375194755 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr110603.c2024-01-

[PATCH] aarch64: Ensure iterator validity when updating debug uses [PR113616]

2024-01-29 Thread Alex Coplan

Hi,

The fix for PR113089 introduced range-based for loops over the
debug_insn_uses of an RTL-SSA set_info, but in the case that we reset a
debug insn, the use would get removed from the use list, and thus we
would end up using an invalidated iterator in the next iteration of the
loop.  In practice this means we end up terminating the loop
prematurely, and hence ICE as in PR113089 since there are debug uses
that we failed to fix up.

This patch fixes that by introducing a general mechanism to avoid this
sort of problem.  We introduce a safe_iterator to iterator-utils.h which
wraps an iterator, and also holds the end iterator value.  It then
pre-computes the next iterator value at all iterations, so it doesn't
matter if the original iterator got invalidated during the loop body, we
can still move safely to the next iteration.

We introduce an iterate_safely helper which effectively adapts a
container such as iterator_range into a container of safe_iterators over
the original iterator type.

We then use iterate_safely around all loops over debug_insn_uses () in
the aarch64 ldp/stp pass to fix PR113616.  While doing this, I
remembered that cleanup_tombstones () had the same problem.  I
previously worked around this locally by manually maintaining the next
nondebug insn, so this patch also refactors that loop to use the new
iterate_safely helper.

While doing that I noticed that a couple of cases in cleanup_tombstones
could be converted from using dyn_cast to as_a,
which should be safe because there are no clobbers of mem in RTL-SSA, so
all defs of memory should be set_infos.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

PR target/113616
* config/aarch64/aarch64-ldp-fusion.cc (fixup_debug_uses_trailing_add):
Use iterate_safely when iterating over debug uses.
(fixup_debug_uses): Likewise.
(ldp_bb_info::cleanup_tombstones): Use iterate_safely to iterate
over nondebug insns instead of manually maintaining the next insn.
* iterator-utils.h (class safe_iterator): New.
(iterate_safely): New.

gcc/testsuite/ChangeLog:

PR target/113616
* gcc.c-torture/compile/pr113616.c: New test.
diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 932a6398ae3..22ed95eb743 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -1480,7 +1480,7 @@ fixup_debug_uses_trailing_add (obstack_watermark &attempt,
   def_info *def = defs[0];
 
   if (auto set = safe_dyn_cast (def->prev_def ()))
-for (auto use : set->debug_insn_uses ())
+for (auto use : iterate_safely (set->debug_insn_uses ()))
   if (*use->insn () > *pair_dst)
// DEF is getting re-ordered above USE, fix up USE accordingly.
fixup_debug_use (attempt, use, def, base, wb_offset);
@@ -1544,13 +1544,16 @@ fixup_debug_uses (obstack_watermark &attempt,
   auto def = memory_access (insns[0]->defs ());
   auto last_def = memory_access (insns[1]->defs ());
   for (; def != last_def; def = def->next_def ())
-   for (auto use : as_a (def)->debug_insn_uses ())
- {
-   if (dump_file)
- fprintf (dump_file, "  i%d: resetting debug use of mem\n",
-  use->insn ()->uid ());
-   reset_debug_use (use);
- }
+   {
+ auto set = as_a (def);
+ for (auto use : iterate_safely (set->debug_insn_uses ()))
+   {
+ if (dump_file)
+   fprintf (dump_file, "  i%d: resetting debug use of mem\n",
+use->insn ()->uid ());
+ reset_debug_use (use);
+   }
+   }
 }
 
   // Now let's take care of register uses, starting with debug uses
@@ -1577,7 +1580,7 @@ fixup_debug_uses (obstack_watermark &attempt,
 
   // Now that we've characterized the defs involved, go through the
   // debug uses and determine how to update them (if needed).
-  for (auto use : set->debug_insn_uses ())
+  for (auto use : iterate_safely (set->debug_insn_uses ()))
{
  if (*pair_dst < *use->insn () && defs[1])
// We're re-ordering defs[1] above a previous use of the
@@ -1609,7 +1612,7 @@ fixup_debug_uses (obstack_watermark &attempt,
 
   // We have a def in insns[1] which isn't def'd by the first insn.
   // Look to the previous def and see if it has any debug uses.
-  for (auto use : prev_set->debug_insn_uses ())
+  for (auto use : iterate_safely (prev_set->debug_insn_uses ()))
if (*pair_dst < *use->insn ())
  // We're ordering DEF above a previous use of the same register.
  update_debug_use (use, def, writeback_pat);
@@ -1622,7 +1625,8 @@ fixup_debug_uses (obstack_watermark &attempt,
   // second writeback def which need re-parenting: do that.
   auto def = find_access (insns[1]->defs (), base_regno);
   gcc_assert (def);
-

[PATCH] RISC-V: THEAD: Fix improper immediate value for MODIFY_DISP instruction on 32-bit systems.

2024-01-29 Thread Jin Ma

When using  '%ld' to print 'long long int' variable, 'fprintf' will
produce messy output on a 32-bit system, in an incorrect instruction
being generated, such as 'th.lwib a1,(a0),-16,4294967295'. And the
following error occurred during compilation:

Assembler messages:
Error: improper immediate value (18446744073709551615)

gcc/ChangeLog:

* config/riscv/thead.cc (th_print_operand_address): Change %ld
to %lld.
---
 gcc/config/riscv/thead.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index 2955bc5f8a9..9ee6444b627 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -1141,7 +1141,7 @@ th_print_operand_address (FILE *file, machine_mode mode, 
rtx x)
   return true;
 
 case ADDRESS_REG_WB:
-  fprintf (file, "(%s),%ld,%u", reg_names[REGNO (addr.reg)],
+  fprintf (file, "(%s),%lld,%u", reg_names[REGNO (addr.reg)],
   INTVAL (addr.offset) >> addr.shift, addr.shift);
return true;
 
-- 
2.17.1

Re: [PATCH] RISC-V: THEAD: Fix improper immediate value for MODIFY_DISP instruction on 32-bit systems.

2024-01-29 Thread Andrew Pinski

On Mon, Jan 29, 2024 at 1:21 AM Jin Ma  wrote:
>
> When using  '%ld' to print 'long long int' variable, 'fprintf' will
> produce messy output on a 32-bit system, in an incorrect instruction
> being generated, such as 'th.lwib a1,(a0),-16,4294967295'. And the
> following error occurred during compilation:
>
> Assembler messages:
> Error: improper immediate value (18446744073709551615)
>
> gcc/ChangeLog:
>
> * config/riscv/thead.cc (th_print_operand_address): Change %ld
> to %lld.
> ---
>  gcc/config/riscv/thead.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
> index 2955bc5f8a9..9ee6444b627 100644
> --- a/gcc/config/riscv/thead.cc
> +++ b/gcc/config/riscv/thead.cc
> @@ -1141,7 +1141,7 @@ th_print_operand_address (FILE *file, machine_mode 
> mode, rtx x)
>return true;
>
>  case ADDRESS_REG_WB:
> -  fprintf (file, "(%s),%ld,%u", reg_names[REGNO (addr.reg)],
> +  fprintf (file, "(%s),%lld,%u", reg_names[REGNO (addr.reg)],
>INTVAL (addr.offset) >> addr.shift, addr.shift);


This is wrong, you should instead use HOST_WIDE_INT_PRINT_DEC or
HOST_WIDE_INT_PRINT_UNSIGNED.

Thanks,
Andrew Pinski

> return true;
>
> --
> 2.17.1
>

Re:[PATCH] RISC-V: THEAD: Fix improper immediate value for MODIFY_DISP instruction on 32-bit systems.

2024-01-29 Thread Jin Ma

>On Mon, Jan 29, 2024 at 1:21=E2=80=AFAM Jin Ma  wr=
>ote:
>>
>> When using  '%ld' to print 'long long int' variable, 'fprintf' will
>> produce messy output on a 32-bit system, in an incorrect instruction
>> being generated, such as 'th.lwib a1,(a0),-16,4294967295'. And the
>> following error occurred during compilation:
>>
>> Assembler messages:
>> Error: improper immediate value (18446744073709551615)
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/thead.cc (th_print_operand_address): Change %ld
>> to %lld.
>> ---
>>  gcc/config/riscv/thead.cc | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
>> index 2955bc5f8a9..9ee6444b627 100644
>> --- a/gcc/config/riscv/thead.cc
>> +++ b/gcc/config/riscv/thead.cc
>> @@ -1141,7 +1141,7 @@ th_print_operand_address (FILE *file, machine_mode =
>mode, rtx x)
>>return true;
>>
>>  case ADDRESS_REG_WB:
>> -  fprintf (file, "(%s),%ld,%u", reg_names[REGNO (addr.reg)],
>> +  fprintf (file, "(%s),%lld,%u", reg_names[REGNO (addr.reg)],
>>INTVAL (addr.offset) >> addr.shift, addr.shift);


>This is wrong, you should instead use HOST_WIDE_INT_PRINT_DEC or
>HOST_WIDE_INT_PRINT_UNSIGNED.

>Thanks,
>Andrew Pinski

Yes, thank you very much for your guidance. It will be better to
use HOST_WIDE_INT_PRINT_DEC. I will make changes later :)

BR

Jin

>> return true;

Re: [patch][v2] gcn/mkoffload.cc: Fix SRAM_ECC and XNACK handling [PR111966]

2024-01-29 Thread Andrew Stubbs


On 25/01/2024 15:11, Tobias Burnus wrote:

Updated patch enclosed.

Tobias Burnus wrote:
I have now run the attached script and the result running yesterday's 
build with both my patch and your patch applied.


(And the now committed gcn-hsa.h patch)

Now the result with the testscript is:

* fiji, gfx1030, gfx1100 work, except for "error: '-mxnack=on' is 
incompatible with ..."
(and link errors for fiji as libgomp is not build, which makes the 
testing a tad less reliable but should be fine).


* (default)/gfx900/gfx906/gfx908: Works, except for -mxnack=on/any due 
to .target / -mattr= mismatch


* gfx90a: simply works

OK for mainline?

Tobias

PS: For the test script, see previous email in the thread; for the 
output of that script, see attachment.


PPS: I hope I got everything right.


OK.

Andrew

[PATCH v2] RISC-V: THEAD: Fix improper immediate value for MODIFY_DISP instruction on 32-bit systems.

2024-01-29 Thread Jin Ma

When using  '%ld' to print 'long long int' variable, 'fprintf' will
produce messy output on a 32-bit system, in an incorrect instruction
being generated, such as 'th.lwib a1,(a0),-16,4294967295'. And the
following error occurred during compilation:

Assembler messages:
Error: improper immediate value (18446744073709551615)

gcc/ChangeLog:

* config/riscv/thead.cc (th_print_operand_address): Change %ld
to %lld.
---
 gcc/config/riscv/thead.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index 2955bc5f8a9..e4b8c37bc28 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -1141,7 +1141,7 @@ th_print_operand_address (FILE *file, machine_mode mode, 
rtx x)
   return true;
 
 case ADDRESS_REG_WB:
-  fprintf (file, "(%s),%ld,%u", reg_names[REGNO (addr.reg)],
+  fprintf (file, "(%s),"HOST_WIDE_INT_PRINT_DEC",%u", reg_names[REGNO 
(addr.reg)],
   INTVAL (addr.offset) >> addr.shift, addr.shift);
return true;
 
-- 
2.17.1

Re: [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled

2024-01-29 Thread Andrew Stubbs


On 26/01/2024 16:45, Tobias Burnus wrote:

Hi,

Thomas Schwinge wrote:
amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to 
the docs

...
Further down in that file, we state:
 @anchor{amdgcn-x-amdhsa}
 @heading amdgcn-*-amdhsa
 AMD GCN GPU target.
 
 Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, [...]


LLVM 13.0.1 may still be fine for gfx1030
('[...]/amdgcn-amdhsa/gfx1030/libgcc' does get built; I've not further
tested), but it's not sufficient for gfx1100 anymore:


Testing with the system compilers here, llvm-mc-14.0.6 also fails while 
llvm-mc-15.0.7 accepts it.



Which version of LLVM should we be recommending?


 >= LLVM 15, I think. How about the following wording? It still mentions 
LLVM 13.0.1 for those that really need it but with for the default 
setup, it requires 15+.


OK.

Andrew

[committed] libgomp.c/declare-variant-4.h: Fix used variant function for gfx1030/gfx1100

2024-01-29 Thread Tobias Burnus


This fixes an obvious and stupid copy'n'paste bug of mine in
the OpenMP declare variant used for two testcases, fixing:
FAIL: libgomp.c/declare-variant-4-gfx1030.c 
scan-amdgcn-amdhsa-offload-tree-dump optimized "= gfx1030 \\(\\);" FAIL: 
libgomp.c/declare-variant-4-gfx1100.c 
scan-amdgcn-amdhsa-offload-tree-dump optimized "= gfx1100 \\(\\);" 
Committed as obvious as r14-8488-gcb366731e767e2

Tobias
commit cb366731e767e2dec158c8c4a495fe2ccbd550ff
Author: Tobias Burnus 
Date:   Mon Jan 29 11:06:15 2024 +0100

libgomp.c/declare-variant-4.h: Fix used variant function for gfx1030/gfx1100

libgomp/ChangeLog:

* testsuite/libgomp.c/declare-variant-4.h: Use gfx1100/gfx1030
function not gfx90a for gfx1100/gfx1030 context selector.

Signed-off-by: Tobias Burnus 

diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4.h b/libgomp/testsuite/libgomp.c/declare-variant-4.h
index 393a5e295cc..d2e9194bf5b 100644
--- a/libgomp/testsuite/libgomp.c/declare-variant-4.h
+++ b/libgomp/testsuite/libgomp.c/declare-variant-4.h
@@ -58,8 +58,8 @@ gfx1100 (void)
 #pragma omp declare variant(gfx906) match(device = {isa("gfx906")})
 #pragma omp declare variant(gfx908) match(device = {isa("gfx908")})
 #pragma omp declare variant(gfx90a) match(device = {isa("gfx90a")})
-#pragma omp declare variant(gfx90a) match(device = {isa("gfx1030")})
-#pragma omp declare variant(gfx90a) match(device = {isa("gfx1100")})
+#pragma omp declare variant(gfx1030) match(device = {isa("gfx1030")})
+#pragma omp declare variant(gfx1100) match(device = {isa("gfx1100")})
 __attribute__ ((noipa))
 int
 f (void)

Re: [PATCH] x86: Generate .cfi_undefined for unsaved callee-saved registers

2024-01-29 Thread Jakub Jelinek

On Sat, Jan 27, 2024 at 12:41:24PM -0800, H.J. Lu wrote:
> When assembler directives for DWARF frame unwind is enabled, generate
> the .cfi_undefined directive for unsaved callee-saved registers which
> have been used in the function.
> 
> gcc/
> 
>   PR target/38534
>   * config/i386/i386.cc (ix86_post_cfi_startproc): New.
>   (TARGET_ASM_POST_CFI_STARTPROC): Likewise.
> 
> gcc/testsuite/
> 
>   PR target/38534
>   * gcc.target/i386/no-callee-saved-19.c: New test.
>   * gcc.target/i386/no-callee-saved-20.c: Likewise.
>   * gcc.target/i386/pr38534-7.c: Likewise.
>   * gcc.target/i386/pr38534-8.c: Likewise.

This only works for -fdwarf2-cfi-asm, but doesn't work for
-fno-dwarf2-cfi-asm.  I think we need something that will work for both.

So, I'd say we want to add support for REG_CFA_UNDEFINED note, emit those
notes on some frame related insn in the prologue during prologue expansion
in pro_and_epilogue pass and handle that in dwarf2cfi.cc pass.

One question is where those should be emitted.  Emitting them right
at the start of the function has an advantage that it can be emitted in
CIE for all FDEs of noreturn functions (or with the new attribute).  But
disadvantage is of course that it will make e.g. debugging experience worse
even in the prologues of functions where the callee saved registers which
current function actually doesn't save aren't modified yet.
E.g. for the cases where callee saved registers are saved to memory or
registers I think dwarf2cfi.cc attempts to optimize and move the .cfi_*
directives or .eh_frame record later into the function as long as the
corresponding original register isn't modified yet.  Perhaps that should
be done also for the undefined case, ideally by using the same dwarf2cfi.cc
code.  So just perhaps at the start of the function read in the
REG_CFA_UNDEFINED notes for all the ever modified callee saved registers
which won't be actually saved and turn that into similar record like for
the saving into stack or other regs, just noting it is undefined instead
and have it pushed later as much as possible.

Jakub

Re: [PATCH] x86: Save callee-saved registers in noreturn functions for -O0/-Og

2024-01-29 Thread Jakub Jelinek

On Sat, Jan 27, 2024 at 07:00:03AM -0800, H.J. Lu wrote:
> On Sat, Jan 27, 2024 at 6:09 AM Jakub Jelinek  wrote:
> >
> > On Sat, Jan 27, 2024 at 05:52:34AM -0800, H.J. Lu wrote:
> > > @@ -3391,7 +3392,9 @@ ix86_set_func_type (tree fndecl)
> > >   function is marked as noreturn in the IR output, which leads the
> > >   incompatible attribute error in LTO1.  */
> > >bool has_no_callee_saved_registers
> > > -= (((TREE_NOTHROW (fndecl) || !flag_exceptions)
> > > += ((optimize
> > > + && !optimize_debug
> >
> > Shouldn't that be opt_for_fn (fndecl, optimize) and ditto for
> > optimize_debug?
> > I mean, aren't the options not restored yet when this function is called
> > (i.e. remain in whatever state they were in the previous function or
> > global state)?
> 
> store_parm_decls is called when parsing a function.  store_parm_decls
> calls allocate_struct_function which calls
> 
>   invoke_set_current_function_hook (fndecl);
> 
> which has
> 
>  /* Change optimization options if needed.  */
>   if (optimization_current_node != opts)
> {
>   optimization_current_node = opts;
>   cl_optimization_restore (&global_options, &global_options_set,
>TREE_OPTIMIZATION (opts));
> }
> 
>   targetm.set_current_function (fndecl);
> 
> which calls ix86_set_current_function after global_options
> has been updated.   ix86_set_func_type is called from
> ix86_set_current_function.

Sorry, you're right, I just saw option restore later in 
ix86_set_current_function
and missed that it is target option restore only.

> > Also, why check "noreturn" attribute rather than
> > TREE_THIS_VOLATILE (fndecl)?
> >
> 
> The comments above this code has
> 
>  NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn
>  function.  The local-pure-const pass turns an interrupt function
>  into a noreturn function by setting TREE_THIS_VOLATILE.  Normally
>  the local-pure-const pass is run after ix86_set_func_type is called.
>  When the local-pure-const pass is enabled for LTO, the interrupt
>  function is marked as noreturn in the IR output, which leads the
>  incompatible attribute error in LTO1.

So in that case, I think it would be best to test
  TREE_THIS_VOLATILE (fndecl)
  && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl))
  && ...
because if it doesn't have noreturn attribute, it will not have
TREE_THIS_VOLATILE set and TREE_THIS_VOLATILE is much cheaper to test than
looking an attribute.

Jakub

Re: [PATCH] vect: Tighten vect_determine_precisions_from_range [PR113281]

2024-01-29 Thread Richard Biener

On Sat, Jan 27, 2024 at 4:44 PM Richard Sandiford
 wrote:
>
> This was another PR caused by the way that
> vect_determine_precisions_from_range handle shifts.  We tried to
> narrow 32768 >> x to a 16-bit shift based on range information for
> the inputs and outputs, with vect_recog_over_widening_pattern
> (after PR110828) adjusting the shift amount.  But this doesn't
> work for the case where x is in [16, 31], since then 32-bit
> 32768 >> x is a well-defined zero, whereas no well-defined
> 16-bit 32768 >> y will produce 0.
>
> We could perhaps generate x < 16 ? 32768 >> x : 0 instead,
> but since vect_determine_precisions_from_range was never really
> supposed to rely on fix-ups, it seems better to fix that instead.
>
> The patch also makes the code more selective about which codes
> can be narrowed based on input and output ranges.  This showed
> that vect_truncatable_operation_p was missing cases for
> BIT_NOT_EXPR (equivalent to BIT_XOR_EXPR of -1) and NEGATE_EXPR
> (equivalent to BIT_NOT_EXPR followed by a PLUS_EXPR of 1).
>
> pr113281-1.c is the original testcase.  pr113281-[23].c failed
> before the patch due to overly optimistic narrowing.  pr113281-[45].c
> previously passed and are meant to protect against accidental
> optimisation regressions.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
> PR target/113281
> * tree-vect-patterns.cc (vect_recog_over_widening_pattern): Remove
> workaround for right shifts.
> (vect_truncatable_operation_p): Handle NEGATE_EXPR and BIT_NOT_EXPR.
> (vect_determine_precisions_from_range): Be more selective about
> which codes can be narrowed based on their input and output ranges.
> For shifts, require at least one more bit of precision than the
> maximum shift amount.
>
> gcc/testsuite/
> PR target/113281
> * gcc.dg/vect/pr113281-1.c: New test.
> * gcc.dg/vect/pr113281-2.c: Likewise.
> * gcc.dg/vect/pr113281-3.c: Likewise.
> * gcc.dg/vect/pr113281-4.c: Likewise.
> * gcc.dg/vect/pr113281-5.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/vect/pr113281-1.c |  17 +++
>  gcc/testsuite/gcc.dg/vect/pr113281-2.c |  50 +
>  gcc/testsuite/gcc.dg/vect/pr113281-3.c |  39 +++
>  gcc/testsuite/gcc.dg/vect/pr113281-4.c |  55 ++
>  gcc/testsuite/gcc.dg/vect/pr113281-5.c |  66 
>  gcc/tree-vect-patterns.cc  | 144 +
>  6 files changed, 305 insertions(+), 66 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-5.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-1.c 
> b/gcc/testsuite/gcc.dg/vect/pr113281-1.c
> new file mode 100644
> index 000..6df4231cb5f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr113281-1.c
> @@ -0,0 +1,17 @@
> +#include "tree-vect.h"
> +
> +unsigned char a;
> +
> +int main() {
> +  check_vect ();
> +
> +  short b = a = 0;
> +  for (; a != 19; a++)
> +if (a)
> +  b = 32872 >> a;
> +
> +  if (b == 0)
> +return 0;
> +  else
> +return 1;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-2.c 
> b/gcc/testsuite/gcc.dg/vect/pr113281-2.c
> new file mode 100644
> index 000..3a1170c28b6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr113281-2.c
> @@ -0,0 +1,50 @@
> +/* { dg-do compile } */
> +
> +#define N 128
> +
> +short x[N];
> +short y[N];
> +
> +void
> +f1 (void)
> +{
> +  for (int i = 0; i < N; ++i)
> +x[i] >>= y[i];
> +}
> +
> +void
> +f2 (void)
> +{
> +  for (int i = 0; i < N; ++i)
> +x[i] >>= (y[i] < 32 ? y[i] : 32);
> +}
> +
> +void
> +f3 (void)
> +{
> +  for (int i = 0; i < N; ++i)
> +x[i] >>= (y[i] < 31 ? y[i] : 31);
> +}
> +
> +void
> +f4 (void)
> +{
> +  for (int i = 0; i < N; ++i)
> +x[i] >>= (y[i] & 31);
> +}
> +
> +void
> +f5 (void)
> +{
> +  for (int i = 0; i < N; ++i)
> +x[i] >>= 0x8000 >> y[i];
> +}
> +
> +void
> +f6 (void)
> +{
> +  for (int i = 0; i < N; ++i)
> +x[i] >>= 0x8000 >> (y[i] & 31);
> +}
> +
> +/* { dg-final { scan-tree-dump-not {can narrow[^\n]+>>} "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-3.c 
> b/gcc/testsuite/gcc.dg/vect/pr113281-3.c
> new file mode 100644
> index 000..5982dd2d16f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr113281-3.c
> @@ -0,0 +1,39 @@
> +/* { dg-do compile } */
> +
> +#define N 128
> +
> +short x[N];
> +short y[N];
> +
> +void
> +f1 (void)
> +{
> +  for (int i = 0; i < N; ++i)
> +x[i] >>= (y[i] < 30 ? y[i] : 30);
> +}
> +
> +void
> +f2 (void)
> +{
> +  for (int i = 0; i < N; ++i)
> +x[i] >>= ((y[i] & 15) + 2);
> +}
> +
> +void
> +f3 (void)
> +{
> +  for (int i = 0; i < N; ++i)
> +x[i] >>= (y[i] < 16 ?

Re: [wwwdocs][patch] gcc-14/changes.html (amdgcn): Update for gfx1030/gfx1100

2024-01-29 Thread Andrew Stubbs


On 26/01/2024 17:06, Tobias Burnus wrote:

Mention that gfx1030/gfx1100 are now supported.

As noted in another thread, LLVM 15's assembler is now required, before 
LLVM 13.0.1 would do. (Alternatively, disabling gfx1100 support would 
do.) Hence, the added link to the install documentation.


Comments, suggestions?


I'm happy with the technical correctness of this, but I'm uncertain if 
"which required an update of the default build requirements" is the sort 
of wording we like in the changelog?


Perhaps like this?

  Initial support for the AMD Radeon gfx1030 (RDNA2) and
  gfx1100 (RDNA3) devices has been added.  LLVM 15+
  (assembler and linker) is required to support gfx1100.

Andrew

[PATCH] middle-end/113622 - allow .VEC_SET and .VEC_EXTRACT for global hard regs

2024-01-29 Thread Richard Biener

The following expands .VEC_SET and .VEC_EXTRACT instruction selection
to global hard registers, not only automatic variables (possibly)
promoted to registers.  This can avoid some ICEs later and create
better code.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

PR middle-end/113622
* gimple-isel.cc (gimple_expand_vec_set_extract_expr):
Also allow DECL_HARD_REGISTER variables.

* gcc.target/i386/pr113622-1.c: New testcase.
---
 gcc/gimple-isel.cc |  3 ++-
 gcc/testsuite/gcc.target/i386/pr113622-1.c | 12 
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr113622-1.c

diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 7e2392ecd38..e94f292dd38 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -104,7 +104,8 @@ gimple_expand_vec_set_extract_expr (struct function *fun,
   machine_mode outermode = TYPE_MODE (TREE_TYPE (view_op0));
   machine_mode extract_mode = TYPE_MODE (TREE_TYPE (ref));
 
-  if (auto_var_in_fn_p (view_op0, fun->decl)
+  if ((auto_var_in_fn_p (view_op0, fun->decl)
+  || DECL_HARD_REGISTER (view_op0))
  && !TREE_ADDRESSABLE (view_op0)
  && ((!is_extract && can_vec_set_var_idx_p (outermode))
  || (is_extract
diff --git a/gcc/testsuite/gcc.target/i386/pr113622-1.c 
b/gcc/testsuite/gcc.target/i386/pr113622-1.c
new file mode 100644
index 000..2d6cb3c89a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr113622-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -w" } */
+
+typedef float __attribute__ ((vector_size (64))) vec;
+register vec a asm("zmm2"), b asm("zmm0"), c asm("zmm1");
+
+void
+test (void)
+{
+  for (int i = 0; i < 8; i++)
+c[i] = a[i] < b[i] ? 0.1 : 0.2;
+}
-- 
2.35.3

[patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

2024-01-29 Thread Tobias Burnus


Andrew wrote off list:
  "Vector reductions don't work on RDNA, as is, but they're
   supposed to be disabled by the insn condition"

This patch disables "fold_left_plus_", which is about
vectorization and in the code path shown in the backtrace.
I can also confirm manually that it fixes the ICE I saw and
also the ICE for the testfile that Richard's PR shows at the
end of his backtrace.  (-O3 is needed to trigger the ICE.)

OK for mainline?

Tobias

* * *

PS: We could add testcase(s) that is/are explicitly compiled with
gfx1100 and/or gfx1030 + '-O3' to ensure that this gets tested
with AMDGPU enabled, but I am not sure whether it is really worthwhile.


PPS: Running the testsuite, I see the following fails with
gfx1100 offloading:

FAIL: libgomp.c/../libgomp.c-c++-common/for-5.c (test for excess errors)
Excess errors:
/tmp/ccrsHfVQ.mkoffload.2.s:788736:27: error: value out of range
  .amdhsa_next_free_vgpr516 
   ^~~ [Obviously, likewise forlibgomp.c++/../libgomp.c-c++-common/for-5.c]
FAIL:libgomp.c/pr104783-2.c execution test FAIL:libgomp.c/pr104783.c 
execution test (The .log unfortunately does not show more details) 
FAIL:libgomp.fortran/optional-map.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for 
excess errors) FAIL:libgomp.fortran/optional-map.f90   -O3 -g  (test for 
excess errors) FAIL: libgomp.fortran/target1.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer 
-finline-functions  (test for excess errors) FAIL: 
libgomp.fortran/target1.f90   -O3 -g  (test for excess errors)Same 'out 
of range' as above. * * * Manual testing shows for the two execution 
fails: Memory access fault by GPU node-1 (Agent handle: 0x8d1aa0) on 
address (nil). Reason: Page not present or supervisor privilege. 
Interestingly, it only fails with -O1 or higher, for -O0 it works. Tobias
gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

gcc/ChangeLog:

	PR target/113615
	* config/gcn/gcn-valu.md (fold_left_plus_): Only
	define for !TARGET_RDNA2_PLUS.

Signed-off-by: Tobias Burnus 

 gcc/config/gcn/gcn-valu.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index cd027f8b369..23b441f8e8b 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -4274,7 +4274,8 @@ (define_expand "fold_left_plus_"
  [(match_operand: 0 "register_operand")
   (match_operand: 1 "gcn_alu_operand")
   (match_operand:V_FP 2 "gcn_alu_operand")]
-  "can_create_pseudo_p ()
+  "!TARGET_RDNA2_PLUS
+   && can_create_pseudo_p ()
&& (flag_openacc || flag_openmp
|| flag_associative_math)"
   {

Re: [PATCH] middle-end/113622 - allow .VEC_SET and .VEC_EXTRACT for global hard regs

2024-01-29 Thread Jakub Jelinek

On Mon, Jan 29, 2024 at 11:24:58AM +0100, Richard Biener wrote:
> The following expands .VEC_SET and .VEC_EXTRACT instruction selection
> to global hard registers, not only automatic variables (possibly)
> promoted to registers.  This can avoid some ICEs later and create
> better code.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK?
> 
> Thanks,
> Richard.
> 
>   PR middle-end/113622
>   * gimple-isel.cc (gimple_expand_vec_set_extract_expr):
>   Also allow DECL_HARD_REGISTER variables.
> 
>   * gcc.target/i386/pr113622-1.c: New testcase.
> ---
>  gcc/gimple-isel.cc |  3 ++-
>  gcc/testsuite/gcc.target/i386/pr113622-1.c | 12 
>  2 files changed, 14 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr113622-1.c
> 
> diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
> index 7e2392ecd38..e94f292dd38 100644
> --- a/gcc/gimple-isel.cc
> +++ b/gcc/gimple-isel.cc
> @@ -104,7 +104,8 @@ gimple_expand_vec_set_extract_expr (struct function *fun,
>machine_mode outermode = TYPE_MODE (TREE_TYPE (view_op0));
>machine_mode extract_mode = TYPE_MODE (TREE_TYPE (ref));
>  
> -  if (auto_var_in_fn_p (view_op0, fun->decl)
> +  if ((auto_var_in_fn_p (view_op0, fun->decl)
> +|| DECL_HARD_REGISTER (view_op0))
> && !TREE_ADDRESSABLE (view_op0)
> && ((!is_extract && can_vec_set_var_idx_p (outermode))
> || (is_extract

All we know here from the earlier checks is DECL_P (view_op0), but
DECL_HARD_REGISTER uses VAR_DECL_CHECK, shouldn't this be
   || (VAR_P (view_op0) && DECL_HARD_REGISTER (view_op0)))
instead?

> diff --git a/gcc/testsuite/gcc.target/i386/pr113622-1.c 
> b/gcc/testsuite/gcc.target/i386/pr113622-1.c
> new file mode 100644
> index 000..2d6cb3c89a8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr113622-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512f -w" } */
> +
> +typedef float __attribute__ ((vector_size (64))) vec;
> +register vec a asm("zmm2"), b asm("zmm0"), c asm("zmm1");

I'd feel better if this used say zmm5, zmm6, zmm7 or something similar
so that it doesn't clash with some of the implicitly used SSE
registers, but on the other side still fit into 8 SSE registers
which ia32 has access to.

> +
> +void
> +test (void)
> +{
> +  for (int i = 0; i < 8; i++)
> +c[i] = a[i] < b[i] ? 0.1 : 0.2;
> +}

Otherwise LGTM.

Jakub

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread Jakub Jelinek

On Sat, Jan 27, 2024 at 07:10:55AM -0800, H.J. Lu wrote:
> For function symbol reference in readonly data section, instead of putting
> it in .data.rel.ro or .rodata.cst section, call function_rodata_section to
> get the read-only or relocated read-only data section associated with the
> function DECL so that the COMDAT section will be used for a COMDAT function
> symbol.

I have to admit I still don't understand what the linker doesn't like on
what GCC emits and why references to the public symbols at the start of
comdat sections are ok in .text but not in .data.rel.ro but are in .data
or .rodata sections (or what the exact rules are, see also what we emit on
__attribute__((noinline, noipa)) inline void foo () {}
void bar () { foo (); } void (*p) () = foo; void (*const q) () = foo; void 
(*const *r) () = &q;
).
I've always thought that the problematic references are when something
references non-public symbols in comdat sections, especially not at their
start, because if linker selects some comdat section(s) from some other
TU, there is no guarantee e.g. the code is identical (just in valid program
should behave the same) and if such reference comes from other comdat that
is kept or from non-comdat sections, the question is what should be
referenced.

But in this case, I believe we are referencing the function at the start of
a code comdat section.

Now, in my limited understanding what the patch does is totally wrong
for multiple reasons.  On the first testcase it changes
-   .section.data.rel.ro.local,"aw"
+   .section
.data.rel.ro.local._ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE,"awG",@progbits,_ZN26vtkStaticCellLinksTemplateIxE18ThreadedBuildLinksExxP12vtkCellArray,comdat
.align 8
 .LC0:
.quad   
_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE
Now, I believe such a .data.rel.ro.local.* section is normally
used for .data.rel.ro.local constants from the referenced function,
if we have some relocatable constant needed in that function we
emit those there.
If linker picks up the comdat from current TU, it will be all fine,
sure, but if it picks up the comdat from another TU, the
.data.rel.ro.local._ZN4blah17_Function_handlerIFvvENS_5* section
there might not be present or might contain some unrelated stuff.
Given the handling of (const (plus (symbol_ref) (const_int)), we
also don't know whether the section holds a reference to the start,
or to some other offset of it, how many etc.
And, we refenre a non-public symbol (.LC0) from non-comdat section
to a comdat section.

If I'm wrong on this, please try to explain.

Jakub

Re: [PATCH] jit: Ensure ssize_t is defined.

2024-01-29 Thread Iain Sandoe

Hi David,

I guess the solution here depends on the scope over which we expect
the header to be used.

> On 28 Jan 2024, at 23:13, Iain Sandoe  wrote:
>> On 28 Jan 2024, at 21:25, Eric Gallager  wrote:
>> On Sun, Jan 28, 2024 at 6:45 AM Iain Sandoe  wrote:
>>> 
>>> Tested on i686, x86_64 Darwin, x86_64 Linux,
>>> OK for trunk?
>>> 
>>> --- 8< ---
>>> 
>>> On some targets it seems that ssize_t is not defined by any of the
>>> headers transitively included by .  This leads to a bootstrap
>>> fail when jit is enabled.
>>> 
>>> The fix proposed here is to include sys/types.h when it is available
>>> since that is where Posix specifies that ssize_t is defined.
>>> 
>>> gcc/jit/ChangeLog:
>>> 
>>>   * libgccjit.h: Conditionally include  where it is
>>>   available to ensure declaration of ssize_t.
>>> 
>>> Signed-off-by: Iain Sandoe 
>>> ---
>>> gcc/jit/libgccjit.h | 3 +++
>>> 1 file changed, 3 insertions(+)
>>> 
>>> diff --git a/gcc/jit/libgccjit.h b/gcc/jit/libgccjit.h
>>> index 235cab053e0..db4f27a48bf 100644
>>> --- a/gcc/jit/libgccjit.h
>>> +++ b/gcc/jit/libgccjit.h
>>> @@ -21,6 +21,9 @@ along with GCC; see the file COPYING3.  If not see
>>> #define LIBGCCJIT_H
>>> 
>>> #include 
>>> +#if __has_include()
>> 
>> Is __has_include() something that we can use unconditionally?
> 
> Hmm.. maybe we cannot, it seems it was introduced in gcc-4.9 and we only ask
> for 4.8, IIRC.
> 
> I guess HAVE_SYS_TYPES_H might be an alternative (I’ll have to retest)

Answering my own question; no that is not going to work  either since the 
header is
installed and config.h is not.

I guess the question is “is this header ever [meaningfully] consumed by a 
compiler
other than the current GCC that it supports”?

e.g. if we expected we could build libgccjit with clang in a 
“—disable-bootstrap”
configuration and expect that to work?

The fallback is
#ifdef __APPLE__
# include   /* For ssize_t.  */
#endif

(which I will test on a number of platform versions).

since this breaks bootstrap at stage 2 on affected platform versions, so we 
need some
fix.

thanks
Iain

[PATCH] RISC-V: Fix VSETLV PASS compile-time issue

2024-01-29 Thread Juzhe-Zhong

The compile time issue was discovered in SPEC 2017 wrf:

Use time and -ftime-report to analyze the profile data of SPEC 2017 wrf 
compilation .

Before this patch (Lazy vsetvl):

scheduling : 121.89 ( 15%)   0.53 ( 11%) 122.72 ( 15%)  
  13M (  1%)
machine dep reorg  : 424.61 ( 53%)   1.84 ( 37%) 427.44 ( 53%)  
5290k (  0%)
real13m27.074s
user13m19.539s
sys 0m5.180s

Simple vsetvl:

machine dep reorg  :   0.10 (  0%)   0.00 (  0%)   0.11 (  0%)  
4138k (  0%)
real6m5.780s
user6m2.396s
sys 0m2.373s

The machine dep reorg is the compile time of VSETVL PASS (424 seconds) which 
counts 53% of
the compilation time, spends much more time than scheduling.

After investigation, the critical patch of VSETVL pass is 
compute_lcm_local_properties which
is called every iteration of phase 2 (earliest fusion) and phase 3 (global lcm).

This patch optimized the codes of compute_lcm_local_properties to reduce the 
compilation time.

After this patch:

scheduling : 117.51 ( 27%)   0.21 (  6%) 118.04 ( 27%)  
  13M (  1%)
machine dep reorg  :  80.13 ( 18%)   0.91 ( 26%)  81.26 ( 18%)  
5290k (  0%)
real7m25.374s
user7m20.116s
sys 0m3.795s

The optimization of this patch is very obvious, lazy VSETVL PASS: 424s (53%) -> 
80s (18%) which
spend less time than scheduling.

Tested on both RV32 and RV64 no regression.  Ok for trunk ?
 
PR target/113495

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (extract_single_source): Remove.
(pre_vsetvl::compute_vsetvl_def_data): Fix compile time issue.
(pre_vsetvl::compute_transparent): New function.
(pre_vsetvl::compute_lcm_local_properties): Fix compile time time issue.

---
 gcc/config/riscv/riscv-vsetvl.cc | 184 ++-
 1 file changed, 60 insertions(+), 124 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index d7b40a5c813..cec862329c5 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -599,14 +599,6 @@ extract_single_source (set_info *set)
   return first_insn;
 }
 
-static insn_info *
-extract_single_source (def_info *def)
-{
-  if (!def)
-return nullptr;
-  return extract_single_source (dyn_cast (def));
-}
-
 static bool
 same_equiv_note_p (set_info *set1, set_info *set2)
 {
@@ -2374,6 +2366,7 @@ public:
   }
 
   void compute_vsetvl_def_data ();
+  void compute_transparent (const bb_info *);
   void compute_lcm_local_properties ();
 
   void fuse_local_vsetvl_info ();
@@ -2452,20 +2445,16 @@ pre_vsetvl::compute_vsetvl_def_data ()
{
  for (unsigned i = 0; i < m_vsetvl_def_exprs.length (); i += 1)
{
- const vsetvl_info &info = *m_vsetvl_def_exprs[i];
- if (!info.has_nonvlmax_reg_avl ())
-   continue;
- unsigned int regno;
- sbitmap_iterator sbi;
- EXECUTE_IF_SET_IN_BITMAP (m_reg_def_loc[bb->index ()], 0, regno,
-   sbi)
-   if (regno == REGNO (info.get_avl ()))
- {
-   bitmap_set_bit (m_kill[bb->index ()], i);
-   bitmap_set_bit (def_loc[bb->index ()],
-   get_expr_index (m_vsetvl_def_exprs,
-   m_unknow_info));
- }
+ auto *info = m_vsetvl_def_exprs[i];
+ if (info->has_nonvlmax_reg_avl ()
+ && bitmap_bit_p (m_reg_def_loc[bb->index ()],
+  REGNO (info->get_avl (
+   {
+ bitmap_set_bit (m_kill[bb->index ()], i);
+ bitmap_set_bit (def_loc[bb->index ()],
+ get_expr_index (m_vsetvl_def_exprs,
+ m_unknow_info));
+   }
}
  continue;
}
@@ -2516,6 +2505,36 @@ pre_vsetvl::compute_vsetvl_def_data ()
   sbitmap_vector_free (m_kill);
 }
 
+/* Subroutine of compute_lcm_local_properties which Compute local transparent
+   BB. Note that the compile time is very sensitive to compute_transparent and
+   compute_lcm_local_properties, any change of these 2 functions should be
+   aware of the compile time changing of the program which has a large number 
of
+   blocks, e.g SPEC 2017 wrf.
+
+   Current compile time profile of SPEC 2017 wrf:
+
+ 1. scheduling - 27%
+ 2. machine dep reorg (VSETVL PASS) - 18%
+
+   VSETVL pass should not spend more time than scheduling in compilation.  */
+void
+pre_vsetvl::compute_transparent (const bb_info *bb)
+{
+  int num_exprs = m_exprs.length ();
+  unsigned bb_index = bb->index ();
+  for (int i = 0; i < num_exprs; i++)
+{
+  auto *info = m_exprs[i];
+  if (info->has_nonvlmax_reg_avl ()
+ && bitmap_bit_p (m_reg_def_loc[bb_index], REGNO (

Re: [patch] nvptx.opt: Add sm_89 and sm_90a to -march-map=

2024-01-29 Thread Thomas Schwinge

Hi Tobias!

On 2024-01-20T10:57:29+0100, Tobias Burnus  wrote:
> Stumbled over this as we recently got a sm_89 card.
>
> -march-map= is mostly a future proof method for user to ensure to use 
> always the best code gen for a specific card - without needing to know 
> which GCC version added support for what --march=sm_... (or -misa=sm_... 
> - those are aliases).
>
> sm_89 was added in CUDA 11.8 (ptx isa 7.8) and sm_90a in CUDA 12.0 (ptx 
> isa 8.0) but that's just FYI as -march-map=sm_xx, xx >= 80 is mapping to 
> -march=sm_80 and implies -mptx=7.0 (i.e. ptx isa 7.0, added in CUDA 
> 11.0); hence, any CUDA 11.0+ will do.
>
> OK for mainline?

OK, thanks.


Grüße
 Thomas


> nvptx.opt: Add sm_89 and sm_90a to -march-map=
>
> The -march-map= options maps the compute capability to the closest
> lower compute capability that has been implemented; for sm_89 and
> sm_90a, that were previously missing, that's currently -march=sm_80
> alias -misa=sm_80.
>
> gcc/ChangeLog:
>
>   * config/nvptx/nvptx.opt (march-map=): Add sm_89 and sm_90a.
>
> Signed-off-by: Tobias Burnus 
>
> diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
> index 09d75fca037..deb006663d7 100644
> --- a/gcc/config/nvptx/nvptx.opt
> +++ b/gcc/config/nvptx/nvptx.opt
> @@ -108,9 +108,15 @@ Target RejectNegative Alias(misa=,sm_80)
>  march-map=sm_87
>  Target RejectNegative Alias(misa=,sm_80)
>  
> +march-map=sm_89
> +Target RejectNegative Alias(misa=,sm_80)
> +
>  march-map=sm_90
>  Target RejectNegative Alias(misa=,sm_80)
>  
> +march-map=sm_90a
> +Target RejectNegative Alias(misa=,sm_80)
> +
>  Enum
>  Name(ptx_version) Type(int)
>  Known PTX ISA versions (for use with the -mptx= option):

Re: [PATCH] middle-end/113622 - allow .VEC_SET and .VEC_EXTRACT for global hard regs

2024-01-29 Thread Richard Biener

On Mon, 29 Jan 2024, Jakub Jelinek wrote:

> On Mon, Jan 29, 2024 at 11:24:58AM +0100, Richard Biener wrote:
> > The following expands .VEC_SET and .VEC_EXTRACT instruction selection
> > to global hard registers, not only automatic variables (possibly)
> > promoted to registers.  This can avoid some ICEs later and create
> > better code.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > OK?
> > 
> > Thanks,
> > Richard.
> > 
> > PR middle-end/113622
> > * gimple-isel.cc (gimple_expand_vec_set_extract_expr):
> > Also allow DECL_HARD_REGISTER variables.
> > 
> > * gcc.target/i386/pr113622-1.c: New testcase.
> > ---
> >  gcc/gimple-isel.cc |  3 ++-
> >  gcc/testsuite/gcc.target/i386/pr113622-1.c | 12 
> >  2 files changed, 14 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr113622-1.c
> > 
> > diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
> > index 7e2392ecd38..e94f292dd38 100644
> > --- a/gcc/gimple-isel.cc
> > +++ b/gcc/gimple-isel.cc
> > @@ -104,7 +104,8 @@ gimple_expand_vec_set_extract_expr (struct function 
> > *fun,
> >machine_mode outermode = TYPE_MODE (TREE_TYPE (view_op0));
> >machine_mode extract_mode = TYPE_MODE (TREE_TYPE (ref));
> >  
> > -  if (auto_var_in_fn_p (view_op0, fun->decl)
> > +  if ((auto_var_in_fn_p (view_op0, fun->decl)
> > +  || DECL_HARD_REGISTER (view_op0))
> >   && !TREE_ADDRESSABLE (view_op0)
> >   && ((!is_extract && can_vec_set_var_idx_p (outermode))
> >   || (is_extract
> 
> All we know here from the earlier checks is DECL_P (view_op0), but
> DECL_HARD_REGISTER uses VAR_DECL_CHECK, shouldn't this be
>  || (VAR_P (view_op0) && DECL_HARD_REGISTER (view_op0)))
> instead?

Ah, yeah - will fix.

> > diff --git a/gcc/testsuite/gcc.target/i386/pr113622-1.c 
> > b/gcc/testsuite/gcc.target/i386/pr113622-1.c
> > new file mode 100644
> > index 000..2d6cb3c89a8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr113622-1.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx512f -w" } */
> > +
> > +typedef float __attribute__ ((vector_size (64))) vec;
> > +register vec a asm("zmm2"), b asm("zmm0"), c asm("zmm1");
> 
> I'd feel better if this used say zmm5, zmm6, zmm7 or something similar
> so that it doesn't clash with some of the implicitly used SSE
> registers, but on the other side still fit into 8 SSE registers
> which ia32 has access to.

OK, will adjust.

Thanks,
Richard.

> > +
> > +void
> > +test (void)
> > +{
> > +  for (int i = 0; i < 8; i++)
> > +c[i] = a[i] < b[i] ? 0.1 : 0.2;
> > +}
> 
> Otherwise LGTM.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH] middle-end/113622 - handle store with variable index to register

2024-01-29 Thread Richard Biener

The following implements storing to a non-MEM_P with a variable
offset.  We usually avoid this by forcing expansion to memory but
this doesn't work for hard register variables.  The solution is
to spill and operate on the stack.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

I realize the flow is a bit awkward, but short of duplicating a lot
of code I can't see a better way.  Forcing some lowering on GIMPLE
(creating the copy there) might be another away.  But then we
could possibly lower the whole vector indexing in a different way
in the first place ...

Thanks,
Richard.

PR middle-end/113622
* expr.cc (expand_assignment): Spill hard registers if
we index them with a variable offset.

* gcc.target/i386/pr113622-2.c: New testcase.
* gcc.target/i386/pr113622-3.c: Likewise.
---
 gcc/expr.cc| 23 +++---
 gcc/testsuite/gcc.target/i386/pr113622-2.c | 12 +++
 gcc/testsuite/gcc.target/i386/pr113622-3.c | 12 +++
 3 files changed, 44 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr113622-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr113622-3.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index ee822c11dce..f13f07a2324 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6061,6 +6061,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
to_rtx = adjust_address (to_rtx, BLKmode, 0);
}
  
+  rtx stemp = NULL_RTX, old_to_rtx = NULL_RTX;
   if (offset != 0)
{
  machine_mode address_mode;
@@ -6070,9 +6071,22 @@ expand_assignment (tree to, tree from, bool nontemporal)
{
  /* We can get constant negative offsets into arrays with broken
 user code.  Translate this to a trap instead of ICEing.  */
- gcc_assert (TREE_CODE (offset) == INTEGER_CST);
- expand_builtin_trap ();
- to_rtx = gen_rtx_MEM (BLKmode, const0_rtx);
+ if (TREE_CODE (offset) == INTEGER_CST)
+   {
+ expand_builtin_trap ();
+ to_rtx = gen_rtx_MEM (BLKmode, const0_rtx);
+   }
+ /* Else spill for variable offset to the destination.  We expect
+to run into this only for hard registers.  */
+ else
+   {
+ gcc_assert (DECL_HARD_REGISTER (tem));
+ stemp = assign_stack_temp (GET_MODE (to_rtx),
+GET_MODE_SIZE (GET_MODE (to_rtx)));
+ emit_move_insn (stemp, to_rtx);
+ old_to_rtx = to_rtx;
+ to_rtx = stemp;
+   }
}
 
  offset_rtx = expand_expr (offset, NULL_RTX, VOIDmode, EXPAND_SUM);
@@ -6305,6 +6319,9 @@ expand_assignment (tree to, tree from, bool nontemporal)
  bitregion_start, bitregion_end,
  mode1, from, get_alias_set (to),
  nontemporal, reversep);
+ /* Move the temporary storage back to the non-MEM_P.  */
+ if (stemp)
+   emit_move_insn (old_to_rtx, stemp);
}
 
   if (result)
diff --git a/gcc/testsuite/gcc.target/i386/pr113622-2.c 
b/gcc/testsuite/gcc.target/i386/pr113622-2.c
new file mode 100644
index 000..7bcc12af27e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr113622-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-msse -w" } */
+
+typedef double __attribute__ ((vector_size (16))) vec;
+register vec a asm("xmm5"), b asm("xmm6"), c asm("xmm7");
+
+void
+test (void)
+{
+  for (int i = 0; i < 2; i++)
+c[i] = a[i] < b[i] ? 0.1 : 0.2;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr113622-3.c 
b/gcc/testsuite/gcc.target/i386/pr113622-3.c
new file mode 100644
index 000..ca79d4ac901
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr113622-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-msse" } */
+
+typedef double __attribute__ ((vector_size (16))) vec;
+
+void
+test (void)
+{
+  register vec a asm("xmm5"), b asm("xmm6"), c asm("xmm7");
+  for (int i = 0; i < 2; i++)
+c[i] = a[i] < b[i] ? 0.1 : 0.2;
+}
-- 
2.35.3

Re: [PATCH] middle-end/113622 - handle store with variable index to register

2024-01-29 Thread Jakub Jelinek

On Mon, Jan 29, 2024 at 01:05:51PM +0100, Richard Biener wrote:
> The following implements storing to a non-MEM_P with a variable
> offset.  We usually avoid this by forcing expansion to memory but
> this doesn't work for hard register variables.  The solution is
> to spill and operate on the stack.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> 
> I realize the flow is a bit awkward, but short of duplicating a lot
> of code I can't see a better way.  Forcing some lowering on GIMPLE
> (creating the copy there) might be another away.  But then we
> could possibly lower the whole vector indexing in a different way
> in the first place ...
> 
> Thanks,
> Richard.
> 
>   PR middle-end/113622
>   * expr.cc (expand_assignment): Spill hard registers if
>   we index them with a variable offset.
> 
>   * gcc.target/i386/pr113622-2.c: New testcase.
>   * gcc.target/i386/pr113622-3.c: Likewise.

Ok, thanks.

Jakub

Re: [PATCH] middle-end/113622 - handle store with variable index to register

2024-01-29 Thread Jakub Jelinek

On Mon, Jan 29, 2024 at 01:17:16PM +0100, Jakub Jelinek wrote:
> On Mon, Jan 29, 2024 at 01:05:51PM +0100, Richard Biener wrote:
> > The following implements storing to a non-MEM_P with a variable
> > offset.  We usually avoid this by forcing expansion to memory but
> > this doesn't work for hard register variables.  The solution is
> > to spill and operate on the stack.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> > 
> > I realize the flow is a bit awkward, but short of duplicating a lot
> > of code I can't see a better way.  Forcing some lowering on GIMPLE
> > (creating the copy there) might be another away.  But then we
> > could possibly lower the whole vector indexing in a different way
> > in the first place ...
> > 
> > Thanks,
> > Richard.
> > 
> > PR middle-end/113622
> > * expr.cc (expand_assignment): Spill hard registers if
> > we index them with a variable offset.
> > 
> > * gcc.target/i386/pr113622-2.c: New testcase.
> > * gcc.target/i386/pr113622-3.c: Likewise.
> 
> Ok, thanks.

Actually, better to do
gcc_assert (VAR_P (tem) && DECL_HARD_REGISTER (tem));
Again, nothing guarantees tem is a VAR_DECL.  Though, with tree checking
it would either ICE for DECL_HARD_REGISTER (tem) being false on a VAR_DECL,
or in checking on tem not being a VAR_DECL.  But say with release checking
it will do a weird thing.

Jakub

Re: [PATCH v2] RISC-V: THEAD: Fix improper immediate value for MODIFY_DISP instruction on 32-bit systems.

2024-01-29 Thread Kito Cheng

LGTM

Jin Ma  於 2024年1月29日 週一 17:57 寫道：

> When using  '%ld' to print 'long long int' variable, 'fprintf' will
> produce messy output on a 32-bit system, in an incorrect instruction
> being generated, such as 'th.lwib a1,(a0),-16,4294967295'. And the
> following error occurred during compilation:
>
> Assembler messages:
> Error: improper immediate value (18446744073709551615)
>
> gcc/ChangeLog:
>
> * config/riscv/thead.cc (th_print_operand_address): Change %ld
> to %lld.
> ---
>  gcc/config/riscv/thead.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
> index 2955bc5f8a9..e4b8c37bc28 100644
> --- a/gcc/config/riscv/thead.cc
> +++ b/gcc/config/riscv/thead.cc
> @@ -1141,7 +1141,7 @@ th_print_operand_address (FILE *file, machine_mode
> mode, rtx x)
>return true;
>
>  case ADDRESS_REG_WB:
> -  fprintf (file, "(%s),%ld,%u", reg_names[REGNO (addr.reg)],
> +  fprintf (file, "(%s),"HOST_WIDE_INT_PRINT_DEC",%u", reg_names[REGNO
> (addr.reg)],
>INTVAL (addr.offset) >> addr.shift, addr.shift);
> return true;
>
> --
> 2.17.1
>
>

Re: [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

2024-01-29 Thread Andrew Stubbs


On 29/01/2024 10:34, Tobias Burnus wrote:

Andrew wrote off list:
   "Vector reductions don't work on RDNA, as is, but they're
    supposed to be disabled by the insn condition"

This patch disables "fold_left_plus_", which is about
vectorization and in the code path shown in the backtrace.
I can also confirm manually that it fixes the ICE I saw and
also the ICE for the testfile that Richard's PR shows at the
end of his backtrace.  (-O3 is needed to trigger the ICE.)

OK for mainline?


OK.


Tobias

* * *

PS: We could add testcase(s) that is/are explicitly compiled with
gfx1100 and/or gfx1030 + '-O3' to ensure that this gets tested
with AMDGPU enabled, but I am not sure whether it is really worthwhile.


PPS: Running the testsuite, I see the following fails with
gfx1100 offloading:

FAIL: libgomp.c/../libgomp.c-c++-common/for-5.c (test for excess errors)
Excess errors:
/tmp/ccrsHfVQ.mkoffload.2.s:788736:27: error: value out of range
   .amdhsa_next_free_vgpr    516 
    ^~~ [Obviously, likewise 
forlibgomp.c++/../libgomp.c-c++-common/for-5.c]
FAIL:libgomp.c/pr104783-2.c execution test FAIL:libgomp.c/pr104783.c 
execution test (The .log unfortunately does not show more details) 
FAIL:libgomp.fortran/optional-map.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for 
excess errors) FAIL:libgomp.fortran/optional-map.f90   -O3 -g  (test for 
excess errors) FAIL: libgomp.fortran/target1.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer 
-finline-functions  (test for excess errors) FAIL: 
libgomp.fortran/target1.f90   -O3 -g  (test for excess errors)Same 'out 
of range' as above. * * * Manual testing shows for the two execution 
fails: Memory access fault by GPU node-1 (Agent handle: 0x8d1aa0) on 
address (nil). Reason: Page not present or supervisor privilege. 
Interestingly, it only fails with -O1 or higher, for -O0 it works. Tobias


Hmm, supposedly there are 768 registers allocated in groups of 12, on 
gfx1100 (8 on other devices), which number you have to double on 
wavefrontsize64 because that field actually counts the number of 32-lane 
registers. The ISA can only actually reference 256 registers, so the 
limit here should be 512. (The remaining registers are intended for 
other wavefronts to use.)


But 256 is not divisible by 12, and it looks like we've rounded up. I 
guess we need to set the limit at 252 (504), for gfx1100.


for-5.c is a register allocation nightmare!

Andrew

Re: [aarch64] PR112950: gcc.target/aarch64/sve/acle/general/dupq_5.c fails on aarch64_be-linux-gnu

2024-01-29 Thread Prathamesh Kulkarni

On Sat, 27 Jan 2024 at 21:19, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > Hi,
> > The test passes -mlittle-endian option but doesn't have target check
> > for aarch64_little_endian and thus fails to compile on
> > aarch64_be-linux-gnu. The patch adds the missing aarch64_little_endian
> > target check, which makes it unsupported on the target.
> > OK to commit ?
> >
> > Thanks,
> > Prathamesh
> >
> > PR112950: Add aarch64_little_endian target check for dupq_5.c
> >
> > gcc/testsuite/ChangeLog:
> >   PR target/112950
> >   * gcc.target/aarch64/sve/acle/general/dupq_5.c: Add
> >   aarch64_little_endian target check.
>
> If we add this requirement, then there's no need to pass -mlittle-endian
> in the dg-options.
>
> But dupq_6.c (the corresponding big-endian test) has:
>
>   /* To avoid needing big-endian header files.  */
>   #pragma GCC aarch64 "arm_sve.h"
>
> instead of:
>
>   #include 
>
> Could you do the same thing here?
That worked, thanks! And it also makes dupq_5.c pass on aarch64_be-linux-gnu.

Thanks,
Prathamesh

>
> Thanks,
> Richard
>
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
> > index 6ae8d4c60b2..1990412d0e5 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
> > @@ -1,5 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-O2 -mlittle-endian" } */
> > +/* { dg-require-effective-target aarch64_little_endian } */
> >
> >  #include 
> >
PR112950: Use #pragma GCC for including arm_sve.h. 

gcc/testsuite/ChangeLog:
PR target/112950
* gcc.target/aarch64/sve/acle/general/dupq_5.c: Remove include directive
and instead use #pragma GCC for including arm_sve.h.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
index 6ae8d4c60b2..e88477b6379 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mlittle-endian" } */
 
-#include 
+#pragma GCC aarch64 "arm_sve.h"
 
 svint32_t
 dupq (int x1, int x2, int x3, int x4)

Re: [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

2024-01-29 Thread Tobias Burnus


Andrew Stubbs wrote:

/tmp/ccrsHfVQ.mkoffload.2.s:788736:27: error: value out of range
   .amdhsa_next_free_vgpr    516 
^~~ [Obviously, likewise 
forlibgomp.c++/..
Hmm, supposedly there are 768 registers allocated in groups of 12, on 
gfx1100 (8 on other devices), which number you have to double on 
wavefrontsize64 because that field actually counts the number of 
32-lane registers. The ISA can only actually reference 256 registers, 
so the limit here should be 512. (The remaining registers are intended 
for other wavefronts to use.)


But 256 is not divisible by 12, and it looks like we've rounded up. I 
guess we need to set the limit at 252 (504), for gfx1100.


BTW: The LLVM source code has,
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp#L1066

unsigned getTotalNumVGPRs(const MCSubtargetInfo *STI) {
  if (STI->getFeatureBits().test(FeatureGFX90AInsts))
return 512;
  if (!isGFX10Plus(*STI))
return 256;
  bool IsWave32 = STI->getFeatureBits().test(FeatureWavefrontSize32);
  if (STI->getFeatureBits().test(FeatureGFX11FullVGPRs))
return IsWave32 ? 1536 : 768;
  return IsWave32 ? 1024 : 512;
}


Tobias

Re: [PATCH] aarch64: Ensure iterator validity when updating debug uses [PR113616]

2024-01-29 Thread Richard Sandiford

Alex Coplan  writes:
> Hi,
>
> The fix for PR113089 introduced range-based for loops over the
> debug_insn_uses of an RTL-SSA set_info, but in the case that we reset a
> debug insn, the use would get removed from the use list, and thus we
> would end up using an invalidated iterator in the next iteration of the
> loop.  In practice this means we end up terminating the loop
> prematurely, and hence ICE as in PR113089 since there are debug uses
> that we failed to fix up.
>
> This patch fixes that by introducing a general mechanism to avoid this
> sort of problem.  We introduce a safe_iterator to iterator-utils.h which
> wraps an iterator, and also holds the end iterator value.  It then
> pre-computes the next iterator value at all iterations, so it doesn't
> matter if the original iterator got invalidated during the loop body, we
> can still move safely to the next iteration.
>
> We introduce an iterate_safely helper which effectively adapts a
> container such as iterator_range into a container of safe_iterators over
> the original iterator type.
>
> We then use iterate_safely around all loops over debug_insn_uses () in
> the aarch64 ldp/stp pass to fix PR113616.  While doing this, I
> remembered that cleanup_tombstones () had the same problem.  I
> previously worked around this locally by manually maintaining the next
> nondebug insn, so this patch also refactors that loop to use the new
> iterate_safely helper.
>
> While doing that I noticed that a couple of cases in cleanup_tombstones
> could be converted from using dyn_cast to as_a,
> which should be safe because there are no clobbers of mem in RTL-SSA, so
> all defs of memory should be set_infos.
>
> Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
>   PR target/113616
>   * config/aarch64/aarch64-ldp-fusion.cc (fixup_debug_uses_trailing_add):
>   Use iterate_safely when iterating over debug uses.
>   (fixup_debug_uses): Likewise.
>   (ldp_bb_info::cleanup_tombstones): Use iterate_safely to iterate
>   over nondebug insns instead of manually maintaining the next insn.
>   * iterator-utils.h (class safe_iterator): New.
>   (iterate_safely): New.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/113616
>   * gcc.c-torture/compile/pr113616.c: New test.

OK, thanks.

Richard

> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 932a6398ae3..22ed95eb743 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -1480,7 +1480,7 @@ fixup_debug_uses_trailing_add (obstack_watermark 
> &attempt,
>def_info *def = defs[0];
>  
>if (auto set = safe_dyn_cast (def->prev_def ()))
> -for (auto use : set->debug_insn_uses ())
> +for (auto use : iterate_safely (set->debug_insn_uses ()))
>if (*use->insn () > *pair_dst)
>   // DEF is getting re-ordered above USE, fix up USE accordingly.
>   fixup_debug_use (attempt, use, def, base, wb_offset);
> @@ -1544,13 +1544,16 @@ fixup_debug_uses (obstack_watermark &attempt,
>auto def = memory_access (insns[0]->defs ());
>auto last_def = memory_access (insns[1]->defs ());
>for (; def != last_def; def = def->next_def ())
> - for (auto use : as_a (def)->debug_insn_uses ())
> -   {
> - if (dump_file)
> -   fprintf (dump_file, "  i%d: resetting debug use of mem\n",
> -use->insn ()->uid ());
> - reset_debug_use (use);
> -   }
> + {
> +   auto set = as_a (def);
> +   for (auto use : iterate_safely (set->debug_insn_uses ()))
> + {
> +   if (dump_file)
> + fprintf (dump_file, "  i%d: resetting debug use of mem\n",
> +  use->insn ()->uid ());
> +   reset_debug_use (use);
> + }
> + }
>  }
>  
>// Now let's take care of register uses, starting with debug uses
> @@ -1577,7 +1580,7 @@ fixup_debug_uses (obstack_watermark &attempt,
>  
>// Now that we've characterized the defs involved, go through the
>// debug uses and determine how to update them (if needed).
> -  for (auto use : set->debug_insn_uses ())
> +  for (auto use : iterate_safely (set->debug_insn_uses ()))
>   {
> if (*pair_dst < *use->insn () && defs[1])
>   // We're re-ordering defs[1] above a previous use of the
> @@ -1609,7 +1612,7 @@ fixup_debug_uses (obstack_watermark &attempt,
>  
>// We have a def in insns[1] which isn't def'd by the first insn.
>// Look to the previous def and see if it has any debug uses.
> -  for (auto use : prev_set->debug_insn_uses ())
> +  for (auto use : iterate_safely (prev_set->debug_insn_uses ()))
>   if (*pair_dst < *use->insn ())
> // We're ordering DEF above a previous use of the same register.
> update_debug_use (use, def, writeback_pat);
> @@ -1622,7 +1625,8 @@ fixup_debug_use

Re: [aarch64] PR112950: gcc.target/aarch64/sve/acle/general/dupq_5.c fails on aarch64_be-linux-gnu

2024-01-29 Thread Richard Sandiford

Prathamesh Kulkarni  writes:
> On Sat, 27 Jan 2024 at 21:19, Richard Sandiford
>  wrote:
>>
>> Prathamesh Kulkarni  writes:
>> > Hi,
>> > The test passes -mlittle-endian option but doesn't have target check
>> > for aarch64_little_endian and thus fails to compile on
>> > aarch64_be-linux-gnu. The patch adds the missing aarch64_little_endian
>> > target check, which makes it unsupported on the target.
>> > OK to commit ?
>> >
>> > Thanks,
>> > Prathamesh
>> >
>> > PR112950: Add aarch64_little_endian target check for dupq_5.c
>> >
>> > gcc/testsuite/ChangeLog:
>> >   PR target/112950
>> >   * gcc.target/aarch64/sve/acle/general/dupq_5.c: Add
>> >   aarch64_little_endian target check.
>>
>> If we add this requirement, then there's no need to pass -mlittle-endian
>> in the dg-options.
>>
>> But dupq_6.c (the corresponding big-endian test) has:
>>
>>   /* To avoid needing big-endian header files.  */
>>   #pragma GCC aarch64 "arm_sve.h"
>>
>> instead of:
>>
>>   #include 
>>
>> Could you do the same thing here?
> That worked, thanks! And it also makes dupq_5.c pass on aarch64_be-linux-gnu.
>
> Thanks,
> Prathamesh
>
>>
>> Thanks,
>> Richard
>>
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c 
>> > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
>> > index 6ae8d4c60b2..1990412d0e5 100644
>> > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
>> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
>> > @@ -1,5 +1,6 @@
>> >  /* { dg-do compile } */
>> >  /* { dg-options "-O2 -mlittle-endian" } */
>> > +/* { dg-require-effective-target aarch64_little_endian } */
>> >
>> >  #include 
>> >
>
> PR112950: Use #pragma GCC for including arm_sve.h. 
>
> gcc/testsuite/ChangeLog:
>   PR target/112950
>   * gcc.target/aarch64/sve/acle/general/dupq_5.c: Remove include directive
>   and instead use #pragma GCC for including arm_sve.h.

OK, thanks.

Richard

> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
> index 6ae8d4c60b2..e88477b6379 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -mlittle-endian" } */
>  
> -#include 
> +#pragma GCC aarch64 "arm_sve.h"
>  
>  svint32_t
>  dupq (int x1, int x2, int x3, int x4)

Re: [PATCH v1] RISC-V: Bugfix for vls integer mode calling convention

2024-01-29 Thread Kito Cheng

> @@ -4868,6 +4968,63 @@ riscv_pass_fpr_pair (machine_mode mode, unsigned 
> regno1,
>GEN_INT (offset2;
>  }
>
> +static rtx
> +riscv_pass_vls_aggregate_in_gpr_or_fpr (struct riscv_arg_info *info,
> +   machine_mode mode, unsigned gpr_base,
> +   unsigned fpr_base)

Tried a few more clang and GCC code gen and I found VLS vector is
always passed in
GPR, and never passed in FPR, so I think I should update psABI rather than fix
that on GCC side.

> @@ -4997,9 +5170,7 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
> CUMULATIVE_ARGS *cum,
>info->gpr_offset = cum->num_gprs;
>info->fpr_offset = cum->num_fprs;
>
> -  /* When disable vector_abi or scalable vector argument is anonymous, this
> - argument is passed by reference.  */
> -  if (riscv_v_ext_mode_p (mode) && (!riscv_vector_abi || !named))
> +  if (riscv_mode_pass_by_reference_p (mode, named))

Keep as it is fine since riscv_vector_abi is gone.

>  return NULL_RTX;
>
>if (named)

[PATCH v3] x86: Save callee-saved registers in noreturn functions for -O0/-Og

2024-01-29 Thread H.J. Lu

Changes in v3:

1. Add the TREE_THIS_VOLATILE check to minimize noreturn attribute lookup.

Changes in v2:

1. Lookup noreturn attribute first.
2. Use __attribute__((noreturn, optimize("-Og"))) in pr38534-6.c.


Save callee-saved registers in noreturn functions for -O0/-Og so that
debugger can restore callee-saved registers in caller's frame.

Also add the TREE_THIS_VOLATILE check to minimize noreturn attribute
lookup.

gcc/

PR target/38534
* config/i386/i386-options.cc (ix86_set_func_type): Save
callee-saved registers in noreturn functions for -O0/-Og.

gcc/testsuite/

PR target/38534
* gcc.target/i386/pr38534-5.c: New file.
* gcc.target/i386/pr38534-6.c: Likewise.
---
 gcc/config/i386/i386-options.cc   | 12 +++
 gcc/testsuite/gcc.target/i386/pr38534-5.c | 26 +++
 gcc/testsuite/gcc.target/i386/pr38534-6.c | 26 +++
 3 files changed, 60 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-6.c

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 473f5359fc9..8f5ce817630 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3381,9 +3381,10 @@ static void
 ix86_set_func_type (tree fndecl)
 {
   /* No need to save and restore callee-saved registers for a noreturn
- function with nothrow or compiled with -fno-exceptions.
+ function with nothrow or compiled with -fno-exceptions unless when
+ compiling with -O0 or -Og.
 
- NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn
+ NB: Can't use just TREE_THIS_VOLATILE to check if this is a noreturn
  function.  The local-pure-const pass turns an interrupt function
  into a noreturn function by setting TREE_THIS_VOLATILE.  Normally
  the local-pure-const pass is run after ix86_set_func_type is called.
@@ -3391,8 +3392,11 @@ ix86_set_func_type (tree fndecl)
  function is marked as noreturn in the IR output, which leads the
  incompatible attribute error in LTO1.  */
   bool has_no_callee_saved_registers
-= (((TREE_NOTHROW (fndecl) || !flag_exceptions)
-   && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl)))
+= ((TREE_THIS_VOLATILE (fndecl)
+   && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl))
+   && optimize
+   && !optimize_debug
+   && (TREE_NOTHROW (fndecl) || !flag_exceptions))
|| lookup_attribute ("no_callee_saved_registers",
TYPE_ATTRIBUTES (TREE_TYPE (fndecl;
 
diff --git a/gcc/testsuite/gcc.target/i386/pr38534-5.c 
b/gcc/testsuite/gcc.target/i386/pr38534-5.c
new file mode 100644
index 000..91c0c0f8c59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr38534-5.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+
+#define ARRAY_SIZE 256
+
+extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
+extern int value (int, int, int)
+#ifndef __x86_64__
+__attribute__ ((regparm(3)))
+#endif
+;
+
+void
+__attribute__((noreturn))
+no_return_to_caller (void)
+{
+  unsigned i, j, k;
+  for (i = ARRAY_SIZE; i > 0; --i)
+for (j = ARRAY_SIZE; j > 0; --j)
+  for (k = ARRAY_SIZE; k > 0; --k)
+   array[i - 1][j - 1][k - 1] = value (i, j, k);
+  while (1);
+}
+
+/* { dg-final { scan-assembler "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr38534-6.c 
b/gcc/testsuite/gcc.target/i386/pr38534-6.c
new file mode 100644
index 000..cf1463a9c66
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr38534-6.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+
+#define ARRAY_SIZE 256
+
+extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
+extern int value (int, int, int)
+#ifndef __x86_64__
+__attribute__ ((regparm(3)))
+#endif
+;
+
+void
+__attribute__((noreturn, optimize("-Og")))
+no_return_to_caller (void)
+{
+  unsigned i, j, k;
+  for (i = ARRAY_SIZE; i > 0; --i)
+for (j = ARRAY_SIZE; j > 0; --j)
+  for (k = ARRAY_SIZE; k > 0; --k)
+   array[i - 1][j - 1][k - 1] = value (i, j, k);
+  while (1);
+}
+
+/* { dg-final { scan-assembler "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
-- 
2.43.0

Re: [PATCH v3] x86: Save callee-saved registers in noreturn functions for -O0/-Og

2024-01-29 Thread Jakub Jelinek

On Mon, Jan 29, 2024 at 05:26:05AM -0800, H.J. Lu wrote:
> Changes in v3:
> 
> 1. Add the TREE_THIS_VOLATILE check to minimize noreturn attribute lookup.
> 
> Changes in v2:
> 
> 1. Lookup noreturn attribute first.
> 2. Use __attribute__((noreturn, optimize("-Og"))) in pr38534-6.c.
> 
> 
> Save callee-saved registers in noreturn functions for -O0/-Og so that
> debugger can restore callee-saved registers in caller's frame.
> 
> Also add the TREE_THIS_VOLATILE check to minimize noreturn attribute
> lookup.
> 
> gcc/
> 
>   PR target/38534
>   * config/i386/i386-options.cc (ix86_set_func_type): Save
>   callee-saved registers in noreturn functions for -O0/-Og.
> 
> gcc/testsuite/
> 
>   PR target/38534
>   * gcc.target/i386/pr38534-5.c: New file.
>   * gcc.target/i386/pr38534-6.c: Likewise.

Ok, thanks.

Jakub

Re: [PATCH] x86: Save callee-saved registers in noreturn functions for -O0/-Og

2024-01-29 Thread H.J. Lu

On Mon, Jan 29, 2024 at 2:11 AM Jakub Jelinek  wrote:
>
> On Sat, Jan 27, 2024 at 07:00:03AM -0800, H.J. Lu wrote:
> > On Sat, Jan 27, 2024 at 6:09 AM Jakub Jelinek  wrote:
> > >
> > > On Sat, Jan 27, 2024 at 05:52:34AM -0800, H.J. Lu wrote:
> > > > @@ -3391,7 +3392,9 @@ ix86_set_func_type (tree fndecl)
> > > >   function is marked as noreturn in the IR output, which leads the
> > > >   incompatible attribute error in LTO1.  */
> > > >bool has_no_callee_saved_registers
> > > > -= (((TREE_NOTHROW (fndecl) || !flag_exceptions)
> > > > += ((optimize
> > > > + && !optimize_debug
> > >
> > > Shouldn't that be opt_for_fn (fndecl, optimize) and ditto for
> > > optimize_debug?
> > > I mean, aren't the options not restored yet when this function is called
> > > (i.e. remain in whatever state they were in the previous function or
> > > global state)?
> >
> > store_parm_decls is called when parsing a function.  store_parm_decls
> > calls allocate_struct_function which calls
> >
> >   invoke_set_current_function_hook (fndecl);
> >
> > which has
> >
> >  /* Change optimization options if needed.  */
> >   if (optimization_current_node != opts)
> > {
> >   optimization_current_node = opts;
> >   cl_optimization_restore (&global_options, &global_options_set,
> >TREE_OPTIMIZATION (opts));
> > }
> >
> >   targetm.set_current_function (fndecl);
> >
> > which calls ix86_set_current_function after global_options
> > has been updated.   ix86_set_func_type is called from
> > ix86_set_current_function.
>
> Sorry, you're right, I just saw option restore later in 
> ix86_set_current_function
> and missed that it is target option restore only.
>
> > > Also, why check "noreturn" attribute rather than
> > > TREE_THIS_VOLATILE (fndecl)?
> > >
> >
> > The comments above this code has
> >
> >  NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn
> >  function.  The local-pure-const pass turns an interrupt function
> >  into a noreturn function by setting TREE_THIS_VOLATILE.  Normally
> >  the local-pure-const pass is run after ix86_set_func_type is called.
> >  When the local-pure-const pass is enabled for LTO, the interrupt
> >  function is marked as noreturn in the IR output, which leads the
> >  incompatible attribute error in LTO1.
>
> So in that case, I think it would be best to test
>   TREE_THIS_VOLATILE (fndecl)
>   && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl))
>   && ...
> because if it doesn't have noreturn attribute, it will not have
> TREE_THIS_VOLATILE set and TREE_THIS_VOLATILE is much cheaper to test than
> looking an attribute.
>

Fixed in the v3 patch:

https://patchwork.sourceware.org/project/gcc/list/?series=30308

Thanks.

-- 
H.J.

Re: [PATCH][GCC][Arm] Add pattern for bswap + rotate -> rev16 [Bug 108933]

2024-01-29 Thread Matthieu Longo


Hi Richard,

Please find below the new patch where I addressed your comments and 
updated the changelog.


rev16 pattern was not recognised anymore as a change in the bswap tree
pass was introducing a new GIMPLE form, not recognized by the assembly
final transformation pass.

More details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108933

gcc/ChangeLog:

PR target/108933
* config/arm/arm.md (arm_rev16si2): Convert to define_insn.
Correct generated RTL.
(arm_rev16si2_alt1): Correctly handle conditional execution.
(arm_rev16si2_alt2): Likewise.

gcc/testsuite/ChangeLog:

PR target/108933
* gcc.target/arm/rev16.c: Moved to...
* gcc.target/arm/rev16_1.c: ...here.
* gcc.target/arm/rev16_2.c: New test to check that rev16 is
emitted.

On 2024-01-22 16:25, Richard Earnshaw (lists) wrote:

On 22/01/2024 12:18, Matthieu Longo wrote:

rev16 pattern was not recognised anymore as a change in the bswap tree
pass was introducing a new GIMPLE form, not recognized by the assembly
final transformation pass.

More details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108933

gcc/ChangeLog:

     PR target/108933
     * config/arm/arm.md (*arm_rev16si2_alt3): new pattern to convert
   a bswap + rotate by 16 bits into rev16


ChangeLog entries need to be written as sentences, so start with a capital letter and end 
with a full stop; continuation lines should start in column 8 (one hard tab, don't use 
spaces).  But in this case, "New pattern." is sufficient.



gcc/testsuite/ChangeLog:

     PR target/108933
     * gcc.target/arm/rev16.c: Moved to...
     * gcc.target/arm/rev16_1.c: ...here.
     * gcc.target/arm/rev16_2.c: New test to check that rev16 is
   emitted.



+;; Similar pattern to match (rotate (bswap) 16)
+(define_insn "*arm_rev16si2_alt3"
+  [(set (match_operand:SI 0 "register_operand" "=l,r")
+(rotate:SI (bswap:SI (match_operand:SI 1 "register_operand" "l,r"))
+ (const_int 16)))]
+  "arm_arch6"
+  "rev16\\t%0, %1"
+  [(set_attr "arch" "t,32")
+   (set_attr "length" "2,4")
+   (set_attr "type" "rev")]
+)
+

Unfortunately, this is insufficient.  When generating Arm or Thumb2 code (but 
not thumb1) we also have to handle conditional execution: we need to have '%?' 
in the output template at the point where a condition code might be needed.  
That means we need separate output templates for all three alternatives (as we 
need a 16-bit variant for thumb2 that's conditional and a 16-bit for thumb1 
that isn't).  See the output of arm_rev16 for a guide of what is really needed.

I note that the arm_rev16si2_alt1, and arm_rev16si2_alt2 patterns are incorrect 
in this regard as well; that will need fixing.

I also see that arm_rev16si2 currently expands to the alt1 variant above; given 
that the preferred canonical form would now appear to use bswap + rotate, we 
should change that as well.  In fact, we can merge your new pattern with the 
expand entirely and eliminate the need to call gen_arm_rev16si2_alt1.  
Something like:

(define_insn "arm_rev16si2"
   [(set (match_operand:SI 0 "s_register_operand")
 (rotate:SI (bswap:SI (match_operand:SI 1 "s_register_operand")) 
(const_int 16))]
   "arm_arch6"
   "@
   rev16...
   ...


R.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
4a98f2d7b6251da940806b26d4c310a7f7af927b..5816409f86f1106b410c5e21d77e599b485f85f2
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -12578,7 +12578,10 @@ (define_insn "arm_rev16si2_alt1"
   "arm_arch6
&& aarch_rev16_shleft_mask_imm_p (operands[3], SImode)
&& aarch_rev16_shright_mask_imm_p (operands[2], SImode)"
-  "rev16\\t%0, %1"
+  "@
+   rev16\t%0, %1
+   rev16%?\t%0, %1
+   rev16%?\t%0, %1"
   [(set_attr "arch" "t1,t2,32")
(set_attr "length" "2,2,4")
(set_attr "type" "rev")]
@@ -12595,22 +12598,28 @@ (define_insn "*arm_rev16si2_alt2"
   "arm_arch6
&& aarch_rev16_shleft_mask_imm_p (operands[3], SImode)
&& aarch_rev16_shright_mask_imm_p (operands[2], SImode)"
-  "rev16\\t%0, %1"
+  "@
+   rev16\t%0, %1
+   rev16%?\t%0, %1
+   rev16%?\t%0, %1"
   [(set_attr "arch" "t1,t2,32")
(set_attr "length" "2,2,4")
(set_attr "type" "rev")]
 )
 
-(define_expand "arm_rev16si2"
-  [(set (match_operand:SI 0 "s_register_operand")
-   (bswap:SI (match_operand:SI 1 "s_register_operand")))]
+;; Similar pattern to match (rotate (bswap) 16)
+(define_insn "arm_rev16si2"
+  [(set (match_operand:SI 0 "register_operand" "=l,l,r")
+(rotate:SI (bswap:SI (match_operand:SI 1 "register_operand" "l,l,r"))
+   (const_int 16)))]
   "arm_arch6"
-  {
-rtx left = gen_int_mode (HOST_WIDE_INT_C (0xff00ff00ff00ff00), SImode);
-rtx right = gen_int_mode (HOST_WIDE_INT_C (0xff00ff00ff00ff), SImode);
-emit_insn (gen_arm_rev16si2_alt1 (operands[0], operands[1], right, left));
-DONE;
-  }
+  "@
+   rev16\t%0, %1
+   rev16

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread H.J. Lu

On Mon, Jan 29, 2024 at 3:03 AM Jakub Jelinek  wrote:
>
> On Sat, Jan 27, 2024 at 07:10:55AM -0800, H.J. Lu wrote:
> > For function symbol reference in readonly data section, instead of putting
> > it in .data.rel.ro or .rodata.cst section, call function_rodata_section to
> > get the read-only or relocated read-only data section associated with the
> > function DECL so that the COMDAT section will be used for a COMDAT function
> > symbol.
>
> I have to admit I still don't understand what the linker doesn't like on
> what GCC emits and why references to the public symbols at the start of
> comdat sections are ok in .text but not in .data.rel.ro but are in .data
> or .rodata sections (or what the exact rules are, see also what we emit on
> __attribute__((noinline, noipa)) inline void foo () {}
> void bar () { foo (); } void (*p) () = foo; void (*const q) () = foo; void 
> (*const *r) () = &q;
> ).
> I've always thought that the problematic references are when something
> references non-public symbols in comdat sections, especially not at their
> start, because if linker selects some comdat section(s) from some other
> TU, there is no guarantee e.g. the code is identical (just in valid program
> should behave the same) and if such reference comes from other comdat that
> is kept or from non-comdat sections, the question is what should be
> referenced.
>
> But in this case, I believe we are referencing the function at the start of
> a code comdat section.
>
> Now, in my limited understanding what the patch does is totally wrong
> for multiple reasons.  On the first testcase it changes
> -   .section.data.rel.ro.local,"aw"
> +   .section
> .data.rel.ro.local._ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE,"awG",@progbits,_ZN26vtkStaticCellLinksTemplateIxE18ThreadedBuildLinksExxP12vtkCellArray,comdat
> .align 8
>  .LC0:
> .quad   
> _ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE
> Now, I believe such a .data.rel.ro.local.* section is normally
> used for .data.rel.ro.local constants from the referenced function,
> if we have some relocatable constant needed in that function we
> emit those there.
> If linker picks up the comdat from current TU, it will be all fine,
> sure, but if it picks up the comdat from another TU, the
> .data.rel.ro.local._ZN4blah17_Function_handlerIFvvENS_5* section
> there might not be present or might contain some unrelated stuff.
> Given the handling of (const (plus (symbol_ref) (const_int)), we
> also don't know whether the section holds a reference to the start,
> or to some other offset of it, how many etc.
> And, we refenre a non-public symbol (.LC0) from non-comdat section
> to a comdat section.

TARGET_ASM_SELECT_RTX_SECTION is for constant in RTL.
It should have a non-public label reference which can't be used
by other TUs.  The same section can contain other constants.
If there is a COMAT issue, linker will catch it.

> If I'm wrong on this, please try to explain.
>
> Jakub
>


-- 
H.J.

[PATCH][libsanitizer]: Sync fixes for asan interceptors from upstream [PR112644]

2024-01-29 Thread Tamar Christina

Hi All,

This cherry-picks and squashes the differences between commits

d3e5c20ab846303874a2a25e5877c72271fc798b..76e1e45922e6709392fb82aac44bebe3dbc2ea63
from LLVM upstream from compiler-rt/lib/hwasan/ to GCC on the changes relevant
for GCC.

This is required to fix the linked PR.

As mentioned in the PR the last sync brought in a bug from upstream[1] where
operations became non-recoverable and as such the tests in AArch64 started
failing.  This cherry picks the fix and there are minor updates needed to GCC
after this to fix the cases.

[1] https://github.com/llvm/llvm-project/pull/74000

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

libsanitizer/ChangeLog:

PR sanitizer/112644
* hwasan/hwasan_interceptors.cpp (ACCESS_MEMORY_RANGE,
HWASAN_READ_RANGE, HWASAN_WRITE_RANGE, COMMON_SYSCALL_PRE_READ_RANGE,
COMMON_SYSCALL_PRE_WRITE_RANGE, COMMON_INTERCEPTOR_WRITE_RANGE,
COMMON_INTERCEPTOR_READ_RANGE): Make recoverable.

--- inline copy of patch -- 
diff --git a/libsanitizer/hwasan/hwasan_interceptors.cpp 
b/libsanitizer/hwasan/hwasan_interceptors.cpp
index 
d9237cf9b8e3bf982cf213123ef22e73ec027c9e..96df4dd0c24d7d3db28fa2557cf63da0f295e33f
 100644
--- a/libsanitizer/hwasan/hwasan_interceptors.cpp
+++ b/libsanitizer/hwasan/hwasan_interceptors.cpp
@@ -36,16 +36,16 @@ struct HWAsanInterceptorContext {
   const char *interceptor_name;
 };
 
-#  define ACCESS_MEMORY_RANGE(ctx, offset, size, access)\
-do {\
-  __hwasan::CheckAddressSized((uptr)offset, \
-  size);\
+#  define ACCESS_MEMORY_RANGE(offset, size, access)   \
+do {  \
+  __hwasan::CheckAddressSized((uptr)offset, \
+size);\
 } while (0)
 
-#  define HWASAN_READ_RANGE(ctx, offset, size) \
-ACCESS_MEMORY_RANGE(ctx, offset, size, AccessType::Load)
-#  define HWASAN_WRITE_RANGE(ctx, offset, size) \
-ACCESS_MEMORY_RANGE(ctx, offset, size, AccessType::Store)
+#  define HWASAN_READ_RANGE(offset, size) \
+ACCESS_MEMORY_RANGE(offset, size, AccessType::Load)
+#  define HWASAN_WRITE_RANGE(offset, size) \
+ACCESS_MEMORY_RANGE(offset, size, AccessType::Store)
 
 #  if !SANITIZER_APPLE
 #define HWASAN_INTERCEPT_FUNC(name)
\
@@ -74,9 +74,8 @@ struct HWAsanInterceptorContext {
 
 #  if HWASAN_WITH_INTERCEPTORS
 
-#define COMMON_SYSCALL_PRE_READ_RANGE(p, s) __hwasan_loadN((uptr)p, 
(uptr)s)
-#define COMMON_SYSCALL_PRE_WRITE_RANGE(p, s) \
-  __hwasan_storeN((uptr)p, (uptr)s)
+#define COMMON_SYSCALL_PRE_READ_RANGE(p, s) HWASAN_READ_RANGE(p, s)
+#define COMMON_SYSCALL_PRE_WRITE_RANGE(p, s) HWASAN_WRITE_RANGE(p, s)
 #define COMMON_SYSCALL_POST_READ_RANGE(p, s) \
   do {   \
 (void)(p);   \
@@ -91,10 +90,10 @@ struct HWAsanInterceptorContext {
 #include "sanitizer_common/sanitizer_syscalls_netbsd.inc"
 
 #define COMMON_INTERCEPTOR_WRITE_RANGE(ctx, ptr, size) \
-  HWASAN_WRITE_RANGE(ctx, ptr, size)
+  HWASAN_WRITE_RANGE(ptr, size)
 
 #define COMMON_INTERCEPTOR_READ_RANGE(ctx, ptr, size) \
-  HWASAN_READ_RANGE(ctx, ptr, size)
+  HWASAN_READ_RANGE(ptr, size)
 
 #define COMMON_INTERCEPTOR_ENTER(ctx, func, ...) \
   HWAsanInterceptorContext _ctx = {#func};   \




-- 
diff --git a/libsanitizer/hwasan/hwasan_interceptors.cpp 
b/libsanitizer/hwasan/hwasan_interceptors.cpp
index 
d9237cf9b8e3bf982cf213123ef22e73ec027c9e..96df4dd0c24d7d3db28fa2557cf63da0f295e33f
 100644
--- a/libsanitizer/hwasan/hwasan_interceptors.cpp
+++ b/libsanitizer/hwasan/hwasan_interceptors.cpp
@@ -36,16 +36,16 @@ struct HWAsanInterceptorContext {
   const char *interceptor_name;
 };
 
-#  define ACCESS_MEMORY_RANGE(ctx, offset, size, access)\
-do {\
-  __hwasan::CheckAddressSized((uptr)offset, \
-  size);\
+#  define ACCESS_MEMORY_RANGE(offset, size, access)   \
+do {  \
+  __hwasan::CheckAddressSized((uptr)offset, \
+size);\
 } while (0)
 
-#  define HWASAN_READ_RANGE(ctx, offset, size) \
-ACCESS_MEMORY_RANGE(ctx, offset, size, AccessType::Load)
-#  define HWASAN_WRITE_RANGE(ctx, offset, size) \
-ACCESS_MEMORY_RANGE(ctx, offset, size, AccessType::Store)
+#  define HWASAN_READ_RANGE(offset, size) \
+ACCESS_MEMORY_RANGE(offset, size, AccessType::Load)

[PATCH]AArch64: relax cbranch tests to accepted inverted branches [PR113502]

2024-01-29 Thread Tamar Christina

Hi All,

Recently something in the midend had started inverting the branches by inverting
the condition and the branches.

While this is fine, it makes it hard to actually test.  In RTL I disable
scheduling and BB reordering to prevent this.  But in GIMPLE there seems to be
nothing I can do.  __builtin_expect seems to have no impact on the change since
I suspect this is happening during expand where conditions can be flipped
regardless of probability during compare_and_branch.

Since the mid-end has plenty of correctness tests, this weakens the backend
tests to just check that a correct looking sequence is emitted.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

PR testsuite/113502
* gcc.target/aarch64/sve/vect-early-break-cbranch.c: Ignore exact 
branch.
* gcc.target/aarch64/vect-early-break-cbranch.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c 
b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
index 
d15053553f94e7dce3540e21f0c1f0d39ea4f289..d7cef1105410be04ed67d1d3b800746267f205a8
 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
@@ -9,7 +9,7 @@ int b[N] = {0};
 ** ...
 ** cmpgt   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
 ** ptest   p[0-9]+, p[0-9]+.b
-** b.any   \.L[0-9]+
+** b.(any|none)\.L[0-9]+
 ** ...
 */
 void f1 ()
@@ -26,7 +26,7 @@ void f1 ()
 ** ...
 ** cmpge   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
 ** ptest   p[0-9]+, p[0-9]+.b
-** b.any   \.L[0-9]+
+** b.(any|none)\.L[0-9]+
 ** ...
 */
 void f2 ()
@@ -43,7 +43,7 @@ void f2 ()
 ** ...
 ** cmpeq   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
 ** ptest   p[0-9]+, p[0-9]+.b
-** b.any   \.L[0-9]+
+** b.(any|none)\.L[0-9]+
 ** ...
 */
 void f3 ()
@@ -60,7 +60,7 @@ void f3 ()
 ** ...
 ** cmpne   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
 ** ptest   p[0-9]+, p[0-9]+.b
-** b.any   \.L[0-9]+
+** b.(any|none)\.L[0-9]+
 ** ...
 */
 void f4 ()
@@ -77,7 +77,7 @@ void f4 ()
 ** ...
 ** cmplt   p[0-9]+.s, p7/z, z[0-9]+.s, #0
 ** ptest   p[0-9]+, p[0-9]+.b
-** b.any   .L[0-9]+
+** b.(any|none).L[0-9]+
 ** ...
 */
 void f5 ()
@@ -94,7 +94,7 @@ void f5 ()
 ** ...
 ** cmple   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
 ** ptest   p[0-9]+, p[0-9]+.b
-** b.any   \.L[0-9]+
+** b.(any|none)\.L[0-9]+
 ** ...
 */
 void f6 ()
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c 
b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
index 
a5e7b94827dd70240d754a834f1d11750a9c27a9..673b781eb6d092f6311409797b20a971f4fae247
 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
@@ -15,7 +15,7 @@ int b[N] = {0};
 ** cmgtv[0-9]+.4s, v[0-9]+.4s, #0
 ** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
 ** fmovx[0-9]+, d[0-9]+
-** cbnzx[0-9]+, \.L[0-9]+
+** cbn?z   x[0-9]+, \.L[0-9]+
 ** ...
 */
 void f1 ()
@@ -34,7 +34,7 @@ void f1 ()
 ** cmgev[0-9]+.4s, v[0-9]+.4s, #0
 ** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
 ** fmovx[0-9]+, d[0-9]+
-** cbnzx[0-9]+, \.L[0-9]+
+** cbn?z   x[0-9]+, \.L[0-9]+
 ** ...
 */
 void f2 ()
@@ -53,7 +53,7 @@ void f2 ()
 ** cmeqv[0-9]+.4s, v[0-9]+.4s, #0
 ** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
 ** fmovx[0-9]+, d[0-9]+
-** cbnzx[0-9]+, \.L[0-9]+
+** cbn?z   x[0-9]+, \.L[0-9]+
 ** ...
 */
 void f3 ()
@@ -72,7 +72,7 @@ void f3 ()
 ** cmtst   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
 ** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
 ** fmovx[0-9]+, d[0-9]+
-** cbnzx[0-9]+, \.L[0-9]+
+** cbn?z   x[0-9]+, \.L[0-9]+
 ** ...
 */
 void f4 ()
@@ -91,7 +91,7 @@ void f4 ()
 ** cmltv[0-9]+.4s, v[0-9]+.4s, #0
 ** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
 ** fmovx[0-9]+, d[0-9]+
-** cbnzx[0-9]+, \.L[0-9]+
+** cbn?z   x[0-9]+, \.L[0-9]+
 ** ...
 */
 void f5 ()
@@ -110,7 +110,7 @@ void f5 ()
 ** cmlev[0-9]+.4s, v[0-9]+.4s, #0
 ** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
 ** fmovx[0-9]+, d[0-9]+
-** cbnzx[0-9]+, \.L[0-9]+
+** cbn?z   x[0-9]+, \.L[0-9]+
 ** ...
 */
 void f6 ()




-- 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c 
b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
index 
d15053553f94e7dce3540e21f0c1f0d39ea4f289..d7cef1105410be04ed67d1d3b800746267f205a8
 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
@@ -9,7 +9,7 @@ int b[N] = {0};
 ** ...
 ** cmpgt   p[0-9]+.s, p[0-9

[PATCH]middle-end: check memory accesses in the destination block [PR113588].

2024-01-29 Thread Tamar Christina

Hi All,

When analyzing loads for early break it was always the intention that for the
exit where things get moved to we only check the loads that can be reached from
the condition.

However the main loop checks all loads and we skip the destination BB.  As such
we never actually check the loads reachable from the COND in the last BB unless
this BB was also the exit chosen by the vectorizer.

This leads us to incorrectly vectorize the loop in the PR and in doing so access
out of bounds.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/113588
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences_1): New.
(vect_analyze_data_ref_dependence):  Use it.
(vect_analyze_early_break_dependences): Update comments.

gcc/testsuite/ChangeLog:

PR tree-optimization/113588
* gcc.dg/vect/vect-early-break_108-pr113588.c: New test.
* gcc.dg/vect/vect-early-break_109-pr113588.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_108-pr113588.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_108-pr113588.c
new file mode 100644
index 
..e488619c9aac41fafbcf479818392a6bb7c6924f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_108-pr113588.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+int foo (const char *s, unsigned long n)
+{
+ unsigned long len = 0;
+ while (*s++ && n--)
+   ++len;
+ return len;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_109-pr113588.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_109-pr113588.c
new file mode 100644
index 
..488c19d3ede809631d1a7ede0e7f7bcdc7a1ae43
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_109-pr113588.c
@@ -0,0 +1,44 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target mmap } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#include 
+#include 
+
+#include "tree-vect.h"
+
+__attribute__((noipa))
+int foo (const char *s, unsigned long n)
+{
+ unsigned long len = 0;
+ while (*s++ && n--)
+   ++len;
+ return len;
+}
+
+int main()
+{
+
+  check_vect ();
+
+  long pgsz = sysconf (_SC_PAGESIZE);
+  void *p = mmap (NULL, pgsz * 3, PROT_READ|PROT_WRITE,
+ MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
+  if (p == MAP_FAILED)
+return 0;
+  mprotect (p, pgsz, PROT_NONE);
+  mprotect (p+2*pgsz, pgsz, PROT_NONE);
+  char *p1 = p + pgsz;
+  p1[0] = 1;
+  p1[1] = 0;
+  foo (p1, 1000);
+  p1 = p + 2*pgsz - 2;
+  p1[0] = 1;
+  p1[1] = 0;
+  foo (p1, 1000);
+  return 0;
+}
+
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 
f592aeb8028afd4fd70e2175104efab2a2c0d82e..52cef242a7ce5d0e525bff639fa1dc2f0a6f30b9
 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -619,10 +619,69 @@ vect_analyze_data_ref_dependence (struct 
data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
-/* Funcion vect_analyze_early_break_dependences.
+/* Function vect_analyze_early_break_dependences_1
 
-   Examime all the data references in the loop and make sure that if we have
-   mulitple exits that we are able to safely move stores such that they become
+   Helper function of vect_analyze_early_break_dependences which performs 
safety
+   analysis for load operations in an early break.  */
+
+static opt_result
+vect_analyze_early_break_dependences_1 (data_reference *dr_ref, gimple *stmt)
+{
+  /* We currently only support statically allocated objects due to
+ not having first-faulting loads support or peeling for
+ alignment support.  Compute the size of the referenced object
+ (it could be dynamically allocated).  */
+  tree obj = DR_BASE_ADDRESS (dr_ref);
+  if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"early breaks only supported on statically"
+" allocated objects.\n");
+  return opt_result::failure_at (stmt,
+"can't safely apply code motion to "
+"dependencies of %G to vectorize "
+"the early exit.\n", stmt);
+}
+
+  tree refop = TREE_OPERAND (obj, 0);
+  tree refbase = get_base_address (refop);
+  if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+  || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"early bre

Re: [PATCH][libsanitizer]: Sync fixes for asan interceptors from upstream [PR112644]

2024-01-29 Thread Jakub Jelinek

On Mon, Jan 29, 2024 at 03:03:46PM +, Tamar Christina wrote:
> Hi All,
> 
> This cherry-picks and squashes the differences between commits
> 
> d3e5c20ab846303874a2a25e5877c72271fc798b..76e1e45922e6709392fb82aac44bebe3dbc2ea63
> from LLVM upstream from compiler-rt/lib/hwasan/ to GCC on the changes relevant
> for GCC.
> 
> This is required to fix the linked PR.
> 
> As mentioned in the PR the last sync brought in a bug from upstream[1] where
> operations became non-recoverable and as such the tests in AArch64 started
> failing.  This cherry picks the fix and there are minor updates needed to GCC
> after this to fix the cases.
> 
> [1] https://github.com/llvm/llvm-project/pull/74000
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> libsanitizer/ChangeLog:
> 
>   PR sanitizer/112644
>   * hwasan/hwasan_interceptors.cpp (ACCESS_MEMORY_RANGE,
>   HWASAN_READ_RANGE, HWASAN_WRITE_RANGE, COMMON_SYSCALL_PRE_READ_RANGE,
>   COMMON_SYSCALL_PRE_WRITE_RANGE, COMMON_INTERCEPTOR_WRITE_RANGE,
>   COMMON_INTERCEPTOR_READ_RANGE): Make recoverable.

The normal ChangeLog entry for this would be
Cherry-pick llvm-project revision XYZ and UVW.

Ok for trunk with that changed.

Jakub

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Qing Zhao

Thank you!

Joseph and Richard,  could you also comment on this?

> On Jan 28, 2024, at 5:09 AM, Martin Uecker  wrote:
> 
> Am Freitag, dem 26.01.2024 um 14:33 + schrieb Qing Zhao:
>> 
>>> On Jan 26, 2024, at 3:04 AM, Martin Uecker  wrote:
>>> 
>>> 
>>> I haven't looked at the patch, but it sounds you give the result
>>> the wrong type. Then patching up all use cases instead of the
>>> type seems wrong.
>> 
>> Yes, this is for resolving a very early gimplification issue as I reported 
>> last Nov:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
>> 
>> Since no-one responded at that time, I fixed the issue by replacing the 
>> ARRAY_REF
>> With a pointer indirection:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html
>> 
>> The reason for such change is:  return a flexible array member TYPE is not 
>> allowed
>> by C language (our gimplification follows this rule), so, we have to return 
>> a pointer TYPE instead. 
>> 
>> **The new internal function
>> 
>> .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
>> ACCESS_MODE, INDEX)
>> 
>> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
>> 
>> which returns the "REF_TO_OBJ" same as the 1st argument;
>> 
>> Both the return type and the type of the first argument of this function 
>> have been converted from 
>> the incomplete array type to the corresponding pointer type.
>> 
>> As a result, the original ARRAY_REF was converted to an INDIRECT_REF, the 
>> original INDEX of the ARRAY_REF was lost
>> when converting from ARRAY_REF to INDIRECT_REF, in order to keep the INDEX 
>> for bound sanitizer instrumentation, I added
>> The 6th argument “INDEX”.
>> 
>> What’s your comment and suggestion on this solution?
> 
> I am not entirely sure but changing types in the FE seems
> problematic because this breaks language semantics. And
> then adding special code everywhere to treat it specially
> in the FE does not seem a good way forward.
> 
> If I understand correctly, returning an incomplete array 
> type is not allowed and then fails during gimplification.

Yes, this is the problem in gimplification. 

> So I would suggest to make it return a pointer to the 
> incomplete array (and not the element type)

for the following:

struct annotated {
  unsigned int size;
  int array[] __attribute__((counted_by (size)));
};

  struct annotated * p = ….
  p->array[9] = 0;

The IL for the above array reference p->array[9] is:

1. If the return type is the original incomplete array type, 

.ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1)[9] = 0;

(this triggered the gimplification failure since the return type cannot be a 
complete type).

2. When the return type is changed to a pointer to the element type of the 
incomplete array, (the current patch)
Then the original array reference naturally becomes an indirect reference 
through the pointer

*(.ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1, 9) + 36) = 0;

Since the original array reference becomes an indirect reference through the 
pointer to the element array, the INDEX info 
is mixed into the OFFSET of the indirect reference and lost, so, I added the 
6th argument to the routine .ACCESS_WITH_SIZE
to record the INDEX. 

3. With your suggestion, the return type is changed to a pointer to the 
incomplete array, 
I just tried this to change the result type :

--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2619,7 +2619,7 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
   tree counted_by_type)
 {
   gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
-  tree result_type = build_pointer_type (TREE_TYPE (TREE_TYPE (ref)));
+  tree result_type = build_pointer_type (TREE_TYPE (ref));

Then, I got the following FE errors:

test.c:10:11: error: invalid use of flexible array member
   10 |   p->array[9] = 0;

The reason for the error is: when the original array_ref becomes an 
indirect_ref through the pointer to the incomplete array,
During the computation of the OFFSET to the pointer, the TYPE_SIZE_UNIT (type) 
is invalid since the type is an incomplete array. 
As a result, the OFFSET cannot computed for the indirect_ref.

Looks like even more issues with this approach.

> but then wrap
> it with an indirection when inserting this code in the FE
> so that the full replacement has the correct type again
> (of the incomplete array).

I don’t quite understand the above, could you please explain this in more 
details? (If possible, could you please use the above small example?)
thanks.

> 
> 
> Alternatively, one could allow this during gimplification
> or add some conversion.

Allowing this in gimplification might trigger some other issues.  I guess that 
adding conversion 
in the end of the FE or in the beginning of gimplification might be better.

i.e,  in FE, still keep the original incomplete array type as the return type 
for the routine .ACCESS_WITH_SIZ

Re: [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

2024-01-29 Thread Andrew Stubbs


On 29/01/2024 12:50, Tobias Burnus wrote:

Andrew Stubbs wrote:

/tmp/ccrsHfVQ.mkoffload.2.s:788736:27: error: value out of range
   .amdhsa_next_free_vgpr    516 
^~~ [Obviously, likewise 
forlibgomp.c++/..
Hmm, supposedly there are 768 registers allocated in groups of 12, on 
gfx1100 (8 on other devices), which number you have to double on 
wavefrontsize64 because that field actually counts the number of 
32-lane registers. The ISA can only actually reference 256 registers, 
so the limit here should be 512. (The remaining registers are intended 
for other wavefronts to use.)


But 256 is not divisible by 12, and it looks like we've rounded up. I 
guess we need to set the limit at 252 (504), for gfx1100.


BTW: The LLVM source code has,
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp#L1066

unsigned getTotalNumVGPRs(const MCSubtargetInfo *STI) {
   if (STI->getFeatureBits().test(FeatureGFX90AInsts))
     return 512;
   if (!isGFX10Plus(*STI))
     return 256;
   bool IsWave32 = STI->getFeatureBits().test(FeatureWavefrontSize32);
   if (STI->getFeatureBits().test(FeatureGFX11FullVGPRs))
     return IsWave32 ? 1536 : 768;
   return IsWave32 ? 1024 : 512;
}


That matches what we have in libgomp.

LLVM must have another configuration somewhere for how many registers it 
can actually use in code (the ISA can encode 256, but that doesn't mean 
it should always do so). This may be a moot point because allowing too 
many registers limits how many threads can run in parallel, so they may 
have chosen to impose an artificial limit at all times.


In GCC, non-kernel functions are limited to 24 registers (for maximum 
occupancy -- we could probably increase that 50% on "GFX11Full" 
devices), but the kernel entry point is permitted to go crazy.


Andrew

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Martin Uecker

Am Montag, dem 29.01.2024 um 15:09 + schrieb Qing Zhao:
> Thank you!
> 
> Joseph and Richard,  could you also comment on this?
> 
> > On Jan 28, 2024, at 5:09 AM, Martin Uecker  wrote:
> > 
> > Am Freitag, dem 26.01.2024 um 14:33 + schrieb Qing Zhao:
> > > 
> > > > On Jan 26, 2024, at 3:04 AM, Martin Uecker  wrote:
> > > > 
> > > > 
> > > > I haven't looked at the patch, but it sounds you give the result
> > > > the wrong type. Then patching up all use cases instead of the
> > > > type seems wrong.
> > > 
> > > Yes, this is for resolving a very early gimplification issue as I 
> > > reported last Nov:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
> > > 
> > > Since no-one responded at that time, I fixed the issue by replacing the 
> > > ARRAY_REF
> > > With a pointer indirection:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html
> > > 
> > > The reason for such change is:  return a flexible array member TYPE is 
> > > not allowed
> > > by C language (our gimplification follows this rule), so, we have to 
> > > return a pointer TYPE instead. 
> > > 
> > > **The new internal function
> > > 
> > > .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
> > > ACCESS_MODE, INDEX)
> > > 
> > > INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
> > > 
> > > which returns the "REF_TO_OBJ" same as the 1st argument;
> > > 
> > > Both the return type and the type of the first argument of this function 
> > > have been converted from 
> > > the incomplete array type to the corresponding pointer type.
> > > 
> > > As a result, the original ARRAY_REF was converted to an INDIRECT_REF, the 
> > > original INDEX of the ARRAY_REF was lost
> > > when converting from ARRAY_REF to INDIRECT_REF, in order to keep the 
> > > INDEX for bound sanitizer instrumentation, I added
> > > The 6th argument “INDEX”.
> > > 
> > > What’s your comment and suggestion on this solution?
> > 
> > I am not entirely sure but changing types in the FE seems
> > problematic because this breaks language semantics. And
> > then adding special code everywhere to treat it specially
> > in the FE does not seem a good way forward.
> > 
> > If I understand correctly, returning an incomplete array 
> > type is not allowed and then fails during gimplification.
> 
> Yes, this is the problem in gimplification. 
> 
> > So I would suggest to make it return a pointer to the 
> > incomplete array (and not the element type)
> 
> 
> for the following:
> 
> struct annotated {
>   unsigned int size;
>   int array[] __attribute__((counted_by (size)));
> };
> 
>   struct annotated * p = ….
>   p->array[9] = 0;
> 
> The IL for the above array reference p->array[9] is:
> 
> 1. If the return type is the original incomplete array type, 
> 
> .ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1)[9] = 0;
> 
> (this triggered the gimplification failure since the return type cannot be a 
> complete type).
> 
> 2. When the return type is changed to a pointer to the element type of the 
> incomplete array, (the current patch)
> Then the original array reference naturally becomes an indirect reference 
> through the pointer
> 
> *(.ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1, 9) + 36) = 0;
> 
> Since the original array reference becomes an indirect reference through the 
> pointer to the element array, the INDEX info 
> is mixed into the OFFSET of the indirect reference and lost, so, I added the 
> 6th argument to the routine .ACCESS_WITH_SIZE
> to record the INDEX. 
> 
> 3. With your suggestion, the return type is changed to a pointer to the 
> incomplete array, 
> I just tried this to change the result type :
> 
> 
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -2619,7 +2619,7 @@ build_access_with_size_for_counted_by (location_t loc, 
> tree ref,
>tree counted_by_type)
>  {
>gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
> -  tree result_type = build_pointer_type (TREE_TYPE (TREE_TYPE (ref)));
> +  tree result_type = build_pointer_type (TREE_TYPE (ref));
> 
> Then, I got the following FE errors:
> 
> test.c:10:11: error: invalid use of flexible array member
>10 |   p->array[9] = 0;
> 
> The reason for the error is: when the original array_ref becomes an 
> indirect_ref through the pointer to the incomplete array,
> During the computation of the OFFSET to the pointer, the TYPE_SIZE_UNIT 
> (type) is invalid since the type is an incomplete array. 
> As a result, the OFFSET cannot computed for the indirect_ref.
> 
> Looks like even more issues with this approach.

Yes, but only because the following is missing:

> 
> 
> > but then wrap
> > it with an indirection when inserting this code in the FE
> > so that the full replacement has the correct type again
> > (of the incomplete array).
> 
> I don’t quite understand the above, could you please explain this in more 
> details? (If possible, could you plea

Re: [v2][patch] plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513]

2024-01-29 Thread Thomas Schwinge

Hi Tobias!

On 2024-01-23T10:55:16+0100, Tobias Burnus  wrote:
> Slightly changed patch:
>
> nvptx_attach_host_thread_to_device now fails again with an error for 
> CUDA_ERROR_DEINITIALIZED, except for GOMP_OFFLOAD_fini_device.
>
> I think it makes more sense that way.

Agreed.

> Tobias Burnus wrote:
>> Testing showed that the libgomp.c/target-52.c failed with:
>>
>> libgomp: cuCtxGetDevice error: unknown cuda error
>>
>> libgomp: device finalization failed
>>
>> This testcase uses OMP_DISPLAY_ENV=true and 
>> OMP_TARGET_OFFLOAD=mandatory, and those env vars matter, i.e. it only 
>> fails if dg-set-target-env-var is honored.
>>
>> If both env vars are set, the device initialization occurs earlier as 
>> OMP_DEFAULT_DEVICE is shown due to the display-env env var and its 
>> value (when target-offload-var is 'mandatory') might be either 
>> 'omp_invalid_device' or '0'.
>>
>> It turned out that this had an effect on device finalization, which 
>> caused CUDA to stop earlier than expected. This patch now handles this 
>> case gracefully. For details, see the commit log message in the 
>> attached patch and/or the PR.

> plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513]
>
> The following issue was found when running libgomp.c/target-52.c with
> nvptx offloading when the dg-set-target-env-var was honored.

Curious, I've never seen this failure mode in my several different
configurations.  :-|

> The issue
> occurred for both -foffload=disable and with offloading configured when
> an nvidia device is available.
>
> At the end of the program, the offloading parts are shutdown via two means:
> The callback registered via 'atexit (gomp_target_fini)' and - via code
> generated in mkoffload, the '__attribute__((destructor)) fini' function
> that calls GOMP_offload_unregister_ver.
>
> In normal processing, first gomp_target_fini is called - which then sets
> GOMP_DEVICE_FINALIZED for the device - and later GOMP_offload_unregister_ver,
> but that's then because the state is GOMP_DEVICE_FINALIZED.
> If both OMP_DISPLAY_ENV=true and OMP_TARGET_OFFLOAD="mandatory" are set,
> the call omp_display_env already invokes gomp_init_targets_once, i.e. it
> occurs earlier than usual and is invoked via __attribute__((constructor))
> initialize_env.
>
> For some unknown reasons, while this does not have an effect on the
> order of the called plugin functions for initialization, it changes the
> order of function calls for shutting down. Namely, when the two environment
> variables are set, GOMP_offload_unregister_ver is called now before
> gomp_target_fini.

Re "unknown reasons", isn't that indeed explained by the different
'atexit' function/'__attribute__((destructor))' sequencing, due to
different order of 'atexit'/'__attribute__((constructor))' calls?

I think I agree that, defensively, we should behave correctly in libgomp
finitialization, no matter in which these calls occur.

> And it seems as if CUDA regards a call to cuModuleUnload
> (or unloading the last module?) as indication that the device context should
> be destroyed - or, at least, afterwards calling cuCtxGetDevice will return
> CUDA_ERROR_DEINITIALIZED.

However, this I don't understand -- but would like to.  Are you saying
that for:

--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -1556,8 +1556,16 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned 
version, const void *target_data)
 if (image->target_data == target_data)
   {
*prev_p = image->next;
-   if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS)
+   CUresult r;
+   r = CUDA_CALL_NOCHECK (cuModuleUnload, image->module);
+   GOMP_PLUGIN_debug (0, "%s: cuModuleUnload: %s\n", __FUNCTION__, 
cuda_error (r));
+   if (r != CUDA_SUCCESS)
  ret = false;
+   CUdevice dev_;
+   r = CUDA_CALL_NOCHECK (cuCtxGetDevice, &dev_);
+   GOMP_PLUGIN_debug (0, "%s: cuCtxGetDevice: %s\n", __FUNCTION__, 
cuda_error (r));
+   GOMP_PLUGIN_debug (0, "%s: dev_=%d, dev->dev=%d\n", __FUNCTION__, dev_, 
dev->dev);
+   assert (dev_ == dev->dev);
free (image->fns);
free (image);
break;

..., you're seeing an error for 'libgomp.c/target-52.c' with
'env OMP_TARGET_OFFLOAD=mandatory OMP_DISPLAY_ENV=true'?  I get:

GOMP_OFFLOAD_unload_image: cuModuleUnload: no error
GOMP_OFFLOAD_unload_image: cuCtxGetDevice: no error
GOMP_OFFLOAD_unload_image: dev_=0, dev->dev=0

Or, is something else happening in between the 'cuModuleUnload' and your
reportedly failing 'cuCtxGetDevice'?

Re your PR113513 details, I don't see how your failure mode could be
related to (a) the PTX code ('--with-arch=sm_80'), or the GPU hardware
("NVIDIA RTX A1000 6GB") (..., unless the Nvidia Driver is doing "funny"
things, of course...), so could this possibly be due to a recent change
in the CUDA Driver/Nvidia Driver?  You say "CUDA Version: 12.3", but
which which Nvidia Driver version?  The la

Re: [PATCH]AArch64: relax cbranch tests to accepted inverted branches [PR113502]

2024-01-29 Thread Richard Sandiford

Tamar Christina  writes:
> Hi All,
>
> Recently something in the midend had started inverting the branches by 
> inverting
> the condition and the branches.
>
> While this is fine, it makes it hard to actually test.  In RTL I disable
> scheduling and BB reordering to prevent this.  But in GIMPLE there seems to be
> nothing I can do.  __builtin_expect seems to have no impact on the change 
> since
> I suspect this is happening during expand where conditions can be flipped
> regardless of probability during compare_and_branch.
>
> Since the mid-end has plenty of correctness tests, this weakens the backend
> tests to just check that a correct looking sequence is emitted.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/testsuite/ChangeLog:
>
>   PR testsuite/113502
>   * gcc.target/aarch64/sve/vect-early-break-cbranch.c: Ignore exact 
> branch.
>   * gcc.target/aarch64/vect-early-break-cbranch.c: Likewise.

OK I guess, since I agree that the "polarity" of the branch isn't really the
thing that we're trying to test.  But the fact that even __builtin_expect
doesn't work seems like a bug.  Do we have a PR for that?  Might be worth
filing one (for GCC 15+) if we don't.

Thanks,
Richard

>
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
> index 
> d15053553f94e7dce3540e21f0c1f0d39ea4f289..d7cef1105410be04ed67d1d3b800746267f205a8
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
> @@ -9,7 +9,7 @@ int b[N] = {0};
>  **   ...
>  **   cmpgt   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
>  **   ptest   p[0-9]+, p[0-9]+.b
> -**   b.any   \.L[0-9]+
> +**   b.(any|none)\.L[0-9]+
>  **   ...
>  */
>  void f1 ()
> @@ -26,7 +26,7 @@ void f1 ()
>  **   ...
>  **   cmpge   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
>  **   ptest   p[0-9]+, p[0-9]+.b
> -**   b.any   \.L[0-9]+
> +**   b.(any|none)\.L[0-9]+
>  **   ...
>  */
>  void f2 ()
> @@ -43,7 +43,7 @@ void f2 ()
>  **   ...
>  **   cmpeq   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
>  **   ptest   p[0-9]+, p[0-9]+.b
> -**   b.any   \.L[0-9]+
> +**   b.(any|none)\.L[0-9]+
>  **   ...
>  */
>  void f3 ()
> @@ -60,7 +60,7 @@ void f3 ()
>  **   ...
>  **   cmpne   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
>  **   ptest   p[0-9]+, p[0-9]+.b
> -**   b.any   \.L[0-9]+
> +**   b.(any|none)\.L[0-9]+
>  **   ...
>  */
>  void f4 ()
> @@ -77,7 +77,7 @@ void f4 ()
>  **   ...
>  **   cmplt   p[0-9]+.s, p7/z, z[0-9]+.s, #0
>  **   ptest   p[0-9]+, p[0-9]+.b
> -**   b.any   .L[0-9]+
> +**   b.(any|none).L[0-9]+
>  **   ...
>  */
>  void f5 ()
> @@ -94,7 +94,7 @@ void f5 ()
>  **   ...
>  **   cmple   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
>  **   ptest   p[0-9]+, p[0-9]+.b
> -**   b.any   \.L[0-9]+
> +**   b.(any|none)\.L[0-9]+
>  **   ...
>  */
>  void f6 ()
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c 
> b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
> index 
> a5e7b94827dd70240d754a834f1d11750a9c27a9..673b781eb6d092f6311409797b20a971f4fae247
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
> @@ -15,7 +15,7 @@ int b[N] = {0};
>  **   cmgtv[0-9]+.4s, v[0-9]+.4s, #0
>  **   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
>  **   fmovx[0-9]+, d[0-9]+
> -**   cbnzx[0-9]+, \.L[0-9]+
> +**   cbn?z   x[0-9]+, \.L[0-9]+
>  **   ...
>  */
>  void f1 ()
> @@ -34,7 +34,7 @@ void f1 ()
>  **   cmgev[0-9]+.4s, v[0-9]+.4s, #0
>  **   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
>  **   fmovx[0-9]+, d[0-9]+
> -**   cbnzx[0-9]+, \.L[0-9]+
> +**   cbn?z   x[0-9]+, \.L[0-9]+
>  **   ...
>  */
>  void f2 ()
> @@ -53,7 +53,7 @@ void f2 ()
>  **   cmeqv[0-9]+.4s, v[0-9]+.4s, #0
>  **   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
>  **   fmovx[0-9]+, d[0-9]+
> -**   cbnzx[0-9]+, \.L[0-9]+
> +**   cbn?z   x[0-9]+, \.L[0-9]+
>  **   ...
>  */
>  void f3 ()
> @@ -72,7 +72,7 @@ void f3 ()
>  **   cmtst   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
>  **   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
>  **   fmovx[0-9]+, d[0-9]+
> -**   cbnzx[0-9]+, \.L[0-9]+
> +**   cbn?z   x[0-9]+, \.L[0-9]+
>  **   ...
>  */
>  void f4 ()
> @@ -91,7 +91,7 @@ void f4 ()
>  **   cmltv[0-9]+.4s, v[0-9]+.4s, #0
>  **   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
>  **   fmovx[0-9]+, d[0-9]+
> -**   cbnzx[0-9]+, \.L[0-9]+
> +**   cbn?z   x[0-9]+, \.L[0-9]+
>  **   ...
>  */
>  void f5 ()
> @@ -110,7 +110,7 @@ void f5 ()
>  **   cmlev[0-9]+.4s, v[0-9]+.4s, #0
>  **   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
>  **   fmovx[0-9]+, d[0-9]+
> -**   cbnzx[0-9]+, \.L[0-9]+
> +**   cbn?z   x[0-9]+, \.L[0-9]+
>  **   ...
>  */
>  void f6 ()

[PATCH] x86: Generate REG_CFA_UNDEFINED for unsaved callee-saved registers

2024-01-29 Thread H.J. Lu

Attach REG_CFA_UNDEFINED notes for unsaved callee-saved registers which
have been used in the function to an instruction in prologue.

gcc/

PR target/38534
* dwarf2cfi.cc (add_cfi_undefined): New.
(dwarf2out_frame_debug_cfa_undefined): Likewise.
(dwarf2out_frame_debug): Handle REG_CFA_UNDEFINED.
* reg-notes.def (REG_CFA_UNDEFINED): New.
* config/i386/i386.cc (ix86_expand_prologue): Attach
REG_CFA_UNDEFINED notes for unsaved callee-saved registers
which have been used in the function to an instruction in
prologue.

gcc/testsuite/

PR target/38534
* gcc.target/i386/no-callee-saved-19.c: New test.
* gcc.target/i386/no-callee-saved-20.c: Likewise.
* gcc.target/i386/pr38534-7.c: Likewise.
* gcc.target/i386/pr38534-8.c: Likewise.
---
 gcc/config/i386/i386.cc   | 20 +++
 gcc/dwarf2cfi.cc  | 55 +++
 gcc/reg-notes.def |  4 ++
 .../gcc.target/i386/no-callee-saved-19.c  | 17 ++
 .../gcc.target/i386/no-callee-saved-20.c  | 12 
 gcc/testsuite/gcc.target/i386/pr38534-7.c | 18 ++
 gcc/testsuite/gcc.target/i386/pr38534-8.c | 13 +
 7 files changed, 139 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-8.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b3e7c74846e..6ec87b6a16f 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -9304,6 +9304,26 @@ ix86_expand_prologue (void)
  combined with prologue modifications.  */
   if (TARGET_SEH)
 emit_insn (gen_prologue_use (stack_pointer_rtx));
+
+  if (cfun->machine->call_saved_registers
+  != TYPE_NO_CALLEE_SAVED_REGISTERS)
+return;
+
+  insn = get_insns ();
+  if (!insn)
+return;
+
+  /* Attach REG_CFA_UNDEFINED notes for unsaved callee-saved registers
+ which have been used in the function to an instruction in prologue.
+   */
+  for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+if (df_regs_ever_live_p (i)
+   && !fixed_regs[i]
+   && !call_used_regs[i]
+   && !STACK_REGNO_P (i)
+   && !MMX_REGNO_P (i))
+  add_reg_note (insn, REG_CFA_UNDEFINED,
+   gen_rtx_REG (word_mode, i));
 }
 
 /* Emit code to restore REG using a POP or POPP insn.  */
diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
index 1231b5bb5f0..12862ed1070 100644
--- a/gcc/dwarf2cfi.cc
+++ b/gcc/dwarf2cfi.cc
@@ -517,6 +517,17 @@ add_cfi_restore (unsigned reg)
   add_cfi (cfi);
 }
 
+static void
+add_cfi_undefined (unsigned reg)
+{
+  dw_cfi_ref cfi = new_cfi ();
+
+  cfi->dw_cfi_opc = DW_CFA_undefined;
+  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = reg;
+
+  add_cfi (cfi);
+}
+
 /* Perform ROW->REG_SAVE[COLUMN] = CFI.  CFI may be null, indicating
that the register column is no longer saved.  */
 
@@ -1532,6 +1543,37 @@ dwarf2out_frame_debug_cfa_restore (rtx reg, bool 
emit_cfi)
 }
 }
 
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_UNDEFINED
+   note.  */
+
+static void
+dwarf2out_frame_debug_cfa_undefined (rtx reg)
+{
+  gcc_assert (REG_P (reg));
+
+  rtx span = targetm.dwarf_register_span (reg);
+  if (!span)
+{
+  unsigned int regno = dwf_regno (reg);
+  add_cfi_undefined (regno);
+}
+  else
+{
+  /* We have a PARALLEL describing where the contents of REG live.
+Restore the register for each piece of the PARALLEL.  */
+  gcc_assert (GET_CODE (span) == PARALLEL);
+
+  const int par_len = XVECLEN (span, 0);
+  for (int par_index = 0; par_index < par_len; par_index++)
+   {
+ reg = XVECEXP (span, 0, par_index);
+ gcc_assert (REG_P (reg));
+ unsigned int regno = dwf_regno (reg);
+ add_cfi_undefined (regno);
+   }
+}
+}
+
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_WINDOW_SAVE.
 
??? Perhaps we should note in the CIE where windows are saved (instead
@@ -2326,6 +2368,19 @@ dwarf2out_frame_debug (rtx_insn *insn)
handled_one = true;
break;
 
+  case REG_CFA_UNDEFINED:
+   n = XEXP (note, 0);
+   if (n == nullptr)
+ {
+   n = PATTERN (insn);
+   if (GET_CODE (n) == PARALLEL)
+ n = XVECEXP (n, 0, 0);
+   n = XEXP (n, 0);
+ }
+   dwarf2out_frame_debug_cfa_undefined (n);
+   handled_one = true;
+   break;
+
   case REG_CFA_SET_VDRAP:
n = XEXP (note, 0);
if (REG_P (n))
diff --git a/gcc/reg-notes.def b/gcc/reg-notes.def
index 5b878fb2a1c..8a78ebb6864 100644
--- a/gcc/reg-notes.def
+++ b/gcc/reg-notes.def
@@ -152,6 +152,10 @@ REG_CFA_NOTE (CFA_EXPRESSION)
the given register.  */
 REG_CFA_NOTE (CFA_VAL_EXPRESSION)
 
+/* Attached to in

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Qing Zhao

An update on the kernel building with my version 4 patch.

Kees reported two FE issues with the current version 4 patch:

1. The operator “typeof” cannot return correct type for a->array;
2. The operator “&” cannot return correct address for a->array;

I fixed both in my local repository. 

With these additional fix.  Kernel with counted-by annotation can be built 
successfully. 

And then, Kees reported one behavioral issue with the current counted-by:

When the counted-by value is below zero, my current patch 

A. Didn’t report any warning for it.
B. Accepted the negative value as a wrapped size.

i.e. for:

struct foo {
signed char size;
unsigned char array[] __counted_by(size);
} *a;

...
a->size = -3;
report(__builtin_dynamic_object_size(p->array, 1));

this reports 253, rather than 0.

And the array-bounds sanitizer doesn’t catch negative index bounds neither. 

a->size = -3;
report(a->array[1]); // does not trap


So, my questions are:

 How should we handle the negative counted-by value?

 My approach is:

   I think that this is a user error, the compiler need to Issue warning during 
runtime about this user error.

Since I have one remaining patch that has not been finished yet:

6  Emit warnings when the user breaks the requirments for the new counted_by 
attribute
  compilation time: -Wcounted-by
  run time: -fsanitizer=counted-by
 * The initialization to the size field should be done before the first 
reference to the FAM field.
 * the array has at least # of elements specified by the size field all the 
time during the program.
 * the value of counted-by should not be negative.

Let me know your comment and suggestions.

Thanks

Qing

> On Jan 25, 2024, at 3:11 PM, Qing Zhao  wrote:
> 
> Thanks a lot for the testing.
> 
> Yes, I can repeat the issue with the following small example:
> 
> #include 
> #include 
> #include 
> 
> #define MAX(a, b)  ((a) > (b) ? (a) :  (b))
> 
> struct untracked {
>   int size;
>   int array[] __attribute__((counted_by (size)));
> } *a;
> struct untracked * alloc_buf (int index)
> {
>  struct untracked *p;
>  p = (struct untracked *) malloc (MAX (sizeof (struct untracked),
>(offsetof (struct untracked, array[0])
> + (index) * sizeof (int;
>  p->size = index;
>  return p;
> }
> 
> int main()
> {
>  a = alloc_buf(10);
> printf ("same_type is %d\n",
>  (__builtin_types_compatible_p(typeof (a->array), typeof (&(a->array)[0];
>  return 0;
> }
> 
> 
> /home/opc/Install/latest-d/bin/gcc -O2 btcp.c
> same_type is 1
> 
> Looks like that the “typeof” operator need to be handled specially in C FE
> for the new internal function .ACCESS_WITH_SIZE. 
> 
> (I have specially handle the operator “offsetof” in C FE already).
> 
> Will fix this issue.
> 
> Thanks.
> 
> Qing
> 
>> On Jan 24, 2024, at 7:51 PM, Kees Cook  wrote:
>> 
>> On Wed, Jan 24, 2024 at 12:29:51AM +, Qing Zhao wrote:
>>> This is the 4th version of the patch.
>> 
>> Thanks very much for this!
>> 
>> I tripped over an unexpected behavioral change that the Linux kernel
>> depends on:
>> 
>> __builtin_types_compatible_p() no longer treats an array marked with
>> counted_by as different from that array's decayed pointer. Specifically,
>> the kernel uses these macros:
>> 
>> 
>> /*
>> * Force a compilation error if condition is true, but also produce a
>> * result (of value 0 and type int), so the expression can be used
>> * e.g. in a structure initializer (or where-ever else comma expressions
>> * aren't permitted).
>> */
>> #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
>> 
>> #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
>> 
>> /* &a[0] degrades to a pointer: a different type from an array */
>> #define __must_be_array(a)   BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
>> 
>> 
>> This gets used in various places to make sure we're dealing with an
>> array for a macro:
>> 
>> #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + 
>> __must_be_array(arr))
>> 
>> 
>> So this builds:
>> 
>> struct untracked {
>>   int size;
>>   int array[];
>> } *a;
>> 
>> __must_be_array(a->array)
>> => 0 (as expected)
>> __builtin_types_compatible_p(typeof(a->array), typeof(&(a->array)[0]))
>> => 0 (as expected, array vs decayed array pointer)
>> 
>> 
>> But if counted_by is added, we get a build failure:
>> 
>> struct tracked {
>>   int size;
>>   int array[] __counted_by(size);
>> } *b;
>> 
>> __must_be_array(b->array)
>> => build failure (not expected)
>> __builtin_types_compatible_p(typeof(b->array), typeof(&(b->array)[0]))
>> => 1 (not expected, both pointers?)
>> 
>> 
>> 
>> 
>> -- 
>> Kees Cook
>

Re: [PATCH][GCC][Arm] Add pattern for bswap + rotate -> rev16 [Bug 108933]

2024-01-29 Thread Richard Earnshaw


On 29/01/2024 14:14, Matthieu Longo wrote:

Hi Richard,

Please find below the new patch where I addressed your comments and 
updated the changelog.


rev16 pattern was not recognised anymore as a change in the bswap tree
pass was introducing a new GIMPLE form, not recognized by the assembly
final transformation pass.

More details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108933

gcc/ChangeLog:

 PR target/108933
 * config/arm/arm.md (arm_rev16si2): Convert to define_insn.
 Correct generated RTL.
 (arm_rev16si2_alt1): Correctly handle conditional execution.
 (arm_rev16si2_alt2): Likewise.

gcc/testsuite/ChangeLog:

 PR target/108933
 * gcc.target/arm/rev16.c: Moved to...
 * gcc.target/arm/rev16_1.c: ...here.
 * gcc.target/arm/rev16_2.c: New test to check that rev16 is
 emitted.


Thanks.  I've tweaked the commit message very slightly and pushed this.

Could you please prepare backports for gcc-11 thru 13?  It should just 
be a matter of cherry-picking the commit.


R.



On 2024-01-22 16:25, Richard Earnshaw (lists) wrote:

On 22/01/2024 12:18, Matthieu Longo wrote:

rev16 pattern was not recognised anymore as a change in the bswap tree
pass was introducing a new GIMPLE form, not recognized by the assembly
final transformation pass.

More details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108933

gcc/ChangeLog:

 PR target/108933
 * config/arm/arm.md (*arm_rev16si2_alt3): new pattern to 
convert

   a bswap + rotate by 16 bits into rev16


ChangeLog entries need to be written as sentences, so start with a 
capital letter and end with a full stop; continuation lines should 
start in column 8 (one hard tab, don't use spaces).  But in this case, 
"New pattern." is sufficient.




gcc/testsuite/ChangeLog:

 PR target/108933
 * gcc.target/arm/rev16.c: Moved to...
 * gcc.target/arm/rev16_1.c: ...here.
 * gcc.target/arm/rev16_2.c: New test to check that rev16 is
   emitted.



+;; Similar pattern to match (rotate (bswap) 16)
+(define_insn "*arm_rev16si2_alt3"
+  [(set (match_operand:SI 0 "register_operand" "=l,r")
+    (rotate:SI (bswap:SI (match_operand:SI 1 "register_operand" 
"l,r"))

+ (const_int 16)))]
+  "arm_arch6"
+  "rev16\\t%0, %1"
+  [(set_attr "arch" "t,32")
+   (set_attr "length" "2,4")
+   (set_attr "type" "rev")]
+)
+

Unfortunately, this is insufficient.  When generating Arm or Thumb2 
code (but not thumb1) we also have to handle conditional execution: we 
need to have '%?' in the output template at the point where a 
condition code might be needed.  That means we need separate output 
templates for all three alternatives (as we need a 16-bit variant for 
thumb2 that's conditional and a 16-bit for thumb1 that isn't).  See 
the output of arm_rev16 for a guide of what is really needed.


I note that the arm_rev16si2_alt1, and arm_rev16si2_alt2 patterns are 
incorrect in this regard as well; that will need fixing.


I also see that arm_rev16si2 currently expands to the alt1 variant 
above; given that the preferred canonical form would now appear to use 
bswap + rotate, we should change that as well.  In fact, we can merge 
your new pattern with the expand entirely and eliminate the need to 
call gen_arm_rev16si2_alt1.  Something like:


(define_insn "arm_rev16si2"
   [(set (match_operand:SI 0 "s_register_operand")
 (rotate:SI (bswap:SI (match_operand:SI 1 
"s_register_operand")) (const_int 16))]

   "arm_arch6"
   "@
   rev16...
   ...


R.

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread Jakub Jelinek

On Mon, Jan 29, 2024 at 06:36:47AM -0800, H.J. Lu wrote:
> TARGET_ASM_SELECT_RTX_SECTION is for constant in RTL.
> It should have a non-public label reference which can't be used
> by other TUs.  The same section can contain other constants.
> If there is a COMAT issue, linker will catch it.

Let me try to explain on short assembly snippet what I believe your patch is
doing and what I'm afraid of.  I believe your patch when we need to emit
a RTL constant foo or foo+1 or foo+2 (where foo is defined in a comdat
section) instead of emitting using say foo in assembly puts those
constants into .data.rel.ro.local section determined by the decl that is
referenced.
Now, when first_tu.o wins and emits the qux comdat, it will contain
the .data.rel.ro.local.foo which bar function refers to, but in second_tu.o
it wants to refer to different offsets from the same function and loses.

I simply believe the constants need to be in section based on what refers
to those symbols, not the value of those constants, and that is what we used
to do before your patch (and I'd like to understand what's wrong with what
GCC emits and why).

first_tu.s:

.section.text.foo,"axG",@progbits,qux,comdat
.p2align 4
.type   foo, @function
foo:
xorl%eax, %eax
ret
.size   foo, .-foo
.text
.p2align 4
.type   bar, @function
bar:
movq.LC0(%rip), %xmm0
ret
.size   bar, .-bar
.section.data.rel.ro.local.foo,"awG",@progbits,qux,comdat
.align 8
.LC0:
.quad   foo

second_tu.s:

.section.text.foo,"axG",@progbits,qux,comdat
.p2align 4
.type   foo, @function
foo:
xorl%eax, %eax
ret
.size   foo, .-foo
.text
.p2align 4
.type   baz, @function
baz:
movq.LC0(%rip), %xmm0
ret
.size   baz, .-baz
.section.data.rel.ro.local.foo,"awG",@progbits,qux,comdat
.align 8
.LC0:
.quad   foo+1
.text
.p2align 4
.type   corge, @function
corge:
movq.LC1(%rip), %xmm0
ret
.size   corge, .-corge
.section.data.rel.ro.local.foo,"awG",@progbits,qux,comdat
.align 8
.LC1:
.quad   foo+2
gcc -shared -o test.so first_tu.s second_tu.s
`.data.rel.ro.local.foo' referenced in section `.text' of /tmp/cceeUWyH.o: 
defined in discarded section `.data.rel.ro.local.foo[qux]' of /tmp/cceeUWyH.o
`.data.rel.ro.local.foo' referenced in section `.text' of /tmp/cceeUWyH.o: 
defined in discarded section `.data.rel.ro.local.foo[qux]' of /tmp/cceeUWyH.o
collect2: error: ld returned 1 exit status

Jakub

Re: [PATCH] x86: Generate .cfi_undefined for unsaved callee-saved registers

2024-01-29 Thread H.J. Lu

On Mon, Jan 29, 2024 at 2:08 AM Jakub Jelinek  wrote:
>
> On Sat, Jan 27, 2024 at 12:41:24PM -0800, H.J. Lu wrote:
> > When assembler directives for DWARF frame unwind is enabled, generate
> > the .cfi_undefined directive for unsaved callee-saved registers which
> > have been used in the function.
> >
> > gcc/
> >
> >   PR target/38534
> >   * config/i386/i386.cc (ix86_post_cfi_startproc): New.
> >   (TARGET_ASM_POST_CFI_STARTPROC): Likewise.
> >
> > gcc/testsuite/
> >
> >   PR target/38534
> >   * gcc.target/i386/no-callee-saved-19.c: New test.
> >   * gcc.target/i386/no-callee-saved-20.c: Likewise.
> >   * gcc.target/i386/pr38534-7.c: Likewise.
> >   * gcc.target/i386/pr38534-8.c: Likewise.
>
> This only works for -fdwarf2-cfi-asm, but doesn't work for
> -fno-dwarf2-cfi-asm.  I think we need something that will work for both.

-fno-dwarf2-cfi-asm stops generating all CFI directives.

> So, I'd say we want to add support for REG_CFA_UNDEFINED note, emit those
> notes on some frame related insn in the prologue during prologue expansion
> in pro_and_epilogue pass and handle that in dwarf2cfi.cc pass.

It is a good idea.  Here is the patch:

https://patchwork.sourceware.org/project/gcc/list/?series=30314

> One question is where those should be emitted.  Emitting them right
> at the start of the function has an advantage that it can be emitted in
> CIE for all FDEs of noreturn functions (or with the new attribute).  But
> disadvantage is of course that it will make e.g. debugging experience worse
> even in the prologues of functions where the callee saved registers which
> current function actually doesn't save aren't modified yet.
> E.g. for the cases where callee saved registers are saved to memory or
> registers I think dwarf2cfi.cc attempts to optimize and move the .cfi_*
> directives or .eh_frame record later into the function as long as the
> corresponding original register isn't modified yet.  Perhaps that should
> be done also for the undefined case, ideally by using the same dwarf2cfi.cc
> code.  So just perhaps at the start of the function read in the
> REG_CFA_UNDEFINED notes for all the ever modified callee saved registers
> which won't be actually saved and turn that into similar record like for
> the saving into stack or other regs, just noting it is undefined instead
> and have it pushed later as much as possible.

My patch doesn't implement this optimization.

-- 
H.J.

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Qing Zhao



> On Jan 29, 2024, at 10:50 AM, Martin Uecker  wrote:
> 
> Am Montag, dem 29.01.2024 um 15:09 + schrieb Qing Zhao:
>> Thank you!
>> 
>> Joseph and Richard,  could you also comment on this?
>> 
>>> On Jan 28, 2024, at 5:09 AM, Martin Uecker  wrote:
>>> 
>>> Am Freitag, dem 26.01.2024 um 14:33 + schrieb Qing Zhao:
 
> On Jan 26, 2024, at 3:04 AM, Martin Uecker  wrote:
> 
> 
> I haven't looked at the patch, but it sounds you give the result
> the wrong type. Then patching up all use cases instead of the
> type seems wrong.
 
 Yes, this is for resolving a very early gimplification issue as I reported 
 last Nov:
 https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
 
 Since no-one responded at that time, I fixed the issue by replacing the 
 ARRAY_REF
 With a pointer indirection:
 https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html
 
 The reason for such change is:  return a flexible array member TYPE is not 
 allowed
 by C language (our gimplification follows this rule), so, we have to 
 return a pointer TYPE instead. 
 
 **The new internal function
 
 .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
 ACCESS_MODE, INDEX)
 
 INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
 
 which returns the "REF_TO_OBJ" same as the 1st argument;
 
 Both the return type and the type of the first argument of this function 
 have been converted from 
 the incomplete array type to the corresponding pointer type.
 
 As a result, the original ARRAY_REF was converted to an INDIRECT_REF, the 
 original INDEX of the ARRAY_REF was lost
 when converting from ARRAY_REF to INDIRECT_REF, in order to keep the INDEX 
 for bound sanitizer instrumentation, I added
 The 6th argument “INDEX”.
 
 What’s your comment and suggestion on this solution?
>>> 
>>> I am not entirely sure but changing types in the FE seems
>>> problematic because this breaks language semantics. And
>>> then adding special code everywhere to treat it specially
>>> in the FE does not seem a good way forward.
>>> 
>>> If I understand correctly, returning an incomplete array 
>>> type is not allowed and then fails during gimplification.
>> 
>> Yes, this is the problem in gimplification. 
>> 
>>> So I would suggest to make it return a pointer to the 
>>> incomplete array (and not the element type)
>> 
>> 
>> for the following:
>> 
>> struct annotated {
>>  unsigned int size;
>>  int array[] __attribute__((counted_by (size)));
>> };
>> 
>>  struct annotated * p = ….
>>  p->array[9] = 0;
>> 
>> The IL for the above array reference p->array[9] is:
>> 
>> 1. If the return type is the original incomplete array type, 
>> 
>> .ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1)[9] = 0;
>> 
>> (this triggered the gimplification failure since the return type cannot be a 
>> complete type).
>> 
>> 2. When the return type is changed to a pointer to the element type of the 
>> incomplete array, (the current patch)
>> Then the original array reference naturally becomes an indirect reference 
>> through the pointer
>> 
>> *(.ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1, 9) + 36) = 0;
>> 
>> Since the original array reference becomes an indirect reference through the 
>> pointer to the element array, the INDEX info 
>> is mixed into the OFFSET of the indirect reference and lost, so, I added the 
>> 6th argument to the routine .ACCESS_WITH_SIZE
>> to record the INDEX. 
>> 
>> 3. With your suggestion, the return type is changed to a pointer to the 
>> incomplete array, 
>> I just tried this to change the result type :
>> 
>> 
>> --- a/gcc/c/c-typeck.cc
>> +++ b/gcc/c/c-typeck.cc
>> @@ -2619,7 +2619,7 @@ build_access_with_size_for_counted_by (location_t loc, 
>> tree ref,
>>   tree counted_by_type)
>> {
>>   gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
>> -  tree result_type = build_pointer_type (TREE_TYPE (TREE_TYPE (ref)));
>> +  tree result_type = build_pointer_type (TREE_TYPE (ref));
>> 
>> Then, I got the following FE errors:
>> 
>> test.c:10:11: error: invalid use of flexible array member
>>   10 |   p->array[9] = 0;
>> 
>> The reason for the error is: when the original array_ref becomes an 
>> indirect_ref through the pointer to the incomplete array,
>> During the computation of the OFFSET to the pointer, the TYPE_SIZE_UNIT 
>> (type) is invalid since the type is an incomplete array. 
>> As a result, the OFFSET cannot computed for the indirect_ref.
>> 
>> Looks like even more issues with this approach.
> 
> Yes, but only because the following is missing:
> 
>> 
>> 
>>> but then wrap
>>> it with an indirection when inserting this code in the FE
>>> so that the full replacement has the correct type again
>>> (of the incomplete array).
>> 
>> I don’t quite understand the ab

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread H.J. Lu

On Mon, Jan 29, 2024 at 8:03 AM Jakub Jelinek  wrote:
>
> On Mon, Jan 29, 2024 at 06:36:47AM -0800, H.J. Lu wrote:
> > TARGET_ASM_SELECT_RTX_SECTION is for constant in RTL.
> > It should have a non-public label reference which can't be used
> > by other TUs.  The same section can contain other constants.
> > If there is a COMAT issue, linker will catch it.
>
> Let me try to explain on short assembly snippet what I believe your patch is
> doing and what I'm afraid of.  I believe your patch when we need to emit
> a RTL constant foo or foo+1 or foo+2 (where foo is defined in a comdat
> section) instead of emitting using say foo in assembly puts those
> constants into .data.rel.ro.local section determined by the decl that is
> referenced.
> Now, when first_tu.o wins and emits the qux comdat, it will contain
> the .data.rel.ro.local.foo which bar function refers to, but in second_tu.o
> it wants to refer to different offsets from the same function and loses.
>
> I simply believe the constants need to be in section based on what refers
> to those symbols, not the value of those constants, and that is what we used
> to do before your patch (and I'd like to understand what's wrong with what
> GCC emits and why).
>
> first_tu.s:
> 
> .section.text.foo,"axG",@progbits,qux,comdat
> .p2align 4
> .type   foo, @function
> foo:
> xorl%eax, %eax
> ret
> .size   foo, .-foo
> .text
> .p2align 4
> .type   bar, @function
> bar:
> movq.LC0(%rip), %xmm0
> ret
> .size   bar, .-bar
> .section.data.rel.ro.local.foo,"awG",@progbits,qux,comdat
> .align 8
> .LC0:
> .quad   foo
>
> second_tu.s:
> 
> .section.text.foo,"axG",@progbits,qux,comdat
> .p2align 4
> .type   foo, @function
> foo:
> xorl%eax, %eax
> ret
> .size   foo, .-foo
> .text
> .p2align 4
> .type   baz, @function
> baz:
> movq.LC0(%rip), %xmm0
> ret

I don't think this is valid.  We can't reference a non-public
symbol outside of a COMDAT group.  It is OK to reference
foo or foo + 1, but not .LC0.

> .size   baz, .-baz
> .section.data.rel.ro.local.foo,"awG",@progbits,qux,comdat
> .align 8
> .LC0:
> .quad   foo+1
> .text
> .p2align 4
> .type   corge, @function
> corge:
> movq.LC1(%rip), %xmm0
> ret
> .size   corge, .-corge
> .section.data.rel.ro.local.foo,"awG",@progbits,qux,comdat
> .align 8
> .LC1:
> .quad   foo+2
> gcc -shared -o test.so first_tu.s second_tu.s
> `.data.rel.ro.local.foo' referenced in section `.text' of /tmp/cceeUWyH.o: 
> defined in discarded section `.data.rel.ro.local.foo[qux]' of /tmp/cceeUWyH.o
> `.data.rel.ro.local.foo' referenced in section `.text' of /tmp/cceeUWyH.o: 
> defined in discarded section `.data.rel.ro.local.foo[qux]' of /tmp/cceeUWyH.o
> collect2: error: ld returned 1 exit status
>
> Jakub
>


-- 
H.J.

Re: [PATCH] x86: Generate REG_CFA_UNDEFINED for unsaved callee-saved registers

2024-01-29 Thread Jakub Jelinek

On Mon, Jan 29, 2024 at 08:00:26AM -0800, H.J. Lu wrote:
> Attach REG_CFA_UNDEFINED notes for unsaved callee-saved registers which
> have been used in the function to an instruction in prologue.
> 
> gcc/
> 
>   PR target/38534
>   * dwarf2cfi.cc (add_cfi_undefined): New.
>   (dwarf2out_frame_debug_cfa_undefined): Likewise.
>   (dwarf2out_frame_debug): Handle REG_CFA_UNDEFINED.
>   * reg-notes.def (REG_CFA_UNDEFINED): New.
>   * config/i386/i386.cc (ix86_expand_prologue): Attach
>   REG_CFA_UNDEFINED notes for unsaved callee-saved registers
>   which have been used in the function to an instruction in
>   prologue.
> 
> gcc/testsuite/
> 
>   PR target/38534
>   * gcc.target/i386/no-callee-saved-19.c: New test.
>   * gcc.target/i386/no-callee-saved-20.c: Likewise.
>   * gcc.target/i386/pr38534-7.c: Likewise.
>   * gcc.target/i386/pr38534-8.c: Likewise.
> ---
>  gcc/config/i386/i386.cc   | 20 +++
>  gcc/dwarf2cfi.cc  | 55 +++
>  gcc/reg-notes.def |  4 ++
>  .../gcc.target/i386/no-callee-saved-19.c  | 17 ++
>  .../gcc.target/i386/no-callee-saved-20.c  | 12 
>  gcc/testsuite/gcc.target/i386/pr38534-7.c | 18 ++
>  gcc/testsuite/gcc.target/i386/pr38534-8.c | 13 +
>  7 files changed, 139 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-19.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-20.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-7.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-8.c
> 
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index b3e7c74846e..6ec87b6a16f 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -9304,6 +9304,26 @@ ix86_expand_prologue (void)
>   combined with prologue modifications.  */
>if (TARGET_SEH)
>  emit_insn (gen_prologue_use (stack_pointer_rtx));
> +
> +  if (cfun->machine->call_saved_registers
> +  != TYPE_NO_CALLEE_SAVED_REGISTERS)
> +return;
> +
> +  insn = get_insns ();
> +  if (!insn)
> +return;

You can't attach the notes to a random instruction that happens to be first
in the function.
1) it needs to be a real instruction, not a note
2) it needs to be RTX_FRAME_RELATED_P
3) if it is RTX_FRAME_RELATED_P, but doesn't contain any previous REG_CFA_*
   notes:
   3a) if it has REG_FRAME_RELATED_EXPR note, then I believe just that
   note argument is processed instead of the instruction pattern and
   I think REG_CFA_* notes which precede REG_FRAME_RELATED_EXPR are
   processed, but REG_CFA_* notes after it are not; so adding
   REG_CFA_UNDEFINED notes at least if the adding is after the existing
   notes instead of before them may be problematic
   3b) if it has neither REG_CFA_* nor REG_FRAME_RELATED_EXPR notes, then
   normally the pattern of the insn would be processed in dwarf2cfi.
   But with the REG_CFA_* notes that part will be ignored.

> --- a/gcc/dwarf2cfi.cc
> +++ b/gcc/dwarf2cfi.cc
> @@ -517,6 +517,17 @@ add_cfi_restore (unsigned reg)
>add_cfi (cfi);
>  }
>  

Function comment missing.

> +static void
> +add_cfi_undefined (unsigned reg)
> +{
> +  dw_cfi_ref cfi = new_cfi ();
> +
> +  cfi->dw_cfi_opc = DW_CFA_undefined;
> +  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = reg;
> +
> +  add_cfi (cfi);
> +}
> +
>  /* Perform ROW->REG_SAVE[COLUMN] = CFI.  CFI may be null, indicating
> that the register column is no longer saved.  */

Jakub

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread Jakub Jelinek

On Mon, Jan 29, 2024 at 08:23:21AM -0800, H.J. Lu wrote:
> > baz:
> > movq.LC0(%rip), %xmm0
> > ret
> 
> I don't think this is valid.  We can't reference a non-public
> symbol outside of a COMDAT group.  It is OK to reference
> foo or foo + 1, but not .LC0.

But that is exactly what your patch does, e.g. on the first testcase:
--- pr113617-1a.s   2024-01-29 11:29:55.831512974 +0100
+++ pr113617-1a.s   2024-01-29 11:30:04.335394116 +0100
@@ -51,28 +-51,28 @@
.section
.text._ZN3vtk6detail3smp15vtkSMPToolsImplILi1EE3ForINS1_27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EvxxxRT_,"axG",@progbits,_ZN26vtkStaticCellLinksTemplateIxE18ThreadedBuildLinksExxP12vtkCellArray,comdat
.align 2
.p2align 4
.type   
_ZN3vtk6detail3smp15vtkSMPToolsImplILi1EE3ForINS1_27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EvxxxRT_,
 @function
 
_ZN3vtk6detail3smp15vtkSMPToolsImplILi1EE3ForINS1_27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EvxxxRT_:
pushq   %r15
leaq
_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx9_M_invokeERKNS_9_Any_dataE(%rip),
 %rax
leaq
_ZN3vtk6detail3smp23ExecuteFunctorSTDThreadINS1_27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EvPvxxx(%rip),
 %r15
pushq   %r14
movq%rax, %xmm1
pushq   %r13
pushq   %r12
movq%rdx, %r12
pushq   %rbp
movq%r8, %rbp
pushq   %rbx
movq%rcx, %rbx
subq$40, %rsp
movlFor_threadNumber(%rip), %esi
movq.LC0(%rip), %xmm0
leaq31(%rsp), %r13
punpcklqdq  %xmm1, %xmm0
movq%r13, %rdi
movaps  %xmm0, (%rsp)
call_ZN3vtk6detail3smp16vtkSMPThreadPoolC1Ei@PLT
movq(%rsp), %r14
.p2align 4,,10
.p2align 3
@@ -191,9 +191,9 @@ vtkConstrainedSmoothingFilterRequestData
.size   For_threadNumber, 4
 For_threadNumber:
.zero   4
-   .section.data.rel.ro.local,"aw"
+   .section
.data.rel.ro.local._ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE,"awG",@progbits,_ZN26vtkStaticCellLinksTemplateIxE18ThreadedBuildLinksExxP12vtkCellArray,comdat
.align 8
 .LC0:
.quad   
_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE
-   .ident  "GCC: (GNU) 14.0.1 20240127 (experimental)"
+   .ident  "GCC: (GNU) 14.0.1 20240129 (experimental)"
.section.note.GNU-stack,"",@progbits

Jakub

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread H.J. Lu

On Mon, Jan 29, 2024 at 8:34 AM Jakub Jelinek  wrote:
>
> On Mon, Jan 29, 2024 at 08:23:21AM -0800, H.J. Lu wrote:
> > > baz:
> > > movq.LC0(%rip), %xmm0
> > > ret
> >
> > I don't think this is valid.  We can't reference a non-public
> > symbol outside of a COMDAT group.  It is OK to reference
> > foo or foo + 1, but not .LC0.
>
> But that is exactly what your patch does, e.g. on the first testcase:
> --- pr113617-1a.s   2024-01-29 11:29:55.831512974 +0100
> +++ pr113617-1a.s   2024-01-29 11:30:04.335394116 +0100
> @@ -51,28 +-51,28 @@
> .section
> .text._ZN3vtk6detail3smp15vtkSMPToolsImplILi1EE3ForINS1_27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EvxxxRT_,"axG",@progbits,_ZN26vtkStaticCellLinksTemplateIxE18ThreadedBuildLinksExxP12vtkCellArray,comdat
> .align 2
> .p2align 4
> .type   
> _ZN3vtk6detail3smp15vtkSMPToolsImplILi1EE3ForINS1_27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EvxxxRT_,
>  @function
>  
> _ZN3vtk6detail3smp15vtkSMPToolsImplILi1EE3ForINS1_27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EvxxxRT_:
> pushq   %r15
> leaq
> _ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx9_M_invokeERKNS_9_Any_dataE(%rip),
>  %rax
> leaq
> _ZN3vtk6detail3smp23ExecuteFunctorSTDThreadINS1_27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EvPvxxx(%rip),
>  %r15
> pushq   %r14
> movq%rax, %xmm1
> pushq   %r13
> pushq   %r12
> movq%rdx, %r12
> pushq   %rbp
> movq%r8, %rbp
> pushq   %rbx
> movq%rcx, %rbx
> subq$40, %rsp
> movlFor_threadNumber(%rip), %esi
> movq.LC0(%rip), %xmm0
> leaq31(%rsp), %r13
> punpcklqdq  %xmm1, %xmm0
> movq%r13, %rdi
> movaps  %xmm0, (%rsp)
> call_ZN3vtk6detail3smp16vtkSMPThreadPoolC1Ei@PLT
> movq(%rsp), %r14
> .p2align 4,,10
> .p2align 3
> @@ -191,9 +191,9 @@ vtkConstrainedSmoothingFilterRequestData
> .size   For_threadNumber, 4
>  For_threadNumber:
> .zero   4
> -   .section.data.rel.ro.local,"aw"
> +   .section
> .data.rel.ro.local._ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE,"awG",@progbits,_ZN26vtkStaticCellLinksTemplateIxE18ThreadedBuildLinksExxP12vtkCellArray,comdat
> .align 8
>  .LC0:
> .quad   
> _ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE
> -   .ident  "GCC: (GNU) 14.0.1 20240127 (experimental)"
> +   .ident  "GCC: (GNU) 14.0.1 20240129 (experimental)"
> .section.note.GNU-stack,"",@progbits
>
> Jakub
>

In this case, these are internal to the same comdat group:

.section 
.text._ZN3vtk6detail3smp15vtkSMPToolsImplILi1EE3ForINS1_27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EvxxxRT_,"axG",@progbits,_ZN26vtkStaticCellLinksTemplateIxE18ThreadedBuildLinksExxP12vtkCellArray,comdat
.align 2
.p2align 4
.type 
_ZN3vtk6detail3smp15vtkSMPToolsImplILi1EE3ForINS1_27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EvxxxRT_,
@function
_ZN3vtk6detail3smp15vtkSMPToolsImplILi1EE3ForINS1_27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EvxxxRT_:
.LFB27:
.cfi_startproc
...
movq .LC0(%rip), %xmm0
...
.section 
.text._ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE,"axG",@progbits,_ZN26vtkStaticCellLinksTemplateIxE18ThreadedBuildLinksExxP12vtkCellArray,comdat
.p2align 4
.type 
_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE,
@function
_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE:
.LFB34:
.cfi_startproc
xorl %eax, %eax
ret
.cfi_endproc
...
.section 
.data.rel.ro.

[committed] MAINTAINERS: Update my email address

2024-01-29 Thread Kwok Cheung Yeung

I have committed this to update my work email address in MAINTAINERS 
(but forgot to change my git user.mail first - oops!).


Thanks

Kwok Yeung
From f3fdaa3eecd155dbdc78c1ec9a259dfa4e379ea4 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Mon, 29 Jan 2024 16:40:49 +
Subject: [PATCH] MAINTAINERS: Update my work email address

* MAINTAINERS: Update my work email address.
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8b11ddbc069..9d92be1f301 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -730,7 +730,7 @@ Canqun Yang 

 Fei Yang   
 Jeffrey Yasskin
 Joey Ye
-Kwok Cheung Yeung  
+Kwok Cheung Yeung  
 Greta Yorsh
 David Yuste
 Adhemerval Zanella 
-- 
2.34.1

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread Jakub Jelinek

On Mon, Jan 29, 2024 at 08:45:45AM -0800, H.J. Lu wrote:
> In this case, these are internal to the same comdat group:

But that is only by accident, no?
I mean, if you need to refer to such a symbol from
non-comdat function or comdat function in a different comdat group
and RA decides it wants the constant in memory rather than code?
Your patch uses
  if (decl)
return targetm.asm_out.function_rodata_section (decl, ???);
and default_function_rodata_section only looks at comdat group of the
passed in decl.  But the decl here is what the constant refers to, not
who is referring it.

Jakub

Re: [v2][patch] plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513]

2024-01-29 Thread Tobias Burnus


Hi Thomas,

Thomas Schwinge wrote:

On 2024-01-23T10:55:16+0100, Tobias Burnus  wrote:

plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513]

The following issue was found when running libgomp.c/target-52.c with
nvptx offloading when the dg-set-target-env-var was honored.

Curious, I've never seen this failure mode in my several different
configurations.  :-|


I think we recently fixed a surprisingly high number of issues that we 
didn't see before but were clearly preexisting for quite a while. 
(Mostly for AMDGPU but still.)


But I concur that this one is a more tricky one.


For some unknown reasons, while this does not have an effect on the
order of the called plugin functions for initialization, it changes the
order of function calls for shutting down. Namely, when the two environment
variables are set, GOMP_offload_unregister_ver is called now before
gomp_target_fini.

Re "unknown reasons", isn't that indeed explained by the different
'atexit' function/'__attribute__((destructor))' sequencing, due to
different order of 'atexit'/'__attribute__((constructor))' calls?


Maybe or not. First, it does not seem to occur elsewhere but maybe 
that's because remote setting of environment variables does not work 
with DejaGNU and most code was run such a way. And secondly, I have no 
idea how 'atexit' and destructors are implemented internally.



And it seems as if CUDA regards a call to cuModuleUnload
(or unloading the last module?) as indication that the device context should
be destroyed - or, at least, afterwards calling cuCtxGetDevice will return
CUDA_ERROR_DEINITIALIZED.

However, this I don't understand -- but would like to.  Are you saying
that for:

 --- libgomp/plugin/plugin-nvptx.c
 +++ libgomp/plugin/plugin-nvptx.c
 @@ -1556,8 +1556,16 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned 
version, const void *target_data)
  if (image->target_data == target_data)
{
*prev_p = image->next;
 -  if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS)
 +  CUresult r;
 +  r = CUDA_CALL_NOCHECK (cuModuleUnload, image->module);
 +  GOMP_PLUGIN_debug (0, "%s: cuModuleUnload: %s\n", __FUNCTION__, 
cuda_error (r));
 +  if (r != CUDA_SUCCESS)
  ret = false;
 +  CUdevice dev_;
 +  r = CUDA_CALL_NOCHECK (cuCtxGetDevice, &dev_);
 +  GOMP_PLUGIN_debug (0, "%s: cuCtxGetDevice: %s\n", __FUNCTION__, 
cuda_error (r));
 +  GOMP_PLUGIN_debug (0, "%s: dev_=%d, dev->dev=%d\n", __FUNCTION__, dev_, 
dev->dev);
 +  assert (dev_ == dev->dev);
free (image->fns);
free (image);
break;

..., you're seeing an error for 'libgomp.c/target-52.c' with
'env OMP_TARGET_OFFLOAD=mandatory OMP_DISPLAY_ENV=true'?  I get:

 GOMP_OFFLOAD_unload_image: cuModuleUnload: no error
 GOMP_OFFLOAD_unload_image: cuCtxGetDevice: no error
 GOMP_OFFLOAD_unload_image: dev_=0, dev->dev=0

Or, is something else happening in between the 'cuModuleUnload' and your
reportedly failing 'cuCtxGetDevice'?


I cluttered the plugin with "printf" debugging; hence, no other code
is calling *into* the run-time library as far as I can see.

But now I will try it with a vanilla code and your patch applied.

Result for the target-52.c with the env vars set:

DEBUG: GOMP_offload_unregister_ver dev=0; state=1
DEBUG: gomp_unload_image_from_device
DEBUG GOMP_OFFLOAD_unload_image, 0, 196609
GOMP_OFFLOAD_unload_image: cuModuleUnload: no error
GOMP_OFFLOAD_unload_image: cuCtxGetDevice: no error
GOMP_OFFLOAD_unload_image: dev_=0, dev->dev=0
DEBUG: gomp_target_fini; dev=0, state=1
DEBUG  0
DEBUG: nvptx_attach_host_thread_to_device - 0
DEBUG: ERROR nvptx_attach_host_thread_to_device - 0

libgomp: cuCtxGetDevice error: unknown cuda error

Hence: The immediately calling cuCtxGetDevice after
the device unloading does not fail.

But calling it soon late via gomp_target_fini
→ GOMP_OFFLOAD_fini_device → nvptx_attach_host_thread_to_device
does fail.

I have attached my printf patch for reference.

* * *


Re your PR113513 details, I don't see how your failure mode could be
related to (a) the PTX code ('--with-arch=sm_80'), or the GPU hardware
("NVIDIA RTX A1000 6GB") (..., unless the Nvidia Driver is doing "funny"
things, of course...), so could this possibly be due to a recent change
in the CUDA Driver/Nvidia Driver?  You say "CUDA Version: 12.3", but
which which Nvidia Driver version?  The latest I've now tested are:

 Driver Version: 525.147.05   CUDA Version: 12.0
 Driver Version: 535.154.05   CUDA Version: 12.2


My laptop has:

NVIDIA-SMI 545.29.06  Driver Version: 545.29.06    CUDA Version: 
12.3


I'd like to please defer that one until we understand the actual origin
of the misbehavior.
(I think that patch makes still sense, but first finding out what goes 
wrong is fine nonetheless.)

When reading the code, the following was observed in addition:
When gomp_fini_device is called, it invokes goacc_fini_a

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Kees Cook

On Mon, Jan 29, 2024 at 04:00:20PM +, Qing Zhao wrote:
> An update on the kernel building with my version 4 patch.
> 
> Kees reported two FE issues with the current version 4 patch:
> 
> 1. The operator “typeof” cannot return correct type for a->array;
> 2. The operator “&” cannot return correct address for a->array;
> 
> I fixed both in my local repository. 
> 
> With these additional fix.  Kernel with counted-by annotation can be built 
> successfully. 

Thanks for the fixes!

> 
> And then, Kees reported one behavioral issue with the current counted-by:
> 
> When the counted-by value is below zero, my current patch 
> 
> A. Didn’t report any warning for it.
> B. Accepted the negative value as a wrapped size.
> 
> i.e. for:
> 
> struct foo {
> signed char size;
> unsigned char array[] __counted_by(size);
> } *a;
> 
> ...
> a->size = -3;
> report(__builtin_dynamic_object_size(p->array, 1));
> 
> this reports 253, rather than 0.
> 
> And the array-bounds sanitizer doesn’t catch negative index bounds neither. 
> 
> a->size = -3;
> report(a->array[1]); // does not trap
> 
> 
> So, my questions are:
> 
>  How should we handle the negative counted-by value?

Treat it as always 0-bounded: count < 0 ? 0 : count

> 
>  My approach is:
> 
>I think that this is a user error, the compiler need to Issue warning 
> during runtime about this user error.
> 
> Since I have one remaining patch that has not been finished yet:
> 
> 6  Emit warnings when the user breaks the requirments for the new counted_by 
> attribute
>   compilation time: -Wcounted-by
>   run time: -fsanitizer=counted-by
>  * The initialization to the size field should be done before the first 
> reference to the FAM field.

I would hope that regular compile-time warnings would catch this.

>  * the array has at least # of elements specified by the size field all 
> the time during the program.
>  * the value of counted-by should not be negative.

This seems reasonable for a very strict program, but it won't work for
the kernel as-is: a negative "count" is sometimes used to carry failure
details back to other users of the structure. This could be refactored in
the kernel, but I'd prefer that even without -fsanitizer=counted-by the
runtime behaviors will be "safe".

It does not seem sensible to me that adding a buffer size validation
primitive to GCC will result in conditions where a size calculation
will wrap around. I prefer no surprises. :)

> Let me know your comment and suggestions.

Clang has implemented the safety logic I'd prefer:

* __bdos will report 0 for any sizing where the "counted_by" count
  variable is negative. Effectively, the count variable is always
  processed as: count < 0 ? 0 : count

  struct foo {
int count;
short array[] __counted_by(count);
  } *p;

  __bdos(p->array, 1) ==> sizeof(*p->array) * (count < 0 ? 0 : count)

  The logic for this is that __bdos can be _certain_ that the size is 0
  when the count variable is pathological.

* -fsanitize=array-bounds similarly treats count as above, so that:

  printf("%d\n", p->array[index]); ==> trap when index > (count < 0 ? 0 : count)

  Same logic for the sanitizer: any access to the array when count is
  invalid means the access is invalid and must be trapped.

This means that software can run safely even in pathological conditions.

-Kees

-- 
Kees Cook

Re: [PATCH] Fortran: use name of array component in runtime error message [PR30802]

2024-01-29 Thread Harald Anlauf


Am 28.01.24 um 22:43 schrieb Steve Kargl:

On Sun, Jan 28, 2024 at 08:56:24PM +0100, Harald Anlauf wrote:


Am 28.01.24 um 12:39 schrieb Mikael Morin:

Le 24/01/2024 à 22:39, Harald Anlauf a écrit :

Dear all,

this patch is actually only a followup fix to generate the proper name
of an array reference in derived-type components for the runtime error
message generated for the bounds-checking code.  Without the proper
part ref, not only a user may get confused: I was, too...

The testcase is compile-only, as it is only important to check the
strings used in the error messages.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?



the change proper looks good, and is an improvement.  But I'm a little
concerned by the production of references like in the test x1%vv%z which
could be confusing and is strictly speaking invalid fortran (multiple
non-scalar components).  Did you consider generating x1%vv(?,?)%zz or
x1%vv(...)%z or similar?


yes, that seems very reasonable, given that this is what NAG does.

We also have spurious %_data in some error messages that I'll try
to get rid off.



I haven't looked at the patch, but sometimes (if not always) things
like _data are marked with attr.artificial.  You might see if this
will help with suppressing spurious messages.


I was talking about the generated format strings of runtime error
messages.

program p
  implicit none
  type t
 real :: zzz(10) = 42
  end type t
  class(t), allocatable :: xx(:)
  integer :: j
  j = 0
  allocate (t :: xx(1))
  print *, xx(1)% zzz(j)
end

This is generating the following error at runtime since at least gcc-7:

Fortran runtime error: Index '0' of dimension 1 of array 'xx%_data%zzz'
below lower bound of 1

I believe you were recalling bogus warnings at compile time.
There are no warnings here, and there shouldn't.

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread H.J. Lu

On Mon, Jan 29, 2024 at 9:00 AM Jakub Jelinek  wrote:
>
> On Mon, Jan 29, 2024 at 08:45:45AM -0800, H.J. Lu wrote:
> > In this case, these are internal to the same comdat group:
>
> But that is only by accident, no?

This may be by luck.  I don't know if gcc checks it when
generating such references.

> I mean, if you need to refer to such a symbol from
> non-comdat function or comdat function in a different comdat group
> and RA decides it wants the constant in memory rather than code?
> Your patch uses
>   if (decl)
> return targetm.asm_out.function_rodata_section (decl, ???);
> and default_function_rodata_section only looks at comdat group of the
> passed in decl.  But the decl here is what the constant refers to, not
> who is referring it.
>
> Jakub
>


-- 
H.J.

[pushed] c++: local class in generic lambda [PR113544]

2024-01-29 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

My earlier commit r14-278-gd60cbbfaa9a3ad was a start toward better
handling of local classes in generic lambdas, but isn't actually useful by
itself and breaks this testcase, so let's revert it for now.

PR c++/113544

gcc/cp/ChangeLog:

* pt.cc (instantiate_class_template): Don't partially instantiate.
(tsubst_stmt): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/lambda-generic-nested3.C: New test.
---
 gcc/cp/pt.cc   | 14 +-
 .../g++.dg/cpp1y/lambda-generic-nested3.C  | 11 +++
 2 files changed, 16 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-generic-nested3.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index f5bf159a879..fb2448a26e9 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -12226,8 +12226,7 @@ instantiate_class_template (tree type)
 return error_mark_node;
 
   if (COMPLETE_OR_OPEN_TYPE_P (type)
-  || (uses_template_parms (type)
- && !TYPE_FUNCTION_SCOPE_P (type)))
+  || uses_template_parms (type))
 return type;
 
   /* Figure out which template is being instantiated.  */
@@ -18893,7 +18892,10 @@ tsubst_stmt (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
 
 case TAG_DEFN:
   tmp = tsubst (TREE_TYPE (t), args, complain, NULL_TREE);
-  if (CLASS_TYPE_P (tmp))
+  if (dependent_type_p (tmp))
+   /* This is a partial instantiation, try again when full.  */
+   add_stmt (build_min (TAG_DEFN, tmp));
+  else if (CLASS_TYPE_P (tmp))
{
  /* Local classes are not independent templates; they are
 instantiated along with their containing function.  And this
@@ -18902,12 +18904,6 @@ tsubst_stmt (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
  /* Closures are handled by the LAMBDA_EXPR.  */
  gcc_assert (!LAMBDA_TYPE_P (TREE_TYPE (t)));
  complete_type (tmp);
- if (dependent_type_p (tmp))
-   {
- /* This is a partial instantiation, try again when full.  */
- add_stmt (build_min (TAG_DEFN, tmp));
- break;
-   }
  tree save_ccp = current_class_ptr;
  tree save_ccr = current_class_ref;
  for (tree fld = TYPE_FIELDS (tmp); fld; fld = DECL_CHAIN (fld))
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-nested3.C 
b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-nested3.C
new file mode 100644
index 000..27655274a87
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-nested3.C
@@ -0,0 +1,11 @@
+// PR c++/113544
+// { dg-do compile { target c++14 } }
+
+template
+void f() {
+  [](auto parm) {
+struct type : decltype(parm) { };
+  };
+}
+
+template void f();

base-commit: f3fdaa3eecd155dbdc78c1ec9a259dfa4e379ea4
-- 
2.39.3

[PATCH v2] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls

2024-01-29 Thread Kwok Cheung Yeung


Can you please akso update the comments to talk about hashtab instead of splay?



Hello

This version has the comments updated and removes a stray 'volatile' in 
the #ifdefed out code.


Thanks

KwokFrom 5737298f4f5e5471667b05e207b22c9c91b94ca0 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Mon, 29 Jan 2024 17:40:04 +
Subject: [PATCH 1/2] openmp: Change to using a hashtab to lookup offload
 target addresses for indirect function calls

A splay-tree was previously used to lookup equivalent target addresses
for a given host address on offload targets. However, as splay-trees can
modify their structure on lookup, they are not suitable for concurrent
access from separate teams/threads without some form of locking.  This
patch changes the lookup data structure to a hashtab instead, which does
not have these issues.

The call to build_indirect_map to initialize the data structure is now
called from just the first thread of the first team to avoid redundant
calls to this function.

2024-01-29  Kwok Cheung Yeung  

libgomp/
* config/accel/target-indirect.c: Include string.h and hashtab.h.
Remove include of splay-tree.h.  Update comments.
(splay_tree_prefix, splay_tree_c): Delete.
(struct indirect_map_t): New.
(hash_entry_type, htab_alloc, htab_free, htab_hash, htab_eq): New.
(GOMP_INDIRECT_ADD_MAP): Remove volatile qualifier.
(USE_SPLAY_TREE_LOOKUP): Rename to...
(USE_HASHTAB_LOOKUP): ..this.
(indirect_map, indirect_array): Delete.
(indirect_htab): New.
(build_indirect_map): Remove locking.  Build indirect map using
hashtab.
(GOMP_target_map_indirect_ptr): Use indirect_htab to lookup target
address.
(GOMP_target_map_indirect_ptr): Remove volatile qualifier.
* config/gcn/team.c (gomp_gcn_enter_kernel): Call build_indirect_map
from first thread of first team only.
* config/nvptx/team.c (gomp_nvptx_main): Likewise.
* testsuite/libgomp.c-c++-common/declare-target-indirect-2.c (main):
Add missing break statements.
---
 libgomp/config/accel/target-indirect.c| 83 ++-
 libgomp/config/gcn/team.c |  7 +-
 libgomp/config/nvptx/team.c   |  9 +-
 .../declare-target-indirect-2.c   | 14 ++--
 4 files changed, 63 insertions(+), 50 deletions(-)

diff --git a/libgomp/config/accel/target-indirect.c 
b/libgomp/config/accel/target-indirect.c
index c60fd547cb6..cfef1ddbc49 100644
--- a/libgomp/config/accel/target-indirect.c
+++ b/libgomp/config/accel/target-indirect.c
@@ -25,60 +25,73 @@
.  */
 
 #include 
+#include 
 #include "libgomp.h"
 
-#define splay_tree_prefix indirect
-#define splay_tree_c
-#include "splay-tree.h"
+struct indirect_map_t
+{
+  void *host_addr;
+  void *target_addr;
+};
+
+typedef struct indirect_map_t *hash_entry_type;
+
+static inline void * htab_alloc (size_t size) { return gomp_malloc (size); }
+static inline void htab_free (void *ptr) { free (ptr); }
+
+#include "hashtab.h"
+
+static inline hashval_t
+htab_hash (hash_entry_type element)
+{
+  return hash_pointer (element->host_addr);
+}
 
-volatile void **GOMP_INDIRECT_ADDR_MAP = NULL;
+static inline bool
+htab_eq (hash_entry_type x, hash_entry_type y)
+{
+  return x->host_addr == y->host_addr;
+}
 
-/* Use a splay tree to lookup the target address instead of using a
-   linear search.  */
-#define USE_SPLAY_TREE_LOOKUP
+void **GOMP_INDIRECT_ADDR_MAP = NULL;
 
-#ifdef USE_SPLAY_TREE_LOOKUP
+/* Use a hashtab to lookup the target address instead of using a linear
+   search.  */
+#define USE_HASHTAB_LOOKUP
 
-static struct indirect_splay_tree_s indirect_map;
-static indirect_splay_tree_node indirect_array = NULL;
+#ifdef USE_HASHTAB_LOOKUP
 
-/* Build the splay tree used for host->target address lookups.  */
+static htab_t indirect_htab = NULL;
+
+/* Build the hashtab used for host->target address lookups.  */
 
 void
 build_indirect_map (void)
 {
   size_t num_ind_funcs = 0;
-  volatile void **map_entry;
-  static int lock = 0; /* == gomp_mutex_t lock; gomp_mutex_init (&lock); */
+  void **map_entry;
 
   if (!GOMP_INDIRECT_ADDR_MAP)
 return;
 
-  gomp_mutex_lock (&lock);
-
-  if (!indirect_array)
+  if (!indirect_htab)
 {
   /* Count the number of entries in the NULL-terminated address map.  */
   for (map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
   map_entry += 2, num_ind_funcs++);
 
-  /* Build splay tree for address lookup.  */
-  indirect_array = gomp_malloc (num_ind_funcs * sizeof (*indirect_array));
-  indirect_splay_tree_node array = indirect_array;
+  /* Build hashtab for address lookup.  */
+  indirect_htab = htab_create (num_ind_funcs);
   map_entry = GOMP_INDIRECT_ADDR_MAP;
 
-  for (int i = 0; i < num_ind_funcs; i++, array++)
+  for (int i = 0; i < num_ind_funcs; i++, map_entry += 2)
{
- i

[Committed] RISC-V: Add require-effective-target to pr113429 testcase

2024-01-29 Thread Patrick O'Neill


Committed.

Thanks for catching this.
Patrick

On 1/28/24 19:41, juzhe.zh...@rivai.ai wrote:

ok



juzhe.zh...@rivai.ai

*From:* Patrick O'Neill 
*Date:* 2024-01-27 10:50
*To:* gcc-patches 
*CC:* juzhe.zhong ; Patrick O'Neill

*Subject:* [PATCH] RISC-V: Add require-effective-target to
pr113429 testcase
The pr113429 testcase fails with newlib spike runs. Adding
require-effective-target rv64 and riscv_v fixes the issue.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr113429.c: Add
require-effective-target rv64 and riscv_v
Signed-off-by: Patrick O'Neill 
---
Tested using rv64gc newlib spike.
---
gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
index 05c3eeecb94..a7f5db616d8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
@@ -1,5 +1,7 @@
/* { dg-do run } */
/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3" } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-require-effective-target riscv_v } */
long a;
int b, c, d, e, f, g;
-- 
2.34.1

[COMMITTED] bpf: emit empty epilogues in naked functions

2024-01-29 Thread Jose E. Marchesi

This patch fixes the BPF backend to not generate `exit' (return)
instructions in epilogues of functions that are declared as naked via
the corresponding compiler attribute.  Having extra exit instructions
upsets the kernel BPF verifier.

Tested in bpf-unknown-none target in x86_64-linux-gnu host.

gcc/ChangeLog

* config/bpf/bpf.cc (bpf_expand_epilogue): Do not emit a return
instruction in naked function epilogues.

gcc/testsuite/ChangeLog

* gcc.target/bpf/naked-1.c: Update test to not expect an exit
instruction in naked function.
* gcc.target/bpf/naked-2.c: New test.
---
 gcc/config/bpf/bpf.cc  |  5 ++---
 gcc/testsuite/gcc.target/bpf/naked-1.c |  1 -
 gcc/testsuite/gcc.target/bpf/naked-2.c | 10 ++
 3 files changed, 12 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/naked-2.c

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 9af1728d852..d6ca47eeecb 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -420,9 +420,8 @@ bpf_expand_epilogue (void)
   /* See note in bpf_expand_prologue for an explanation on why we are
  not restoring callee-saved registers in BPF.  */
 
-  /* If we ever need to do anything else than just generating a return
- instruction here, please mind the `naked' function attribute.  */
-
+  if (lookup_attribute ("naked", DECL_ATTRIBUTES (cfun->decl)) != NULL_TREE)
+return;
   emit_jump_insn (gen_exit ());
 }
 
diff --git a/gcc/testsuite/gcc.target/bpf/naked-1.c 
b/gcc/testsuite/gcc.target/bpf/naked-1.c
index cbbc4c51697..dc8ac2619cc 100644
--- a/gcc/testsuite/gcc.target/bpf/naked-1.c
+++ b/gcc/testsuite/gcc.target/bpf/naked-1.c
@@ -9,4 +9,3 @@ int __attribute__((naked)) foo()
   __asm__ volatile ("@ naked");
 }
 /* { dg-final { scan-assembler "\t@ naked" } } */
-/* { dg-final { scan-assembler "\texit\n" } } */
diff --git a/gcc/testsuite/gcc.target/bpf/naked-2.c 
b/gcc/testsuite/gcc.target/bpf/naked-2.c
new file mode 100644
index 000..25aebf84755
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/naked-2.c
@@ -0,0 +1,10 @@
+/* Verify that __attribute__((naked)) produces functions without implicit
+   `exit' instructions in the epilogue.  */
+/* { dg-do compile } */
+/* { dg-options "-O0" } */
+
+int __attribute__((naked)) foo()
+{
+  __asm__ volatile ("exit");
+}
+/* { dg-final { scan-assembler-times "\texit" 1 } } */
-- 
2.30.2

Re: [PATCH] testsuite: no dfp run without dfprt

2024-01-29 Thread Alexandre Oliva

On Jan 24, 2024, Jeff Law  wrote:

> OK

Thanks.  FTR, there were typos (s/ṕ/p/g) and a missing entry for
builtin-snan-1.c in the ChangeLog entries, that the ChangeLog checker
kindly pointed out.  Fixed below, just pushed along with the
otherwise-unchanged patch as r14-8505.

for  gcc/testsuite/ChangeLog

* c-c++-common/dfp/pr36800.c: Drop dg-do overrider.
* c-c++-common/dfp/pr39034.c: Likewise.
* c-c++-common/dfp/pr39035.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d32-1.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d32-2.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d64-1.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d64-2.c: Likewise.
* gcc.dg/dfp/builtin-snan-1.c: Likewise.
* gcc.dg/dfp/builtin-tgmath-dfp.c: Likewise.
* gcc.dg/dfp/c23-float-dfp-4.c: Likewise.
* gcc.dg/dfp/c23-float-dfp-5.c: Likewise.
* gcc.dg/dfp/c23-float-dfp-6.c: Likewise.
* gcc.dg/dfp/c23-float-dfp-7.c: Likewise.
* gcc.dg/dfp/pr108068.c: Likewise.
* gcc.dg/dfp/pr97439.c: Likewise.
* g++.dg/compat/decimal/pass-1_main.C: Require dfprt.
* g++.dg/compat/decimal/pass-2_main.C: Likewise.
* g++.dg/compat/decimal/pass-3_main.C: Likewise.
* g++.dg/compat/decimal/pass-4_main.C: Likewise.
* g++.dg/compat/decimal/pass-5_main.C: Likewise.
* g++.dg/compat/decimal/pass-6_main.C: Likewise.
* g++.dg/compat/decimal/return-1_main.C: Likewise.
* g++.dg/compat/decimal/return-2_main.C: Likewise.
* g++.dg/compat/decimal/return-3_main.C: Likewise.
* g++.dg/compat/decimal/return-4_main.C: Likewise.
* g++.dg/compat/decimal/return-5_main.C: Likewise.
* g++.dg/compat/decimal/return-6_main.C: Likewise.
* g++.dg/eh/dfp-1.C: Likewise.
* g++.dg/eh/dfp-2.C: Likewise.
* g++.dg/eh/dfp-saves-aarch64.C: Likewise.
* gcc.c-torture/execute/pr80692.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d128-1.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d128-2.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d128-3.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d128-4.c: Likewise.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: _GLIBCXX_DEBUG_BACKTRACE broken

2024-01-29 Thread François Dumont


I had missed it, thanks.

Here is a patch to fix debug mode doc then.

libstdc++: Fix _GLIBCXX_DEBUG_BACKTRACE macro documentation

libstdc++-v3/ChangeLog:

    * doc/xml/manual/debug_mode.xml: Link against libstdc++exp.a to use
    _GLIBCXX_DEBUG_BACKTRACE macro.

Ok to commit ?

François

On 29/01/2024 11:10, Jonathan Wakely wrote:

On Mon, 29 Jan 2024 at 06:13, François Dumont  wrote:

Hi

I'm trying to use _GLIBCXX_DEBUG_BACKTRACE to debug some crash in debug
mode.

So I buit library with --enable-libstdcxx-backtrace=yes

But when I build any test I have:

/usr/bin/ld: /tmp/cctvPvlb.o: in function
`__gnu_debug::_Error_formatter::_Error_formatter(char const*, unsigned
int, char const*)':
/home/fdumont/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/formatter.h:597:
undefined reference to `__glibcxx_backtrace_create_state'
/usr/bin/ld:
/home/fdumont/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/formatter.h:598:
undefined reference to `__glibcxx_backtrace_full'

-lstdc++_libbacktrace does not help as it cannot find it.

You need to use -lstdc++exp.a instead, as documented at
https://gcc.gnu.org/gcc-14/changes.html#libstdcxx

I changed this with
https://gcc.gnu.org/g:b96b554592c5cbb6a2c1797ffcb5706fd295f4fd
diff --git a/libstdc++-v3/doc/xml/manual/debug_mode.xml 
b/libstdc++-v3/doc/xml/manual/debug_mode.xml
index dadc0cd1bb4..ac15ef6f6d0 100644
--- a/libstdc++-v3/doc/xml/manual/debug_mode.xml
+++ b/libstdc++-v3/doc/xml/manual/debug_mode.xml
@@ -165,8 +165,8 @@ which always works correctly.
   It requires that you configure libstdc++ build with
   --enable-libstdcxx-backtrace=yes.
   Use -D_GLIBCXX_DEBUG_BACKTRACE to activate it.
-  You'll then have to link with libstdc++_libbacktrace static library
-  (-lstdc++_libbacktrace) to build your application.
+  You'll then have to link against libstdc++exp static library
+  (-lstdc++exp) to build your application.
 
 
 Using a Specific Debug Container

Re: [PATCH] aarch64: enforce lane checking for intrinsics

2024-01-29 Thread Alexandre Oliva

On Jan 23, 2024, Richard Sandiford  wrote:

> Performing the check in expand is itself wrong

*nod*

> So I think we should enforce the immediate range within the frontend
> instead, via TARGET_CHECK_BUILTIN_CALL.

Sounds good.  Can that accommodate the existing uses in always_inline
wrappers?

> Unfortunately that isn't suitable for stage 4 though.

ACK.  Is there a partial implementation of that?  I might get a chance
to take it to completion, even if it doesn't make gcc 14.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: _GLIBCXX_DEBUG_BACKTRACE broken

2024-01-29 Thread Jonathan Wakely

On Mon, 29 Jan 2024 at 18:30, François Dumont  wrote:
>
> I had missed it, thanks.
>
> Here is a patch to fix debug mode doc then.
>
> libstdc++: Fix _GLIBCXX_DEBUG_BACKTRACE macro documentation
>
> libstdc++-v3/ChangeLog:
>
>  * doc/xml/manual/debug_mode.xml: Link against libstdc++exp.a to use
>  _GLIBCXX_DEBUG_BACKTRACE macro.
>
> Ok to commit ?

Yes please - thanks.


>
> François
>
> On 29/01/2024 11:10, Jonathan Wakely wrote:
> > On Mon, 29 Jan 2024 at 06:13, François Dumont  wrote:
> >> Hi
> >>
> >> I'm trying to use _GLIBCXX_DEBUG_BACKTRACE to debug some crash in debug
> >> mode.
> >>
> >> So I buit library with --enable-libstdcxx-backtrace=yes
> >>
> >> But when I build any test I have:
> >>
> >> /usr/bin/ld: /tmp/cctvPvlb.o: in function
> >> `__gnu_debug::_Error_formatter::_Error_formatter(char const*, unsigned
> >> int, char const*)':
> >> /home/fdumont/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/formatter.h:597:
> >> undefined reference to `__glibcxx_backtrace_create_state'
> >> /usr/bin/ld:
> >> /home/fdumont/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/formatter.h:598:
> >> undefined reference to `__glibcxx_backtrace_full'
> >>
> >> -lstdc++_libbacktrace does not help as it cannot find it.
> > You need to use -lstdc++exp.a instead, as documented at
> > https://gcc.gnu.org/gcc-14/changes.html#libstdcxx
> >
> > I changed this with
> > https://gcc.gnu.org/g:b96b554592c5cbb6a2c1797ffcb5706fd295f4fd
> >

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Qing Zhao



> On Jan 29, 2024, at 12:25 PM, Kees Cook  wrote:
> 
> On Mon, Jan 29, 2024 at 04:00:20PM +, Qing Zhao wrote:
>> An update on the kernel building with my version 4 patch.
>> 
>> Kees reported two FE issues with the current version 4 patch:
>> 
>> 1. The operator “typeof” cannot return correct type for a->array;
>> 2. The operator “&” cannot return correct address for a->array;
>> 
>> I fixed both in my local repository. 
>> 
>> With these additional fix.  Kernel with counted-by annotation can be built 
>> successfully. 
> 
> Thanks for the fixes!
> 
>> 
>> And then, Kees reported one behavioral issue with the current counted-by:
>> 
>> When the counted-by value is below zero, my current patch 
>> 
>> A. Didn’t report any warning for it.
>> B. Accepted the negative value as a wrapped size.
>> 
>> i.e. for:
>> 
>> struct foo {
>> signed char size;
>> unsigned char array[] __counted_by(size);
>> } *a;
>> 
>> ...
>> a->size = -3;
>> report(__builtin_dynamic_object_size(p->array, 1));
>> 
>> this reports 253, rather than 0.
>> 
>> And the array-bounds sanitizer doesn’t catch negative index bounds neither. 
>> 
>> a->size = -3;
>> report(a->array[1]); // does not trap
>> 
>> 
>> So, my questions are:
>> 
>> How should we handle the negative counted-by value?
> 
> Treat it as always 0-bounded: count < 0 ? 0 : count

Then the size of the object is 0?

> 
>> 
>> My approach is:
>> 
>>   I think that this is a user error, the compiler need to Issue warning 
>> during runtime about this user error.
>> 
>> Since I have one remaining patch that has not been finished yet:
>> 
>> 6  Emit warnings when the user breaks the requirments for the new counted_by 
>> attribute
>>  compilation time: -Wcounted-by
>>  run time: -fsanitizer=counted-by
>> * The initialization to the size field should be done before the first 
>> reference to the FAM field.
> 
> I would hope that regular compile-time warnings would catch this.
If the value is known at compile-time, then compile-time should catch it.

> 
>> * the array has at least # of elements specified by the size field all 
>> the time during the program.
>> * the value of counted-by should not be negative.
> 
> This seems reasonable for a very strict program, but it won't work for
> the kernel as-is: a negative "count" is sometimes used to carry failure
> details back to other users of the structure. This could be refactored in
> the kernel, but I'd prefer that even without -fsanitizer=counted-by the
> runtime behaviors will be "safe".

So, In the kernel’s source code, for example:

struct foo {
  int count;
  short array[] __counted_by(count);
};

The field “count” will be used for two purposes:
A. As the counted_by for the “array” when its value > 0;
B. As an errno when its value < 0;  under such condition, the size of “array” 
is zero. 

Is the understanding correct?

Is doing this for saving space?  (Curious -:)


> 
> It does not seem sensible to me that adding a buffer size validation
> primitive to GCC will result in conditions where a size calculation
> will wrap around. I prefer no surprises. :)

Might be a bug here. I guess. 
> 
>> Let me know your comment and suggestions.
> 
> Clang has implemented the safety logic I'd prefer:
> 
> * __bdos will report 0 for any sizing where the "counted_by" count
>  variable is negative. Effectively, the count variable is always
>  processed as: count < 0 ? 0 : count
> 
>  struct foo {
> int count;
> short array[] __counted_by(count);
>  } *p;
> 
>  __bdos(p->array, 1) ==> sizeof(*p->array) * (count < 0 ? 0 : count)

NOTE,  __bdo will use value 0 as UNKNOWN_SIZE for MINMUM SIZE query, i.e:

size_t __builtin_object_size (const void * ptr, int type)

Will return 0 as UNKNOW_SIZE when type= 2 or 3.

So, I am wondering: should  the 0 here is  UNKNOWN_SIZE or 0 size?

I guess should be the UNKNOWN_SIZE?  (I,e, -1 for MAXIMUM type,  0 for MINIMUM 
type).

i.e, when the value of “count” is 0 or negative,  the __bdos will return 
UNKNOWN_SIZE.  Is this correct?

> 
>  The logic for this is that __bdos can be _certain_ that the size is 0
>  when the count variable is pathological.


> 
> * -fsanitize=array-bounds similarly treats count as above, so that:
> 
>  printf("%d\n", p->array[index]); ==> trap when index > (count < 0 ? 0 : 
> count)
> 
>  Same logic for the sanitizer: any access to the array when count is
>  invalid means the access is invalid and must be trapped.

Okay, when the value of “count” is 0 or negative, bound sanitizer will report 
out-of-bound (or trap) for any access to the array. 
This should be reasonable.

Qing


> 
> 
> This means that software can run safely even in pathological conditions.
> 
> -Kees
> 
> -- 
> Kees Cook

[PATCH] RISC-V: Fix rvv intrinsic pragma tests dejagnu selector

2024-01-29 Thread Edwin Lu

Adding rvv related flags (i.e. --param=riscv-autovec-preference) to
non vector targets bypassed the dejagnu skip test directive. Change the
target selector to skip if rvv is enabled

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-1.c: change selector
* gcc.target/riscv/rvv/base/pragma-2.c: ditto
* gcc.target/riscv/rvv/base/pragma-3.c: ditto

Signed-off-by: Edwin Lu 
---
 gcc/testsuite/gcc.target/riscv/rvv/base/abi-1.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/pragma-2.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/pragma-3.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/abi-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/abi-1.c
index 2eef9e1e1a8..a072bdd47bf 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/abi-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/abi-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! riscv_xtheadvector } } } */
-/* { dg-skip-if "test rvv intrinsic" { *-*-* } { "*" } { "-march=rv*v*" } } */
+/* { dg-skip-if "test rvv intrinsic" { ! riscv_v } } */
 
 void foo0 () {__rvv_bool64_t t;}
 void foo1 () {__rvv_bool32_t t;}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-2.c
index fd2aa3066cd..fc1bb13c53d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-2.c
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
-/* { dg-skip-if "test rvv intrinsic" { *-*-* } { "*" } { "-march=rv*v*" } } */
+/* { dg-skip-if "test rvv intrinsic" { ! riscv_v } } */
 
 #pragma riscv intrinsic "vector"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-3.c
index 96a0e051a29..45580bb2faa 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-3.c
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
-/* { dg-skip-if "test rvv intrinsic" { *-*-* } { "*" } { "-march=rv*v*" } } */
+/* { dg-skip-if "test rvv intrinsic" { ! riscv_v } */
 
 #pragma riscv intrinsic "report-error" /* { dg-error {unknown '#pragma riscv 
intrinsic' option 'report-error'} } */
-- 
2.34.1

Re: [PATCH] x86: Generate REG_CFA_UNDEFINED for unsaved callee-saved registers

2024-01-29 Thread H.J. Lu

On Mon, Jan 29, 2024 at 8:30 AM Jakub Jelinek  wrote:
>
> On Mon, Jan 29, 2024 at 08:00:26AM -0800, H.J. Lu wrote:
> > Attach REG_CFA_UNDEFINED notes for unsaved callee-saved registers which
> > have been used in the function to an instruction in prologue.
> >
> > gcc/
> >
> >   PR target/38534
> >   * dwarf2cfi.cc (add_cfi_undefined): New.
> >   (dwarf2out_frame_debug_cfa_undefined): Likewise.
> >   (dwarf2out_frame_debug): Handle REG_CFA_UNDEFINED.
> >   * reg-notes.def (REG_CFA_UNDEFINED): New.
> >   * config/i386/i386.cc (ix86_expand_prologue): Attach
> >   REG_CFA_UNDEFINED notes for unsaved callee-saved registers
> >   which have been used in the function to an instruction in
> >   prologue.
> >
> > gcc/testsuite/
> >
> >   PR target/38534
> >   * gcc.target/i386/no-callee-saved-19.c: New test.
> >   * gcc.target/i386/no-callee-saved-20.c: Likewise.
> >   * gcc.target/i386/pr38534-7.c: Likewise.
> >   * gcc.target/i386/pr38534-8.c: Likewise.
> > ---
> >  gcc/config/i386/i386.cc   | 20 +++
> >  gcc/dwarf2cfi.cc  | 55 +++
> >  gcc/reg-notes.def |  4 ++
> >  .../gcc.target/i386/no-callee-saved-19.c  | 17 ++
> >  .../gcc.target/i386/no-callee-saved-20.c  | 12 
> >  gcc/testsuite/gcc.target/i386/pr38534-7.c | 18 ++
> >  gcc/testsuite/gcc.target/i386/pr38534-8.c | 13 +
> >  7 files changed, 139 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-19.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-20.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-7.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-8.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index b3e7c74846e..6ec87b6a16f 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -9304,6 +9304,26 @@ ix86_expand_prologue (void)
> >   combined with prologue modifications.  */
> >if (TARGET_SEH)
> >  emit_insn (gen_prologue_use (stack_pointer_rtx));
> > +
> > +  if (cfun->machine->call_saved_registers
> > +  != TYPE_NO_CALLEE_SAVED_REGISTERS)
> > +return;
> > +
> > +  insn = get_insns ();
> > +  if (!insn)
> > +return;
>
> You can't attach the notes to a random instruction that happens to be first
> in the function.
> 1) it needs to be a real instruction, not a note

Will fix it.

> 2) it needs to be RTX_FRAME_RELATED_P

This should work:

  insn = nullptr;
  rtx_insn *next;
  for (next = get_insns (); next; next = NEXT_INSN (next))
{
  if (!RTX_FRAME_RELATED_P (next))
continue;
  insn = next;
}

  if (!insn)
return;

> 3) if it is RTX_FRAME_RELATED_P, but doesn't contain any previous REG_CFA_*
>notes:
>3a) if it has REG_FRAME_RELATED_EXPR note, then I believe just that
>note argument is processed instead of the instruction pattern and
>I think REG_CFA_* notes which precede REG_FRAME_RELATED_EXPR are
>processed, but REG_CFA_* notes after it are not; so adding
>REG_CFA_UNDEFINED notes at least if the adding is after the existing
>notes instead of before them may be problematic

Since register note is added to the head:

/* Add register note with kind KIND and datum DATUM to INSN.  */

void
add_reg_note (rtx insn, enum reg_note kind, rtx datum)
{
  REG_NOTES (insn) = alloc_reg_note (kind, datum, REG_NOTES (insn));
}

it isn't an issue.

>3b) if it has neither REG_CFA_* nor REG_FRAME_RELATED_EXPR notes, then
>normally the pattern of the insn would be processed in dwarf2cfi.
>But with the REG_CFA_* notes that part will be ignored.
>
> > --- a/gcc/dwarf2cfi.cc
> > +++ b/gcc/dwarf2cfi.cc
> > @@ -517,6 +517,17 @@ add_cfi_restore (unsigned reg)
> >add_cfi (cfi);
> >  }
> >
>
> Function comment missing.

Will fix it in the v3 patch.

Thanks.

> > +static void
> > +add_cfi_undefined (unsigned reg)
> > +{
> > +  dw_cfi_ref cfi = new_cfi ();
> > +
> > +  cfi->dw_cfi_opc = DW_CFA_undefined;
> > +  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = reg;
> > +
> > +  add_cfi (cfi);
> > +}
> > +
> >  /* Perform ROW->REG_SAVE[COLUMN] = CFI.  CFI may be null, indicating
> > that the register column is no longer saved.  */
>
> Jakub
>


-- 
H.J.

[PATCH v2] x86: Generate REG_CFA_UNDEFINED for unsaved callee-saved registers

2024-01-29 Thread H.J. Lu

Changes in v2:

1. Add REG_CFA_UNDEFINED notes to a frame-related instruction in prologue.
2. Add comments for add_cfi_undefined.

---
Attach REG_CFA_UNDEFINED notes for unsaved callee-saved registers which
have been used in the function to a frame-related instruction in prologue.

gcc/

PR target/38534
* dwarf2cfi.cc (add_cfi_undefined): New.
(dwarf2out_frame_debug_cfa_undefined): Likewise.
(dwarf2out_frame_debug): Handle REG_CFA_UNDEFINED.
* reg-notes.def (REG_CFA_UNDEFINED): New.
* config/i386/i386.cc (ix86_expand_prologue): Attach
REG_CFA_UNDEFINED notes for unsaved callee-saved registers
which have been used in the function to a frame-related
instruction in prologue.

gcc/testsuite/

PR target/38534
* gcc.target/i386/no-callee-saved-19.c: New test.
* gcc.target/i386/no-callee-saved-20.c: Likewise.
* gcc.target/i386/pr38534-7.c: Likewise.
* gcc.target/i386/pr38534-8.c: Likewise.
---
 gcc/config/i386/i386.cc   | 29 ++
 gcc/dwarf2cfi.cc  | 58 +++
 gcc/reg-notes.def |  4 ++
 .../gcc.target/i386/no-callee-saved-19.c  | 17 ++
 .../gcc.target/i386/no-callee-saved-20.c  | 12 
 gcc/testsuite/gcc.target/i386/pr38534-7.c | 18 ++
 gcc/testsuite/gcc.target/i386/pr38534-8.c | 13 +
 7 files changed, 151 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-8.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b3e7c74846e..4b7026f3ab4 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -9304,6 +9304,35 @@ ix86_expand_prologue (void)
  combined with prologue modifications.  */
   if (TARGET_SEH)
 emit_insn (gen_prologue_use (stack_pointer_rtx));
+
+  if (cfun->machine->call_saved_registers
+  != TYPE_NO_CALLEE_SAVED_REGISTERS)
+return;
+
+  /* Attach REG_CFA_UNDEFINED notes for unsaved callee-saved registers
+ which have been used in the function to a frame-related instruction
+ in prologue.  */
+
+  insn = nullptr;
+  rtx_insn *next;
+  for (next = get_insns (); next; next = NEXT_INSN (next))
+{
+  if (!RTX_FRAME_RELATED_P (next))
+   continue;
+  insn = next;
+}
+
+  if (!insn)
+return;
+
+  for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+if (df_regs_ever_live_p (i)
+   && !fixed_regs[i]
+   && !call_used_regs[i]
+   && !STACK_REGNO_P (i)
+   && !MMX_REGNO_P (i))
+  add_reg_note (insn, REG_CFA_UNDEFINED,
+   gen_rtx_REG (word_mode, i));
 }
 
 /* Emit code to restore REG using a POP or POPP insn.  */
diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
index 1231b5bb5f0..9ba0ac07ee7 100644
--- a/gcc/dwarf2cfi.cc
+++ b/gcc/dwarf2cfi.cc
@@ -517,6 +517,20 @@ add_cfi_restore (unsigned reg)
   add_cfi (cfi);
 }
 
+/* Add DW_CFA_undefined either to the current insn stream or to a vector,
+   or both.  */
+
+static void
+add_cfi_undefined (unsigned reg)
+{
+  dw_cfi_ref cfi = new_cfi ();
+
+  cfi->dw_cfi_opc = DW_CFA_undefined;
+  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = reg;
+
+  add_cfi (cfi);
+}
+
 /* Perform ROW->REG_SAVE[COLUMN] = CFI.  CFI may be null, indicating
that the register column is no longer saved.  */
 
@@ -1532,6 +1546,37 @@ dwarf2out_frame_debug_cfa_restore (rtx reg, bool 
emit_cfi)
 }
 }
 
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_UNDEFINED
+   note.  */
+
+static void
+dwarf2out_frame_debug_cfa_undefined (rtx reg)
+{
+  gcc_assert (REG_P (reg));
+
+  rtx span = targetm.dwarf_register_span (reg);
+  if (!span)
+{
+  unsigned int regno = dwf_regno (reg);
+  add_cfi_undefined (regno);
+}
+  else
+{
+  /* We have a PARALLEL describing where the contents of REG live.
+Restore the register for each piece of the PARALLEL.  */
+  gcc_assert (GET_CODE (span) == PARALLEL);
+
+  const int par_len = XVECLEN (span, 0);
+  for (int par_index = 0; par_index < par_len; par_index++)
+   {
+ reg = XVECEXP (span, 0, par_index);
+ gcc_assert (REG_P (reg));
+ unsigned int regno = dwf_regno (reg);
+ add_cfi_undefined (regno);
+   }
+}
+}
+
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_WINDOW_SAVE.
 
??? Perhaps we should note in the CIE where windows are saved (instead
@@ -2326,6 +2371,19 @@ dwarf2out_frame_debug (rtx_insn *insn)
handled_one = true;
break;
 
+  case REG_CFA_UNDEFINED:
+   n = XEXP (note, 0);
+   if (n == nullptr)
+ {
+   n = PATTERN (insn);
+   if (GET_CODE (n) == PARALLEL)
+ n = XVECEXP (n, 0, 0);
+   n = XEXP (n, 0);
+ }
+

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Kees Cook

On Mon, Jan 29, 2024 at 07:32:06PM +, Qing Zhao wrote:
> 
> 
> > On Jan 29, 2024, at 12:25 PM, Kees Cook  wrote:
> > 
> > On Mon, Jan 29, 2024 at 04:00:20PM +, Qing Zhao wrote:
> >> An update on the kernel building with my version 4 patch.
> >> 
> >> Kees reported two FE issues with the current version 4 patch:
> >> 
> >> 1. The operator “typeof” cannot return correct type for a->array;
> >> 2. The operator “&” cannot return correct address for a->array;
> >> 
> >> I fixed both in my local repository. 
> >> 
> >> With these additional fix.  Kernel with counted-by annotation can be built 
> >> successfully. 
> > 
> > Thanks for the fixes!
> > 
> >> 
> >> And then, Kees reported one behavioral issue with the current counted-by:
> >> 
> >> When the counted-by value is below zero, my current patch 
> >> 
> >> A. Didn’t report any warning for it.
> >> B. Accepted the negative value as a wrapped size.
> >> 
> >> i.e. for:
> >> 
> >> struct foo {
> >> signed char size;
> >> unsigned char array[] __counted_by(size);
> >> } *a;
> >> 
> >> ...
> >> a->size = -3;
> >> report(__builtin_dynamic_object_size(p->array, 1));
> >> 
> >> this reports 253, rather than 0.
> >> 
> >> And the array-bounds sanitizer doesn’t catch negative index bounds 
> >> neither. 
> >> 
> >> a->size = -3;
> >> report(a->array[1]); // does not trap
> >> 
> >> 
> >> So, my questions are:
> >> 
> >> How should we handle the negative counted-by value?
> > 
> > Treat it as always 0-bounded: count < 0 ? 0 : count
> 
> Then the size of the object is 0?

That would be the purpose, yes. It's possible something else has
happened, but it would mean "the array contents should not be accessed
(since we don't have a valid size)".

> 
> > 
> >> 
> >> My approach is:
> >> 
> >>   I think that this is a user error, the compiler need to Issue warning 
> >> during runtime about this user error.
> >> 
> >> Since I have one remaining patch that has not been finished yet:
> >> 
> >> 6  Emit warnings when the user breaks the requirments for the new 
> >> counted_by attribute
> >>  compilation time: -Wcounted-by
> >>  run time: -fsanitizer=counted-by
> >> * The initialization to the size field should be done before the first 
> >> reference to the FAM field.
> > 
> > I would hope that regular compile-time warnings would catch this.
> If the value is known at compile-time, then compile-time should catch it.
> 
> > 
> >> * the array has at least # of elements specified by the size field all 
> >> the time during the program.
> >> * the value of counted-by should not be negative.
> > 
> > This seems reasonable for a very strict program, but it won't work for
> > the kernel as-is: a negative "count" is sometimes used to carry failure
> > details back to other users of the structure. This could be refactored in
> > the kernel, but I'd prefer that even without -fsanitizer=counted-by the
> > runtime behaviors will be "safe".
> 
> So, In the kernel’s source code, for example:
> 
> struct foo {
>   int count;
>   short array[] __counted_by(count);
> };
> 
> The field “count” will be used for two purposes:
> A. As the counted_by for the “array” when its value > 0;
> B. As an errno when its value < 0;  under such condition, the size of “array” 
> is zero. 
> 
> Is the understanding correct?

Yes.

> Is doing this for saving space?  (Curious -:)

It seems so, yes.

> > It does not seem sensible to me that adding a buffer size validation
> > primitive to GCC will result in conditions where a size calculation
> > will wrap around. I prefer no surprises. :)
> 
> Might be a bug here. I guess. 
> > 
> >> Let me know your comment and suggestions.
> > 
> > Clang has implemented the safety logic I'd prefer:
> > 
> > * __bdos will report 0 for any sizing where the "counted_by" count
> >  variable is negative. Effectively, the count variable is always
> >  processed as: count < 0 ? 0 : count
> > 
> >  struct foo {
> > int count;
> > short array[] __counted_by(count);
> >  } *p;
> > 
> >  __bdos(p->array, 1) ==> sizeof(*p->array) * (count < 0 ? 0 : count)
> 
> NOTE,  __bdo will use value 0 as UNKNOWN_SIZE for MINMUM SIZE query, i.e:
> 
> size_t __builtin_object_size (const void * ptr, int type)
> 
> Will return 0 as UNKNOW_SIZE when type= 2 or 3.
> 
> So, I am wondering: should  the 0 here is  UNKNOWN_SIZE or 0 size?
> 
> I guess should be the UNKNOWN_SIZE?  (I,e, -1 for MAXIMUM type,  0 for 
> MINIMUM type).
> 
> i.e, when the value of “count” is 0 or negative,  the __bdos will return 
> UNKNOWN_SIZE.  Is this correct?

I would suggest that a negative count should always return 0. The size
isn't "unknown", the "count" has been clamped to 0 to avoid surprises,
so the result is as if the "count" had a zero value.

> Okay, when the value of “count” is 0 or negative, bound sanitizer will report 
> out-of-bound (or trap) for any access to the array. 
> This should be reasonable.

Thanks! And with __bdos() following this logic there won't be a disconnect
between the two. i.e

Re: Fix ICE with -g and -std=c23 when forming composite types [PR113438]

2024-01-29 Thread Joseph Myers

On Sat, 27 Jan 2024, Martin Uecker wrote:

> Debug output ICEs when we do not set TYPE_STUB_DECL, fix this.
> 
> 
> Fix ICE with -g and -std=c23 when forming composite types [PR113438]
> 
> Set TYPE_STUB_DECL to an artificial decl when creating a new structure
> as a composite type.
> 
> PR c/113438
> 
> gcc/c/
> * c-typeck.cc (composite_type_internal): Set TYPE_STUB_DECL.
> 
> gcc/testsuite/
> * gcc.dg/pr113438.c: New test.

OK.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Joseph Myers

On Mon, 29 Jan 2024, Qing Zhao wrote:

> Thank you!
> 
> Joseph and Richard,  could you also comment on this?

I think Martin's suggestions are reasonable.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] Fortran: Mark internal symbols as artificial [PR88009,PR68800]

2024-01-29 Thread Bernhard Reutner-Fischer

On Wed, 17 Nov 2021 21:32:14 +0100
Harald Anlauf  wrote:

> Do you have testcases/reproducers demonstrating that the patch actually
> fixes the issues you're describing?

I believe that marking artificial symbols as such is obvious and i did
use the existing tests to verify that the changes do not regress but
behave as intended. I did check that the memory leak in
gfc_find_derived_vtab is fixed with the patch.

Ok for stage 1 if the rebased regression test passes?

thanks

> 
> Am 17.11.21 um 09:12 schrieb Bernhard Reutner-Fischer via Gcc-patches:
> > On Tue, 16 Nov 2021 21:46:32 +0100
> > Harald Anlauf via Fortran  wrote:
> >  
> >> Hi Bernhard,
> >>
> >> I'm trying to understand your patch.  What does it really try to solve?  
> >
> > Compiler generated symbols should be marked artificial.
> > The fix for PR88009 ( f8add009ce300f24b75e9c2e2cc5dd944a020c28 ,
> > r9-5194 ) added artificial just to the _final component and left out all 
> > the rest.
> > Note that the majority of compiler generated symbols in class.c
> > already had artificial set properly.
> > The proposed patch amends the other generated symbols to be marked
> > artificial, too.
> >
> > The other parts fix memory leaks.
> >  
> >>
> >> PR88009 is closed and seems to have nothing to do with this.  
> >
> > Well it marked only _final as artificial and forgot to adjust the
> > others as well.
> > We can remove the reference to PR88009 if you prefer?
> >
> > thanks!  
> >>
> >> Harald
> >>
> >> Am 14.11.21 um 23:17 schrieb Bernhard Reutner-Fischer via Fortran:  
> >>> Hi!
> >>>
> >>> Amend fix for PR88009 to mark all these class components as artificial.
> >>>
> >>> gcc/fortran/ChangeLog:
> >>>
> >>>   * class.c (gfc_build_class_symbol, 
> >>> generate_finalization_wrapper,
> >>>   (gfc_find_derived_vtab, find_intrinsic_vtab): Use stringpool for
> >>>   names. Mark internal symbols as artificial.
> >>>   * decl.c (gfc_match_decl_type_spec, gfc_match_end): Fix
> >>>   indentation.
> >>>   (gfc_match_derived_decl): Fix indentation. Check extension level
> >>>   before incrementing refs counter.
> >>>   * parse.c (parse_derived): Fix style.
> >>>   * resolve.c (resolve_global_procedure): Likewise.
> >>>   * symbol.c (gfc_check_conflict): Do not ignore artificial 
> >>> symbols.
> >>>   (gfc_add_flavor): Reorder condition, cheapest first.
> >>>   (gfc_new_symbol, gfc_get_sym_tree,
> >>>   generate_isocbinding_symbol): Fix style.
> >>>   * trans-expr.c (gfc_trans_subcomponent_assign): Remove
> >>>   restriction on !artificial.
> >>>   * match.c (gfc_match_equivalence): Special-case CLASS_DATA for
> >>>   warnings.
> >>>
> >>> ---
> >>> gfc_match_equivalence(), too, should not bail-out early on the first
> >>> error but should diagnose all errors. I.e. not goto cleanup but set
> >>> err=true and continue in order to diagnose all constraints of a
> >>> statement. Maybe Sandra or somebody else will eventually find time to
> >>> tweak that.
> >>>
> >>> I think it also plugs a very minor leak of name in gfc_find_derived_vtab
> >>> so i also tagged it [PR68800]. At least that was the initial
> >>> motiviation to look at that spot.
> >>> We were doing
> >>> -  name = xasprintf ("__vtab_%s", tname);
> >>> ...
> >>> gfc_set_sym_referenced (vtab);
> >>> - name = xasprintf ("__vtype_%s", tname);
> >>>
> >>> Bootstrapped and regtested without regressions on x86_64-unknown-linux.
> >>> Ok for trunk?
> >>>  
> >>
> >>  
> >
> >  
>

Re: [PATCH] c++: problematic assert in reference_binding [PR113141]

2024-01-29 Thread Patrick Palka

On Fri, 26 Jan 2024, Jason Merrill wrote:

> On 1/26/24 17:11, Jason Merrill wrote:
> > On 1/26/24 16:52, Jason Merrill wrote:
> > > On 1/25/24 14:18, Patrick Palka wrote:
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > > > OK for trunk/13?  This isn't a very satisfactory fix, but at least
> > > > it safely fixes these testcases I guess.  Note that there's
> > > > implementation disagreement about the second testcase, GCC always
> > > > accepted it but Clang/MSVC/icc reject it.
> > > 
> > > Because of trying to initialize int& from {c}; removing the extra braces
> > > makes it work everywhore.
> > > 
> > > https://eel.is/c++draft/dcl.init#list-3.10 says that we always generate a
> > > prvalue in this case, so perhaps we shouldn't recalculate if the
> > > initializer is an init-list?
> > 
> > ...but it seems bad to silently bind a const int& to a prvalue instead of
> > directly to the reference returned by the operator, as clang does if we add
> > const to the second testcase, so I think there's a defect in the standard
> > here.
> 
> Perhaps bullet 3.9 should change to "...its referenced type is
> reference-related to E or scalar, ..."
> 
> > Maybe for now also disable the maybe_valid heuristics in the case of an
> > init-list?
> > 
> > > The first testcase is special because it's a C-style cast; seems like the
> > > maybe_valid = false heuristics should be disabled if c_cast_p.

Thanks a lot for the pointers.  IIUC c_cast_p and LOOKUP_SHORTCUT_BAD_CONVS
should already be mutually exclusive, since the latter is set only when
computing argument conversions, so it shouldn't be necessary to check c_cast_p.

I suppose we could disable the heuristic for init-lists, but after some
digging I noticed that the heuristics were originally in same spot they
are now until r5-601-gd02f620dc0bb3b moved them to get checked after
the recursive recalculation case in reference_binding, returning a bad
conversion instead of NULL.  (Then in r13-1755-g68f37670eff0b872 I moved
them back; IIRC that's why I felt confident that moving the checks was safe.)
Thus we didn't always accept the second testcase, we only started doing so in
GCC 5: https://godbolt.org/z/6nsEW14fh (sorry for missing this and saying we
always accepted it)

And indeed the current order of checks seems consistent with that of
[dcl.init.ref]/5.  So I wonder if we don't instead want to "complete"
the NULL-to-bad-conversion adjustment in r5-601-gd02f620dc0bb3b and
do:

gcc/cp/ChangeLog:

* call.cc (reference_binding): Set bad_p according to
maybe_valid_p in the recursive case as well.  Remove
redundant gcc_assert.

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 9de0d77c423..c4158b2af37 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -2033,8 +2033,8 @@ reference_binding (tree rto, tree rfrom, tree expr, bool 
c_cast_p, int flags,
   sflags, complain);
if (!new_second)
  return bad_direct_conv ? bad_direct_conv : nullptr;
+   t->bad_p = !maybe_valid_p;
conv = merge_conversion_sequences (t, new_second);
-   gcc_assert (maybe_valid_p || conv->bad_p);
return conv;
  }
 }

This'd mean we'd go back to rejecting the second testcase (only the
call, not the direct-init, interestingly enough), but that seems to be
the correct behavior anyway IIUC.  The testsuite is otherwise happy
with this change.

Re: [PATCH] Fortran: use name of array component in runtime error message [PR30802]

2024-01-29 Thread Harald Anlauf


Am 29.01.24 um 18:25 schrieb Harald Anlauf:

I was talking about the generated format strings of runtime error
messages.

program p
   implicit none
   type t
  real :: zzz(10) = 42
   end type t
   class(t), allocatable :: xx(:)
   integer :: j
   j = 0
   allocate (t :: xx(1))
   print *, xx(1)% zzz(j)
end

This is generating the following error at runtime since at least gcc-7:

Fortran runtime error: Index '0' of dimension 1 of array 'xx%_data%zzz'
below lower bound of 1


Of course this is easily suppressed by:

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 1e0d698a949..fa0e00a28a6 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -4054,7 +4054,8 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref *
ar, gfc_expr *expr,
{
  if (ref->type == REF_ARRAY && &ref->u.ar == ar)
break;
- if (ref->type == REF_COMPONENT)
+ if (ref->type == REF_COMPONENT
+ && strcmp (ref->u.c.component->name, "_data") != 0)
{
  strcat (var_name, "%%");
  strcat (var_name, ref->u.c.component->name);


I have been contemplating the generation the full chain of references as
suggested by Mikael and supported by NAG.  The main issue is: how do I
easily generate that call?

gfc_trans_runtime_check is a vararg function, but what I would rather
have is a function that takes either a (chained?) list of trees or
an array of trees holding the (co-)indices of the reference.

Is there an example, or a recommendation which variant to prefer?

Thanks,
Harald

[PATCH] c++: unifying INTEGER_CST parm with type-dep arg [PR113644]

2024-01-29 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

Here when trying to unify P=42 A=T::value we ICE due to the latter's
empty type, which same_type_p dislikes.  If the argument has empty type
then it can't be an INTEGER_CST, so unification should fail.

PR c++/113644

gcc/cp/ChangeLog:

* pt.cc (unify) : Handle NULL_TREE type.

gcc/testsuite/ChangeLog:

* g++.dg/template/nontype30.C: New test.
---
 gcc/cp/pt.cc  |  3 ++-
 gcc/testsuite/g++.dg/template/nontype30.C | 13 +
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/nontype30.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 74013533b0f..0d8dbc68962 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -24953,7 +24953,8 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
   /* Type INTEGER_CST can come from ordinary constant template args.  */
 case INTEGER_CST:
 case REAL_CST:
-  if (!same_type_p (TREE_TYPE (parm), TREE_TYPE (arg)))
+  if (TREE_TYPE (arg) == NULL_TREE
+ || !same_type_p (TREE_TYPE (parm), TREE_TYPE (arg)))
return unify_template_argument_mismatch (explain_p, parm, arg);
   while (CONVERT_EXPR_P (arg))
arg = TREE_OPERAND (arg, 0);
diff --git a/gcc/testsuite/g++.dg/template/nontype30.C 
b/gcc/testsuite/g++.dg/template/nontype30.C
new file mode 100644
index 000..926a7726547
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/nontype30.C
@@ -0,0 +1,13 @@
+// PR c++/113644
+
+template struct A { };
+
+template void f(A<42>);
+template void f(A);
+
+struct B { static const int value = 42; };
+
+int main() {
+  A<42> a;
+  f(a); // { dg-error "ambiguous" }
+}
-- 
2.43.0.440.gb50a608ba2

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread H.J. Lu

On Mon, Jan 29, 2024 at 9:34 AM H.J. Lu  wrote:
>
> On Mon, Jan 29, 2024 at 9:00 AM Jakub Jelinek  wrote:
> >
> > On Mon, Jan 29, 2024 at 08:45:45AM -0800, H.J. Lu wrote:
> > > In this case, these are internal to the same comdat group:
> >
> > But that is only by accident, no?
>
> This may be by luck.  I don't know if gcc checks it when
> generating such references.
>
> > I mean, if you need to refer to such a symbol from
> > non-comdat function or comdat function in a different comdat group
> > and RA decides it wants the constant in memory rather than code?
> > Your patch uses
> >   if (decl)
> > return targetm.asm_out.function_rodata_section (decl, ???);
> > and default_function_rodata_section only looks at comdat group of the
> > passed in decl.  But the decl here is what the constant refers to, not
> > who is referring it.

LRA puts a function symbol reference in a constant pool via

#0  force_const_mem (in_mode=E_DImode, x=0x7fffe9e7e000)
at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/varasm.cc:3951
#1  0x01833870 in curr_insn_transform (check_only_p=false)
at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra-constraints.cc:4473
#2  0x01836eae in lra_constraints (first_p=true)
at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra-constraints.cc:5462
#3  0x0181fcf1 in lra (f=0x0, verbose=5)
at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra.cc:2442
#4  0x017c8828 in do_reload ()
at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/ira.cc:5973
#5  0x017c8d25 in (anonymous namespace)::pass_reload::execute (
this=0x48d8730)
at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/ira.cc:6161

for

(gdb) call debug_rtx (curr_insn)
(insn 12 57 15 2 (set (reg:V2DI 101 [ _16 ])
(vec_concat:V2DI (symbol_ref:DI
("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE")
[flags 0x3] )
(reg/f:DI 109))) 7521 {vec_concatv2di}
 (expr_list:REG_DEAD (reg/f:DI 110)
(expr_list:REG_DEAD (reg/f:DI 109)
(expr_list:REG_EQUIV (vec_concat:V2DI (symbol_ref:DI
("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE")
[flags 0x3] )
(symbol_ref:DI
("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx9_M_invokeERKNS_9_Any_dataE")
[flags 0x3] ))
(nil)
(gdb)

CONST_POOL_OK_P doesn't check if it is safe to do so for function
symbols.   Here is a patch to add the check.

-- 
H.J.
From 1947920740e48cdc8076299f8cc58e797ec39a7c Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 29 Jan 2024 12:53:32 -0800
Subject: [PATCH] lra: Add const_pool_reference_ok

LRA may put a function symbol reference in

(gdb) call debug_rtx (curr_insn)
(insn 12 57 15 2 (set (reg:V2DI 101 [ _16 ])
(vec_concat:V2DI (symbol_ref:DI ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE") [flags 0x3] )
(reg/f:DI 109))) 7521 {vec_concatv2di}
 (expr_list:REG_DEAD (reg/f:DI 110)
(expr_list:REG_DEAD (reg/f:DI 109)
(expr_list:REG_EQUIV (vec_concat:V2DI (symbol_ref:DI ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE") [flags 0x3] )
(symbol_ref:DI ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx9_M_invokeERKNS_9_Any_dataE") [flags 0x3] ))
(nil)
(gdb)

in the constant pool.  But it isn't safe when the referenced function
symbol is in a different COMDAT group from the current instruction
function body if the function symbol isn't public.

Add const_pool_reference_ok to check if a function symbol can be forced
into the constant pool.

	PR rtl-optimization/113617
	* lra-constraints.cc (const_pool_reference_ok): New.
	(CONST_POOL_OK_P): Use.
---
 gcc/lra-constraints.cc | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 0ae81c1ff9c..59e6944c245 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -925,6 +925,35 @@ operands_match_p (rtx x, rtx y, int y_hard_regno)
   return true;
 }
 
+/* Return true if the symbol X can be referenced in the function
+   FUNC_DECL.  */
+
+static bool
+const_pool_reference_ok (tree func_decl, rtx x)
+{
+  /* It is OK if there is no COMDAT or X isn'

Re: [PATCH] Fortran: Mark internal symbols as artificial [PR88009,PR68800]

2024-01-29 Thread Harald Anlauf


Am 29.01.24 um 21:45 schrieb Bernhard Reutner-Fischer:

On Wed, 17 Nov 2021 21:32:14 +0100
Harald Anlauf  wrote:


Do you have testcases/reproducers demonstrating that the patch actually
fixes the issues you're describing?


I believe that marking artificial symbols as such is obvious and i did
use the existing tests to verify that the changes do not regress but
behave as intended. I did check that the memory leak in
gfc_find_derived_vtab is fixed with the patch.

Ok for stage 1 if the rebased regression test passes?

thanks



Am 17.11.21 um 09:12 schrieb Bernhard Reutner-Fischer via Gcc-patches:

On Tue, 16 Nov 2021 21:46:32 +0100
Harald Anlauf via Fortran  wrote:


Hi Bernhard,

I'm trying to understand your patch.  What does it really try to solve?


Compiler generated symbols should be marked artificial.
The fix for PR88009 ( f8add009ce300f24b75e9c2e2cc5dd944a020c28 ,
r9-5194 ) added artificial just to the _final component and left out all the 
rest.
Note that the majority of compiler generated symbols in class.c
already had artificial set properly.
The proposed patch amends the other generated symbols to be marked
artificial, too.

The other parts fix memory leaks.



PR88009 is closed and seems to have nothing to do with this.


Well it marked only _final as artificial and forgot to adjust the
others as well.
We can remove the reference to PR88009 if you prefer?

thanks!


Harald

Am 14.11.21 um 23:17 schrieb Bernhard Reutner-Fischer via Fortran:

Hi!

Amend fix for PR88009 to mark all these class components as artificial.

gcc/fortran/ChangeLog:

   * class.c (gfc_build_class_symbol, generate_finalization_wrapper,
   (gfc_find_derived_vtab, find_intrinsic_vtab): Use stringpool for
   names. Mark internal symbols as artificial.
   * decl.c (gfc_match_decl_type_spec, gfc_match_end): Fix
   indentation.
   (gfc_match_derived_decl): Fix indentation. Check extension level
   before incrementing refs counter.
   * parse.c (parse_derived): Fix style.
   * resolve.c (resolve_global_procedure): Likewise.
   * symbol.c (gfc_check_conflict): Do not ignore artificial symbols.
   (gfc_add_flavor): Reorder condition, cheapest first.
   (gfc_new_symbol, gfc_get_sym_tree,
   generate_isocbinding_symbol): Fix style.
   * trans-expr.c (gfc_trans_subcomponent_assign): Remove
   restriction on !artificial.
   * match.c (gfc_match_equivalence): Special-case CLASS_DATA for
   warnings.

---
gfc_match_equivalence(), too, should not bail-out early on the first
error but should diagnose all errors. I.e. not goto cleanup but set
err=true and continue in order to diagnose all constraints of a
statement. Maybe Sandra or somebody else will eventually find time to
tweak that.

I think it also plugs a very minor leak of name in gfc_find_derived_vtab
so i also tagged it [PR68800]. At least that was the initial
motiviation to look at that spot.
We were doing
-  name = xasprintf ("__vtab_%s", tname);
...
 gfc_set_sym_referenced (vtab);
- name = xasprintf ("__vtype_%s", tname);

Bootstrapped and regtested without regressions on x86_64-unknown-linux.
Ok for trunk?














Can you please post the patch here so that we can review it?

Re: [PATCH] aarch64: enforce lane checking for intrinsics

2024-01-29 Thread Richard Sandiford

Alexandre Oliva  writes:
> On Jan 23, 2024, Richard Sandiford  wrote:
>
>> Performing the check in expand is itself wrong
>
> *nod*
>
>> So I think we should enforce the immediate range within the frontend
>> instead, via TARGET_CHECK_BUILTIN_CALL.
>
> Sounds good.  Can that accommodate the existing uses in always_inline
> wrappers?

No, I don't think so.  We'd probably need to move them to
directly-defined builtins (i.e. defined via handle_arm_neon_h,
rather than at start-up).

>> Unfortunately that isn't suitable for stage 4 though.
>
> ACK.  Is there a partial implementation of that?  I might get a chance
> to take it to completion, even if it doesn't make gcc 14.

Not that I know of, sorry.

Thanks,
Richard

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread H.J. Lu

On Mon, Jan 29, 2024 at 1:00 PM H.J. Lu  wrote:
>
> On Mon, Jan 29, 2024 at 9:34 AM H.J. Lu  wrote:
> >
> > On Mon, Jan 29, 2024 at 9:00 AM Jakub Jelinek  wrote:
> > >
> > > On Mon, Jan 29, 2024 at 08:45:45AM -0800, H.J. Lu wrote:
> > > > In this case, these are internal to the same comdat group:
> > >
> > > But that is only by accident, no?
> >
> > This may be by luck.  I don't know if gcc checks it when
> > generating such references.
> >
> > > I mean, if you need to refer to such a symbol from
> > > non-comdat function or comdat function in a different comdat group
> > > and RA decides it wants the constant in memory rather than code?
> > > Your patch uses
> > >   if (decl)
> > > return targetm.asm_out.function_rodata_section (decl, ???);
> > > and default_function_rodata_section only looks at comdat group of the
> > > passed in decl.  But the decl here is what the constant refers to, not
> > > who is referring it.
>
> LRA puts a function symbol reference in a constant pool via
>
> #0  force_const_mem (in_mode=E_DImode, x=0x7fffe9e7e000)
> at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/varasm.cc:3951
> #1  0x01833870 in curr_insn_transform (check_only_p=false)
> at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra-constraints.cc:4473
> #2  0x01836eae in lra_constraints (first_p=true)
> at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra-constraints.cc:5462
> #3  0x0181fcf1 in lra (f=0x0, verbose=5)
> at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra.cc:2442
> #4  0x017c8828 in do_reload ()
> at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/ira.cc:5973
> #5  0x017c8d25 in (anonymous namespace)::pass_reload::execute (
> this=0x48d8730)
> at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/ira.cc:6161
>
> for
>
> (gdb) call debug_rtx (curr_insn)
> (insn 12 57 15 2 (set (reg:V2DI 101 [ _16 ])
> (vec_concat:V2DI (symbol_ref:DI
> ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE")
> [flags 0x3] )
> (reg/f:DI 109))) 7521 {vec_concatv2di}
>  (expr_list:REG_DEAD (reg/f:DI 110)
> (expr_list:REG_DEAD (reg/f:DI 109)
> (expr_list:REG_EQUIV (vec_concat:V2DI (symbol_ref:DI
> ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE")
> [flags 0x3] )
> (symbol_ref:DI
> ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx9_M_invokeERKNS_9_Any_dataE")
> [flags 0x3] ))
> (nil)
> (gdb)
>
> CONST_POOL_OK_P doesn't check if it is safe to do so for function
> symbols.   Here is a patch to add the check.
>
> --
> H.J.

On the other hand, does C++ even allow access to non-public members
from different classes?  So my patch should be safe and linker should
catch all invalid comdat usages like this bug.

-- 
H.J.

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread H.J. Lu

On Mon, Jan 29, 2024 at 1:22 PM H.J. Lu  wrote:
>
> On Mon, Jan 29, 2024 at 1:00 PM H.J. Lu  wrote:
> >
> > On Mon, Jan 29, 2024 at 9:34 AM H.J. Lu  wrote:
> > >
> > > On Mon, Jan 29, 2024 at 9:00 AM Jakub Jelinek  wrote:
> > > >
> > > > On Mon, Jan 29, 2024 at 08:45:45AM -0800, H.J. Lu wrote:
> > > > > In this case, these are internal to the same comdat group:
> > > >
> > > > But that is only by accident, no?
> > >
> > > This may be by luck.  I don't know if gcc checks it when
> > > generating such references.
> > >
> > > > I mean, if you need to refer to such a symbol from
> > > > non-comdat function or comdat function in a different comdat group
> > > > and RA decides it wants the constant in memory rather than code?
> > > > Your patch uses
> > > >   if (decl)
> > > > return targetm.asm_out.function_rodata_section (decl, ???);
> > > > and default_function_rodata_section only looks at comdat group of the
> > > > passed in decl.  But the decl here is what the constant refers to, not
> > > > who is referring it.
> >
> > LRA puts a function symbol reference in a constant pool via
> >
> > #0  force_const_mem (in_mode=E_DImode, x=0x7fffe9e7e000)
> > at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/varasm.cc:3951
> > #1  0x01833870 in curr_insn_transform (check_only_p=false)
> > at 
> > /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra-constraints.cc:4473
> > #2  0x01836eae in lra_constraints (first_p=true)
> > at 
> > /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra-constraints.cc:5462
> > #3  0x0181fcf1 in lra (f=0x0, verbose=5)
> > at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra.cc:2442
> > #4  0x017c8828 in do_reload ()
> > at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/ira.cc:5973
> > #5  0x017c8d25 in (anonymous namespace)::pass_reload::execute (
> > this=0x48d8730)
> > at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/ira.cc:6161
> >
> > for
> >
> > (gdb) call debug_rtx (curr_insn)
> > (insn 12 57 15 2 (set (reg:V2DI 101 [ _16 ])
> > (vec_concat:V2DI (symbol_ref:DI
> > ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE")
> > [flags 0x3] )
> > (reg/f:DI 109))) 7521 {vec_concatv2di}
> >  (expr_list:REG_DEAD (reg/f:DI 110)
> > (expr_list:REG_DEAD (reg/f:DI 109)
> > (expr_list:REG_EQUIV (vec_concat:V2DI (symbol_ref:DI
> > ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE")
> > [flags 0x3] )
> > (symbol_ref:DI
> > ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx9_M_invokeERKNS_9_Any_dataE")
> > [flags 0x3] ))
> > (nil)
> > (gdb)
> >
> > CONST_POOL_OK_P doesn't check if it is safe to do so for function
> > symbols.   Here is a patch to add the check.
> >
> > --
> > H.J.
>
> On the other hand, does C++ even allow access to non-public members
> from different classes?  So my patch should be safe and linker should
> catch all invalid comdat usages like this bug.

A function accesses a function symbol defined in a comdat group.
If the function symbol is public, any comdat definition of the same group
signature should provide the function definition.  If the function symbol
is private to the comdat group, only functions in the same comdat
group can access the private function symbol.  If a function in a different
comdat group accesses a private symbol, it is a compiler bug and
link may catch it like in this case.

-- 
H.J.

Re: [PATCH][libsanitizer]: Sync fixes for asan interceptors from upstream [PR112644]

2024-01-29 Thread Andrew Pinski

On Mon, Jan 29, 2024 at 7:04 AM Tamar Christina  wrote:
>
> Hi All,
>
> This cherry-picks and squashes the differences between commits
>
> d3e5c20ab846303874a2a25e5877c72271fc798b..76e1e45922e6709392fb82aac44bebe3dbc2ea63
> from LLVM upstream from compiler-rt/lib/hwasan/ to GCC on the changes relevant
> for GCC.
>
> This is required to fix the linked PR.
>
> As mentioned in the PR the last sync brought in a bug from upstream[1] where
> operations became non-recoverable and as such the tests in AArch64 started
> failing.  This cherry picks the fix and there are minor updates needed to GCC
> after this to fix the cases.
>
> [1] https://github.com/llvm/llvm-project/pull/74000
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?

Thanks for handling this; though I wonder how this slipped through
testing upstream in LLVM. I see they added some new testcases for
this. I Know GCC's testsuite for sanitizer is slightly different from
LLVM's. Is it the case, GCC has more tests in this area? Is someone
adding the testcases that GCC has in this area upstream to LLVM;
basically so merging won't bring in regressions like this in the
future?

Thanks,
Andrew

>
> Thanks,
> Tamar
>
> libsanitizer/ChangeLog:
>
> PR sanitizer/112644
> * hwasan/hwasan_interceptors.cpp (ACCESS_MEMORY_RANGE,
> HWASAN_READ_RANGE, HWASAN_WRITE_RANGE, COMMON_SYSCALL_PRE_READ_RANGE,
> COMMON_SYSCALL_PRE_WRITE_RANGE, COMMON_INTERCEPTOR_WRITE_RANGE,
> COMMON_INTERCEPTOR_READ_RANGE): Make recoverable.
>
> --- inline copy of patch --
> diff --git a/libsanitizer/hwasan/hwasan_interceptors.cpp 
> b/libsanitizer/hwasan/hwasan_interceptors.cpp
> index 
> d9237cf9b8e3bf982cf213123ef22e73ec027c9e..96df4dd0c24d7d3db28fa2557cf63da0f295e33f
>  100644
> --- a/libsanitizer/hwasan/hwasan_interceptors.cpp
> +++ b/libsanitizer/hwasan/hwasan_interceptors.cpp
> @@ -36,16 +36,16 @@ struct HWAsanInterceptorContext {
>const char *interceptor_name;
>  };
>
> -#  define ACCESS_MEMORY_RANGE(ctx, offset, size, access)\
> -do {\
> -  __hwasan::CheckAddressSized((uptr)offset, \
> -  size);\
> +#  define ACCESS_MEMORY_RANGE(offset, size, access)  
>  \
> +do { 
>  \
> +  __hwasan::CheckAddressSized access>((uptr)offset, \
> +size);   
>  \
>  } while (0)
>
> -#  define HWASAN_READ_RANGE(ctx, offset, size) \
> -ACCESS_MEMORY_RANGE(ctx, offset, size, AccessType::Load)
> -#  define HWASAN_WRITE_RANGE(ctx, offset, size) \
> -ACCESS_MEMORY_RANGE(ctx, offset, size, AccessType::Store)
> +#  define HWASAN_READ_RANGE(offset, size) \
> +ACCESS_MEMORY_RANGE(offset, size, AccessType::Load)
> +#  define HWASAN_WRITE_RANGE(offset, size) \
> +ACCESS_MEMORY_RANGE(offset, size, AccessType::Store)
>
>  #  if !SANITIZER_APPLE
>  #define HWASAN_INTERCEPT_FUNC(name)  
>   \
> @@ -74,9 +74,8 @@ struct HWAsanInterceptorContext {
>
>  #  if HWASAN_WITH_INTERCEPTORS
>
> -#define COMMON_SYSCALL_PRE_READ_RANGE(p, s) __hwasan_loadN((uptr)p, 
> (uptr)s)
> -#define COMMON_SYSCALL_PRE_WRITE_RANGE(p, s) \
> -  __hwasan_storeN((uptr)p, (uptr)s)
> +#define COMMON_SYSCALL_PRE_READ_RANGE(p, s) HWASAN_READ_RANGE(p, s)
> +#define COMMON_SYSCALL_PRE_WRITE_RANGE(p, s) HWASAN_WRITE_RANGE(p, s)
>  #define COMMON_SYSCALL_POST_READ_RANGE(p, s) \
>do {   \
>  (void)(p);   \
> @@ -91,10 +90,10 @@ struct HWAsanInterceptorContext {
>  #include "sanitizer_common/sanitizer_syscalls_netbsd.inc"
>
>  #define COMMON_INTERCEPTOR_WRITE_RANGE(ctx, ptr, size) \
> -  HWASAN_WRITE_RANGE(ctx, ptr, size)
> +  HWASAN_WRITE_RANGE(ptr, size)
>
>  #define COMMON_INTERCEPTOR_READ_RANGE(ctx, ptr, size) \
> -  HWASAN_READ_RANGE(ctx, ptr, size)
> +  HWASAN_READ_RANGE(ptr, size)
>
>  #define COMMON_INTERCEPTOR_ENTER(ctx, func, ...) \
>HWAsanInterceptorContext _ctx = {#func};   \
>
>
>
>
> --

[PATCH v3] x86: Generate REG_CFA_UNDEFINED for unsaved callee-saved registers

2024-01-29 Thread H.J. Lu

Changes in v3:

1. Fix a typo in REG_CFA_UNDEFINED note comment.
2. Replace assemble with compile in tests and remove -save-temps since
".cfi_undefined regno" is generated now.

Changes in v2:

1. Add REG_CFA_UNDEFINED notes to a frame-related instruction in prologue.
2. Add comments for add_cfi_undefined.

---
Attach REG_CFA_UNDEFINED notes for unsaved callee-saved registers which
have been used in the function to a frame-related instruction in prologue.

gcc/

PR target/38534
* dwarf2cfi.cc (add_cfi_undefined): New.
(dwarf2out_frame_debug_cfa_undefined): Likewise.
(dwarf2out_frame_debug): Handle REG_CFA_UNDEFINED.
* reg-notes.def (REG_CFA_UNDEFINED): New.
* config/i386/i386.cc (ix86_expand_prologue): Attach
REG_CFA_UNDEFINED notes for unsaved callee-saved registers
which have been used in the function to a frame-related
instruction in prologue.

gcc/testsuite/

PR target/38534
* gcc.target/i386/no-callee-saved-19.c: New test.
* gcc.target/i386/no-callee-saved-20.c: Likewise.
* gcc.target/i386/pr38534-7.c: Likewise.
* gcc.target/i386/pr38534-8.c: Likewise.
---
 gcc/config/i386/i386.cc   | 29 ++
 gcc/dwarf2cfi.cc  | 58 +++
 gcc/reg-notes.def |  4 ++
 .../gcc.target/i386/no-callee-saved-19.c  | 17 ++
 .../gcc.target/i386/no-callee-saved-20.c  | 12 
 gcc/testsuite/gcc.target/i386/pr38534-7.c | 18 ++
 gcc/testsuite/gcc.target/i386/pr38534-8.c | 13 +
 7 files changed, 151 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-8.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b3e7c74846e..4b7026f3ab4 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -9304,6 +9304,35 @@ ix86_expand_prologue (void)
  combined with prologue modifications.  */
   if (TARGET_SEH)
 emit_insn (gen_prologue_use (stack_pointer_rtx));
+
+  if (cfun->machine->call_saved_registers
+  != TYPE_NO_CALLEE_SAVED_REGISTERS)
+return;
+
+  /* Attach REG_CFA_UNDEFINED notes for unsaved callee-saved registers
+ which have been used in the function to a frame-related instruction
+ in prologue.  */
+
+  insn = nullptr;
+  rtx_insn *next;
+  for (next = get_insns (); next; next = NEXT_INSN (next))
+{
+  if (!RTX_FRAME_RELATED_P (next))
+   continue;
+  insn = next;
+}
+
+  if (!insn)
+return;
+
+  for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+if (df_regs_ever_live_p (i)
+   && !fixed_regs[i]
+   && !call_used_regs[i]
+   && !STACK_REGNO_P (i)
+   && !MMX_REGNO_P (i))
+  add_reg_note (insn, REG_CFA_UNDEFINED,
+   gen_rtx_REG (word_mode, i));
 }
 
 /* Emit code to restore REG using a POP or POPP insn.  */
diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
index 1231b5bb5f0..9ba0ac07ee7 100644
--- a/gcc/dwarf2cfi.cc
+++ b/gcc/dwarf2cfi.cc
@@ -517,6 +517,20 @@ add_cfi_restore (unsigned reg)
   add_cfi (cfi);
 }
 
+/* Add DW_CFA_undefined either to the current insn stream or to a vector,
+   or both.  */
+
+static void
+add_cfi_undefined (unsigned reg)
+{
+  dw_cfi_ref cfi = new_cfi ();
+
+  cfi->dw_cfi_opc = DW_CFA_undefined;
+  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = reg;
+
+  add_cfi (cfi);
+}
+
 /* Perform ROW->REG_SAVE[COLUMN] = CFI.  CFI may be null, indicating
that the register column is no longer saved.  */
 
@@ -1532,6 +1546,37 @@ dwarf2out_frame_debug_cfa_restore (rtx reg, bool 
emit_cfi)
 }
 }
 
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_UNDEFINED
+   note.  */
+
+static void
+dwarf2out_frame_debug_cfa_undefined (rtx reg)
+{
+  gcc_assert (REG_P (reg));
+
+  rtx span = targetm.dwarf_register_span (reg);
+  if (!span)
+{
+  unsigned int regno = dwf_regno (reg);
+  add_cfi_undefined (regno);
+}
+  else
+{
+  /* We have a PARALLEL describing where the contents of REG live.
+Restore the register for each piece of the PARALLEL.  */
+  gcc_assert (GET_CODE (span) == PARALLEL);
+
+  const int par_len = XVECLEN (span, 0);
+  for (int par_index = 0; par_index < par_len; par_index++)
+   {
+ reg = XVECEXP (span, 0, par_index);
+ gcc_assert (REG_P (reg));
+ unsigned int regno = dwf_regno (reg);
+ add_cfi_undefined (regno);
+   }
+}
+}
+
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_WINDOW_SAVE.
 
??? Perhaps we should note in the CIE where windows are saved (instead
@@ -2326,6 +2371,19 @@ dwarf2out_frame_debug (rtx_insn *insn)
handled_one = true;
break;
 
+  case REG_CFA_UNDEFINED:
+   n = XEXP (note, 0);
+   if (n =

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread H.J. Lu

On Mon, Jan 29, 2024 at 1:42 PM H.J. Lu  wrote:
>
> On Mon, Jan 29, 2024 at 1:22 PM H.J. Lu  wrote:
> >
> > On Mon, Jan 29, 2024 at 1:00 PM H.J. Lu  wrote:
> > >
> > > On Mon, Jan 29, 2024 at 9:34 AM H.J. Lu  wrote:
> > > >
> > > > On Mon, Jan 29, 2024 at 9:00 AM Jakub Jelinek  wrote:
> > > > >
> > > > > On Mon, Jan 29, 2024 at 08:45:45AM -0800, H.J. Lu wrote:
> > > > > > In this case, these are internal to the same comdat group:
> > > > >
> > > > > But that is only by accident, no?
> > > >
> > > > This may be by luck.  I don't know if gcc checks it when
> > > > generating such references.
> > > >
> > > > > I mean, if you need to refer to such a symbol from
> > > > > non-comdat function or comdat function in a different comdat group
> > > > > and RA decides it wants the constant in memory rather than code?
> > > > > Your patch uses
> > > > >   if (decl)
> > > > > return targetm.asm_out.function_rodata_section (decl, ???);
> > > > > and default_function_rodata_section only looks at comdat group of the
> > > > > passed in decl.  But the decl here is what the constant refers to, not
> > > > > who is referring it.
> > >
> > > LRA puts a function symbol reference in a constant pool via
> > >
> > > #0  force_const_mem (in_mode=E_DImode, x=0x7fffe9e7e000)
> > > at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/varasm.cc:3951
> > > #1  0x01833870 in curr_insn_transform (check_only_p=false)
> > > at 
> > > /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra-constraints.cc:4473
> > > #2  0x01836eae in lra_constraints (first_p=true)
> > > at 
> > > /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra-constraints.cc:5462
> > > #3  0x0181fcf1 in lra (f=0x0, verbose=5)
> > > at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra.cc:2442
> > > #4  0x017c8828 in do_reload ()
> > > at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/ira.cc:5973
> > > #5  0x017c8d25 in (anonymous namespace)::pass_reload::execute (
> > > this=0x48d8730)
> > > at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/ira.cc:6161
> > >
> > > for
> > >
> > > (gdb) call debug_rtx (curr_insn)
> > > (insn 12 57 15 2 (set (reg:V2DI 101 [ _16 ])
> > > (vec_concat:V2DI (symbol_ref:DI
> > > ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE")
> > > [flags 0x3] )
> > > (reg/f:DI 109))) 7521 {vec_concatv2di}
> > >  (expr_list:REG_DEAD (reg/f:DI 110)
> > > (expr_list:REG_DEAD (reg/f:DI 109)
> > > (expr_list:REG_EQUIV (vec_concat:V2DI (symbol_ref:DI
> > > ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE")
> > > [flags 0x3] )
> > > (symbol_ref:DI
> > > ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx9_M_invokeERKNS_9_Any_dataE")
> > > [flags 0x3] ))
> > > (nil)
> > > (gdb)
> > >
> > > CONST_POOL_OK_P doesn't check if it is safe to do so for function
> > > symbols.   Here is a patch to add the check.
> > >
> > > --
> > > H.J.
> >
> > On the other hand, does C++ even allow access to non-public members
> > from different classes?  So my patch should be safe and linker should
> > catch all invalid comdat usages like this bug.
>
> A function accesses a function symbol defined in a comdat group.
> If the function symbol is public, any comdat definition of the same group
> signature should provide the function definition.  If the function symbol
> is private to the comdat group, only functions in the same comdat
> group can access the private function symbol.  If a function in a different
> comdat group accesses a private symbol, it is a compiler bug and
> link may catch it like in this case.
>

My patch simply puts the constant pool of the function symbol reference
in the same comdat group as the function definition.  I believe it is the
right thing to do.

-- 
H.J.

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread H.J. Lu

On Mon, Jan 29, 2024 at 2:01 PM H.J. Lu  wrote:
>
> On Mon, Jan 29, 2024 at 1:42 PM H.J. Lu  wrote:
> >
> > On Mon, Jan 29, 2024 at 1:22 PM H.J. Lu  wrote:
> > >
> > > On Mon, Jan 29, 2024 at 1:00 PM H.J. Lu  wrote:
> > > >
> > > > On Mon, Jan 29, 2024 at 9:34 AM H.J. Lu  wrote:
> > > > >
> > > > > On Mon, Jan 29, 2024 at 9:00 AM Jakub Jelinek  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Jan 29, 2024 at 08:45:45AM -0800, H.J. Lu wrote:
> > > > > > > In this case, these are internal to the same comdat group:
> > > > > >
> > > > > > But that is only by accident, no?
> > > > >
> > > > > This may be by luck.  I don't know if gcc checks it when
> > > > > generating such references.
> > > > >
> > > > > > I mean, if you need to refer to such a symbol from
> > > > > > non-comdat function or comdat function in a different comdat group
> > > > > > and RA decides it wants the constant in memory rather than code?
> > > > > > Your patch uses
> > > > > >   if (decl)
> > > > > > return targetm.asm_out.function_rodata_section (decl, ???);
> > > > > > and default_function_rodata_section only looks at comdat group of 
> > > > > > the
> > > > > > passed in decl.  But the decl here is what the constant refers to, 
> > > > > > not
> > > > > > who is referring it.
> > > >
> > > > LRA puts a function symbol reference in a constant pool via
> > > >
> > > > #0  force_const_mem (in_mode=E_DImode, x=0x7fffe9e7e000)
> > > > at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/varasm.cc:3951
> > > > #1  0x01833870 in curr_insn_transform (check_only_p=false)
> > > > at 
> > > > /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra-constraints.cc:4473
> > > > #2  0x01836eae in lra_constraints (first_p=true)
> > > > at 
> > > > /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra-constraints.cc:5462
> > > > #3  0x0181fcf1 in lra (f=0x0, verbose=5)
> > > > at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/lra.cc:2442
> > > > #4  0x017c8828 in do_reload ()
> > > > at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/ira.cc:5973
> > > > #5  0x017c8d25 in (anonymous namespace)::pass_reload::execute (
> > > > this=0x48d8730)
> > > > at /export/gnu/import/git/gitlab/x86-gcc-test/gcc/ira.cc:6161
> > > >
> > > > for
> > > >
> > > > (gdb) call debug_rtx (curr_insn)
> > > > (insn 12 57 15 2 (set (reg:V2DI 101 [ _16 ])
> > > > (vec_concat:V2DI (symbol_ref:DI
> > > > ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE")
> > > > [flags 0x3] )
> > > > (reg/f:DI 109))) 7521 {vec_concatv2di}
> > > >  (expr_list:REG_DEAD (reg/f:DI 110)
> > > > (expr_list:REG_DEAD (reg/f:DI 109)
> > > > (expr_list:REG_EQUIV (vec_concat:V2DI (symbol_ref:DI
> > > > ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx10_M_managerERNS_9_Any_dataERKSI_NS_18_Manager_operationE")
> > > > [flags 0x3] )
> > > > (symbol_ref:DI
> > > > ("_ZN4blah17_Function_handlerIFvvENS_5_BindIFPFvPvxxxEPN3vtk6detail3smp27vtkSMPTools_FunctorInternalIN12_GLOBAL__N_19CountUsesIxEELb0EEExxx9_M_invokeERKNS_9_Any_dataE")
> > > > [flags 0x3] ))
> > > > (nil)
> > > > (gdb)
> > > >
> > > > CONST_POOL_OK_P doesn't check if it is safe to do so for function
> > > > symbols.   Here is a patch to add the check.
> > > >
> > > > --
> > > > H.J.
> > >
> > > On the other hand, does C++ even allow access to non-public members
> > > from different classes?  So my patch should be safe and linker should
> > > catch all invalid comdat usages like this bug.
> >
> > A function accesses a function symbol defined in a comdat group.
> > If the function symbol is public, any comdat definition of the same group
> > signature should provide the function definition.  If the function symbol
> > is private to the comdat group, only functions in the same comdat
> > group can access the private function symbol.  If a function in a different
> > comdat group accesses a private symbol, it is a compiler bug and
> > link may catch it like in this case.
> >
>
> My patch simply puts the constant pool of the function symbol reference
> in the same comdat group as the function definition.  I believe it is the
> right thing to do.

If we are concerned that not all comdat definitions provide such a constant
pool, we can change LA to only allow such a constant pool when it is safe
to do so.

-- 
H.J.

Re: [PATCH] Fortran: Mark internal symbols as artificial [PR88009,PR68800]

2024-01-29 Thread rep . dot . nop

On 29 January 2024 22:06:04 CET, Harald Anlauf  wrote:
>Am 29.01.24 um 21:45 schrieb Bernhard Reutner-Fischer:
>> On Wed, 17 Nov 2021 21:32:14 +0100
>> Harald Anlauf  wrote:
>> 
>>> Do you have testcases/reproducers demonstrating that the patch actually
>>> fixes the issues you're describing?
>> 
>> I believe that marking artificial symbols as such is obvious and i did
>> use the existing tests to verify that the changes do not regress but
>> behave as intended. I did check that the memory leak in
>> gfc_find_derived_vtab is fixed with the patch.
>> 
>> Ok for stage 1 if the rebased regression test passes?
>> 
>> thanks
>> 
>>> 
>>> Am 17.11.21 um 09:12 schrieb Bernhard Reutner-Fischer via Gcc-patches:
 On Tue, 16 Nov 2021 21:46:32 +0100
 Harald Anlauf via Fortran  wrote:
 
> Hi Bernhard,
> 
> I'm trying to understand your patch.  What does it really try to solve?
 
 Compiler generated symbols should be marked artificial.
 The fix for PR88009 ( f8add009ce300f24b75e9c2e2cc5dd944a020c28 ,
 r9-5194 ) added artificial just to the _final component and left out all 
 the rest.
 Note that the majority of compiler generated symbols in class.c
 already had artificial set properly.
 The proposed patch amends the other generated symbols to be marked
 artificial, too.
 
 The other parts fix memory leaks.
 
> 
> PR88009 is closed and seems to have nothing to do with this.
 
 Well it marked only _final as artificial and forgot to adjust the
 others as well.
 We can remove the reference to PR88009 if you prefer?
 
 thanks!
> 
> Harald
> 
> Am 14.11.21 um 23:17 schrieb Bernhard Reutner-Fischer via Fortran:
>> Hi!
>> 
>> Amend fix for PR88009 to mark all these class components as artificial.
>> 
>> gcc/fortran/ChangeLog:
>> 
>>* class.c (gfc_build_class_symbol, 
>> generate_finalization_wrapper,
>>(gfc_find_derived_vtab, find_intrinsic_vtab): Use stringpool 
>> for
>>names. Mark internal symbols as artificial.
>>* decl.c (gfc_match_decl_type_spec, gfc_match_end): Fix
>>indentation.
>>(gfc_match_derived_decl): Fix indentation. Check extension 
>> level
>>before incrementing refs counter.
>>* parse.c (parse_derived): Fix style.
>>* resolve.c (resolve_global_procedure): Likewise.
>>* symbol.c (gfc_check_conflict): Do not ignore artificial 
>> symbols.
>>(gfc_add_flavor): Reorder condition, cheapest first.
>>(gfc_new_symbol, gfc_get_sym_tree,
>>generate_isocbinding_symbol): Fix style.
>>* trans-expr.c (gfc_trans_subcomponent_assign): Remove
>>restriction on !artificial.
>>* match.c (gfc_match_equivalence): Special-case CLASS_DATA for
>>warnings.
>> 
>> ---
>> gfc_match_equivalence(), too, should not bail-out early on the first
>> error but should diagnose all errors. I.e. not goto cleanup but set
>> err=true and continue in order to diagnose all constraints of a
>> statement. Maybe Sandra or somebody else will eventually find time to
>> tweak that.
>> 
>> I think it also plugs a very minor leak of name in gfc_find_derived_vtab
>> so i also tagged it [PR68800]. At least that was the initial
>> motiviation to look at that spot.
>> We were doing
>> -  name = xasprintf ("__vtab_%s", tname);
>> ...
>>  gfc_set_sym_referenced (vtab);
>> - name = xasprintf ("__vtype_%s", tname);
>> 
>> Bootstrapped and regtested without regressions on x86_64-unknown-linux.
>> Ok for trunk?
>> 
> 
> 
 
 
>>> 
>> 
>> 
>
>Can you please post the patch here so that we can review it?
>

I'm very sorry that I missed to provide an explicit reference, the initial 
patch was submitted here:
https://inbox.sourceware.org/fortran/2024231748.376086cd@nbbrfq/
But I will follow-up with a rebased, tested patch ASAP during a weekend or 
vacation. 

But just to give context, that's what I was talking about..
thanks

PS: IMHO a strcmp(..,"_data") is inappropriate for you should check 
attr.artificial and the path exploder to give hints should ideally deal with 
this transparently -- IMHO

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Qing Zhao




> On Jan 29, 2024, at 3:35 PM, Joseph Myers  wrote:
> 
> On Mon, 29 Jan 2024, Qing Zhao wrote:
> 
>> Thank you!
>> 
>> Joseph and Richard,  could you also comment on this?
> 
> I think Martin's suggestions are reasonable.

Okay, I will update the patches based on this approach. 

Thanks a lot for the comment.

Qing
> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com
>

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread Jakub Jelinek

On Mon, Jan 29, 2024 at 02:01:56PM -0800, H.J. Lu wrote:
> > A function accesses a function symbol defined in a comdat group.
> > If the function symbol is public, any comdat definition of the same group
> > signature should provide the function definition.  If the function symbol
> > is private to the comdat group, only functions in the same comdat
> > group can access the private function symbol.  If a function in a different
> > comdat group accesses a private symbol, it is a compiler bug and
> > link may catch it like in this case.
> >
> 
> My patch simply puts the constant pool of the function symbol reference
> in the same comdat group as the function definition.  I believe it is the
> right thing to do.

I disagree, I think we should use something like
  if (current_function_decl)
return targetm.asm_out.function_rodata_section (current_function_decl,
true);

Obviously, for non-reloc or non-pic, we don't want an unconditional
  if (current_function_decl)
return targetm.asm_out.function_rodata_section (current_function_decl,
false);
that would kill mergeable sections, so perhaps
  if (current_function_decl
  && reloc
  && DECL_COMDAT_GROUP (current_function_decl))
return targetm.asm_out.function_rodata_section (current_function_decl,
false);

Jakub

Re: [PATCH] Handle function symbol reference in readonly data section

2024-01-29 Thread Jakub Jelinek

On Mon, Jan 29, 2024 at 11:22:44PM +0100, Jakub Jelinek wrote:
> On Mon, Jan 29, 2024 at 02:01:56PM -0800, H.J. Lu wrote:
> > > A function accesses a function symbol defined in a comdat group.
> > > If the function symbol is public, any comdat definition of the same group
> > > signature should provide the function definition.  If the function symbol
> > > is private to the comdat group, only functions in the same comdat
> > > group can access the private function symbol.  If a function in a 
> > > different
> > > comdat group accesses a private symbol, it is a compiler bug and
> > > link may catch it like in this case.
> > >
> > 
> > My patch simply puts the constant pool of the function symbol reference
> > in the same comdat group as the function definition.  I believe it is the
> > right thing to do.
> 
> I disagree, I think we should use something like
>   if (current_function_decl)

Or perhaps && DECL_COMDAT_GROUP (current_function_decl) added here as well,
just to make it change things less often.

>   return targetm.asm_out.function_rodata_section (current_function_decl,
>   true);
> 
> Obviously, for non-reloc or non-pic, we don't want an unconditional
>   if (current_function_decl)
> return targetm.asm_out.function_rodata_section (current_function_decl,
>   false);
> that would kill mergeable sections, so perhaps
>   if (current_function_decl
>   && reloc
>   && DECL_COMDAT_GROUP (current_function_decl))
> return targetm.asm_out.function_rodata_section (current_function_decl,
>   false);

Jakub

1 2 >

1 - 100 of 125 matches

Mail list logo