[gcc/devel/omp/gcc-13] gensupport: drop suppport for define_cond_exec from compact syntac

2024-08-29 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:737b8f383563e5d1b10b85a7bc93ce359111be88

commit 737b8f383563e5d1b10b85a7bc93ce359111be88
Author: Tamar Christina 
Date:   Tue Jun 20 23:31:40 2023 +0100

gensupport: drop suppport for define_cond_exec from compact syntac

define_cond_exec does not support the special @@ syntax
and so can't support {@.  As such just remove support
for it.

gcc/ChangeLog:

PR bootstrap/110324
* gensupport.cc (convert_syntax): Explicitly check for RTX code.

Diff:
---
 gcc/gensupport.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 980b49cd4814..e39e6dacce25 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -878,7 +878,8 @@ convert_syntax (rtx x, file_location loc)
   const char *templ;
   vec_conlist tconvec, convec, attrvec;
 
-  templ_index = GET_CODE (x) == DEFINE_INSN ? 3 : 2;
+  templ_index = 3;
+  gcc_assert (GET_CODE (x) == DEFINE_INSN);
 
   templ = XTMPL (x, templ_index);
 
@@ -1053,7 +1054,6 @@ process_rtx (rtx desc, file_location loc)
   break;
 
 case DEFINE_COND_EXEC:
-  convert_syntax (desc, loc);
   queue_pattern (desc, &define_cond_exec_tail, loc);
   break;


[gcc r15-3381] amdgcn: remove gfx803 "Fiji" support

2024-09-02 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:57af0022073f11bc300709b3717069f6d616c6ac

commit r15-3381-g57af0022073f11bc300709b3717069f6d616c6ac
Author: Andrew Stubbs 
Date:   Mon Aug 5 15:14:17 2024 +

amdgcn: remove gfx803 "Fiji" support

The gfx803 "Fiji" device was deprecated in GCC 14, removed from LLVM 18, and
hasn't worked properly with the drivers since about ROCm 4.

This patch removes the device from GCC options and documentation, and 
removes
the direct mentions from the internals.

The TARGET_GCN3 support in the back-end is now unused and can be removed 
(in a
follow-up patch).

gcc/ChangeLog:

* config.gcc (amdgcn-*-*): Remove "fiji" from with_arch checks.
* config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): Remove fiji alternative.
(NO_XNACK): Likewise.
(NO_SRAM_ECC): Likewise.
(ASM_SPEC): Remove "%{}" around ABI_VERSION_SPEC.
* config/gcn/gcn-opts.h (enum processor_type): Remove 
PROCESSOR_FIJI.
(TARGET_FIJI): Delete.
* config/gcn/gcn.cc (gcn_option_override): Remove Fiji.
(gcn_omp_device_kind_arch_isa): Likewise.
(output_file_start): Likewise.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Likewise.
* config/gcn/gcn.opt (gpu_type): Likewise.
(march, mtune): Change default to PROCESSOR_VEGA10.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX803): Delete.
(copy_early_debug_info): Remove elf_flags_actual.
Use ELFABIVERSION_AMDGPU_HSA_V4 unconditionally.
(get_arch): Remove Fiji.
(main): Remove gfx803.
* config/gcn/t-omp-device
(omp-device-properties-gcn): Remove fiji and gfx803.
* doc/install.texi (amdgcn*-*-*): Remove fiji and special 
instructions.
* doc/invoke.texi: Remove fiji.

libgomp/ChangeLog:

* libgomp.texi: Remove fiji and gfx803.
* testsuite/libgomp.c/declare-variant-4.h: Remove fiji and gfx803.
* testsuite/libgomp.c/declare-variant-4-fiji.c: Removed.
* testsuite/libgomp.c/declare-variant-4-gfx803.c: Removed.

Diff:
---
 gcc/config.gcc|  2 +-
 gcc/config/gcn/gcn-hsa.h  | 13 +
 gcc/config/gcn/gcn-opts.h |  2 --
 gcc/config/gcn/gcn.cc | 19 ---
 gcc/config/gcn/gcn.h  |  7 +--
 gcc/config/gcn/gcn.opt|  7 ++-
 gcc/config/gcn/mkoffload.cc   | 17 +++--
 gcc/config/gcn/t-omp-device   |  2 +-
 gcc/doc/install.texi  |  8 +---
 gcc/doc/invoke.texi   |  5 -
 libgomp/libgomp.texi  |  3 +--
 libgomp/testsuite/libgomp.c/declare-variant-4-fiji.c  | 11 ---
 .../testsuite/libgomp.c/declare-variant-4-gfx803.c| 10 --
 libgomp/testsuite/libgomp.c/declare-variant-4.h   | 12 
 14 files changed, 19 insertions(+), 99 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 08291f4b6e07..f09ce9f63a01 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4618,7 +4618,7 @@ case "${target}" in
for which in arch tune; do
eval "val=\$with_$which"
case ${val} in
-   "" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx90c 
| gfx1030 | gfx1036 | gfx1100 | gfx1103)
+   "" | gfx900 | gfx906 | gfx908 | gfx90a | gfx90c | 
gfx1030 | gfx1036 | gfx1100 | gfx1103)
# OK
;;
*)
diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 032205550755..7a1bfad49cad 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -79,21 +79,18 @@ extern unsigned int gcn_local_sym_hash (const char *name);
default; however, when debugging symbols are turned on, mkoffload.cc
writes a new AMD GPU object file and the ABI version needs to be the
same. - LLVM <= 17 defaults to 4 while LLVM >= 18 defaults to 5.
-   GCC supports LLVM >= 13.0.1 and only LLVM >= 14 supports version 5.
-   Note that Fiji is only supported with LLVM <= 17 as version 3 is no longer
-   supported in LLVM >= 18.  */
-#define ABI_VERSION_SPEC "march=fiji:--amdhsa-code-object-version=3;" \
-"!march=*|march=*:--amdhsa-code-object-version=4"
+   GCC supports LLVM >= 13.0.1 and only LLVM >= 14 supports version 5.  */
+#define ABI_VERSION_SPEC "--amdhsa-code-object-version=4"
 
 /* Note that the XNACK and SRAM-ECC settings must match those in mkoffload.cc
as the latter creates new ELF object file when debugging is enabled and
the ELF flags (e_flags) 

[gcc r15-3382] amdgcn: Remove TARGET_GCN3

2024-09-02 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:023641d97c5139bfcf8d468442a4e9782e90a467

commit r15-3382-g023641d97c5139bfcf8d468442a4e9782e90a467
Author: Andrew Stubbs 
Date:   Tue Aug 6 15:37:36 2024 +

amdgcn: Remove TARGET_GCN3

The only GCN3 ISA device was remove (Fiji, gfx803) so all the GCN3-specific
code and features can be removed from the back-end.

gcc/ChangeLog:

* config/gcn/gcn-opts.h (enum gcn_isa): Delete ISA_GCN3.
(TARGET_GCN3): Delete.
(TARGET_GCN3_PLUS): Delete.
(TARGET_M0_LDS_LIMIT): Delete.
* config/gcn/gcn-valu.md
(gather_insn_1offset): Remove TARGET_GCN3 from 
conditions.
(*_dpp_shr_): Likewise.
* config/gcn/gcn.cc (enum gcn_isa): Change default to ISA_GCN5.
(gcn_expand_prologue): Remove TARGET_M0_LDS_LIMIT feature.
(gcn_expand_reduc_scalar): Remove TARGET_GCN3 conditions.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Remove TARGET_GCN3.

Diff:
---
 gcc/config/gcn/gcn-opts.h  |  6 --
 gcc/config/gcn/gcn-valu.md | 12 
 gcc/config/gcn/gcn.cc  | 16 ++--
 gcc/config/gcn/gcn.h   |  4 +---
 4 files changed, 7 insertions(+), 31 deletions(-)

diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index a896a80cd0a0..6f5969d7bc8d 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -44,7 +44,6 @@ enum processor_type
 /* Set in gcn_option_override.  */
 extern enum gcn_isa {
   ISA_UNKNOWN,
-  ISA_GCN3,
   ISA_GCN5,
   ISA_RDNA2,
   ISA_RDNA3,
@@ -52,8 +51,6 @@ extern enum gcn_isa {
   ISA_CDNA2
 } gcn_isa;
 
-#define TARGET_GCN3 (gcn_isa == ISA_GCN3)
-#define TARGET_GCN3_PLUS (gcn_isa >= ISA_GCN3)
 #define TARGET_GCN5 (gcn_isa == ISA_GCN5)
 #define TARGET_GCN5_PLUS (gcn_isa >= ISA_GCN5)
 #define TARGET_CDNA1 (gcn_isa == ISA_CDNA1)
@@ -65,7 +62,6 @@ extern enum gcn_isa {
 #define TARGET_RDNA3 (gcn_isa == ISA_RDNA3)
 
 
-#define TARGET_M0_LDS_LIMIT (TARGET_GCN3)
 #define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS || TARGET_RDNA3)
 
 #define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF)
@@ -92,8 +88,6 @@ enum hsaco_attr_type
 #define TARGET_11BIT_GLOBAL_OFFSET TARGET_RDNA2_PLUS
 /* The work item details are all encoded into v0.  */
 //#define TARGET_PACKED_WORK_ITEMS TARGET_PACKED_WORK_ITEMS
-/* m0 must be initialized in order to use LDS.  */
-//#define TARGET_M0_LDS_LIMIT TARGET_M0_LDS_LIMIT
 /* CDNA2 load/store costs are reduced.
  * TODO: what does this mean?  */
 #define TARGET_CDNA2_MEM_COSTS TARGET_CDNA2_PLUS
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index b24cf9be32ef..54f4b14d4f21 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -1156,10 +1156,9 @@
   (mem:BLK (scratch))]
  UNSPEC_GATHER))]
   "(AS_FLAT_P (INTVAL (operands[3]))
-&& ((TARGET_GCN3 && INTVAL(operands[2]) == 0)
-   || ((unsigned HOST_WIDE_INT)INTVAL(operands[2]) < 0x1000)))
-|| (AS_GLOBAL_P (INTVAL (operands[3]))
-   && (((unsigned HOST_WIDE_INT)INTVAL(operands[2]) + 0x1000) < 0x2000))"
+&& ((unsigned HOST_WIDE_INT)INTVAL(operands[2]) < 0x1000))
+   || (AS_GLOBAL_P (INTVAL (operands[3]))
+   && (((unsigned HOST_WIDE_INT)INTVAL(operands[2]) + 0x1000) < 0x2000))"
   {
 addr_space_t as = INTVAL (operands[3]);
 const char *glc = INTVAL (operands[4]) ? " glc" : "";
@@ -4297,10 +4296,7 @@
   (match_operand:V_1REG 2 "register_operand" "v")
   (match_operand:SI 3 "const_int_operand""n")]
  REDUC_UNSPEC))]
-  ; GCN3 requires a carry out, GCN5 not
-  "!(TARGET_GCN3 && SCALAR_INT_MODE_P (mode)
- &&  == UNSPEC_PLUS_DPP_SHR)
-   && TARGET_DPP_FULL"
+  "TARGET_DPP_FULL"
   {
 return gcn_expand_dpp_shr_insn (mode, "",
, INTVAL (operands[3]));
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 89aab6fe8e43..fd2b86085749 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -68,7 +68,7 @@ static bool ext_gcn_constants_init = 0;
 
 /* Holds the ISA variant, derived from the command line parameters.  */
 
-enum gcn_isa gcn_isa = ISA_GCN3;   /* Default to GCN3.  */
+enum gcn_isa gcn_isa = ISA_GCN5;   /* Default to GCN5.  */
 
 /* Reserve this much space for LDS (for propagating variables from
worker-single mode to worker-partitioned mode), per workgroup.  Global
@@ -3556,17 +3556,6 @@ gcn_expand_prologue ()
   /* Ensure that the scheduler doesn't do anything unexpected.  */
   emit_insn (gen_blockage ());
 
-  if (TARGET_M0_LDS_LIMIT)
-  {
-/* m0 is initialized for the usual LDS DS and FLAT memory case.
-   The low-part is the address of the topmost addressable byte, which is
-   size-1.  The high-part is an offset and should be zero.  */
-emit_move_insn (gen_rtx_REG (SImode, M0_REG),
-   gen_int_mode (LDS_SIZE, SImode));
-
-emit_insn (gen_prologue_use (gen_rtx_REG (SImode, M0_REG)));
-  }
-
   if (cfun &&

[gcc r15-3383] amdgcn: Remove TARGET_GCN5_PLUS

2024-09-02 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:b9bf0c3f54d4e36ca40598600d6e87107204c4c6

commit r15-3383-gb9bf0c3f54d4e36ca40598600d6e87107204c4c6
Author: Andrew Stubbs 
Date:   Tue Aug 6 16:00:21 2024 +

amdgcn: Remove TARGET_GCN5_PLUS

Now that GCN3 support is gone, TARGET_GCN5_PLUS always evaluates to true, so
we can make that code unconditional, and remove all the "else" cases.

The ISA features TARGET_GLOBAL_ADDRSPACE, TARGET_FLAT_OFFSETS,
TARGET_EXPLICIT_CARRY, and TARGET_MULTIPLY_IMMEDIATE, are similarly also
redundant and can be made unconditional.

The naming of the "gcc_version" attribute has been confusing since the 
"rdna"
attribute was added and this makes it worse, so it has been renamed to 
"cdna".

The add-with-carry assembler mnemonics no longer have two forms, so '%^' 
can be
removed.

gcc/ChangeLog:

* config/gcn/gcn-opts.h (TARGET_GCN5_PLUS): Delete.
(TARGET_GLOBAL_ADDRSPACE): Delete.
(TARGET_FLAT_OFFSETS): Delete.
(TARGET_EXPLICIT_CARRY): Delete.
(TARGET_MULTIPLY_IMMEDIATE): Delete.
* config/gcn/gcn-valu.md (*mov): Rename "gcn_version" to 
"cdna".
(*mov_4reg): Likewise.
(@mov_sgprbase): Likwise.
(gather_insn_1offset): Likewise.
(gather_insn_1offset_ds): Likewise.
(gather_insn_2offsets): Likewise.
(scatter_insn_1offset): Likewise.
(scatter_insn_1offset_ds): Likewise.
(scatter_insn_2offsets): Likewise.
(gather_insn_1offset): Remove TARGET_FLAT_OFFSETS
conditionals.
(scatter_insn_1offset): Likewise.
(scatter_insn_1offset): Likewise.
(add3): Use "_co" instead of "%^".
(add3_dup): Likewise.
(add3_vcc): Likewise.
(add3_vcc_dup): Likewise.
(addc3): Likewise.
(sub3): Likewise.
(sub3_vcc): Likewise.
(subc3): Likewise.
(*plus_carry_dpp_shr_): Likewise.
(*plus_carry_in_dpp_shr_): Likewise.
* config/gcn/gcn.cc (gcn_flat_address_p): Remove TARGET_FLAT_OFFSETS
conditionals.
(gcn_addr_space_legitimate_address_p): Likewise.
(gcn_addr_space_legitimize_address): Likewise.
(gcn_expand_scalar_to_vector_address): Likewise.
(print_operand_address): Likewise, and TARGET_GLOBAL_ADDRSPACE also.
(print_operand): Remove "%^" operand code.
Remove TARGET_GLOBAL_ADDRSPACE assertion.
* config/gcn/gcn.h (STACK_ADDR_SPACE): Remove GCN5 conditional.
* config/gcn/gcn.md (gcn_version): Rename attribute ...
(cdna): ... to this, and remove the gcn3 and gcn5 values.
(enabled): Replace old "gcn_version" logic with new "cdna" logic.
(*mov_insn): Rename "gcn_version" to "cdna".
(*movti_insn): Likewise.
(addsi3): Use "_co" instead of "%^".
(addsi3_scalar_carry): Likewise.
(addsi3_scalar_carry_cst): Likewise.
(addcsi3_scalar): Likewise.
(addcsi3_scalar_zero): Likewise.
(addptrdi3): Likewise.
(subsi3): Likewise.
(mulsi3_highpart): Remove TARGET_MULTIPLY_IMMEDIATE conditions.
(mulsi3_highpart_reg): Remove "gcn_version" attribute.
(muldi3): Likewise.
(atomic_fetch_): Likewise.
(atomic_): Likewise.
(sync_compare_and_swap_insn): Likewise.
(atomic_load): Likewise.
(atomic_store): Likewise.
(atomic_exchange): Likewise.
(mulsi3_highpart_imm): Remove both TARGET_MULTIPLY_IMMEDIATE and
"gcn_version".
(mulsidi3): Likewise.
(mulsidi3_imm): Likewise.

Diff:
---
 gcc/config/gcn/gcn-opts.h  |  9 --
 gcc/config/gcn/gcn-valu.md | 72 +++-
 gcc/config/gcn/gcn.cc  | 36 +++---
 gcc/config/gcn/gcn.h   |  3 +-
 gcc/config/gcn/gcn.md  | 74 +++---
 5 files changed, 59 insertions(+), 135 deletions(-)

diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index 6f5969d7bc8d..76f50ab9364f 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -52,7 +52,6 @@ extern enum gcn_isa {
 } gcn_isa;
 
 #define TARGET_GCN5 (gcn_isa == ISA_GCN5)
-#define TARGET_GCN5_PLUS (gcn_isa >= ISA_GCN5)
 #define TARGET_CDNA1 (gcn_isa == ISA_CDNA1)
 #define TARGET_CDNA1_PLUS (gcn_isa >= ISA_CDNA1)
 #define TARGET_CDNA2 (gcn_isa == ISA_CDNA2)
@@ -74,16 +73,12 @@ enum hsaco_attr_type
   HSACO_ATTR_DEFAULT
 };
 
-/* There are global address instructions.  */
-#define TARGET_GLOBAL_ADDRSPACE TARGET_GCN5_PLUS
 /* Device has an AVGPR register file.  */
 #define TARGET_AVGPRS TARGET_CDNA1_PLUS
 /* There are load/store instructions for AVGPRS.  */
 #define TARGET_AVGPR_

[gcc r15-1705] amdgcn: Fix RDNA V32 permutations [PR115640]

2024-06-28 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:ef0b30212f7756db15d7507bfd871bf377d7d648

commit r15-1705-gef0b30212f7756db15d7507bfd871bf377d7d648
Author: Andrew Stubbs 
Date:   Fri Jun 28 10:47:50 2024 +

amdgcn: Fix RDNA V32 permutations [PR115640]

There was an off-by-one error in the RDNA validation check, plus I forgot to
allow for two-to-one permute-and-merge operations.

PR target/115640

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_vectorize_vec_perm_const): Modify RDNA 
checks.

Diff:
---
 gcc/config/gcn/gcn.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index d6531f55190..aab9b59c519 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5134,7 +5134,7 @@ gcn_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
  Reject permutations that cross the boundary.  */
   if (TARGET_RDNA2_PLUS)
 for (unsigned int i = 0; i < nelt; i++)
-  if (i < 31 ? perm[i] > 31 : perm[i] < 32)
+  if (i < 32 ? (perm[i] % nelt) > 31 : (perm[i] % nelt) < 32)
return false;
 
   /* All vector permutations are possible on other architectures,


[gcc r15-1747] libgomp: change alloc-pinned tests failure mode

2024-07-01 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:90efaebf95c93244f6b1eda5cb8724e52047cecd

commit r15-1747-g90efaebf95c93244f6b1eda5cb8724e52047cecd
Author: Andrew Stubbs 
Date:   Wed Jun 12 08:43:53 2024 +

libgomp: change alloc-pinned tests failure mode

The feature doesn't work on non-Linux hosts, at present, so skip the tests
entirely.

On Linux systems that have insufficient lockable memory configured we still
need to fail or else the feature won't be getting tested when we think it 
is,
but now there's a message to explain why.

libgomp/ChangeLog:

* testsuite/libgomp.c/alloc-pinned-1.c: Change dg-xfail-run-if to
dg-skip-if.
Correct spelling mistake.
Abort on insufficient lockable memory.
Use #error on non-linux hosts.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.

Diff:
---
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c | 20 ++--
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c | 20 ++--
 2 files changed, 12 insertions(+), 28 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c 
b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
index 4185accf2e6..672f2453a78 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 
-/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu 
} } */
+/* { dg-skip-if "Pinning not implemented on this host" { ! *-*-linux-gnu* } } 
*/
 
 /* Test that pinned memory works.  */
 
@@ -19,7 +19,10 @@
   struct rlimit limit; \
   if (getrlimit (RLIMIT_MEMLOCK, &limit) \
   || limit.rlim_cur <= SIZE) \
-fprintf (stderr, "unsufficient lockable memory; please increase 
ulimit\n"); \
+{ \
+  fprintf (stderr, "insufficient lockable memory; please increase 
ulimit\n"); \
+  abort (); \
+} \
   }
 
 int
@@ -44,18 +47,7 @@ get_pinned_mem ()
   abort ();
 }
 #else
-#define PAGE_SIZE 1024 /* unknown */
-#define CHECK_SIZE(SIZE) { \
-  fprintf (stderr, "OS unsupported\n"); \
-  abort (); \
-  }
-#define EXPECT_OMP_NULL_ALLOCATOR
-
-int
-get_pinned_mem ()
-{
-  return 0;
-}
+#error "OS unsupported"
 #endif
 
 static void
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c 
b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
index 0b9c11d0315..b6d1d83fb6f 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 
-/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu 
} } */
+/* { dg-skip-if "Pinning not implemented on this host" { ! *-*-linux-gnu* } } 
*/
 
 /* Test that pinned memory works (pool_size code path).  */
 
@@ -19,7 +19,10 @@
   struct rlimit limit; \
   if (getrlimit (RLIMIT_MEMLOCK, &limit) \
   || limit.rlim_cur <= SIZE) \
-fprintf (stderr, "unsufficient lockable memory; please increase 
ulimit\n"); \
+{ \
+  fprintf (stderr, "insufficient lockable memory; please increase 
ulimit\n"); \
+  abort (); \
+} \
   }
 
 int
@@ -44,18 +47,7 @@ get_pinned_mem ()
   abort ();
 }
 #else
-#define PAGE_SIZE 1024 /* unknown */
-#define CHECK_SIZE(SIZE) { \
-  fprintf (stderr, "OS unsupported\n"); \
-  abort (); \
-  }
-#define EXPECT_OMP_NULL_ALLOCATOR
-
-int
-get_pinned_mem ()
-{
-  return 0;
-}
+#error "OS unsupported"
 #endif
 
 static void


[gcc r15-1748] libgomp, openmp: Add ompx_gnu_pinned_mem_alloc

2024-07-01 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:64001441ec99b80e457188ce50bb6c59c757d3c6

commit r15-1748-g64001441ec99b80e457188ce50bb6c59c757d3c6
Author: Andrew Stubbs 
Date:   Wed Jun 12 11:09:33 2024 +

libgomp, openmp: Add ompx_gnu_pinned_mem_alloc

This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  This is not in the OpenMP standard so it uses the 
"ompx"
namespace and an independent enum baseline of 200 (selected to not clash 
with
other known implementations).

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.  One motivation for having this feature 
is
for use by the (planned) -foffload-memory=pinned feature.

gcc/fortran/ChangeLog:

* openmp.cc (is_predefined_allocator): Update valid ranges to
incorporate ompx_gnu_pinned_mem_alloc.

libgomp/ChangeLog:

* allocator.c (ompx_gnu_min_predefined_alloc): New.
(ompx_gnu_max_predefined_alloc): New.
(predefined_alloc_mapping): Rename to ...
(predefined_omp_alloc_mapping): ... this.
(predefined_ompx_gnu_alloc_mapping): New.
(_Static_assert): Adjust for the new name, and add a new assert for 
the
new table.
(predefined_allocator_p): New.
(predefined_alloc_mapping): New.
(omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc.
Use predefined_allocator_p and predefined_alloc_mapping.
(omp_free): Likewise.
(omp_alligned_calloc): Likewise.
(omp_realloc): Likewise.
* env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc.
* libgomp.texi: Document ompx_gnu_pinned_mem_alloc.
* omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc.
* omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-pinned-1.f90: New test.

Co-Authored-By: Thomas Schwinge 

Diff:
---
 gcc/fortran/openmp.cc  |  11 +-
 .../gfortran.dg/gomp/allocate-pinned-1.f90 |  16 +++
 libgomp/allocator.c| 115 +++--
 libgomp/env.c  |   1 +
 libgomp/libgomp.texi   |   7 +-
 libgomp/omp.h.in   |   1 +
 libgomp/omp_lib.f90.in |   2 +
 libgomp/omp_lib.h.in   |   2 +
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c   | 100 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c   | 102 ++
 .../testsuite/libgomp.fortran/alloc-pinned-1.f90   |  16 +++
 11 files changed, 336 insertions(+), 37 deletions(-)

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 9b30a108560..333f0c7fe7f 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -7423,8 +7423,9 @@ resolve_omp_udr_clause (gfc_omp_namelist *n, 
gfc_namespace *ns,
 }
 
 /* Assume that a constant expression in the range 1 (omp_default_mem_alloc)
-   to 8 (omp_thread_mem_alloc) range is fine.  The original symbol name is
-   already lost during matching via gfc_match_expr.  */
+   to 8 (omp_thread_mem_alloc) range, or 200 (ompx_gnu_pinned_mem_alloc) is
+   fine.  The original symbol name is already lost during matching via
+   gfc_match_expr.  */
 static bool
 is_predefined_allocator (gfc_expr *expr)
 {
@@ -7433,8 +7434,10 @@ is_predefined_allocator (gfc_expr *expr)
  && expr->ts.type == BT_INTEGER
  && expr->ts.kind == gfc_c_intptr_kind
  && expr->expr_type == EXPR_CONSTANT
- && mpz_sgn (expr->value.integer) > 0
- && mpz_cmp_si (expr->value.integer, 8) <= 0);
+ && ((mpz_sgn (expr->value.integer) > 0
+  && mpz_cmp_si (expr->value.integer, 8) <= 0)
+ || (mpz_cmp_si (expr->value.integer, 200) >= 0
+ && mpz_cmp_si (expr->value.integer, 200) <= 0)));
 }
 
 /* Resolve declarative ALLOCATE statement. Note: Common block vars only appear
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90 
b/gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90
new file mode 100644
index 000..0e6619b7853
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90
@@ -0,0 +1,16 @@
+! Test that the ompx_gnu_pinned_mem_alloc is accepted by the parser
+
+module m
+use iso_c_binding
+integer, parameter :: omp_allocator_handle_kind = c_intptr_t
+integer (kind=omp_allocator_handle_kind), &
+ parameter :: ompx_gnu_pinned_mem_alloc = 200
+end
+
+subroutine f ()
+

[gcc r15-1769] amdgcn: invent target feature flags

2024-07-02 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:68e034920bab9abd547503967f73b81cc37cfbf4

commit r15-1769-g68e034920bab9abd547503967f73b81cc37cfbf4
Author: Andrew Stubbs 
Date:   Fri Jun 28 15:13:59 2024 +

amdgcn: invent target feature flags

This is a first step towards having a device table so we can add new devices
more easily.  It'll also make it easier to remove the deprecated GCN3 bits.

The patch should not change the behaviour of anything.

gcc/ChangeLog:

* config/gcn/gcn-opts.h (TARGET_GLOBAL_ADDRSPACE): New.
(TARGET_AVGPRS): New.
(TARGET_AVGPR_MEMOPS): New.
(TARGET_AVGPR_COMBINED): New.
(TARGET_FLAT_OFFSETS): New.
(TARGET_11BIT_GLOBAL_OFFSET): New.
(TARGET_CDNA2_MEM_COSTS): New.
(TARGET_WAVE64_COMPAT): New.
(TARGET_DPP_FULL): New.
(TARGET_DPP16): New.
(TARGET_DPP8): New.
(TARGET_AVGPR_CDNA1_NOPS): New.
(TARGET_VGPR_GRANULARITY): New.
(TARGET_ARCHITECTED_FLAT_SCRATCH): New.
(TARGET_EXPLICIT_CARRY): New.
(TARGET_MULTIPLY_IMMEDIATE): New.
(TARGET_SDWA): New.
(TARGET_WBINVL1_CACHE): New.
(TARGET_GLn_CACHE): New.
* config/gcn/gcn-valu.md (throughout): Change TARGET_GCN*,
TARGET_CDNA* and TARGET_RDNA* to use TARGET_ instead.
* config/gcn/gcn.cc (throughout): Likewise.
* config/gcn/gcn.md (throughout): Likewise.

Diff:
---
 gcc/config/gcn/gcn-opts.h  | 44 ++
 gcc/config/gcn/gcn-valu.md | 28 +++---
 gcc/config/gcn/gcn.cc  | 76 +++--
 gcc/config/gcn/gcn.md  | 94 --
 4 files changed, 155 insertions(+), 87 deletions(-)

diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index 1091035a69a..24e856bc0c3 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -80,4 +80,48 @@ enum hsaco_attr_type
   HSACO_ATTR_DEFAULT
 };
 
+/* There are global address instructions.  */
+#define TARGET_GLOBAL_ADDRSPACE TARGET_GCN5_PLUS
+/* Device has an AVGPR register file.  */
+#define TARGET_AVGPRS TARGET_CDNA1_PLUS
+/* There are load/store instructions for AVGPRS.  */
+#define TARGET_AVGPR_MEMOPS TARGET_CDNA2_PLUS
+/* AVGPRS may have their own register file, or be combined with VGPRS.  */
+#define TARGET_AVGPR_COMBINED TARGET_CDNA2_PLUS
+/* flat_load/store allows offsets.  */
+#define TARGET_FLAT_OFFSETS TARGET_GCN5_PLUS
+/* global_load/store has reduced offset.  */
+#define TARGET_11BIT_GLOBAL_OFFSET TARGET_RDNA2_PLUS
+/* The work item details are all encoded into v0.  */
+//#define TARGET_PACKED_WORK_ITEMS TARGET_PACKED_WORK_ITEMS
+/* m0 must be initialized in order to use LDS.  */
+//#define TARGET_M0_LDS_LIMIT TARGET_M0_LDS_LIMIT
+/* CDNA2 load/store costs are reduced.
+ * TODO: what does this mean?  */
+#define TARGET_CDNA2_MEM_COSTS TARGET_CDNA2_PLUS
+/* Wave32 devices running in wave64 compatibility mode.  */
+#define TARGET_WAVE64_COMPAT TARGET_RDNA2_PLUS
+/* RDNA devices have different DPP with reduced capabilities.  */
+#define TARGET_DPP_FULL !TARGET_RDNA2_PLUS
+#define TARGET_DPP16 TARGET_RDNA2_PLUS
+#define TARGET_DPP8 TARGET_RDNA2_PLUS
+/* Device requires CDNA1-style manually inserted wait states for AVGPRs.  */
+#define TARGET_AVGPR_CDNA1_NOPS TARGET_CDNA1
+/* The metadata on different devices need different granularity.  */
+#define TARGET_VGPR_GRANULARITY \
+  (TARGET_RDNA3 ? 12 \
+   : TARGET_RDNA2_PLUS || TARGET_CDNA2_PLUS ? 8 \
+   : 4)
+/* This mostly affects the metadata.  */
+#define TARGET_ARCHITECTED_FLAT_SCRATCH TARGET_RDNA3
+/* Assembler uses s_add_co not just s_add.  */
+#define TARGET_EXPLICIT_CARRY TARGET_GCN5_PLUS
+/* mulsi3 permits immediate.  */
+#define TARGET_MULTIPLY_IMMEDIATE TARGET_GCN5_PLUS
+/* Device has Sub-DWord Addressing instrucions.  */
+#define TARGET_SDWA (!TARGET_RDNA3)
+/* Different devices uses different cache control instructions.  */
+#define TARGET_WBINVL1_CACHE (!TARGET_RDNA2_PLUS)
+#define TARGET_GLn_CACHE TARGET_RDNA2_PLUS
+
 #endif
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index e8381d28c1b..b24cf9be32e 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -983,7 +983,7 @@
(match_operand 2 "immediate_operand")]
   "MODE_VF (mode) < MODE_VF (mode)
&& mode == mode
-   && (!TARGET_RDNA2_PLUS || MODE_VF (mode) <= 32)"
+   && (!TARGET_WAVE64_COMPAT || MODE_VF (mode) <= 32)"
   {
 int numlanes = GET_MODE_NUNITS (mode);
 int firstlane = INTVAL (operands[2]) * numlanes;
@@ -1167,7 +1167,7 @@
 static char buf[200];
 if (AS_FLAT_P (as))
   {
-   if (TARGET_GCN5_PLUS)
+   if (TARGET_FLAT_OFFSETS)
  sprintf (buf, "flat_load%%o0\t%%0, %%1 offset:%%2%s\;s_waitcnt\t0",
   glc);
else
@@ -1290,7 +1290,7 @@
  UNSPEC_SCATTER))]
   "(AS_

[gcc r14-9593] amdgcn: Clean up device memory in gcn-run

2024-03-21 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:c3fb8a4d150586459a9fa177cb2aeeac5e4c0464

commit r14-9593-gc3fb8a4d150586459a9fa177cb2aeeac5e4c0464
Author: Andrew Stubbs 
Date:   Wed Mar 20 12:49:24 2024 +

amdgcn: Clean up device memory in gcn-run

gcc/ChangeLog:

* config/gcn/gcn-run.cc (main): Add an hsa_memory_free calls for 
each
device_malloc call.

Diff:
---
 gcc/config/gcn/gcn-run.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn-run.cc b/gcc/config/gcn/gcn-run.cc
index d45ff3e6c2b..2f3ed2d41d2 100644
--- a/gcc/config/gcn/gcn-run.cc
+++ b/gcc/config/gcn/gcn-run.cc
@@ -755,7 +755,13 @@ main (int argc, char *argv[])
 
   /* Clean shut down.  */
   XHSA (hsa_fns.hsa_memory_free_fn (kernargs),
-   "Clean up device memory");
+   "Clean up device kernargs memory");
+  XHSA (hsa_fns.hsa_memory_free_fn (args),
+   "Clean up device args memory");
+  XHSA (hsa_fns.hsa_memory_free_fn (heap),
+   "Clean up device heap memory");
+  XHSA (hsa_fns.hsa_memory_free_fn (stack),
+   "Clean up device stack memory");
   XHSA (hsa_fns.hsa_executable_destroy_fn (executable),
"Clean up GCN executable");
   XHSA (hsa_fns.hsa_queue_destroy_fn (queue),


[gcc r14-9594] amdgcn: Ensure gfx11 is running in cumode

2024-03-21 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:69dc2dc7e0e853856b84b1bcc89d0241d8a570aa

commit r14-9594-g69dc2dc7e0e853856b84b1bcc89d0241d8a570aa
Author: Andrew Stubbs 
Date:   Mon Mar 4 15:48:47 2024 +

amdgcn: Ensure gfx11 is running in cumode

CUmode "on" is the setting for compatibility with GCN and CDNA devices.

gcc/ChangeLog:

* config/gcn/gcn-hsa.h (ASM_SPEC): Pass -mattr=+cumode.

Diff:
---
 gcc/config/gcn/gcn-hsa.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 9cf181f52a4..c75256dbac3 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -107,6 +107,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
  "%{" NO_XNACK XNACKOPT "} " \
  "%{" NO_SRAM_ECC SRAMOPT "} " \
  "%{march=gfx1030|march=gfx1100:-mattr=+wavefrontsize64} " \
+ "%{march=gfx1030|march=gfx1100:-mattr=+cumode} " \
  "-filetype=obj"
 #define LINK_SPEC "--pie --export-dynamic"
 #define LIB_SPEC  "-lc"


[gcc r14-9595] amdgcn: Comment correction

2024-03-21 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:a2fe34e0b993d5fb879d75ddb42b24b45c4b7242

commit r14-9595-ga2fe34e0b993d5fb879d75ddb42b24b45c4b7242
Author: Andrew Stubbs 
Date:   Mon Mar 4 15:52:00 2024 +

amdgcn: Comment correction

The location of the marker was changed, but the comment wasn't updated.
Fixed now.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_expand_builtin_1): Comment correction.

Diff:
---
 gcc/config/gcn/gcn.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index bc076d1120d..fca001811e5 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4932,8 +4932,8 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx 
/*subtarget */ ,
   }
 case GCN_BUILTIN_FIRST_CALL_THIS_THREAD_P:
   {
-   /* Stash a marker in the unused upper 16 bits of s[0:1] to indicate
-  whether it was the first call.  */
+   /* Stash a marker in the unused upper 16 bits of QUEUE_PTR_ARG to
+  indicate whether it was the first call.  */
rtx result = gen_reg_rtx (BImode);
emit_move_insn (result, const0_rtx);
if (cfun->machine->args.reg[QUEUE_PTR_ARG] >= 0)


[gcc r14-9621] vect: more oversized bitmask fixups

2024-03-22 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:e4e02c07d93559a037608c73e8153549b5104fbb

commit r14-9621-ge4e02c07d93559a037608c73e8153549b5104fbb
Author: Andrew Stubbs 
Date:   Fri Mar 15 14:21:15 2024 +

vect: more oversized bitmask fixups

These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
configured to use V32 instead of V64 (I plan to do this for RDNA devices).

The problem was that a "not" operation on the mask inadvertently enabled
inactive lanes 31-63 and corrupted the output.  The fix is to adjust the 
mask
when calling internal functions (in this case COND_MINUS), when doing masked
loads and stores, and when doing conditional jumps (some cases were already
handled).

gcc/ChangeLog:

* dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
bitmasks.
(do_compare_and_jump): Remove now-redundant similar code.
* internal-fn.cc (expand_fn_using_insn): Clear excess bits in vector
bitmasks.
(add_mask_and_len_args): Likewise.

Diff:
---
 gcc/dojump.cc  | 34 ++
 gcc/internal-fn.cc | 26 ++
 2 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/gcc/dojump.cc b/gcc/dojump.cc
index 88600cb42d3..5f74b696b41 100644
--- a/gcc/dojump.cc
+++ b/gcc/dojump.cc
@@ -1235,6 +1235,24 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code 
code, int unsignedp,
}
}
 
+  /* For boolean vectors with less than mode precision
+make sure to fill padding with consistent values.  */
+  if (val
+ && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
+ && SCALAR_INT_MODE_P (mode))
+   {
+ auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant ();
+ if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
+   {
+ op0 = expand_binop (mode, and_optab, op0,
+ GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+ NULL_RTX, true, OPTAB_WIDEN);
+ op1 = expand_binop (mode, and_optab, op1,
+ GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+ NULL_RTX, true, OPTAB_WIDEN);
+   }
+   }
+
   emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val,
   if_true_label, prob);
 }
@@ -1266,7 +1284,6 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum 
rtx_code signed_code,
   machine_mode mode;
   int unsignedp;
   enum rtx_code code;
-  unsigned HOST_WIDE_INT nunits;
 
   /* Don't crash if the comparison was erroneous.  */
   op0 = expand_normal (treeop0);
@@ -1309,21 +1326,6 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum 
rtx_code signed_code,
   emit_insn (targetm.gen_canonicalize_funcptr_for_compare (new_op1, op1));
   op1 = new_op1;
 }
-  /* For boolean vectors with less than mode precision
- make sure to fill padding with consistent values.  */
-  else if (VECTOR_BOOLEAN_TYPE_P (type)
-  && SCALAR_INT_MODE_P (mode)
-  && TYPE_VECTOR_SUBPARTS (type).is_constant (&nunits)
-  && maybe_ne (GET_MODE_PRECISION (mode), nunits))
-{
-  gcc_assert (code == EQ || code == NE);
-  op0 = expand_binop (mode, and_optab, op0,
- GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), NULL_RTX,
- true, OPTAB_WIDEN);
-  op1 = expand_binop (mode, and_optab, op1,
- GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), NULL_RTX,
- true, OPTAB_WIDEN);
-}
 
   do_compare_rtx_and_jump (op0, op1, code, unsignedp, treeop0, mode,
   ((mode == BLKmode)
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index fcf47c7fa12..5269f0ac528 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
unsigned int noutputs,
   && SSA_NAME_IS_DEFAULT_DEF (rhs)
   && VAR_P (SSA_NAME_VAR (rhs)))
create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type));
+  else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
+  && SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
+  && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
+   TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ()))
+   {
+ /* Ensure that the vector bitmasks do not have excess bits.  */
+ int nunits = TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ();
+ rtx tmp = expand_binop (TYPE_MODE (rhs_type), and_optab, rhs_rtx,
+ GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+ NULL_RTX, true, OPTAB_WIDEN);
+ create_input_operand (&ops[opno], tmp, TYPE_MODE (rhs_type));
+   }
   else
create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type)

[gcc r14-9623] amdgcn: Add gfx1103 target

2024-03-22 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:1bf18629c54adf4893c8db5227a36e1952ee69a3

commit r14-9623-g1bf18629c54adf4893c8db5227a36e1952ee69a3
Author: Andrew Stubbs 
Date:   Fri Mar 15 14:26:15 2024 +

amdgcn: Add gfx1103 target

Add support for the gfx1103 RDNA3 APU integrated graphics devices.  The ROCm
documentation warns that these may not be supported, but it seems to work
at least partially.

gcc/ChangeLog:

* config.gcc (amdgcn): Add gfx1103 entries.
* config/gcn/gcn-hsa.h (NO_XNACK): Likewise.
(gcn_local_sym_hash): Likewise.
* config/gcn/gcn-opts.h (enum processor_type): Likewise.
(TARGET_GFX1103): New macro.
* config/gcn/gcn.cc (gcn_option_override): Handle gfx1103.
(gcn_omp_device_kind_arch_isa): Likewise.
(output_file_start): Likewise.
(gcn_hsa_declare_function_name): Use TARGET_RDNA3, not just gfx1100.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __gfx1103__.
* config/gcn/gcn.opt: Add gfx1103.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1103): New.
(main): Handle gfx1103.
* config/gcn/t-omp-device: Add gfx1103 isa.
* doc/install.texi (amdgcn): Add gfx1103.
* doc/invoke.texi (-march): Likewise.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (EF_AMDGPU_MACH): GFX1103.
(gcn_gfx1103_s): New.
(isa_hsa_name): Handle gfx1103.
(isa_code): Likewise.
(max_isa_vgprs): Likewise.

Diff:
---
 gcc/config.gcc  |  4 ++--
 gcc/config/gcn/gcn-hsa.h|  6 +++---
 gcc/config/gcn/gcn-opts.h   |  4 +++-
 gcc/config/gcn/gcn.cc   | 14 --
 gcc/config/gcn/gcn.h|  2 ++
 gcc/config/gcn/gcn.opt  |  3 +++
 gcc/config/gcn/mkoffload.cc |  5 +
 gcc/config/gcn/t-omp-device |  2 +-
 gcc/doc/install.texi| 13 +++--
 gcc/doc/invoke.texi |  3 +++
 libgomp/plugin/plugin-gcn.c | 10 +-
 11 files changed, 50 insertions(+), 16 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 040afabd9ec..87a5c92b6e3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4560,7 +4560,7 @@ case "${target}" in
for which in arch tune; do
eval "val=\$with_$which"
case ${val} in
-   "" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030 
| gfx1100)
+   "" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030 
| gfx1100 | gfx1103)
# OK
;;
*)
@@ -4576,7 +4576,7 @@ case "${target}" in
TM_MULTILIB_CONFIG=
;;
xdefault | xyes)
-   TM_MULTILIB_CONFIG=`echo 
"gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100" | sed 
"s/${with_arch},\?//;s/,$//"`
+   TM_MULTILIB_CONFIG=`echo 
"gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100,gfx1103" | sed 
"s/${with_arch},\?//;s/,$//"`
;;
*)
TM_MULTILIB_CONFIG="${with_multilib_list}"
diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index c75256dbac3..ac32b8a328f 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -90,7 +90,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
the ELF flags (e_flags) of that generated file must be identical to those
generated by the compiler.  */
 
-#define NO_XNACK "march=fiji:;march=gfx1030:;march=gfx1100:;" \
+#define NO_XNACK "march=fiji:;march=gfx1030:;march=gfx1100:;march=gfx1103:;" \
 /* These match the defaults set in gcn.cc.  */ \
 
"!mxnack*|mxnack=default:%{march=gfx900|march=gfx906|march=gfx908:-mattr=-xnack};"
 #define NO_SRAM_ECC "!march=*:;march=fiji:;march=gfx900:;march=gfx906:;"
@@ -106,8 +106,8 @@ extern unsigned int gcn_local_sym_hash (const char *name);
  "%{" ABI_VERSION_SPEC "} " \
  "%{" NO_XNACK XNACKOPT "} " \
  "%{" NO_SRAM_ECC SRAMOPT "} " \
- "%{march=gfx1030|march=gfx1100:-mattr=+wavefrontsize64} " \
- "%{march=gfx1030|march=gfx1100:-mattr=+cumode} " \
+ 
"%{march=gfx1030|march=gfx1100|march=gfx1103:-mattr=+wavefrontsize64} " \
+ "%{march=gfx1030|march=gfx1100|march=gfx1103:-mattr=+cumode} 
" \
  "-filetype=obj"
 #define LINK_SPEC "--pie --export-dynamic"
 #define LIB_SPEC  "-lc"
diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index 6be2c9204fa..285746f7f4d 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -26,7 +26,8 @@ enum processor_type
   PROCESSOR_GFX908,
   PROCESSOR_GFX90a,
   PROCESSOR_GFX1030,
-  PROCESSOR_GFX1100
+  PROCESSOR_GFX1100,
+  PROCESSOR_GFX1103
 };
 
 #define TARGET_FIJI (gcn_arch == PROCESSOR_FIJI)
@@ -36,6 +

[gcc r14-9626] amdgcn: Prefer V32 on RDNA devices

2024-03-22 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:6dedafe166cc02ae87b6a0699ad61ce3ffc46803

commit r14-9626-g6dedafe166cc02ae87b6a0699ad61ce3ffc46803
Author: Andrew Stubbs 
Date:   Thu Feb 22 11:41:19 2024 +

amdgcn: Prefer V32 on RDNA devices

We run these devices in wavefrontsize64 for compatibility, but they actually
only have 32-lane vectors, natively.  If the upper part of a V64 is masked
off (as it is in V32) then RDNA devices will skip execution of the upper 
part
for most operations, so this adjustment shouldn't leave too much 
performance on
the table.  One exception is memory instructions, so full wavefrontsize32
support would be better.

The advantage is that we avoid the missing V64 operations (such as permute 
and
vec_extract).

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_vectorize_preferred_simd_mode): Prefer V32 
on
RDNA devices.

Diff:
---
 gcc/config/gcn/gcn.cc | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 498146dcde9..efb73af50c4 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5226,6 +5226,32 @@ gcn_vector_mode_supported_p (machine_mode mode)
 static machine_mode
 gcn_vectorize_preferred_simd_mode (scalar_mode mode)
 {
+  /* RDNA devices have 32-lane vectors with limited support for 64-bit vectors
+ (in particular, permute operations are only available for cases that don't
+ span the 32-lane boundary).
+
+ From the RDNA3 manual: "Hardware may choose to skip either half if the
+ EXEC mask for that half is all zeros...". This means that preferring
+ 32-lanes is a good stop-gap until we have proper wave32 support.  */
+  if (TARGET_RDNA2_PLUS)
+switch (mode)
+  {
+  case E_QImode:
+   return V32QImode;
+  case E_HImode:
+   return V32HImode;
+  case E_SImode:
+   return V32SImode;
+  case E_DImode:
+   return V32DImode;
+  case E_SFmode:
+   return V32SFmode;
+  case E_DFmode:
+   return V32DFmode;
+  default:
+   return word_mode;
+  }
+
   switch (mode)
 {
 case E_QImode:


[gcc r14-9627] amdgcn: Adjust GFX10/GFX11 cache coherency

2024-03-22 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:e194503b6f2cf5f1b819f4a8af9d16311a07e4f5

commit r14-9627-ge194503b6f2cf5f1b819f4a8af9d16311a07e4f5
Author: Andrew Stubbs 
Date:   Wed Mar 6 15:54:46 2024 +

amdgcn: Adjust GFX10/GFX11 cache coherency

The RDNA devices have different cache architectures to the CDNA devices, and
the differences go deeper than just the assembler mnemonics.

I believe this patch is correct according to the documentation in the LLVM
AMDGPU user guide (the ISA manual is less instructive), but I hadn't 
observed
any real problems before (or after).

gcc/ChangeLog:

* config/gcn/gcn.md (*memory_barrier): Split into RDNA and !RDNA.
(atomic_load): Adjust RDNA cache settings.
(atomic_store): Likewise.
(atomic_exchange): Likewise.

Diff:
---
 gcc/config/gcn/gcn.md | 86 ---
 1 file changed, 55 insertions(+), 31 deletions(-)

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 3b51453aaca..574c2f87e8c 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -1960,11 +1960,19 @@
 (define_insn "*memory_barrier"
   [(set (match_operand:BLK 0)
(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))]
-  ""
-  "{buffer_wbinvl1_vol|buffer_gl0_inv}"
+  "!TARGET_RDNA2_PLUS"
+  "buffer_wbinvl1_vol"
   [(set_attr "type" "mubuf")
(set_attr "length" "4")])
 
+(define_insn "*memory_barrier"
+  [(set (match_operand:BLK 0)
+   (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))]
+  "TARGET_RDNA2_PLUS"
+  "buffer_gl1_inv\;buffer_gl0_inv"
+  [(set_attr "type" "mult")
+   (set_attr "length" "8")])
+
 ; FIXME: These patterns have been disabled as they do not seem to work
 ; reliably - they can cause hangs or incorrect results.
 ; TODO: flush caches according to memory model
@@ -2094,9 +2102,13 @@
  case 0:
return "s_load%o0\t%0, %A1 glc\;s_waitcnt\tlgkmcnt(0)";
  case 1:
-   return "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0";
+   return (TARGET_RDNA2 /* Not GFX11.  */
+   ? "flat_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\t0"
+   : "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0");
  case 2:
-   return "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)";
+   return (TARGET_RDNA2 /* Not GFX11.  */
+   ? "global_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\tvmcnt(0)"
+   : "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)");
  }
break;
   case MEMMODEL_CONSUME:
@@ -2108,15 +2120,21 @@
return "s_load%o0\t%0, %A1 glc\;s_waitcnt\tlgkmcnt(0)\;"
   "s_dcache_wb_vol";
  case 1:
-   return (TARGET_RDNA2_PLUS
+   return (TARGET_RDNA2
+   ? "flat_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\t0\;"
+ "buffer_gl1_inv\;buffer_gl0_inv"
+   : TARGET_RDNA3
? "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0\;"
- "buffer_gl0_inv"
+ "buffer_gl1_inv\;buffer_gl0_inv"
: "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0\;"
  "buffer_wbinvl1_vol");
  case 2:
-   return (TARGET_RDNA2_PLUS
+   return (TARGET_RDNA2
+   ? "global_load%o0\t%0, %A1%O1 glc 
dlc\;s_waitcnt\tvmcnt(0)\;"
+ "buffer_gl1_inv\;buffer_gl0_inv"
+   : TARGET_RDNA3
? "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)\;"
- "buffer_gl0_inv"
+ "buffer_gl1_inv\;buffer_gl0_inv"
: "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)\;"
  "buffer_wbinvl1_vol");
  }
@@ -2130,15 +2148,21 @@
return "s_dcache_wb_vol\;s_load%o0\t%0, %A1 glc\;"
   "s_waitcnt\tlgkmcnt(0)\;s_dcache_inv_vol";
  case 1:
-   return (TARGET_RDNA2_PLUS
-   ? "buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 glc\;"
- "s_waitcnt\t0\;buffer_gl0_inv"
+   return (TARGET_RDNA2
+   ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 
glc dlc\;"
+ "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv"
+   : TARGET_RDNA3
+   ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 
glc\;"
+ "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv"
: "buffer_wbinvl1_vol\;flat_load%o0\t%0, %A1%O1 glc\;"
  "s_waitcnt\t0\;buffer_wbinvl1_vol");
  case 2:
-   return (TARGET_RDNA2_PLUS
-   ? "buffer_gl0_inv\;global_load%o0\t%0, %A1%O1 glc\;"
- "s_waitcnt\tvmcnt(0)\;buffer_gl0_inv"
+   return (TARGET_RDNA2
+   ? "buffer_gl1_inv\;buffer_gl0_inv\;global_load%o0\t%0, 
%A1%O1 glc dlc\;"
+  

[gcc r15-2822] amdgcn: Re-enable trampolines

2024-08-08 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:6f71e050a51378e1811b90fe9c16cd37bf4c48ec

commit r15-2822-g6f71e050a51378e1811b90fe9c16cd37bf4c48ec
Author: Andrew Stubbs 
Date:   Thu Aug 8 13:12:43 2024 +

amdgcn: Re-enable trampolines

The stacks are executable since the reverse-offload features were added, so
trampolines actually do work.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_trampoline_init): Re-enable trampolines.

Diff:
---
 gcc/config/gcn/gcn.cc | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 00f2978559bd..b22132de6ab7 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -3799,11 +3799,6 @@ gcn_asm_trampoline_template (FILE *f)
 static void
 gcn_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value)
 {
-  // FIXME
-  if (TARGET_GCN5_PLUS)
-sorry ("nested function trampolines not supported on GCN5 due to"
-   " non-executable stacks");
-
   emit_block_move (m_tramp, assemble_trampoline_template (),
   GEN_INT (TRAMPOLINE_SIZE), BLOCK_OP_NORMAL);


[gcc r15-2835] amdgcn: Fix VGPR max count

2024-08-08 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:715317331994d3d69395056f77bfe7ac613af009

commit r15-2835-g715317331994d3d69395056f77bfe7ac613af009
Author: Andrew Stubbs 
Date:   Wed Aug 7 15:35:18 2024 +

amdgcn: Fix VGPR max count

The metadata for RDNA3 kernels allocates VGPRs in blocks of 12, which means 
the
maximum usable number of registers is 252.  This patch prevents the compiler
from exceeding this artifical limit.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_conditional_register_usage): Fix registers
remaining after maximum allocation using TARGET_VGPR_GRANULARITY.

Diff:
---
 gcc/config/gcn/gcn.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index b22132de6ab7..0725d15c8ed0 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -2493,6 +2493,13 @@ gcn_secondary_reload (bool in_p, rtx x, reg_class_t 
rclass,
 static void
 gcn_conditional_register_usage (void)
 {
+  /* Some architectures have a register allocation granularity that does not
+ permit use of the full register count.  */
+  for (int i = 256 - (256 % TARGET_VGPR_GRANULARITY);
+   i < 256;
+   i++)
+fixed_regs[VGPR_REGNO (i)] = call_used_regs[VGPR_REGNO (i)] = 1;
+
   if (!cfun || !cfun->machine)
 return;


[gcc/devel/omp/gcc-14] amdgcn: Re-enable trampolines

2024-08-08 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:ffc69480ea3eca06f6e445025b839d23848ee148

commit ffc69480ea3eca06f6e445025b839d23848ee148
Author: Andrew Stubbs 
Date:   Thu Aug 8 13:12:43 2024 +

amdgcn: Re-enable trampolines

The stacks are executable since the reverse-offload features were added, so
trampolines actually do work.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_trampoline_init): Re-enable trampolines.

(cherry picked from commit 6f71e050a51378e1811b90fe9c16cd37bf4c48ec)

Diff:
---
 gcc/config/gcn/gcn.cc | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index d6531f55190c..4c212e248764 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -3798,10 +3798,6 @@ gcn_asm_trampoline_template (FILE *f)
 static void
 gcn_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value)
 {
-  if (TARGET_GCN5_PLUS)
-sorry ("nested function trampolines not supported on GCN5 due to"
-   " non-executable stacks");
-
   emit_block_move (m_tramp, assemble_trampoline_template (),
   GEN_INT (TRAMPOLINE_SIZE), BLOCK_OP_NORMAL);


[gcc/devel/omp/gcc-14] amdgcn: Fix VGPR max count

2024-08-08 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:6d3c68ff05cf2b681c68db8dd0e2936cc34f2c40

commit 6d3c68ff05cf2b681c68db8dd0e2936cc34f2c40
Author: Andrew Stubbs 
Date:   Wed Aug 7 15:35:18 2024 +

amdgcn: Fix VGPR max count

The metadata for RDNA3 kernels allocates VGPRs in blocks of 12, which means 
the
maximum usable number of registers is 252.  This patch prevents the compiler
from exceeding this artifical limit.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_conditional_register_usage): Fix registers
remaining after maximum allocation using TARGET_VGPR_GRANULARITY.

(cherry picked from commit 715317331994d3d69395056f77bfe7ac613af009)

Diff:
---
 gcc/config/gcn/gcn.cc | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 4c212e248764..e5fc05afbf86 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -2492,6 +2492,16 @@ gcn_secondary_reload (bool in_p, rtx x, reg_class_t 
rclass,
 static void
 gcn_conditional_register_usage (void)
 {
+  /* Some architectures have a register allocation granularity that does not
+ permit use of the full register count.  */
+  int vgpr_block_size = (TARGET_RDNA3 ? 12
+: TARGET_RDNA2_PLUS || TARGET_CDNA2_PLUS ? 8
+: 4);
+  for (int i = 256 - (256 % vgpr_block_size);
+   i < 256;
+   i++)
+fixed_regs[VGPR_REGNO (i)] = call_used_regs[VGPR_REGNO (i)] = 1;
+
   if (!cfun || !cfun->machine)
 return;


[gcc r15-2846] amdgcn: Add padding to trampoline

2024-08-09 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:b5a09a68bf0feaf0b0678d8f3433f776238d3896

commit r15-2846-gb5a09a68bf0feaf0b0678d8f3433f776238d3896
Author: Andrew Stubbs 
Date:   Fri Aug 9 11:45:42 2024 +

amdgcn: Add padding to trampoline

This avoids a -Wpadded warning (testcase gcc.dg/20050607-1.c).

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_asm_trampoline_template): Add .align.
* config/gcn/gcn.h (TRAMPOLINE_SIZE): Increase to 40.

Diff:
---
 gcc/config/gcn/gcn.cc | 1 +
 gcc/config/gcn/gcn.h  | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 0725d15c8ed0..17316a7ddb84 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -3794,6 +3794,7 @@ gcn_asm_trampoline_template (FILE *f)
   asm_fprintf (f, "\ts_mov_b32\ts%i, 0x\n", CC_SAVE_REG);
   asm_fprintf (f, "\ts_mov_b32\ts%i, 0x\n", CC_SAVE_REG + 1);
   asm_fprintf (f, "\ts_setpc_b64\ts[%i:%i]\n", CC_SAVE_REG, CC_SAVE_REG + 1);
+  asm_fprintf (f, "\t.align 8\n");
 }
 
 /* Implement TARGET_TRAMPOLINE_INIT.
diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index e3bfd29c17d2..bd2afa61c10b 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -831,7 +831,7 @@ enum gcn_builtin_codes
 #define PROFILE_BEFORE_PROLOGUE 0
 
 /* Trampolines */
-#define TRAMPOLINE_SIZE 36
+#define TRAMPOLINE_SIZE 40  /* 36 + 4 padding for alignment.  */
 #define TRAMPOLINE_ALIGNMENT 64
 
 /* MD Optimization.


[gcc r15-4519] amdgcn: silence warning

2024-10-21 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:0b6d94ce72b2f35dbee7c42774d6972671c86f97

commit r15-4519-g0b6d94ce72b2f35dbee7c42774d6972671c86f97
Author: Andrew Stubbs 
Date:   Mon Sep 16 12:31:59 2024 +

amdgcn: silence warning

FIRST_SGPR_REG is register zero so the compiler always claims this 
comparison
is redundant.  It's right, of course, but I'd have preferred to keep the
comparison for completeness.  Probably the "correct" solution is to use an 
enum
for these values.

gcc/ChangeLog:

* config/gcn/gcn.h (SGPR_REGNO_P): Silence warning.

Diff:
---
 gcc/config/gcn/gcn.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index 1a4631dd39f6..faefe68cdfa9 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -191,7 +191,7 @@ STATIC_ASSERT (LAST_AVGPR_REG + 1 - FIRST_AVGPR_REG == 256);
 #define HARD_FRAME_POINTER_IS_ARG_POINTER   0
 #define HARD_FRAME_POINTER_IS_FRAME_POINTER 0
 
-#define SGPR_REGNO_P(N)((N) >= FIRST_SGPR_REG && (N) <= 
LAST_SGPR_REG)
+#define SGPR_REGNO_P(N)(/*(N) >= FIRST_SGPR_REG &&*/ (N) <= 
LAST_SGPR_REG)
 #define VGPR_REGNO_P(N)((N) >= FIRST_VGPR_REG && (N) <= 
LAST_VGPR_REG)
 #define AVGPR_REGNO_P(N)((N) >= FIRST_AVGPR_REG && (N) <= 
LAST_AVGPR_REG)
 #define SSRC_REGNO_P(N)((N) <= SCC_REG && (N) != VCCZ_REG)


[gcc r15-4540] amdgcn: Refactor device settings into a def file

2024-10-22 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:a6b26e5ea09779bf276dff52a6692f3bb655d230

commit r15-4540-ga6b26e5ea09779bf276dff52a6692f3bb655d230
Author: Andrew Stubbs 
Date:   Tue Sep 17 15:26:04 2024 +

amdgcn: Refactor device settings into a def file

Almost all device-specific settings are now centralised into gcn-devices.def
for the compiler, mkoffload, and libgomp.  No longer will we have to touch 
10
files in multiple places just to add another device without any exotic
features.  (New ISAs and devices with incompatible metadata will continue to
need a bit more.)

In order to remove the device-specific conditionals in the code a new value
HSACO_ATTR_UNSUPPORTED has been added, indicating that the assembler will
reject any setting of that option.

This incorporates some of Tobias's patch from March 2024.

Co-Authored-By: Tobias Burnus 

gcc/ChangeLog:

* config.gcc (amdgcn): Add gcn-device-macros.h to tm_file.
Add gcn-tables.opt to extra_options.
* config/gcn/gcn-hsa.h (NO_XNACK): Delete.
(NO_SRAM_ECC): Delete.
(SRAMOPT): Move definition to generated file gcn-device-macros.h.
(XNACKOPT): Likewise.
(ASM_SPEC): Redefine using generated values from 
gcn-device-macros.h.
* config/gcn/gcn-opts.h
(enum processor_type): Generate from gcn-devices.def.
(TARGET_VEGA10): Delete.
(TARGET_VEGA20): Delete.
(TARGET_GFX908): Delete.
(TARGET_GFX90a): Delete.
(TARGET_GFX90c): Delete.
(TARGET_GFX1030): Delete.
(TARGET_GFX1036): Delete.
(TARGET_GFX1100): Delete.
(TARGET_GFX1103): Delete.
(TARGET_XNACK): Redefine to allow for HSACO_ATTR_UNSUPPORTED.
(enum hsaco_attr_type): Add HSACO_ATTR_UNSUPPORTED.
(TARGET_TGSPLIT): New define.
* config/gcn/gcn.cc (gcn_devices): New constant table.
(gcn_option_override): Rework to use gcn_devices table.
(gcn_omp_device_kind_arch_isa): Likewise.
(output_file_start): Likewise.
(gcn_hsa_declare_function_name): Rework using TARGET_* macros.
* config/gcn/gcn.h (gcn_devices): Declare struct and table.
(TARGET_CPU_CPP_BUILTINS): Rework using gcn_devices.
* config/gcn/gcn.opt: Move enum data to generated file 
gcn-tables.opt.
Use new names for the default values.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX900): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX906): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX908): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX90a): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX90c): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX1030): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX1036): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX1100): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX1103): Delete.
(enum elf_arch_code): Define using gcn-devices.def.
(get_arch): Rework using gcn-devices.def.
(main): Rework using gcn-devices.def
* config/gcn/t-gcn-hsa (gcn-tables.opt): Generate file.
(gcn-device-macros.h): Generate file.
* config/gcn/t-omp-device: Generate isa list from gcn-devices.def.
* config/gcn/gcn-devices.def: New file.
* config/gcn/gcn-tables.opt: New file.
* config/gcn/gcn-tables.opt.urls: New file.
* config/gcn/gen-gcn-device-macros.awk: New file.
* config/gcn/gen-opt-tables.awk: New file.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (EF_AMDGPU_MACH): Generate from 
gcn-devices.def.
(gcn_gfx803_s): Delete.
(gcn_gfx900_s): Delete.
(gcn_gfx906_s): Delete.
(gcn_gfx908_s): Delete.
(gcn_gfx90a_s): Delete.
(gcn_gfx90c_s): Delete.
(gcn_gfx1030_s): Delete.
(gcn_gfx1036_s): Delete.
(gcn_gfx1100_s): Delete.
(gcn_gfx1103_s): Delete.
(gcn_isa_name_len): Delete.
(isa_hsa_name): Rename ...
(isa_name): ... to this, and rework using gcn-devices.def.
(isa_gcc_name): Delete.
(isa_code): Rework using gcn-devices.def.
(max_isa_vgprs): Rework using gcn-devices.def.
(isa_matches_agent): Update isa_name usage.
(GOMP_OFFLOAD_init_device): Improve diagnostic using the name.

Diff:
---
 gcc/config.gcc   |   3 +-
 gcc/config/gcn/gcn-devices.def   | 143 +++
 gcc/config/gcn/gcn-hsa.h |  23 ++---
 gcc/config/gcn/gcn-opts.h|  31 +++
 gcc/config/gcn/gcn-tables.opt|  52 +++
 gcc/config/gcn/gcn-tables.opt.urls   |   2 +
 gcc/config/gcn/gcn.cc| 132 ++-

[gcc r15-4989] openmp: Fix signed/unsigned warning

2024-11-06 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:345eb9b795d9728733bd0e472529e259ce796ff6

commit r15-4989-g345eb9b795d9728733bd0e472529e259ce796ff6
Author: Andrew Stubbs 
Date:   Wed Nov 6 17:50:00 2024 +

openmp: Fix signed/unsigned warning

My previous patch broke things when building with Werror.

gcc/ChangeLog:

* omp-general.cc (omp_max_vf): Cast the constant to poly_uint64.

Diff:
---
 gcc/omp-general.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index 1ae575ee181f..72fb7f92ff70 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -1005,7 +1005,7 @@ omp_max_vf (bool offload)
   for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c;)
{
  if (startswith (c, "amdgcn"))
-   return ordered_max (64, omp_max_vf (false));
+   return ordered_max (poly_uint64 (64), omp_max_vf (false));
  else if ((c = strchr (c, ':')))
c++;
}


[gcc r15-4987] openmp: Add IFN_GOMP_MAX_VF

2024-11-06 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:2a2e6e9894f42fef9315aaad80c36843718ca0cb

commit r15-4987-g2a2e6e9894f42fef9315aaad80c36843718ca0cb
Author: Andrew Stubbs 
Date:   Fri Nov 1 15:00:25 2024 +

openmp: Add IFN_GOMP_MAX_VF

Delay omp_max_vf call until after the host and device compilers have 
diverged
so that the max_vf value can be tuned exactly right on both variants.

This change means that the ompdevlow pass must be enabled for functions that
use OpenMP directives with both "simd" and "schedule" enabled.

gcc/ChangeLog:

* internal-fn.cc (expand_GOMP_MAX_VF): New function.
* internal-fn.def (GOMP_MAX_VF): New internal function.
* omp-expand.cc (omp_adjust_chunk_size): Emit IFN_GOMP_MAX_VF when
called in offload context, otherwise assume host context.
* omp-offload.cc (execute_omp_device_lower): Expand IFN_GOMP_MAX_VF.

Diff:
---
 gcc/internal-fn.cc  |  8 
 gcc/internal-fn.def |  1 +
 gcc/omp-expand.cc   | 30 ++
 gcc/omp-offload.cc  |  3 +++
 4 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 1b3fe7be0479..0ee5f5bc7c55 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -510,6 +510,14 @@ expand_GOMP_SIMT_VF (internal_fn, gcall *)
 
 /* This should get expanded in omp_device_lower pass.  */
 
+static void
+expand_GOMP_MAX_VF (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
+/* This should get expanded in omp_device_lower pass.  */
+
 static void
 expand_GOMP_TARGET_REV (internal_fn, gcall *)
 {
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 2d4559382711..c3d0efc0f2c3 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -465,6 +465,7 @@ DEF_INTERNAL_FN (GOMP_SIMT_ENTER_ALLOC, ECF_LEAF | 
ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_EXIT, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_VF, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (GOMP_MAX_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LAST_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, 
NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_ORDERED_PRED, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_VOTE_ANY, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index b0f9d375b6c7..80fb1843445d 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -229,15 +229,29 @@ omp_adjust_chunk_size (tree chunk_size, bool 
simd_schedule, bool offload)
   if (!simd_schedule || integer_zerop (chunk_size))
 return chunk_size;
 
-  poly_uint64 vf = omp_max_vf (offload);
-  if (known_eq (vf, 1U))
-return chunk_size;
-
+  tree vf;
   tree type = TREE_TYPE (chunk_size);
-  chunk_size = fold_build2 (PLUS_EXPR, type, chunk_size,
-   build_int_cst (type, vf - 1));
-  return fold_build2 (BIT_AND_EXPR, type, chunk_size,
- build_int_cst (type, -vf));
+
+  if (offload)
+{
+  cfun->curr_properties &= ~PROP_gimple_lomp_dev;
+  vf = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOMP_MAX_VF,
+unsigned_type_node, 0);
+  vf = fold_convert (type, vf);
+}
+  else
+{
+  poly_uint64 vf_num = omp_max_vf (false);
+  if (known_eq (vf_num, 1U))
+   return chunk_size;
+  vf = build_int_cst (type, vf_num);
+}
+
+  tree vf_minus_one = fold_build2 (MINUS_EXPR, type, vf,
+  build_int_cst (type, 1));
+  tree negative_vf = fold_build1 (NEGATE_EXPR, type, vf);
+  chunk_size = fold_build2 (PLUS_EXPR, type, chunk_size, vf_minus_one);
+  return fold_build2 (BIT_AND_EXPR, type, chunk_size, negative_vf);
 }
 
 /* Collect additional arguments needed to emit a combined
diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index 25ce8133fe5e..372b019f9d60 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -2754,6 +2754,9 @@ execute_omp_device_lower ()
  case IFN_GOMP_SIMT_VF:
rhs = build_int_cst (type, vf);
break;
+ case IFN_GOMP_MAX_VF:
+   rhs = build_int_cst (type, omp_max_vf (false));
+   break;
  case IFN_GOMP_SIMT_ORDERED_PRED:
rhs = vf == 1 ? integer_zero_node : NULL_TREE;
if (rhs || !lhs)


[gcc r15-4988] openmp: Add testcases for omp_max_vf

2024-11-06 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:d334f729e53867b838e867375b3f475ba793d96e

commit r15-4988-gd334f729e53867b838e867375b3f475ba793d96e
Author: Andrew Stubbs 
Date:   Wed Nov 6 12:26:08 2024 +

openmp: Add testcases for omp_max_vf

Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, 
when
offloading is enabled ("target" directives are present), and is inactive
otherwise.

libgomp/ChangeLog:

* testsuite/libgomp.c/max_vf-1.c: New test.
* testsuite/libgomp.c/max_vf-2.c: New test.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/max_vf-1.c: New test.

Diff:
---
 gcc/testsuite/gcc.dg/gomp/max_vf-1.c   | 37 ++
 libgomp/testsuite/libgomp.c/max_vf-1.c | 47 ++
 libgomp/testsuite/libgomp.c/max_vf-2.c | 21 +++
 3 files changed, 105 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/gomp/max_vf-1.c 
b/gcc/testsuite/gcc.dg/gomp/max_vf-1.c
new file mode 100644
index ..0513aae226ce
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/max_vf-1.c
@@ -0,0 +1,37 @@
+/* Test that omp parallel simd schedule uses the correct max_vf for the
+   host system, when no target directives are present.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fopenmp -O2 -fdump-tree-ompexp" } */
+
+/* Fix a max_vf size so we can scan for it.
+{ dg-additional-options "-msse2" { target { x86_64-*-* i?86-*-* } } } */
+
+#define N 1024
+int a[N], b[N], c[N];
+
+void
+f2 (void)
+{
+  int i;
+  #pragma omp parallel for simd schedule (simd: static, 7)
+  for (i = 0; i < N; i++)
+a[i] = b[i] + c[i];
+}
+
+/* Make sure the max_vf is inlined as a number.
+   Hopefully there are no unrelated uses of these numbers ...
+{ dg-final { scan-tree-dump-times {\* 16} 2 "ompexp" { target { x86_64-*-* } } 
} }
+{ dg-final { scan-tree-dump-times {\+ 16} 1 "ompexp" { target { x86_64-*-* } } 
} } */
+
+void
+f3 (int *a, int *b, int *c)
+{
+  int i;
+  #pragma omp parallel for simd schedule (simd : dynamic, 7)
+  for (i = 0; i < N; i++)
+a[i] = b[i] + c[i];
+}
+
+/* Make sure the max_vf is inlined as a number.
+{ dg-final { scan-tree-dump-times 
{__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \(.*, 16, 0\);} 1 "ompexp" { 
target { x86_64-*-* } } } } */
diff --git a/libgomp/testsuite/libgomp.c/max_vf-1.c 
b/libgomp/testsuite/libgomp.c/max_vf-1.c
new file mode 100644
index ..be900c565a37
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/max_vf-1.c
@@ -0,0 +1,47 @@
+/* Test that omp parallel simd schedule uses the correct max_vf for the
+   host system, when target directives are present.  */
+
+/* { dg-require-effective-target offloading_enabled } */
+
+/* { dg-do link } */
+/* { dg-options "-fopenmp -O2 -fdump-tree-ompexp 
-foffload=-fdump-tree-optimized" } */
+
+/* Fix a max_vf size so we can scan for it.
+{ dg-additional-options "-msse2" { target { x86_64-*-* i?86-*-* } } } */
+
+#define N 1024
+int a[N], b[N], c[N];
+
+/* Test both static schedules and inline target directives.  */
+void
+f2 (void)
+{
+  int i;
+  #pragma omp target parallel for simd schedule (simd: static, 7)
+  for (i = 0; i < N; i++)
+a[i] = b[i] + c[i];
+}
+
+/* Test both dynamic schedules and declare target functions.  */
+#pragma omp declare target
+void
+f3 (int *a, int *b, int *c)
+{
+  int i;
+  #pragma omp parallel for simd schedule (simd : dynamic, 7)
+  for (i = 0; i < N; i++)
+a[i] = b[i] + c[i];
+}
+#pragma omp end declare target
+
+/* Make sure that the max_vf is used as an IFN.
+{ dg-final { scan-tree-dump-times {GOMP_MAX_VF} 2 "ompexp" { target { 
x86_64-*-* i?86-*-* } } } } */
+
+/* Make sure the max_vf is passed as a temporary variable.
+{ dg-final { scan-tree-dump-times 
{__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \(.*, D\.[0-9]*, 0\);} 1 
"ompexp" { target { x86_64-*-* i?86-*-* } } } } */
+
+/* Test SIMD offload devices
+{ dg-final { scan-offload-tree-dump-times 
{__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \(.*, 64, 0\);} 1 
"optimized" { target { offload_gcn } } } } 
+{ dg-final { scan-offload-tree-dump-times 
{__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \(.*, 7, 0\);} 1 "optimized" 
{ target { offload_nvptx } } } } */
+
+int main() {}
diff --git a/libgomp/testsuite/libgomp.c/max_vf-2.c 
b/libgomp/testsuite/libgomp.c/max_vf-2.c
new file mode 100644
index ..91744c309df8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/max_vf-2.c
@@ -0,0 +1,21 @@
+/* Ensure that the default safelen is set correctly for the larger of the host
+   and offload device, to prevent defeating the vectorizer.  */
+ 
+/* { dg-require-effective-target offloading_enabled } */
+
+/* { dg-do link } */
+/* { dg-options "-fopenmp -O2 -fdump-tree-omplower" } */
+
+int f(float *a, float *b, int n)
+{
+  float sum = 0;
+  #pragma omp target teams distribute parallel for simd map(tofrom: sum) 
reduction(+:sum)
+  for (int i = 0; i < n; i++)
+sum += a[i] * b[i];
+  return sum;
+}
+
+/* Make sure that 

[gcc r15-4985] openmp: Tune omp_max_vf for offload targets

2024-11-06 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:5c9de3df8547682bfb3d484d7d28a27776bf979c

commit r15-4985-g5c9de3df8547682bfb3d484d7d28a27776bf979c
Author: Andrew Stubbs 
Date:   Mon Oct 21 12:29:54 2024 +

openmp: Tune omp_max_vf for offload targets

If requested, return the vectorization factor appropriate for the offload
device, if any.

This change gives a significant speedup in the BabelStream "dot" benchmark 
on
amdgcn.

The omp_adjust_chunk_size usecase is set "false", for now, but I intend to
change that in a follow-up patch.

Note that NVPTX SIMT offload does not use this code-path.

gcc/ChangeLog:

* gimple-loop-versioning.cc (loop_versioning::loop_versioning): Set
omp_max_vf to offload == false.
* omp-expand.cc (omp_adjust_chunk_size): Likewise.
* omp-general.cc (omp_max_vf): Add "offload" parameter, and detect
amdgcn offload devices.
* omp-general.h (omp_max_vf): Likewise.
* omp-low.cc (lower_rec_simd_input_clauses): Pass offload state to
omp_max_vf.

Diff:
---
 gcc/gimple-loop-versioning.cc |  2 +-
 gcc/omp-expand.cc |  2 +-
 gcc/omp-general.cc| 17 +++--
 gcc/omp-general.h |  2 +-
 gcc/omp-low.cc|  3 ++-
 5 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/gcc/gimple-loop-versioning.cc b/gcc/gimple-loop-versioning.cc
index 107b00200247..2968c929d04a 100644
--- a/gcc/gimple-loop-versioning.cc
+++ b/gcc/gimple-loop-versioning.cc
@@ -554,7 +554,7 @@ loop_versioning::loop_versioning (function *fn)
  handled efficiently by scalar code.  omp_max_vf calculates the
  maximum number of bytes in a vector, when such a value is relevant
  to loop optimization.  */
-  m_maximum_scale = estimated_poly_value (omp_max_vf ());
+  m_maximum_scale = estimated_poly_value (omp_max_vf (false));
   m_maximum_scale = MAX (m_maximum_scale, MAX_FIXED_MODE_SIZE);
 }
 
diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index b0b4ddf5dbc8..907fd46a5b26 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -212,7 +212,7 @@ omp_adjust_chunk_size (tree chunk_size, bool simd_schedule)
   if (!simd_schedule || integer_zerop (chunk_size))
 return chunk_size;
 
-  poly_uint64 vf = omp_max_vf ();
+  poly_uint64 vf = omp_max_vf (false);
   if (known_eq (vf, 1U))
 return chunk_size;
 
diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index f74b9bf5e96c..1ae575ee181f 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -987,10 +987,11 @@ find_combined_omp_for (tree *tp, int *walk_subtrees, void 
*data)
   return NULL_TREE;
 }
 
-/* Return maximum possible vectorization factor for the target.  */
+/* Return maximum possible vectorization factor for the target, or for
+   the OpenMP offload target if one exists.  */
 
 poly_uint64
-omp_max_vf (void)
+omp_max_vf (bool offload)
 {
   if (!optimize
   || optimize_debug
@@ -999,6 +1000,18 @@ omp_max_vf (void)
  && OPTION_SET_P (flag_tree_loop_vectorize)))
 return 1;
 
+  if (ENABLE_OFFLOADING && offload)
+{
+  for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c;)
+   {
+ if (startswith (c, "amdgcn"))
+   return ordered_max (64, omp_max_vf (false));
+ else if ((c = strchr (c, ':')))
+   c++;
+   }
+  /* Otherwise, fall through to host VF.  */
+}
+
   auto_vector_modes modes;
   targetm.vectorize.autovectorize_vector_modes (&modes, true);
   if (!modes.is_empty ())
diff --git a/gcc/omp-general.h b/gcc/omp-general.h
index f37781316269..70f78d2055b7 100644
--- a/gcc/omp-general.h
+++ b/gcc/omp-general.h
@@ -162,7 +162,7 @@ extern void omp_extract_for_data (gomp_for *for_stmt, 
struct omp_for_data *fd,
  struct omp_for_data_loop *loops);
 extern gimple *omp_build_barrier (tree lhs);
 extern tree find_combined_omp_for (tree *, int *, void *);
-extern poly_uint64 omp_max_vf (void);
+extern poly_uint64 omp_max_vf (bool);
 extern int omp_max_simt_vf (void);
 extern const char *omp_context_name_list_prop (tree);
 extern void omp_construct_traits_to_codes (tree, int, enum tree_code *);
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 44c4310075bf..70a2c108fbca 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -4589,7 +4589,8 @@ lower_rec_simd_input_clauses (tree new_var, omp_context 
*ctx,
 {
   if (known_eq (sctx->max_vf, 0U))
 {
-  sctx->max_vf = sctx->is_simt ? omp_max_simt_vf () : omp_max_vf ();
+  sctx->max_vf = (sctx->is_simt ? omp_max_simt_vf ()
+ : omp_max_vf (omp_maybe_offloaded_ctx (ctx)));
   if (maybe_gt (sctx->max_vf, 1U))
{
  tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),


[gcc r15-4986] openmp: use offload max_vf for chunk_size

2024-11-06 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:896c6c28939f0b1eb6582231d24ea07ce01d071e

commit r15-4986-g896c6c28939f0b1eb6582231d24ea07ce01d071e
Author: Andrew Stubbs 
Date:   Fri Nov 1 13:53:34 2024 +

openmp: use offload max_vf for chunk_size

The chunk size for SIMD loops should be right for the current device; too 
big
allocates too much memory, too small is inefficient.  Getting it wrong 
doesn't
actually break anything though.

This patch attempts to choose the optimal setting based on the context.  
Both
host-fallback and device will get the same chunk size, but device 
performance
is the most important in this case.

gcc/ChangeLog:

* omp-expand.cc (is_in_offload_region): New function.
(omp_adjust_chunk_size): Add pass-through "offload" parameter.
(get_ws_args_for): Likewise.
(determine_parallel_type): Use is_in_offload_region to adjust call 
to
get_ws_args_for.
(expand_omp_for_generic): Likewise.
(expand_omp_for_static_chunk): Likewise.

Diff:
---
 gcc/omp-expand.cc | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index 907fd46a5b26..b0f9d375b6c7 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -127,6 +127,23 @@ is_combined_parallel (struct omp_region *region)
   return region->is_combined_parallel;
 }
 
+/* Return true is REGION is or is contained within an offload region.  */
+
+static bool
+is_in_offload_region (struct omp_region *region)
+{
+  gimple *entry_stmt = last_nondebug_stmt (region->entry);
+  if (is_gimple_omp (entry_stmt)
+  && is_gimple_omp_offloaded (entry_stmt))
+return true;
+  else if (region->outer)
+return is_in_offload_region (region->outer);
+  else
+return (lookup_attribute ("omp declare target",
+ DECL_ATTRIBUTES (current_function_decl))
+   != NULL);
+}
+
 /* Given two blocks PAR_ENTRY_BB and WS_ENTRY_BB such that WS_ENTRY_BB
is the immediate dominator of PAR_ENTRY_BB, return true if there
are no data dependencies that would prevent expanding the parallel
@@ -207,12 +224,12 @@ workshare_safe_to_combine_p (basic_block ws_entry_bb)
presence (SIMD_SCHEDULE).  */
 
 static tree
-omp_adjust_chunk_size (tree chunk_size, bool simd_schedule)
+omp_adjust_chunk_size (tree chunk_size, bool simd_schedule, bool offload)
 {
   if (!simd_schedule || integer_zerop (chunk_size))
 return chunk_size;
 
-  poly_uint64 vf = omp_max_vf (false);
+  poly_uint64 vf = omp_max_vf (offload);
   if (known_eq (vf, 1U))
 return chunk_size;
 
@@ -228,7 +245,7 @@ omp_adjust_chunk_size (tree chunk_size, bool simd_schedule)
expanded.  */
 
 static vec *
-get_ws_args_for (gimple *par_stmt, gimple *ws_stmt)
+get_ws_args_for (gimple *par_stmt, gimple *ws_stmt, bool offload)
 {
   tree t;
   location_t loc = gimple_location (ws_stmt);
@@ -270,7 +287,7 @@ get_ws_args_for (gimple *par_stmt, gimple *ws_stmt)
   if (fd.chunk_size)
{
  t = fold_convert_loc (loc, long_integer_type_node, fd.chunk_size);
- t = omp_adjust_chunk_size (t, fd.simd_schedule);
+ t = omp_adjust_chunk_size (t, fd.simd_schedule, offload);
  ws_args->quick_push (t);
}
 
@@ -366,7 +383,8 @@ determine_parallel_type (struct omp_region *region)
 
   region->is_combined_parallel = true;
   region->inner->is_combined_parallel = true;
-  region->ws_args = get_ws_args_for (par_stmt, ws_stmt);
+  region->ws_args = get_ws_args_for (par_stmt, ws_stmt,
+is_in_offload_region (region));
 }
 }
 
@@ -3929,6 +3947,7 @@ expand_omp_for_generic (struct omp_region *region,
   tree *counts = NULL;
   int i;
   bool ordered_lastprivate = false;
+  bool offload = is_in_offload_region (region);
 
   gcc_assert (!broken_loop || !in_combined_parallel);
   gcc_assert (fd->iter_type == long_integer_type_node
@@ -4196,7 +4215,7 @@ expand_omp_for_generic (struct omp_region *region,
  if (fd->chunk_size)
{
  t = fold_convert (fd->iter_type, fd->chunk_size);
- t = omp_adjust_chunk_size (t, fd->simd_schedule);
+ t = omp_adjust_chunk_size (t, fd->simd_schedule, offload);
  if (sched_arg)
{
  if (fd->ordered)
@@ -4240,7 +4259,7 @@ expand_omp_for_generic (struct omp_region *region,
{
  tree bfn_decl = builtin_decl_explicit (start_fn);
  t = fold_convert (fd->iter_type, fd->chunk_size);
- t = omp_adjust_chunk_size (t, fd->simd_schedule);
+ t = omp_adjust_chunk_size (t, fd->simd_schedule, offload);
  if (sched_arg)
t = build_call_expr (bfn_decl, 10, t5, t0, t1, t2, sched_arg,
 t, t3, t4, reductions, mem);
@@ -5937,7 +5956,8 @@ expand_omp_for_static_chunk (struct om

[gcc r15-5010] openmp: Fix max_vf testcases with -march=cascadelake

2024-11-07 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:4e91d0587200cf801b42abd74a837e0b3ce635d5

commit r15-5010-g4e91d0587200cf801b42abd74a837e0b3ce635d5
Author: Andrew Stubbs 
Date:   Thu Nov 7 11:23:41 2024 +

openmp: Fix max_vf testcases with -march=cascadelake

Apparently we need to explicitly disable AVX, not just enabled SSE, to
guarentee the 16-lane vectors we need for the pattern match.

libgomp/ChangeLog:

* testsuite/libgomp.c/max_vf-1.c: Add -mno-avx.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/max_vf-1.c: Add -mno-avx.

Diff:
---
 gcc/testsuite/gcc.dg/gomp/max_vf-1.c   | 2 +-
 libgomp/testsuite/libgomp.c/max_vf-1.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/gomp/max_vf-1.c 
b/gcc/testsuite/gcc.dg/gomp/max_vf-1.c
index 0513aae226ce..d4617940eb29 100644
--- a/gcc/testsuite/gcc.dg/gomp/max_vf-1.c
+++ b/gcc/testsuite/gcc.dg/gomp/max_vf-1.c
@@ -5,7 +5,7 @@
 /* { dg-options "-fopenmp -O2 -fdump-tree-ompexp" } */
 
 /* Fix a max_vf size so we can scan for it.
-{ dg-additional-options "-msse2" { target { x86_64-*-* i?86-*-* } } } */
+{ dg-additional-options "-msse2 -mno-avx" { target { x86_64-*-* i?86-*-* } } } 
*/
 
 #define N 1024
 int a[N], b[N], c[N];
diff --git a/libgomp/testsuite/libgomp.c/max_vf-1.c 
b/libgomp/testsuite/libgomp.c/max_vf-1.c
index be900c565a37..9c8d5dc0af97 100644
--- a/libgomp/testsuite/libgomp.c/max_vf-1.c
+++ b/libgomp/testsuite/libgomp.c/max_vf-1.c
@@ -7,7 +7,7 @@
 /* { dg-options "-fopenmp -O2 -fdump-tree-ompexp 
-foffload=-fdump-tree-optimized" } */
 
 /* Fix a max_vf size so we can scan for it.
-{ dg-additional-options "-msse2" { target { x86_64-*-* i?86-*-* } } } */
+{ dg-additional-options "-msse2 -mno-avx" { target { x86_64-*-* i?86-*-* } } } 
*/
 
 #define N 1024
 int a[N], b[N], c[N];


[gcc r15-5459] amdgcn: Fix build failure (PR117657)

2024-11-19 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:234da38a0e68a204a59562fcca2aa6d297bc21ed

commit r15-5459-g234da38a0e68a204a59562fcca2aa6d297bc21ed
Author: Andrew Stubbs 
Date:   Tue Nov 19 12:01:22 2024 +

amdgcn: Fix build failure (PR117657)

The last patch did the right thing to the wrong parameter, which caused a 
build
failure in Newlib.  This patch fixes it.

gcc/ChangeLog:

PR target/117657
* config/gcn/gcn-valu.md (mask_gather_load): Fix bug in
maskload else patch.

Diff:
---
 gcc/config/gcn/gcn-valu.md | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index ce7a68f0e2d3..f7ed0b825a16 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -4038,16 +4038,17 @@
 if (GET_MODE (addr) == mode)
   emit_insn (gen_gather_insn_1offset_exec (operands[0], addr,
 const0_rtx, const0_rtx,
+const0_rtx,
 gcn_gen_undef
(mode),
-operands[0], exec));
+exec));
 else
   emit_insn (gen_gather_insn_2offsets_exec (operands[0], operands[1],
  addr, const0_rtx,
- const0_rtx,
+ const0_rtx, const0_rtx,
  gcn_gen_undef
(mode),
- operands[0], exec));
+ exec));
 DONE;
   })


[gcc/devel/omp/gcc-14] openmp: Fix error reporting in parsing of C++ OpenMP to/from clause

2024-12-06 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:9e3aeec74092e91b7f66d2cc5dc5885ef728d5b6

commit 9e3aeec74092e91b7f66d2cc5dc5885ef728d5b6
Author: Kwok Cheung Yeung 
Date:   Mon Oct 7 16:19:39 2024 +0100

openmp: Fix error reporting in parsing of C++ OpenMP to/from clause

The final 'else' when checking the motion modifiers is nested one level
too deep.

This patch should be folded into "OpenMP: Enable 'declare mapper' mappers 
for
'target update' directives" when pushing upstream.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_clause_from_to): Move an "else" clause to
a higher nesting level.

Diff:
---
 gcc/cp/ChangeLog.omp |  6 ++
 gcc/cp/parser.cc | 20 ++--
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/ChangeLog.omp b/gcc/cp/ChangeLog.omp
index f0b50bea7f9f..1136e686b18c 100644
--- a/gcc/cp/ChangeLog.omp
+++ b/gcc/cp/ChangeLog.omp
@@ -1,3 +1,9 @@
+2024-12-06  Andrew Stubbs  
+   Kwok Cheung Yeung  
+
+   * parser.cc (cp_parser_omp_clause_from_to): Move an "else" clause to
+   a higher nesting level.
+
 2024-05-15  Jakub Jelinek  
 
* semantics.cc (finish_omp_clauses): Diagnose grainsize
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 4157d912039c..f52446c5e46a 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -42058,16 +42058,16 @@ cp_parser_omp_clause_from_to (cp_parser *parser, enum 
omp_clause_code kind,
  mapper_modifier = true;
  pos += 3;
}
- else
-   {
- cp_parser_error (parser, "% or % clause with "
-  "modifier other than % or %");
- cp_parser_skip_to_closing_parenthesis (parser,
-/*recovering=*/true,
-/*or_comma=*/false,
-/*consume_paren=*/true);
- return list;
-   }
+   }
+  else
+   {
+ cp_parser_error (parser, "% or % clause with "
+  "modifier other than % or %");
+ cp_parser_skip_to_closing_parenthesis (parser,
+/*recovering=*/true,
+/*or_comma=*/false,
+/*consume_paren=*/true);
+ return list;
}
 }


[gcc r16-134] OpenMP, GCN: Add interop-hsa testcase

2025-04-25 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:8d84ea28510054fbbb8a2b7441916bd75e29163f

commit r16-134-g8d84ea28510054fbbb8a2b7441916bd75e29163f
Author: Andrew Stubbs 
Date:   Thu Apr 24 16:50:08 2025 +

OpenMP, GCN: Add interop-hsa testcase

This testcase ensures that the interop HSA support is sufficient to run
a kernel manually on the same device.

libgomp/ChangeLog:

* testsuite/libgomp.c/interop-hsa.c: New test.

Diff:
---
 libgomp/testsuite/libgomp.c/interop-hsa.c | 203 ++
 1 file changed, 203 insertions(+)

diff --git a/libgomp/testsuite/libgomp.c/interop-hsa.c 
b/libgomp/testsuite/libgomp.c/interop-hsa.c
new file mode 100644
index ..cf8bc90bb9c0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/interop-hsa.c
@@ -0,0 +1,203 @@
+/* { dg-additional-options "-ldl" } */
+/* { dg-require-effective-target offload_device_gcn } */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "../../../include/hsa.h"
+#include "../../config/gcn/libgomp-gcn.h"
+
+#define STACKSIZE (100 * 1024)
+#define HEAPSIZE (10 * 1024 * 1024)
+#define ARENASIZE HEAPSIZE
+
+/* This code fragment must be optimized or else the host-fallback kernel has
+ * invalid ASM inserts.  The rest of the file can be compiled safely at -O0.  
*/
+#pragma omp declare target
+uintptr_t __attribute__((optimize("O1")))
+get_kernel_ptr ()
+{
+  uintptr_t val;
+  if (!omp_is_initial_device ())
+/* "main._omp_fn.0" is the name GCC gives the first OpenMP target
+ * region in the "main" function.
+ * The ".kd" suffix is added by the LLVM assembler when it creates the
+ * kernel meta-data, and this is what we need to launch a kernel.  */
+asm ("s_getpc_b64 %0\n\t"
+"s_add_u32 %L0, %L0, main._omp_fn.0.kd@rel32@lo+4\n\t"
+"s_addc_u32 %H0, %H0, main._omp_fn.0.kd@rel32@hi+4"
+: "=Sg"(val));
+  return val;
+}
+#pragma omp end declare target
+
+int
+main(int argc, char** argv)
+{
+
+  /* Load the HSA runtime DLL.  */
+  void *hsalib = dlopen ("libhsa-runtime64.so.1", RTLD_LAZY);
+  assert (hsalib);
+
+  hsa_status_t (*hsa_signal_create) (hsa_signal_value_t initial_value,
+uint32_t num_consumers,
+const hsa_agent_t *consumers,
+hsa_signal_t *signal)
+= dlsym (hsalib, "hsa_signal_create");
+  assert (hsa_signal_create);
+
+  uint64_t (*hsa_queue_load_write_index_relaxed) (const hsa_queue_t *queue)
+= dlsym (hsalib, "hsa_queue_load_write_index_relaxed");
+  assert (hsa_queue_load_write_index_relaxed);
+
+  void (*hsa_signal_store_relaxed) (hsa_signal_t signal,
+   hsa_signal_value_t value)
+= dlsym (hsalib, "hsa_signal_store_relaxed");
+  assert (hsa_signal_store_relaxed);
+
+  hsa_signal_value_t (*hsa_signal_wait_relaxed) (hsa_signal_t signal,
+hsa_signal_condition_t 
condition,
+hsa_signal_value_t 
compare_value,
+uint64_t timeout_hint,
+hsa_wait_state_t 
wait_state_hint)
+= dlsym (hsalib, "hsa_signal_wait_relaxed");
+  assert (hsa_signal_wait_relaxed);
+
+  void (*hsa_queue_store_write_index_relaxed) (const hsa_queue_t *queue,
+  uint64_t value)
+= dlsym (hsalib, "hsa_queue_store_write_index_relaxed");
+  assert (hsa_queue_store_write_index_relaxed);
+
+  hsa_status_t (*hsa_signal_destroy) (hsa_signal_t signal)
+= dlsym (hsalib, "hsa_signal_destroy");
+  assert (hsa_signal_destroy);
+
+  /* Set up the device data environment.  */
+  int test_data_value = 0;
+#pragma omp target enter data map(test_data_value)
+
+  /* Get the interop details.  */
+  int device_num = omp_get_default_device();
+  hsa_agent_t *gpu_agent;
+  hsa_queue_t *hsa_queue = NULL;
+
+  omp_interop_t interop = omp_interop_none;
+#pragma omp interop init(target, targetsync, prefer_type("hsa"): interop) 
device(device_num)
+  assert (interop != omp_interop_none);
+
+  omp_interop_rc_t retcode;
+  omp_interop_fr_t fr = omp_get_interop_int (interop, omp_ipr_fr_id, &retcode);
+  assert (retcode == omp_irc_success);
+  assert (fr == omp_ifr_hsa);
+
+  gpu_agent = omp_get_interop_ptr(interop, omp_ipr_device, &retcode);
+  assert (retcode == omp_irc_success);
+
+  hsa_queue = omp_get_interop_ptr(interop, omp_ipr_targetsync, &retcode);
+  assert (retcode == omp_irc_success);
+  assert (hsa_queue);
+
+  /* Call an offload kernel via OpenMP/libgomp.
+   *
+   * This kernel serves two purposes:
+   *   1) Lookup the device-side load-address of itself (thus avoiding the
+   *   need to access the libgomp internals).
+   *   2) Count how many times it is called.
+   * We then call it once using OpenMP, and once manually, and check
+   * the counter reads "2".  */
+  uint64_t kernel_object = 

[gcc/devel/omp/gcc-14] OpenMP, GCN: Add interop-hsa testcase

2025-04-28 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:33e01148ab0ed5fba2a5ac380bbba2e90629d7fd

commit 33e01148ab0ed5fba2a5ac380bbba2e90629d7fd
Author: Andrew Stubbs 
Date:   Thu Apr 24 16:50:08 2025 +

OpenMP, GCN: Add interop-hsa testcase

This testcase ensures that the interop HSA support is sufficient to run
a kernel manually on the same device.

libgomp/ChangeLog:

* testsuite/libgomp.c/interop-hsa.c: New test.

(cherry picked from commit 8d84ea28510054fbbb8a2b7441916bd75e29163f)

Diff:
---
 libgomp/testsuite/libgomp.c/interop-hsa.c | 203 ++
 1 file changed, 203 insertions(+)

diff --git a/libgomp/testsuite/libgomp.c/interop-hsa.c 
b/libgomp/testsuite/libgomp.c/interop-hsa.c
new file mode 100644
index ..cf8bc90bb9c0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/interop-hsa.c
@@ -0,0 +1,203 @@
+/* { dg-additional-options "-ldl" } */
+/* { dg-require-effective-target offload_device_gcn } */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "../../../include/hsa.h"
+#include "../../config/gcn/libgomp-gcn.h"
+
+#define STACKSIZE (100 * 1024)
+#define HEAPSIZE (10 * 1024 * 1024)
+#define ARENASIZE HEAPSIZE
+
+/* This code fragment must be optimized or else the host-fallback kernel has
+ * invalid ASM inserts.  The rest of the file can be compiled safely at -O0.  
*/
+#pragma omp declare target
+uintptr_t __attribute__((optimize("O1")))
+get_kernel_ptr ()
+{
+  uintptr_t val;
+  if (!omp_is_initial_device ())
+/* "main._omp_fn.0" is the name GCC gives the first OpenMP target
+ * region in the "main" function.
+ * The ".kd" suffix is added by the LLVM assembler when it creates the
+ * kernel meta-data, and this is what we need to launch a kernel.  */
+asm ("s_getpc_b64 %0\n\t"
+"s_add_u32 %L0, %L0, main._omp_fn.0.kd@rel32@lo+4\n\t"
+"s_addc_u32 %H0, %H0, main._omp_fn.0.kd@rel32@hi+4"
+: "=Sg"(val));
+  return val;
+}
+#pragma omp end declare target
+
+int
+main(int argc, char** argv)
+{
+
+  /* Load the HSA runtime DLL.  */
+  void *hsalib = dlopen ("libhsa-runtime64.so.1", RTLD_LAZY);
+  assert (hsalib);
+
+  hsa_status_t (*hsa_signal_create) (hsa_signal_value_t initial_value,
+uint32_t num_consumers,
+const hsa_agent_t *consumers,
+hsa_signal_t *signal)
+= dlsym (hsalib, "hsa_signal_create");
+  assert (hsa_signal_create);
+
+  uint64_t (*hsa_queue_load_write_index_relaxed) (const hsa_queue_t *queue)
+= dlsym (hsalib, "hsa_queue_load_write_index_relaxed");
+  assert (hsa_queue_load_write_index_relaxed);
+
+  void (*hsa_signal_store_relaxed) (hsa_signal_t signal,
+   hsa_signal_value_t value)
+= dlsym (hsalib, "hsa_signal_store_relaxed");
+  assert (hsa_signal_store_relaxed);
+
+  hsa_signal_value_t (*hsa_signal_wait_relaxed) (hsa_signal_t signal,
+hsa_signal_condition_t 
condition,
+hsa_signal_value_t 
compare_value,
+uint64_t timeout_hint,
+hsa_wait_state_t 
wait_state_hint)
+= dlsym (hsalib, "hsa_signal_wait_relaxed");
+  assert (hsa_signal_wait_relaxed);
+
+  void (*hsa_queue_store_write_index_relaxed) (const hsa_queue_t *queue,
+  uint64_t value)
+= dlsym (hsalib, "hsa_queue_store_write_index_relaxed");
+  assert (hsa_queue_store_write_index_relaxed);
+
+  hsa_status_t (*hsa_signal_destroy) (hsa_signal_t signal)
+= dlsym (hsalib, "hsa_signal_destroy");
+  assert (hsa_signal_destroy);
+
+  /* Set up the device data environment.  */
+  int test_data_value = 0;
+#pragma omp target enter data map(test_data_value)
+
+  /* Get the interop details.  */
+  int device_num = omp_get_default_device();
+  hsa_agent_t *gpu_agent;
+  hsa_queue_t *hsa_queue = NULL;
+
+  omp_interop_t interop = omp_interop_none;
+#pragma omp interop init(target, targetsync, prefer_type("hsa"): interop) 
device(device_num)
+  assert (interop != omp_interop_none);
+
+  omp_interop_rc_t retcode;
+  omp_interop_fr_t fr = omp_get_interop_int (interop, omp_ipr_fr_id, &retcode);
+  assert (retcode == omp_irc_success);
+  assert (fr == omp_ifr_hsa);
+
+  gpu_agent = omp_get_interop_ptr(interop, omp_ipr_device, &retcode);
+  assert (retcode == omp_irc_success);
+
+  hsa_queue = omp_get_interop_ptr(interop, omp_ipr_targetsync, &retcode);
+  assert (retcode == omp_irc_success);
+  assert (hsa_queue);
+
+  /* Call an offload kernel via OpenMP/libgomp.
+   *
+   * This kernel serves two purposes:
+   *   1) Lookup the device-side load-address of itself (thus avoiding the
+   *   need to access the libgomp internals).
+   *   2) Count how many times it is called.
+   * We then call it once using OpenMP, and once manually, an

[gcc/devel/omp/gcc-14] libgomp.texi: Document supported OpenMP 'interop' types for nvptx and gcn

2025-04-28 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:0572eb1918b4de3a27a24cf0a21c9b71aea7c5f7

commit 0572eb1918b4de3a27a24cf0a21c9b71aea7c5f7
Author: Tobias Burnus 
Date:   Wed Mar 26 11:27:56 2025 +0100

libgomp.texi: Document supported OpenMP 'interop' types for nvptx and gcn

Note that this commit also updates the API interface to OpenMP 6.0;
while 5.1 and 5.2 use 'int *' for the the ret_code argument,
OpenMP 6.0 changed this to omp_interop_rc_t *; this enum also exists in
OpenMP 5.1. However, C++ does not like this change such that unless NULL
is passed (i.e. the argument is ignored), OpenMP 5.x and 6.x are not
compatible.

Note that GCC's omp.h already follows OpenMP 6.0 and is now in sync with
the documentation.

libgomp/ChangeLog:

* libgomp.texi (OpenMP 5.1): Add @ref to offload-target specifics
for 'interop'.
(OpenMP 6.0): Mark dispatch's interop clause as implemented.
(omp_get_interop_int, omp_get_interop_str,
omp_get_interop_ptr, omp_get_interop_type_desc): Add @ref to
Offload-Target Specifics; change ret_code argument type to
'omp_interop_rc_t *'.
(Offload-Target Specifics): Document the supported OpenMP
interop foreign runtimes on AMD and Nvidia GPUs.

(cherry picked from commit 2e7c1b589bc58be0e155098cf87d8535d41adeab)

Diff:
---
 libgomp/libgomp.texi | 170 ---
 1 file changed, 161 insertions(+), 9 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index e075cf1cfb98..42dabfc80562 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -313,7 +313,7 @@ The OpenMP 4.5 specification is fully supported.
   clauses @tab N @tab
 @item Indirect calls to the device version of a procedure or function in
   @code{target} regions @tab Y @tab
-@item @code{interop} directive @tab N @tab
+@item @code{interop} directive @tab Y @tab Cf. @ref{Offload-Target Specifics}
 @item @code{omp_interop_t} object support in runtime routines @tab Y @tab
 @item @code{nowait} clause in @code{taskwait} directive @tab Y @tab
 @item Extensions to the @code{atomic} directive @tab Y @tab
@@ -516,7 +516,7 @@ Technical Report (TR) 12 is the second preview for OpenMP 
6.0.
 @item Extension of @code{interop} operation of @code{append_args}, allowing all
   modifiers of the @code{init} clause
   @tab N @tab
-@item @code{interop} clause to @code{dispatch} @tab N @tab
+@item @code{interop} clause to @code{dispatch} @tab Y @tab
 @item @code{message} and @code{severity} clauses to @code{parallel} directive
   @tab N @tab
 @item @code{self} clause to @code{requires} directive @tab N @tab
@@ -2945,7 +2945,7 @@ the initial device is unspecified.
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
 @item @emph{Prototype}: @tab @code{omp_intptr_t omp_get_interop_int(const 
omp_interop_t interop,
-   omp_interop_property_t property_id, int 
*ret_code)}
+   omp_interop_property_t property_id, 
omp_interop_rc_t *ret_code)}
 @end multitable
 
 @item @emph{Fortran}:
@@ -2959,7 +2959,8 @@ the initial device is unspecified.
 @end multitable
 
 @item @emph{See also}:
-@ref{omp_get_interop_ptr}, @ref{omp_get_interop_str}, 
@ref{omp_get_interop_rc_desc}
+@ref{omp_get_interop_ptr}, @ref{omp_get_interop_str}, 
@ref{omp_get_interop_rc_desc},
+@ref{Offload-Target Specifics}
 
 @item @emph{Reference}:
 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.2,
@@ -2990,7 +2991,7 @@ the initial device is unspecified.
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
 @item @emph{Prototype}: @tab @code{void *omp_get_interop_ptr(const 
omp_interop_t interop,
-   omp_interop_property_t property_id, int 
*ret_code)}
+   omp_interop_property_t property_id, 
omp_interop_rc_t *ret_code)}
 @end multitable
 
 @item @emph{Fortran}:
@@ -3004,7 +3005,8 @@ the initial device is unspecified.
 @end multitable
 
 @item @emph{See also}:
-@ref{omp_get_interop_int}, @ref{omp_get_interop_str}, 
@ref{omp_get_interop_rc_desc}
+@ref{omp_get_interop_int}, @ref{omp_get_interop_str}, 
@ref{omp_get_interop_rc_desc},
+@ref{Offload-Target Specifics}
 
 @item @emph{Reference}:
 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.3,
@@ -3034,7 +3036,7 @@ the initial device is unspecified.
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
 @item @emph{Prototype}: @tab @code{const char *omp_get_interop_str(const 
omp_interop_t interop,
-   omp_interop_property_t property_id, int 
*ret_code)}
+   omp_interop_property_t property_id, 
omp_interop_rc_t *ret_code)}
 @end multitable
 
 @item @emph{Fortran}:
@@ -3048,7 +3050,8 @@ the initial device is unspecified.
 @end multitable
 
 @item @emph{See also}:
-@ref{omp

[gcc/devel/omp/gcc-14] libgomp.texi: For HIP interop, mention cpp defines to set

2025-04-28 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:fd91f571f6e986f84f09f35139ff0650caa669d6

commit fd91f571f6e986f84f09f35139ff0650caa669d6
Author: Tobias Burnus 
Date:   Thu Apr 17 10:21:05 2025 +0200

libgomp.texi: For HIP interop, mention cpp defines to set

The HIP header files recognize the used compiler, defaulting to either AMD
or Nvidia/CUDA; thus, the alternative way of explicitly defining a macro is
less prominently documented. With GCC, the user has to define the
preprocessor macro manually. Hence, as a service to the user, mention
__HIP_PLATFORM_AMD__ and __HIP_PLATFORM_NVIDIA__ in the interop 
documentation,
even though it has only indirectly to do with GCC and its interop support.

Note to commit-log readers, only: For Fortran, the hipfort modules can be
used; when compiling the hipfort package (defaults to use gfortran), it
generates the module (*.mod) files in include/hipfort/{amdgcn,nvidia}/ such
that the choice is made by setting the respective include path.

libgomp/ChangeLog:

* libgomp.texi (gcn interop, nvptx interop): For HIP with C/C++, add
a note about setting a preprocessor define.

(cherry picked from commit 4bff3f0b89af9a9aad69b8f85859c0a3667533ae)

Diff:
---
 libgomp/libgomp.texi | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 42dabfc80562..32d651498afa 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6828,6 +6828,9 @@ or string (str) data type, call 
@code{omp_get_interop_int},
 Note that @code{device_num} is the OpenMP device number
 while @code{device} is the HIP device number or HSA device handle.
 
+When using HIP with C and C++, the @code{__HIP_PLATFORM_AMD__} preprocessor
+macro must be defined before including the HIP header files.
+
 For the API routine call, add the prefix @code{omp_ipr_} to the property name;
 for instance:
 @smallexample
@@ -6990,6 +6993,9 @@ or string (str) data type, call 
@code{omp_get_interop_int},
 Note that @code{device_num} is the OpenMP device number while @code{device}
 is the CUDA, CUDA Driver, or HIP device number.
 
+When using HIP with C and C++, the @code{__HIP_PLATFORM_NVIDIA__} preprocessor
+macro must be defined before including the HIP header files.
+
 For the API routine call, add the prefix @code{omp_ipr_} to the property name;
 for instance:
 @smallexample


[gcc/devel/omp/gcc-14] OpenMP: Silence uninitialized variable warning in C++ front end.

2025-04-28 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:a9f1e49fa11de85cdc55ee22ea3c021157e07719

commit a9f1e49fa11de85cdc55ee22ea3c021157e07719
Author: Sandra Loosemore 
Date:   Sat Feb 22 16:54:50 2025 +

OpenMP: Silence uninitialized variable warning in C++ front end.

There's no actual problem with the code here, just a false-positive
warning emitted by some older GCC versions.

gcc/cp/ChangeLog
* parser.cc (cp_finish_omp_declare_variant): Initialize
append_args_last.

(cherry picked from commit c978965b445079abbb88c22ba74de1e26e9f5b81)

Diff:
---
 gcc/cp/parser.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 2faf82ef12d4..747209fc77f1 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -51270,7 +51270,7 @@ cp_finish_omp_declare_variant (cp_parser *parser, 
cp_token *pragma_tok,
   location_t varid_loc = make_location (caret_loc, start_loc, finish_loc);
 
   tree append_args_tree = NULL_TREE;
-  tree append_args_last;
+  tree append_args_last = NULL_TREE;
   bool has_match = false, has_adjust_args = false;
   location_t adjust_args_loc = UNKNOWN_LOCATION;
   location_t append_args_loc = UNKNOWN_LOCATION;