[gcc/devel/omp/gcc-13] gensupport: drop suppport for define_cond_exec from compact syntac
https://gcc.gnu.org/g:737b8f383563e5d1b10b85a7bc93ce359111be88 commit 737b8f383563e5d1b10b85a7bc93ce359111be88 Author: Tamar Christina Date: Tue Jun 20 23:31:40 2023 +0100 gensupport: drop suppport for define_cond_exec from compact syntac define_cond_exec does not support the special @@ syntax and so can't support {@. As such just remove support for it. gcc/ChangeLog: PR bootstrap/110324 * gensupport.cc (convert_syntax): Explicitly check for RTX code. Diff: --- gcc/gensupport.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc index 980b49cd4814..e39e6dacce25 100644 --- a/gcc/gensupport.cc +++ b/gcc/gensupport.cc @@ -878,7 +878,8 @@ convert_syntax (rtx x, file_location loc) const char *templ; vec_conlist tconvec, convec, attrvec; - templ_index = GET_CODE (x) == DEFINE_INSN ? 3 : 2; + templ_index = 3; + gcc_assert (GET_CODE (x) == DEFINE_INSN); templ = XTMPL (x, templ_index); @@ -1053,7 +1054,6 @@ process_rtx (rtx desc, file_location loc) break; case DEFINE_COND_EXEC: - convert_syntax (desc, loc); queue_pattern (desc, &define_cond_exec_tail, loc); break;
[gcc r15-3381] amdgcn: remove gfx803 "Fiji" support
https://gcc.gnu.org/g:57af0022073f11bc300709b3717069f6d616c6ac commit r15-3381-g57af0022073f11bc300709b3717069f6d616c6ac Author: Andrew Stubbs Date: Mon Aug 5 15:14:17 2024 + amdgcn: remove gfx803 "Fiji" support The gfx803 "Fiji" device was deprecated in GCC 14, removed from LLVM 18, and hasn't worked properly with the drivers since about ROCm 4. This patch removes the device from GCC options and documentation, and removes the direct mentions from the internals. The TARGET_GCN3 support in the back-end is now unused and can be removed (in a follow-up patch). gcc/ChangeLog: * config.gcc (amdgcn-*-*): Remove "fiji" from with_arch checks. * config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): Remove fiji alternative. (NO_XNACK): Likewise. (NO_SRAM_ECC): Likewise. (ASM_SPEC): Remove "%{}" around ABI_VERSION_SPEC. * config/gcn/gcn-opts.h (enum processor_type): Remove PROCESSOR_FIJI. (TARGET_FIJI): Delete. * config/gcn/gcn.cc (gcn_option_override): Remove Fiji. (gcn_omp_device_kind_arch_isa): Likewise. (output_file_start): Likewise. * config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Likewise. * config/gcn/gcn.opt (gpu_type): Likewise. (march, mtune): Change default to PROCESSOR_VEGA10. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX803): Delete. (copy_early_debug_info): Remove elf_flags_actual. Use ELFABIVERSION_AMDGPU_HSA_V4 unconditionally. (get_arch): Remove Fiji. (main): Remove gfx803. * config/gcn/t-omp-device (omp-device-properties-gcn): Remove fiji and gfx803. * doc/install.texi (amdgcn*-*-*): Remove fiji and special instructions. * doc/invoke.texi: Remove fiji. libgomp/ChangeLog: * libgomp.texi: Remove fiji and gfx803. * testsuite/libgomp.c/declare-variant-4.h: Remove fiji and gfx803. * testsuite/libgomp.c/declare-variant-4-fiji.c: Removed. * testsuite/libgomp.c/declare-variant-4-gfx803.c: Removed. Diff: --- gcc/config.gcc| 2 +- gcc/config/gcn/gcn-hsa.h | 13 + gcc/config/gcn/gcn-opts.h | 2 -- gcc/config/gcn/gcn.cc | 19 --- gcc/config/gcn/gcn.h | 7 +-- gcc/config/gcn/gcn.opt| 7 ++- gcc/config/gcn/mkoffload.cc | 17 +++-- gcc/config/gcn/t-omp-device | 2 +- gcc/doc/install.texi | 8 +--- gcc/doc/invoke.texi | 5 - libgomp/libgomp.texi | 3 +-- libgomp/testsuite/libgomp.c/declare-variant-4-fiji.c | 11 --- .../testsuite/libgomp.c/declare-variant-4-gfx803.c| 10 -- libgomp/testsuite/libgomp.c/declare-variant-4.h | 12 14 files changed, 19 insertions(+), 99 deletions(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index 08291f4b6e07..f09ce9f63a01 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -4618,7 +4618,7 @@ case "${target}" in for which in arch tune; do eval "val=\$with_$which" case ${val} in - "" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx90c | gfx1030 | gfx1036 | gfx1100 | gfx1103) + "" | gfx900 | gfx906 | gfx908 | gfx90a | gfx90c | gfx1030 | gfx1036 | gfx1100 | gfx1103) # OK ;; *) diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h index 032205550755..7a1bfad49cad 100644 --- a/gcc/config/gcn/gcn-hsa.h +++ b/gcc/config/gcn/gcn-hsa.h @@ -79,21 +79,18 @@ extern unsigned int gcn_local_sym_hash (const char *name); default; however, when debugging symbols are turned on, mkoffload.cc writes a new AMD GPU object file and the ABI version needs to be the same. - LLVM <= 17 defaults to 4 while LLVM >= 18 defaults to 5. - GCC supports LLVM >= 13.0.1 and only LLVM >= 14 supports version 5. - Note that Fiji is only supported with LLVM <= 17 as version 3 is no longer - supported in LLVM >= 18. */ -#define ABI_VERSION_SPEC "march=fiji:--amdhsa-code-object-version=3;" \ -"!march=*|march=*:--amdhsa-code-object-version=4" + GCC supports LLVM >= 13.0.1 and only LLVM >= 14 supports version 5. */ +#define ABI_VERSION_SPEC "--amdhsa-code-object-version=4" /* Note that the XNACK and SRAM-ECC settings must match those in mkoffload.cc as the latter creates new ELF object file when debugging is enabled and the ELF flags (e_flags)
[gcc r15-3382] amdgcn: Remove TARGET_GCN3
https://gcc.gnu.org/g:023641d97c5139bfcf8d468442a4e9782e90a467 commit r15-3382-g023641d97c5139bfcf8d468442a4e9782e90a467 Author: Andrew Stubbs Date: Tue Aug 6 15:37:36 2024 + amdgcn: Remove TARGET_GCN3 The only GCN3 ISA device was remove (Fiji, gfx803) so all the GCN3-specific code and features can be removed from the back-end. gcc/ChangeLog: * config/gcn/gcn-opts.h (enum gcn_isa): Delete ISA_GCN3. (TARGET_GCN3): Delete. (TARGET_GCN3_PLUS): Delete. (TARGET_M0_LDS_LIMIT): Delete. * config/gcn/gcn-valu.md (gather_insn_1offset): Remove TARGET_GCN3 from conditions. (*_dpp_shr_): Likewise. * config/gcn/gcn.cc (enum gcn_isa): Change default to ISA_GCN5. (gcn_expand_prologue): Remove TARGET_M0_LDS_LIMIT feature. (gcn_expand_reduc_scalar): Remove TARGET_GCN3 conditions. * config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Remove TARGET_GCN3. Diff: --- gcc/config/gcn/gcn-opts.h | 6 -- gcc/config/gcn/gcn-valu.md | 12 gcc/config/gcn/gcn.cc | 16 ++-- gcc/config/gcn/gcn.h | 4 +--- 4 files changed, 7 insertions(+), 31 deletions(-) diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h index a896a80cd0a0..6f5969d7bc8d 100644 --- a/gcc/config/gcn/gcn-opts.h +++ b/gcc/config/gcn/gcn-opts.h @@ -44,7 +44,6 @@ enum processor_type /* Set in gcn_option_override. */ extern enum gcn_isa { ISA_UNKNOWN, - ISA_GCN3, ISA_GCN5, ISA_RDNA2, ISA_RDNA3, @@ -52,8 +51,6 @@ extern enum gcn_isa { ISA_CDNA2 } gcn_isa; -#define TARGET_GCN3 (gcn_isa == ISA_GCN3) -#define TARGET_GCN3_PLUS (gcn_isa >= ISA_GCN3) #define TARGET_GCN5 (gcn_isa == ISA_GCN5) #define TARGET_GCN5_PLUS (gcn_isa >= ISA_GCN5) #define TARGET_CDNA1 (gcn_isa == ISA_CDNA1) @@ -65,7 +62,6 @@ extern enum gcn_isa { #define TARGET_RDNA3 (gcn_isa == ISA_RDNA3) -#define TARGET_M0_LDS_LIMIT (TARGET_GCN3) #define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS || TARGET_RDNA3) #define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF) @@ -92,8 +88,6 @@ enum hsaco_attr_type #define TARGET_11BIT_GLOBAL_OFFSET TARGET_RDNA2_PLUS /* The work item details are all encoded into v0. */ //#define TARGET_PACKED_WORK_ITEMS TARGET_PACKED_WORK_ITEMS -/* m0 must be initialized in order to use LDS. */ -//#define TARGET_M0_LDS_LIMIT TARGET_M0_LDS_LIMIT /* CDNA2 load/store costs are reduced. * TODO: what does this mean? */ #define TARGET_CDNA2_MEM_COSTS TARGET_CDNA2_PLUS diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index b24cf9be32ef..54f4b14d4f21 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -1156,10 +1156,9 @@ (mem:BLK (scratch))] UNSPEC_GATHER))] "(AS_FLAT_P (INTVAL (operands[3])) -&& ((TARGET_GCN3 && INTVAL(operands[2]) == 0) - || ((unsigned HOST_WIDE_INT)INTVAL(operands[2]) < 0x1000))) -|| (AS_GLOBAL_P (INTVAL (operands[3])) - && (((unsigned HOST_WIDE_INT)INTVAL(operands[2]) + 0x1000) < 0x2000))" +&& ((unsigned HOST_WIDE_INT)INTVAL(operands[2]) < 0x1000)) + || (AS_GLOBAL_P (INTVAL (operands[3])) + && (((unsigned HOST_WIDE_INT)INTVAL(operands[2]) + 0x1000) < 0x2000))" { addr_space_t as = INTVAL (operands[3]); const char *glc = INTVAL (operands[4]) ? " glc" : ""; @@ -4297,10 +4296,7 @@ (match_operand:V_1REG 2 "register_operand" "v") (match_operand:SI 3 "const_int_operand""n")] REDUC_UNSPEC))] - ; GCN3 requires a carry out, GCN5 not - "!(TARGET_GCN3 && SCALAR_INT_MODE_P (mode) - && == UNSPEC_PLUS_DPP_SHR) - && TARGET_DPP_FULL" + "TARGET_DPP_FULL" { return gcn_expand_dpp_shr_insn (mode, "", , INTVAL (operands[3])); diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 89aab6fe8e43..fd2b86085749 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -68,7 +68,7 @@ static bool ext_gcn_constants_init = 0; /* Holds the ISA variant, derived from the command line parameters. */ -enum gcn_isa gcn_isa = ISA_GCN3; /* Default to GCN3. */ +enum gcn_isa gcn_isa = ISA_GCN5; /* Default to GCN5. */ /* Reserve this much space for LDS (for propagating variables from worker-single mode to worker-partitioned mode), per workgroup. Global @@ -3556,17 +3556,6 @@ gcn_expand_prologue () /* Ensure that the scheduler doesn't do anything unexpected. */ emit_insn (gen_blockage ()); - if (TARGET_M0_LDS_LIMIT) - { -/* m0 is initialized for the usual LDS DS and FLAT memory case. - The low-part is the address of the topmost addressable byte, which is - size-1. The high-part is an offset and should be zero. */ -emit_move_insn (gen_rtx_REG (SImode, M0_REG), - gen_int_mode (LDS_SIZE, SImode)); - -emit_insn (gen_prologue_use (gen_rtx_REG (SImode, M0_REG))); - } - if (cfun &&
[gcc r15-3383] amdgcn: Remove TARGET_GCN5_PLUS
https://gcc.gnu.org/g:b9bf0c3f54d4e36ca40598600d6e87107204c4c6 commit r15-3383-gb9bf0c3f54d4e36ca40598600d6e87107204c4c6 Author: Andrew Stubbs Date: Tue Aug 6 16:00:21 2024 + amdgcn: Remove TARGET_GCN5_PLUS Now that GCN3 support is gone, TARGET_GCN5_PLUS always evaluates to true, so we can make that code unconditional, and remove all the "else" cases. The ISA features TARGET_GLOBAL_ADDRSPACE, TARGET_FLAT_OFFSETS, TARGET_EXPLICIT_CARRY, and TARGET_MULTIPLY_IMMEDIATE, are similarly also redundant and can be made unconditional. The naming of the "gcc_version" attribute has been confusing since the "rdna" attribute was added and this makes it worse, so it has been renamed to "cdna". The add-with-carry assembler mnemonics no longer have two forms, so '%^' can be removed. gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_GCN5_PLUS): Delete. (TARGET_GLOBAL_ADDRSPACE): Delete. (TARGET_FLAT_OFFSETS): Delete. (TARGET_EXPLICIT_CARRY): Delete. (TARGET_MULTIPLY_IMMEDIATE): Delete. * config/gcn/gcn-valu.md (*mov): Rename "gcn_version" to "cdna". (*mov_4reg): Likewise. (@mov_sgprbase): Likwise. (gather_insn_1offset): Likewise. (gather_insn_1offset_ds): Likewise. (gather_insn_2offsets): Likewise. (scatter_insn_1offset): Likewise. (scatter_insn_1offset_ds): Likewise. (scatter_insn_2offsets): Likewise. (gather_insn_1offset): Remove TARGET_FLAT_OFFSETS conditionals. (scatter_insn_1offset): Likewise. (scatter_insn_1offset): Likewise. (add3): Use "_co" instead of "%^". (add3_dup): Likewise. (add3_vcc): Likewise. (add3_vcc_dup): Likewise. (addc3): Likewise. (sub3): Likewise. (sub3_vcc): Likewise. (subc3): Likewise. (*plus_carry_dpp_shr_): Likewise. (*plus_carry_in_dpp_shr_): Likewise. * config/gcn/gcn.cc (gcn_flat_address_p): Remove TARGET_FLAT_OFFSETS conditionals. (gcn_addr_space_legitimate_address_p): Likewise. (gcn_addr_space_legitimize_address): Likewise. (gcn_expand_scalar_to_vector_address): Likewise. (print_operand_address): Likewise, and TARGET_GLOBAL_ADDRSPACE also. (print_operand): Remove "%^" operand code. Remove TARGET_GLOBAL_ADDRSPACE assertion. * config/gcn/gcn.h (STACK_ADDR_SPACE): Remove GCN5 conditional. * config/gcn/gcn.md (gcn_version): Rename attribute ... (cdna): ... to this, and remove the gcn3 and gcn5 values. (enabled): Replace old "gcn_version" logic with new "cdna" logic. (*mov_insn): Rename "gcn_version" to "cdna". (*movti_insn): Likewise. (addsi3): Use "_co" instead of "%^". (addsi3_scalar_carry): Likewise. (addsi3_scalar_carry_cst): Likewise. (addcsi3_scalar): Likewise. (addcsi3_scalar_zero): Likewise. (addptrdi3): Likewise. (subsi3): Likewise. (mulsi3_highpart): Remove TARGET_MULTIPLY_IMMEDIATE conditions. (mulsi3_highpart_reg): Remove "gcn_version" attribute. (muldi3): Likewise. (atomic_fetch_): Likewise. (atomic_): Likewise. (sync_compare_and_swap_insn): Likewise. (atomic_load): Likewise. (atomic_store): Likewise. (atomic_exchange): Likewise. (mulsi3_highpart_imm): Remove both TARGET_MULTIPLY_IMMEDIATE and "gcn_version". (mulsidi3): Likewise. (mulsidi3_imm): Likewise. Diff: --- gcc/config/gcn/gcn-opts.h | 9 -- gcc/config/gcn/gcn-valu.md | 72 +++- gcc/config/gcn/gcn.cc | 36 +++--- gcc/config/gcn/gcn.h | 3 +- gcc/config/gcn/gcn.md | 74 +++--- 5 files changed, 59 insertions(+), 135 deletions(-) diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h index 6f5969d7bc8d..76f50ab9364f 100644 --- a/gcc/config/gcn/gcn-opts.h +++ b/gcc/config/gcn/gcn-opts.h @@ -52,7 +52,6 @@ extern enum gcn_isa { } gcn_isa; #define TARGET_GCN5 (gcn_isa == ISA_GCN5) -#define TARGET_GCN5_PLUS (gcn_isa >= ISA_GCN5) #define TARGET_CDNA1 (gcn_isa == ISA_CDNA1) #define TARGET_CDNA1_PLUS (gcn_isa >= ISA_CDNA1) #define TARGET_CDNA2 (gcn_isa == ISA_CDNA2) @@ -74,16 +73,12 @@ enum hsaco_attr_type HSACO_ATTR_DEFAULT }; -/* There are global address instructions. */ -#define TARGET_GLOBAL_ADDRSPACE TARGET_GCN5_PLUS /* Device has an AVGPR register file. */ #define TARGET_AVGPRS TARGET_CDNA1_PLUS /* There are load/store instructions for AVGPRS. */ #define TARGET_AVGPR_
[gcc r15-1705] amdgcn: Fix RDNA V32 permutations [PR115640]
https://gcc.gnu.org/g:ef0b30212f7756db15d7507bfd871bf377d7d648 commit r15-1705-gef0b30212f7756db15d7507bfd871bf377d7d648 Author: Andrew Stubbs Date: Fri Jun 28 10:47:50 2024 + amdgcn: Fix RDNA V32 permutations [PR115640] There was an off-by-one error in the RDNA validation check, plus I forgot to allow for two-to-one permute-and-merge operations. PR target/115640 gcc/ChangeLog: * config/gcn/gcn.cc (gcn_vectorize_vec_perm_const): Modify RDNA checks. Diff: --- gcc/config/gcn/gcn.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index d6531f55190..aab9b59c519 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -5134,7 +5134,7 @@ gcn_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode, Reject permutations that cross the boundary. */ if (TARGET_RDNA2_PLUS) for (unsigned int i = 0; i < nelt; i++) - if (i < 31 ? perm[i] > 31 : perm[i] < 32) + if (i < 32 ? (perm[i] % nelt) > 31 : (perm[i] % nelt) < 32) return false; /* All vector permutations are possible on other architectures,
[gcc r15-1747] libgomp: change alloc-pinned tests failure mode
https://gcc.gnu.org/g:90efaebf95c93244f6b1eda5cb8724e52047cecd commit r15-1747-g90efaebf95c93244f6b1eda5cb8724e52047cecd Author: Andrew Stubbs Date: Wed Jun 12 08:43:53 2024 + libgomp: change alloc-pinned tests failure mode The feature doesn't work on non-Linux hosts, at present, so skip the tests entirely. On Linux systems that have insufficient lockable memory configured we still need to fail or else the feature won't be getting tested when we think it is, but now there's a message to explain why. libgomp/ChangeLog: * testsuite/libgomp.c/alloc-pinned-1.c: Change dg-xfail-run-if to dg-skip-if. Correct spelling mistake. Abort on insufficient lockable memory. Use #error on non-linux hosts. * testsuite/libgomp.c/alloc-pinned-2.c: Likewise. Diff: --- libgomp/testsuite/libgomp.c/alloc-pinned-1.c | 20 ++-- libgomp/testsuite/libgomp.c/alloc-pinned-2.c | 20 ++-- 2 files changed, 12 insertions(+), 28 deletions(-) diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c index 4185accf2e6..672f2453a78 100644 --- a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c @@ -1,6 +1,6 @@ /* { dg-do run } */ -/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */ +/* { dg-skip-if "Pinning not implemented on this host" { ! *-*-linux-gnu* } } */ /* Test that pinned memory works. */ @@ -19,7 +19,10 @@ struct rlimit limit; \ if (getrlimit (RLIMIT_MEMLOCK, &limit) \ || limit.rlim_cur <= SIZE) \ -fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \ +{ \ + fprintf (stderr, "insufficient lockable memory; please increase ulimit\n"); \ + abort (); \ +} \ } int @@ -44,18 +47,7 @@ get_pinned_mem () abort (); } #else -#define PAGE_SIZE 1024 /* unknown */ -#define CHECK_SIZE(SIZE) { \ - fprintf (stderr, "OS unsupported\n"); \ - abort (); \ - } -#define EXPECT_OMP_NULL_ALLOCATOR - -int -get_pinned_mem () -{ - return 0; -} +#error "OS unsupported" #endif static void diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c index 0b9c11d0315..b6d1d83fb6f 100644 --- a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c @@ -1,6 +1,6 @@ /* { dg-do run } */ -/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */ +/* { dg-skip-if "Pinning not implemented on this host" { ! *-*-linux-gnu* } } */ /* Test that pinned memory works (pool_size code path). */ @@ -19,7 +19,10 @@ struct rlimit limit; \ if (getrlimit (RLIMIT_MEMLOCK, &limit) \ || limit.rlim_cur <= SIZE) \ -fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \ +{ \ + fprintf (stderr, "insufficient lockable memory; please increase ulimit\n"); \ + abort (); \ +} \ } int @@ -44,18 +47,7 @@ get_pinned_mem () abort (); } #else -#define PAGE_SIZE 1024 /* unknown */ -#define CHECK_SIZE(SIZE) { \ - fprintf (stderr, "OS unsupported\n"); \ - abort (); \ - } -#define EXPECT_OMP_NULL_ALLOCATOR - -int -get_pinned_mem () -{ - return 0; -} +#error "OS unsupported" #endif static void
[gcc r15-1748] libgomp, openmp: Add ompx_gnu_pinned_mem_alloc
https://gcc.gnu.org/g:64001441ec99b80e457188ce50bb6c59c757d3c6 commit r15-1748-g64001441ec99b80e457188ce50bb6c59c757d3c6 Author: Andrew Stubbs Date: Wed Jun 12 11:09:33 2024 + libgomp, openmp: Add ompx_gnu_pinned_mem_alloc This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. This is not in the OpenMP standard so it uses the "ompx" namespace and an independent enum baseline of 200 (selected to not clash with other known implementations). The allocator is equivalent to using a custom allocator with the pinned trait and the null fallback trait. One motivation for having this feature is for use by the (planned) -foffload-memory=pinned feature. gcc/fortran/ChangeLog: * openmp.cc (is_predefined_allocator): Update valid ranges to incorporate ompx_gnu_pinned_mem_alloc. libgomp/ChangeLog: * allocator.c (ompx_gnu_min_predefined_alloc): New. (ompx_gnu_max_predefined_alloc): New. (predefined_alloc_mapping): Rename to ... (predefined_omp_alloc_mapping): ... this. (predefined_ompx_gnu_alloc_mapping): New. (_Static_assert): Adjust for the new name, and add a new assert for the new table. (predefined_allocator_p): New. (predefined_alloc_mapping): New. (omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc. Use predefined_allocator_p and predefined_alloc_mapping. (omp_free): Likewise. (omp_alligned_calloc): Likewise. (omp_realloc): Likewise. * env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc. * libgomp.texi: Document ompx_gnu_pinned_mem_alloc. * omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc. * omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc. * omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc. * testsuite/libgomp.c/alloc-pinned-5.c: New test. * testsuite/libgomp.c/alloc-pinned-6.c: New test. * testsuite/libgomp.fortran/alloc-pinned-1.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-pinned-1.f90: New test. Co-Authored-By: Thomas Schwinge Diff: --- gcc/fortran/openmp.cc | 11 +- .../gfortran.dg/gomp/allocate-pinned-1.f90 | 16 +++ libgomp/allocator.c| 115 +++-- libgomp/env.c | 1 + libgomp/libgomp.texi | 7 +- libgomp/omp.h.in | 1 + libgomp/omp_lib.f90.in | 2 + libgomp/omp_lib.h.in | 2 + libgomp/testsuite/libgomp.c/alloc-pinned-5.c | 100 ++ libgomp/testsuite/libgomp.c/alloc-pinned-6.c | 102 ++ .../testsuite/libgomp.fortran/alloc-pinned-1.f90 | 16 +++ 11 files changed, 336 insertions(+), 37 deletions(-) diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 9b30a108560..333f0c7fe7f 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -7423,8 +7423,9 @@ resolve_omp_udr_clause (gfc_omp_namelist *n, gfc_namespace *ns, } /* Assume that a constant expression in the range 1 (omp_default_mem_alloc) - to 8 (omp_thread_mem_alloc) range is fine. The original symbol name is - already lost during matching via gfc_match_expr. */ + to 8 (omp_thread_mem_alloc) range, or 200 (ompx_gnu_pinned_mem_alloc) is + fine. The original symbol name is already lost during matching via + gfc_match_expr. */ static bool is_predefined_allocator (gfc_expr *expr) { @@ -7433,8 +7434,10 @@ is_predefined_allocator (gfc_expr *expr) && expr->ts.type == BT_INTEGER && expr->ts.kind == gfc_c_intptr_kind && expr->expr_type == EXPR_CONSTANT - && mpz_sgn (expr->value.integer) > 0 - && mpz_cmp_si (expr->value.integer, 8) <= 0); + && ((mpz_sgn (expr->value.integer) > 0 + && mpz_cmp_si (expr->value.integer, 8) <= 0) + || (mpz_cmp_si (expr->value.integer, 200) >= 0 + && mpz_cmp_si (expr->value.integer, 200) <= 0))); } /* Resolve declarative ALLOCATE statement. Note: Common block vars only appear diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90 new file mode 100644 index 000..0e6619b7853 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90 @@ -0,0 +1,16 @@ +! Test that the ompx_gnu_pinned_mem_alloc is accepted by the parser + +module m +use iso_c_binding +integer, parameter :: omp_allocator_handle_kind = c_intptr_t +integer (kind=omp_allocator_handle_kind), & + parameter :: ompx_gnu_pinned_mem_alloc = 200 +end + +subroutine f () +
[gcc r15-1769] amdgcn: invent target feature flags
https://gcc.gnu.org/g:68e034920bab9abd547503967f73b81cc37cfbf4 commit r15-1769-g68e034920bab9abd547503967f73b81cc37cfbf4 Author: Andrew Stubbs Date: Fri Jun 28 15:13:59 2024 + amdgcn: invent target feature flags This is a first step towards having a device table so we can add new devices more easily. It'll also make it easier to remove the deprecated GCN3 bits. The patch should not change the behaviour of anything. gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_GLOBAL_ADDRSPACE): New. (TARGET_AVGPRS): New. (TARGET_AVGPR_MEMOPS): New. (TARGET_AVGPR_COMBINED): New. (TARGET_FLAT_OFFSETS): New. (TARGET_11BIT_GLOBAL_OFFSET): New. (TARGET_CDNA2_MEM_COSTS): New. (TARGET_WAVE64_COMPAT): New. (TARGET_DPP_FULL): New. (TARGET_DPP16): New. (TARGET_DPP8): New. (TARGET_AVGPR_CDNA1_NOPS): New. (TARGET_VGPR_GRANULARITY): New. (TARGET_ARCHITECTED_FLAT_SCRATCH): New. (TARGET_EXPLICIT_CARRY): New. (TARGET_MULTIPLY_IMMEDIATE): New. (TARGET_SDWA): New. (TARGET_WBINVL1_CACHE): New. (TARGET_GLn_CACHE): New. * config/gcn/gcn-valu.md (throughout): Change TARGET_GCN*, TARGET_CDNA* and TARGET_RDNA* to use TARGET_ instead. * config/gcn/gcn.cc (throughout): Likewise. * config/gcn/gcn.md (throughout): Likewise. Diff: --- gcc/config/gcn/gcn-opts.h | 44 ++ gcc/config/gcn/gcn-valu.md | 28 +++--- gcc/config/gcn/gcn.cc | 76 +++-- gcc/config/gcn/gcn.md | 94 -- 4 files changed, 155 insertions(+), 87 deletions(-) diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h index 1091035a69a..24e856bc0c3 100644 --- a/gcc/config/gcn/gcn-opts.h +++ b/gcc/config/gcn/gcn-opts.h @@ -80,4 +80,48 @@ enum hsaco_attr_type HSACO_ATTR_DEFAULT }; +/* There are global address instructions. */ +#define TARGET_GLOBAL_ADDRSPACE TARGET_GCN5_PLUS +/* Device has an AVGPR register file. */ +#define TARGET_AVGPRS TARGET_CDNA1_PLUS +/* There are load/store instructions for AVGPRS. */ +#define TARGET_AVGPR_MEMOPS TARGET_CDNA2_PLUS +/* AVGPRS may have their own register file, or be combined with VGPRS. */ +#define TARGET_AVGPR_COMBINED TARGET_CDNA2_PLUS +/* flat_load/store allows offsets. */ +#define TARGET_FLAT_OFFSETS TARGET_GCN5_PLUS +/* global_load/store has reduced offset. */ +#define TARGET_11BIT_GLOBAL_OFFSET TARGET_RDNA2_PLUS +/* The work item details are all encoded into v0. */ +//#define TARGET_PACKED_WORK_ITEMS TARGET_PACKED_WORK_ITEMS +/* m0 must be initialized in order to use LDS. */ +//#define TARGET_M0_LDS_LIMIT TARGET_M0_LDS_LIMIT +/* CDNA2 load/store costs are reduced. + * TODO: what does this mean? */ +#define TARGET_CDNA2_MEM_COSTS TARGET_CDNA2_PLUS +/* Wave32 devices running in wave64 compatibility mode. */ +#define TARGET_WAVE64_COMPAT TARGET_RDNA2_PLUS +/* RDNA devices have different DPP with reduced capabilities. */ +#define TARGET_DPP_FULL !TARGET_RDNA2_PLUS +#define TARGET_DPP16 TARGET_RDNA2_PLUS +#define TARGET_DPP8 TARGET_RDNA2_PLUS +/* Device requires CDNA1-style manually inserted wait states for AVGPRs. */ +#define TARGET_AVGPR_CDNA1_NOPS TARGET_CDNA1 +/* The metadata on different devices need different granularity. */ +#define TARGET_VGPR_GRANULARITY \ + (TARGET_RDNA3 ? 12 \ + : TARGET_RDNA2_PLUS || TARGET_CDNA2_PLUS ? 8 \ + : 4) +/* This mostly affects the metadata. */ +#define TARGET_ARCHITECTED_FLAT_SCRATCH TARGET_RDNA3 +/* Assembler uses s_add_co not just s_add. */ +#define TARGET_EXPLICIT_CARRY TARGET_GCN5_PLUS +/* mulsi3 permits immediate. */ +#define TARGET_MULTIPLY_IMMEDIATE TARGET_GCN5_PLUS +/* Device has Sub-DWord Addressing instrucions. */ +#define TARGET_SDWA (!TARGET_RDNA3) +/* Different devices uses different cache control instructions. */ +#define TARGET_WBINVL1_CACHE (!TARGET_RDNA2_PLUS) +#define TARGET_GLn_CACHE TARGET_RDNA2_PLUS + #endif diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index e8381d28c1b..b24cf9be32e 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -983,7 +983,7 @@ (match_operand 2 "immediate_operand")] "MODE_VF (mode) < MODE_VF (mode) && mode == mode - && (!TARGET_RDNA2_PLUS || MODE_VF (mode) <= 32)" + && (!TARGET_WAVE64_COMPAT || MODE_VF (mode) <= 32)" { int numlanes = GET_MODE_NUNITS (mode); int firstlane = INTVAL (operands[2]) * numlanes; @@ -1167,7 +1167,7 @@ static char buf[200]; if (AS_FLAT_P (as)) { - if (TARGET_GCN5_PLUS) + if (TARGET_FLAT_OFFSETS) sprintf (buf, "flat_load%%o0\t%%0, %%1 offset:%%2%s\;s_waitcnt\t0", glc); else @@ -1290,7 +1290,7 @@ UNSPEC_SCATTER))] "(AS_
[gcc r14-9593] amdgcn: Clean up device memory in gcn-run
https://gcc.gnu.org/g:c3fb8a4d150586459a9fa177cb2aeeac5e4c0464 commit r14-9593-gc3fb8a4d150586459a9fa177cb2aeeac5e4c0464 Author: Andrew Stubbs Date: Wed Mar 20 12:49:24 2024 + amdgcn: Clean up device memory in gcn-run gcc/ChangeLog: * config/gcn/gcn-run.cc (main): Add an hsa_memory_free calls for each device_malloc call. Diff: --- gcc/config/gcn/gcn-run.cc | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/config/gcn/gcn-run.cc b/gcc/config/gcn/gcn-run.cc index d45ff3e6c2b..2f3ed2d41d2 100644 --- a/gcc/config/gcn/gcn-run.cc +++ b/gcc/config/gcn/gcn-run.cc @@ -755,7 +755,13 @@ main (int argc, char *argv[]) /* Clean shut down. */ XHSA (hsa_fns.hsa_memory_free_fn (kernargs), - "Clean up device memory"); + "Clean up device kernargs memory"); + XHSA (hsa_fns.hsa_memory_free_fn (args), + "Clean up device args memory"); + XHSA (hsa_fns.hsa_memory_free_fn (heap), + "Clean up device heap memory"); + XHSA (hsa_fns.hsa_memory_free_fn (stack), + "Clean up device stack memory"); XHSA (hsa_fns.hsa_executable_destroy_fn (executable), "Clean up GCN executable"); XHSA (hsa_fns.hsa_queue_destroy_fn (queue),
[gcc r14-9594] amdgcn: Ensure gfx11 is running in cumode
https://gcc.gnu.org/g:69dc2dc7e0e853856b84b1bcc89d0241d8a570aa commit r14-9594-g69dc2dc7e0e853856b84b1bcc89d0241d8a570aa Author: Andrew Stubbs Date: Mon Mar 4 15:48:47 2024 + amdgcn: Ensure gfx11 is running in cumode CUmode "on" is the setting for compatibility with GCN and CDNA devices. gcc/ChangeLog: * config/gcn/gcn-hsa.h (ASM_SPEC): Pass -mattr=+cumode. Diff: --- gcc/config/gcn/gcn-hsa.h | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h index 9cf181f52a4..c75256dbac3 100644 --- a/gcc/config/gcn/gcn-hsa.h +++ b/gcc/config/gcn/gcn-hsa.h @@ -107,6 +107,7 @@ extern unsigned int gcn_local_sym_hash (const char *name); "%{" NO_XNACK XNACKOPT "} " \ "%{" NO_SRAM_ECC SRAMOPT "} " \ "%{march=gfx1030|march=gfx1100:-mattr=+wavefrontsize64} " \ + "%{march=gfx1030|march=gfx1100:-mattr=+cumode} " \ "-filetype=obj" #define LINK_SPEC "--pie --export-dynamic" #define LIB_SPEC "-lc"
[gcc r14-9595] amdgcn: Comment correction
https://gcc.gnu.org/g:a2fe34e0b993d5fb879d75ddb42b24b45c4b7242 commit r14-9595-ga2fe34e0b993d5fb879d75ddb42b24b45c4b7242 Author: Andrew Stubbs Date: Mon Mar 4 15:52:00 2024 + amdgcn: Comment correction The location of the marker was changed, but the comment wasn't updated. Fixed now. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_expand_builtin_1): Comment correction. Diff: --- gcc/config/gcn/gcn.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index bc076d1120d..fca001811e5 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -4932,8 +4932,8 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ , } case GCN_BUILTIN_FIRST_CALL_THIS_THREAD_P: { - /* Stash a marker in the unused upper 16 bits of s[0:1] to indicate - whether it was the first call. */ + /* Stash a marker in the unused upper 16 bits of QUEUE_PTR_ARG to + indicate whether it was the first call. */ rtx result = gen_reg_rtx (BImode); emit_move_insn (result, const0_rtx); if (cfun->machine->args.reg[QUEUE_PTR_ARG] >= 0)
[gcc r14-9621] vect: more oversized bitmask fixups
https://gcc.gnu.org/g:e4e02c07d93559a037608c73e8153549b5104fbb commit r14-9621-ge4e02c07d93559a037608c73e8153549b5104fbb Author: Andrew Stubbs Date: Fri Mar 15 14:21:15 2024 + vect: more oversized bitmask fixups These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when configured to use V32 instead of V64 (I plan to do this for RDNA devices). The problem was that a "not" operation on the mask inadvertently enabled inactive lanes 31-63 and corrupted the output. The fix is to adjust the mask when calling internal functions (in this case COND_MINUS), when doing masked loads and stores, and when doing conditional jumps (some cases were already handled). gcc/ChangeLog: * dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector bitmasks. (do_compare_and_jump): Remove now-redundant similar code. * internal-fn.cc (expand_fn_using_insn): Clear excess bits in vector bitmasks. (add_mask_and_len_args): Likewise. Diff: --- gcc/dojump.cc | 34 ++ gcc/internal-fn.cc | 26 ++ 2 files changed, 44 insertions(+), 16 deletions(-) diff --git a/gcc/dojump.cc b/gcc/dojump.cc index 88600cb42d3..5f74b696b41 100644 --- a/gcc/dojump.cc +++ b/gcc/dojump.cc @@ -1235,6 +1235,24 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code code, int unsignedp, } } + /* For boolean vectors with less than mode precision +make sure to fill padding with consistent values. */ + if (val + && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val)) + && SCALAR_INT_MODE_P (mode)) + { + auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant (); + if (maybe_ne (GET_MODE_PRECISION (mode), nunits)) + { + op0 = expand_binop (mode, and_optab, op0, + GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), + NULL_RTX, true, OPTAB_WIDEN); + op1 = expand_binop (mode, and_optab, op1, + GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), + NULL_RTX, true, OPTAB_WIDEN); + } + } + emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val, if_true_label, prob); } @@ -1266,7 +1284,6 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum rtx_code signed_code, machine_mode mode; int unsignedp; enum rtx_code code; - unsigned HOST_WIDE_INT nunits; /* Don't crash if the comparison was erroneous. */ op0 = expand_normal (treeop0); @@ -1309,21 +1326,6 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum rtx_code signed_code, emit_insn (targetm.gen_canonicalize_funcptr_for_compare (new_op1, op1)); op1 = new_op1; } - /* For boolean vectors with less than mode precision - make sure to fill padding with consistent values. */ - else if (VECTOR_BOOLEAN_TYPE_P (type) - && SCALAR_INT_MODE_P (mode) - && TYPE_VECTOR_SUBPARTS (type).is_constant (&nunits) - && maybe_ne (GET_MODE_PRECISION (mode), nunits)) -{ - gcc_assert (code == EQ || code == NE); - op0 = expand_binop (mode, and_optab, op0, - GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), NULL_RTX, - true, OPTAB_WIDEN); - op1 = expand_binop (mode, and_optab, op1, - GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), NULL_RTX, - true, OPTAB_WIDEN); -} do_compare_rtx_and_jump (op0, op1, code, unsignedp, treeop0, mode, ((mode == BLKmode) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index fcf47c7fa12..5269f0ac528 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs, && SSA_NAME_IS_DEFAULT_DEF (rhs) && VAR_P (SSA_NAME_VAR (rhs))) create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type)); + else if (VECTOR_BOOLEAN_TYPE_P (rhs_type) + && SCALAR_INT_MODE_P (TYPE_MODE (rhs_type)) + && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)), + TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ())) + { + /* Ensure that the vector bitmasks do not have excess bits. */ + int nunits = TYPE_VECTOR_SUBPARTS (rhs_type).to_constant (); + rtx tmp = expand_binop (TYPE_MODE (rhs_type), and_optab, rhs_rtx, + GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), + NULL_RTX, true, OPTAB_WIDEN); + create_input_operand (&ops[opno], tmp, TYPE_MODE (rhs_type)); + } else create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type)
[gcc r14-9623] amdgcn: Add gfx1103 target
https://gcc.gnu.org/g:1bf18629c54adf4893c8db5227a36e1952ee69a3 commit r14-9623-g1bf18629c54adf4893c8db5227a36e1952ee69a3 Author: Andrew Stubbs Date: Fri Mar 15 14:26:15 2024 + amdgcn: Add gfx1103 target Add support for the gfx1103 RDNA3 APU integrated graphics devices. The ROCm documentation warns that these may not be supported, but it seems to work at least partially. gcc/ChangeLog: * config.gcc (amdgcn): Add gfx1103 entries. * config/gcn/gcn-hsa.h (NO_XNACK): Likewise. (gcn_local_sym_hash): Likewise. * config/gcn/gcn-opts.h (enum processor_type): Likewise. (TARGET_GFX1103): New macro. * config/gcn/gcn.cc (gcn_option_override): Handle gfx1103. (gcn_omp_device_kind_arch_isa): Likewise. (output_file_start): Likewise. (gcn_hsa_declare_function_name): Use TARGET_RDNA3, not just gfx1100. * config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __gfx1103__. * config/gcn/gcn.opt: Add gfx1103. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1103): New. (main): Handle gfx1103. * config/gcn/t-omp-device: Add gfx1103 isa. * doc/install.texi (amdgcn): Add gfx1103. * doc/invoke.texi (-march): Likewise. libgomp/ChangeLog: * plugin/plugin-gcn.c (EF_AMDGPU_MACH): GFX1103. (gcn_gfx1103_s): New. (isa_hsa_name): Handle gfx1103. (isa_code): Likewise. (max_isa_vgprs): Likewise. Diff: --- gcc/config.gcc | 4 ++-- gcc/config/gcn/gcn-hsa.h| 6 +++--- gcc/config/gcn/gcn-opts.h | 4 +++- gcc/config/gcn/gcn.cc | 14 -- gcc/config/gcn/gcn.h| 2 ++ gcc/config/gcn/gcn.opt | 3 +++ gcc/config/gcn/mkoffload.cc | 5 + gcc/config/gcn/t-omp-device | 2 +- gcc/doc/install.texi| 13 +++-- gcc/doc/invoke.texi | 3 +++ libgomp/plugin/plugin-gcn.c | 10 +- 11 files changed, 50 insertions(+), 16 deletions(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index 040afabd9ec..87a5c92b6e3 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -4560,7 +4560,7 @@ case "${target}" in for which in arch tune; do eval "val=\$with_$which" case ${val} in - "" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030 | gfx1100) + "" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030 | gfx1100 | gfx1103) # OK ;; *) @@ -4576,7 +4576,7 @@ case "${target}" in TM_MULTILIB_CONFIG= ;; xdefault | xyes) - TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100" | sed "s/${with_arch},\?//;s/,$//"` + TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100,gfx1103" | sed "s/${with_arch},\?//;s/,$//"` ;; *) TM_MULTILIB_CONFIG="${with_multilib_list}" diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h index c75256dbac3..ac32b8a328f 100644 --- a/gcc/config/gcn/gcn-hsa.h +++ b/gcc/config/gcn/gcn-hsa.h @@ -90,7 +90,7 @@ extern unsigned int gcn_local_sym_hash (const char *name); the ELF flags (e_flags) of that generated file must be identical to those generated by the compiler. */ -#define NO_XNACK "march=fiji:;march=gfx1030:;march=gfx1100:;" \ +#define NO_XNACK "march=fiji:;march=gfx1030:;march=gfx1100:;march=gfx1103:;" \ /* These match the defaults set in gcn.cc. */ \ "!mxnack*|mxnack=default:%{march=gfx900|march=gfx906|march=gfx908:-mattr=-xnack};" #define NO_SRAM_ECC "!march=*:;march=fiji:;march=gfx900:;march=gfx906:;" @@ -106,8 +106,8 @@ extern unsigned int gcn_local_sym_hash (const char *name); "%{" ABI_VERSION_SPEC "} " \ "%{" NO_XNACK XNACKOPT "} " \ "%{" NO_SRAM_ECC SRAMOPT "} " \ - "%{march=gfx1030|march=gfx1100:-mattr=+wavefrontsize64} " \ - "%{march=gfx1030|march=gfx1100:-mattr=+cumode} " \ + "%{march=gfx1030|march=gfx1100|march=gfx1103:-mattr=+wavefrontsize64} " \ + "%{march=gfx1030|march=gfx1100|march=gfx1103:-mattr=+cumode} " \ "-filetype=obj" #define LINK_SPEC "--pie --export-dynamic" #define LIB_SPEC "-lc" diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h index 6be2c9204fa..285746f7f4d 100644 --- a/gcc/config/gcn/gcn-opts.h +++ b/gcc/config/gcn/gcn-opts.h @@ -26,7 +26,8 @@ enum processor_type PROCESSOR_GFX908, PROCESSOR_GFX90a, PROCESSOR_GFX1030, - PROCESSOR_GFX1100 + PROCESSOR_GFX1100, + PROCESSOR_GFX1103 }; #define TARGET_FIJI (gcn_arch == PROCESSOR_FIJI) @@ -36,6 +
[gcc r14-9626] amdgcn: Prefer V32 on RDNA devices
https://gcc.gnu.org/g:6dedafe166cc02ae87b6a0699ad61ce3ffc46803 commit r14-9626-g6dedafe166cc02ae87b6a0699ad61ce3ffc46803 Author: Andrew Stubbs Date: Thu Feb 22 11:41:19 2024 + amdgcn: Prefer V32 on RDNA devices We run these devices in wavefrontsize64 for compatibility, but they actually only have 32-lane vectors, natively. If the upper part of a V64 is masked off (as it is in V32) then RDNA devices will skip execution of the upper part for most operations, so this adjustment shouldn't leave too much performance on the table. One exception is memory instructions, so full wavefrontsize32 support would be better. The advantage is that we avoid the missing V64 operations (such as permute and vec_extract). gcc/ChangeLog: * config/gcn/gcn.cc (gcn_vectorize_preferred_simd_mode): Prefer V32 on RDNA devices. Diff: --- gcc/config/gcn/gcn.cc | 26 ++ 1 file changed, 26 insertions(+) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 498146dcde9..efb73af50c4 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -5226,6 +5226,32 @@ gcn_vector_mode_supported_p (machine_mode mode) static machine_mode gcn_vectorize_preferred_simd_mode (scalar_mode mode) { + /* RDNA devices have 32-lane vectors with limited support for 64-bit vectors + (in particular, permute operations are only available for cases that don't + span the 32-lane boundary). + + From the RDNA3 manual: "Hardware may choose to skip either half if the + EXEC mask for that half is all zeros...". This means that preferring + 32-lanes is a good stop-gap until we have proper wave32 support. */ + if (TARGET_RDNA2_PLUS) +switch (mode) + { + case E_QImode: + return V32QImode; + case E_HImode: + return V32HImode; + case E_SImode: + return V32SImode; + case E_DImode: + return V32DImode; + case E_SFmode: + return V32SFmode; + case E_DFmode: + return V32DFmode; + default: + return word_mode; + } + switch (mode) { case E_QImode:
[gcc r14-9627] amdgcn: Adjust GFX10/GFX11 cache coherency
https://gcc.gnu.org/g:e194503b6f2cf5f1b819f4a8af9d16311a07e4f5 commit r14-9627-ge194503b6f2cf5f1b819f4a8af9d16311a07e4f5 Author: Andrew Stubbs Date: Wed Mar 6 15:54:46 2024 + amdgcn: Adjust GFX10/GFX11 cache coherency The RDNA devices have different cache architectures to the CDNA devices, and the differences go deeper than just the assembler mnemonics. I believe this patch is correct according to the documentation in the LLVM AMDGPU user guide (the ISA manual is less instructive), but I hadn't observed any real problems before (or after). gcc/ChangeLog: * config/gcn/gcn.md (*memory_barrier): Split into RDNA and !RDNA. (atomic_load): Adjust RDNA cache settings. (atomic_store): Likewise. (atomic_exchange): Likewise. Diff: --- gcc/config/gcn/gcn.md | 86 --- 1 file changed, 55 insertions(+), 31 deletions(-) diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index 3b51453aaca..574c2f87e8c 100644 --- a/gcc/config/gcn/gcn.md +++ b/gcc/config/gcn/gcn.md @@ -1960,11 +1960,19 @@ (define_insn "*memory_barrier" [(set (match_operand:BLK 0) (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))] - "" - "{buffer_wbinvl1_vol|buffer_gl0_inv}" + "!TARGET_RDNA2_PLUS" + "buffer_wbinvl1_vol" [(set_attr "type" "mubuf") (set_attr "length" "4")]) +(define_insn "*memory_barrier" + [(set (match_operand:BLK 0) + (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))] + "TARGET_RDNA2_PLUS" + "buffer_gl1_inv\;buffer_gl0_inv" + [(set_attr "type" "mult") + (set_attr "length" "8")]) + ; FIXME: These patterns have been disabled as they do not seem to work ; reliably - they can cause hangs or incorrect results. ; TODO: flush caches according to memory model @@ -2094,9 +2102,13 @@ case 0: return "s_load%o0\t%0, %A1 glc\;s_waitcnt\tlgkmcnt(0)"; case 1: - return "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0"; + return (TARGET_RDNA2 /* Not GFX11. */ + ? "flat_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\t0" + : "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0"); case 2: - return "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)"; + return (TARGET_RDNA2 /* Not GFX11. */ + ? "global_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\tvmcnt(0)" + : "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)"); } break; case MEMMODEL_CONSUME: @@ -2108,15 +2120,21 @@ return "s_load%o0\t%0, %A1 glc\;s_waitcnt\tlgkmcnt(0)\;" "s_dcache_wb_vol"; case 1: - return (TARGET_RDNA2_PLUS + return (TARGET_RDNA2 + ? "flat_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\t0\;" + "buffer_gl1_inv\;buffer_gl0_inv" + : TARGET_RDNA3 ? "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0\;" - "buffer_gl0_inv" + "buffer_gl1_inv\;buffer_gl0_inv" : "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0\;" "buffer_wbinvl1_vol"); case 2: - return (TARGET_RDNA2_PLUS + return (TARGET_RDNA2 + ? "global_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\tvmcnt(0)\;" + "buffer_gl1_inv\;buffer_gl0_inv" + : TARGET_RDNA3 ? "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)\;" - "buffer_gl0_inv" + "buffer_gl1_inv\;buffer_gl0_inv" : "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)\;" "buffer_wbinvl1_vol"); } @@ -2130,15 +2148,21 @@ return "s_dcache_wb_vol\;s_load%o0\t%0, %A1 glc\;" "s_waitcnt\tlgkmcnt(0)\;s_dcache_inv_vol"; case 1: - return (TARGET_RDNA2_PLUS - ? "buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 glc\;" - "s_waitcnt\t0\;buffer_gl0_inv" + return (TARGET_RDNA2 + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 glc dlc\;" + "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv" + : TARGET_RDNA3 + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 glc\;" + "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv" : "buffer_wbinvl1_vol\;flat_load%o0\t%0, %A1%O1 glc\;" "s_waitcnt\t0\;buffer_wbinvl1_vol"); case 2: - return (TARGET_RDNA2_PLUS - ? "buffer_gl0_inv\;global_load%o0\t%0, %A1%O1 glc\;" - "s_waitcnt\tvmcnt(0)\;buffer_gl0_inv" + return (TARGET_RDNA2 + ? "buffer_gl1_inv\;buffer_gl0_inv\;global_load%o0\t%0, %A1%O1 glc dlc\;" +
[gcc r15-2822] amdgcn: Re-enable trampolines
https://gcc.gnu.org/g:6f71e050a51378e1811b90fe9c16cd37bf4c48ec commit r15-2822-g6f71e050a51378e1811b90fe9c16cd37bf4c48ec Author: Andrew Stubbs Date: Thu Aug 8 13:12:43 2024 + amdgcn: Re-enable trampolines The stacks are executable since the reverse-offload features were added, so trampolines actually do work. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_trampoline_init): Re-enable trampolines. Diff: --- gcc/config/gcn/gcn.cc | 5 - 1 file changed, 5 deletions(-) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 00f2978559bd..b22132de6ab7 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -3799,11 +3799,6 @@ gcn_asm_trampoline_template (FILE *f) static void gcn_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value) { - // FIXME - if (TARGET_GCN5_PLUS) -sorry ("nested function trampolines not supported on GCN5 due to" - " non-executable stacks"); - emit_block_move (m_tramp, assemble_trampoline_template (), GEN_INT (TRAMPOLINE_SIZE), BLOCK_OP_NORMAL);
[gcc r15-2835] amdgcn: Fix VGPR max count
https://gcc.gnu.org/g:715317331994d3d69395056f77bfe7ac613af009 commit r15-2835-g715317331994d3d69395056f77bfe7ac613af009 Author: Andrew Stubbs Date: Wed Aug 7 15:35:18 2024 + amdgcn: Fix VGPR max count The metadata for RDNA3 kernels allocates VGPRs in blocks of 12, which means the maximum usable number of registers is 252. This patch prevents the compiler from exceeding this artifical limit. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_conditional_register_usage): Fix registers remaining after maximum allocation using TARGET_VGPR_GRANULARITY. Diff: --- gcc/config/gcn/gcn.cc | 7 +++ 1 file changed, 7 insertions(+) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index b22132de6ab7..0725d15c8ed0 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -2493,6 +2493,13 @@ gcn_secondary_reload (bool in_p, rtx x, reg_class_t rclass, static void gcn_conditional_register_usage (void) { + /* Some architectures have a register allocation granularity that does not + permit use of the full register count. */ + for (int i = 256 - (256 % TARGET_VGPR_GRANULARITY); + i < 256; + i++) +fixed_regs[VGPR_REGNO (i)] = call_used_regs[VGPR_REGNO (i)] = 1; + if (!cfun || !cfun->machine) return;
[gcc/devel/omp/gcc-14] amdgcn: Re-enable trampolines
https://gcc.gnu.org/g:ffc69480ea3eca06f6e445025b839d23848ee148 commit ffc69480ea3eca06f6e445025b839d23848ee148 Author: Andrew Stubbs Date: Thu Aug 8 13:12:43 2024 + amdgcn: Re-enable trampolines The stacks are executable since the reverse-offload features were added, so trampolines actually do work. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_trampoline_init): Re-enable trampolines. (cherry picked from commit 6f71e050a51378e1811b90fe9c16cd37bf4c48ec) Diff: --- gcc/config/gcn/gcn.cc | 4 1 file changed, 4 deletions(-) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index d6531f55190c..4c212e248764 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -3798,10 +3798,6 @@ gcn_asm_trampoline_template (FILE *f) static void gcn_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value) { - if (TARGET_GCN5_PLUS) -sorry ("nested function trampolines not supported on GCN5 due to" - " non-executable stacks"); - emit_block_move (m_tramp, assemble_trampoline_template (), GEN_INT (TRAMPOLINE_SIZE), BLOCK_OP_NORMAL);
[gcc/devel/omp/gcc-14] amdgcn: Fix VGPR max count
https://gcc.gnu.org/g:6d3c68ff05cf2b681c68db8dd0e2936cc34f2c40 commit 6d3c68ff05cf2b681c68db8dd0e2936cc34f2c40 Author: Andrew Stubbs Date: Wed Aug 7 15:35:18 2024 + amdgcn: Fix VGPR max count The metadata for RDNA3 kernels allocates VGPRs in blocks of 12, which means the maximum usable number of registers is 252. This patch prevents the compiler from exceeding this artifical limit. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_conditional_register_usage): Fix registers remaining after maximum allocation using TARGET_VGPR_GRANULARITY. (cherry picked from commit 715317331994d3d69395056f77bfe7ac613af009) Diff: --- gcc/config/gcn/gcn.cc | 10 ++ 1 file changed, 10 insertions(+) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 4c212e248764..e5fc05afbf86 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -2492,6 +2492,16 @@ gcn_secondary_reload (bool in_p, rtx x, reg_class_t rclass, static void gcn_conditional_register_usage (void) { + /* Some architectures have a register allocation granularity that does not + permit use of the full register count. */ + int vgpr_block_size = (TARGET_RDNA3 ? 12 +: TARGET_RDNA2_PLUS || TARGET_CDNA2_PLUS ? 8 +: 4); + for (int i = 256 - (256 % vgpr_block_size); + i < 256; + i++) +fixed_regs[VGPR_REGNO (i)] = call_used_regs[VGPR_REGNO (i)] = 1; + if (!cfun || !cfun->machine) return;
[gcc r15-2846] amdgcn: Add padding to trampoline
https://gcc.gnu.org/g:b5a09a68bf0feaf0b0678d8f3433f776238d3896 commit r15-2846-gb5a09a68bf0feaf0b0678d8f3433f776238d3896 Author: Andrew Stubbs Date: Fri Aug 9 11:45:42 2024 + amdgcn: Add padding to trampoline This avoids a -Wpadded warning (testcase gcc.dg/20050607-1.c). gcc/ChangeLog: * config/gcn/gcn.cc (gcn_asm_trampoline_template): Add .align. * config/gcn/gcn.h (TRAMPOLINE_SIZE): Increase to 40. Diff: --- gcc/config/gcn/gcn.cc | 1 + gcc/config/gcn/gcn.h | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 0725d15c8ed0..17316a7ddb84 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -3794,6 +3794,7 @@ gcn_asm_trampoline_template (FILE *f) asm_fprintf (f, "\ts_mov_b32\ts%i, 0x\n", CC_SAVE_REG); asm_fprintf (f, "\ts_mov_b32\ts%i, 0x\n", CC_SAVE_REG + 1); asm_fprintf (f, "\ts_setpc_b64\ts[%i:%i]\n", CC_SAVE_REG, CC_SAVE_REG + 1); + asm_fprintf (f, "\t.align 8\n"); } /* Implement TARGET_TRAMPOLINE_INIT. diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h index e3bfd29c17d2..bd2afa61c10b 100644 --- a/gcc/config/gcn/gcn.h +++ b/gcc/config/gcn/gcn.h @@ -831,7 +831,7 @@ enum gcn_builtin_codes #define PROFILE_BEFORE_PROLOGUE 0 /* Trampolines */ -#define TRAMPOLINE_SIZE 36 +#define TRAMPOLINE_SIZE 40 /* 36 + 4 padding for alignment. */ #define TRAMPOLINE_ALIGNMENT 64 /* MD Optimization.
[gcc r15-4519] amdgcn: silence warning
https://gcc.gnu.org/g:0b6d94ce72b2f35dbee7c42774d6972671c86f97 commit r15-4519-g0b6d94ce72b2f35dbee7c42774d6972671c86f97 Author: Andrew Stubbs Date: Mon Sep 16 12:31:59 2024 + amdgcn: silence warning FIRST_SGPR_REG is register zero so the compiler always claims this comparison is redundant. It's right, of course, but I'd have preferred to keep the comparison for completeness. Probably the "correct" solution is to use an enum for these values. gcc/ChangeLog: * config/gcn/gcn.h (SGPR_REGNO_P): Silence warning. Diff: --- gcc/config/gcn/gcn.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h index 1a4631dd39f6..faefe68cdfa9 100644 --- a/gcc/config/gcn/gcn.h +++ b/gcc/config/gcn/gcn.h @@ -191,7 +191,7 @@ STATIC_ASSERT (LAST_AVGPR_REG + 1 - FIRST_AVGPR_REG == 256); #define HARD_FRAME_POINTER_IS_ARG_POINTER 0 #define HARD_FRAME_POINTER_IS_FRAME_POINTER 0 -#define SGPR_REGNO_P(N)((N) >= FIRST_SGPR_REG && (N) <= LAST_SGPR_REG) +#define SGPR_REGNO_P(N)(/*(N) >= FIRST_SGPR_REG &&*/ (N) <= LAST_SGPR_REG) #define VGPR_REGNO_P(N)((N) >= FIRST_VGPR_REG && (N) <= LAST_VGPR_REG) #define AVGPR_REGNO_P(N)((N) >= FIRST_AVGPR_REG && (N) <= LAST_AVGPR_REG) #define SSRC_REGNO_P(N)((N) <= SCC_REG && (N) != VCCZ_REG)
[gcc r15-4540] amdgcn: Refactor device settings into a def file
https://gcc.gnu.org/g:a6b26e5ea09779bf276dff52a6692f3bb655d230 commit r15-4540-ga6b26e5ea09779bf276dff52a6692f3bb655d230 Author: Andrew Stubbs Date: Tue Sep 17 15:26:04 2024 + amdgcn: Refactor device settings into a def file Almost all device-specific settings are now centralised into gcn-devices.def for the compiler, mkoffload, and libgomp. No longer will we have to touch 10 files in multiple places just to add another device without any exotic features. (New ISAs and devices with incompatible metadata will continue to need a bit more.) In order to remove the device-specific conditionals in the code a new value HSACO_ATTR_UNSUPPORTED has been added, indicating that the assembler will reject any setting of that option. This incorporates some of Tobias's patch from March 2024. Co-Authored-By: Tobias Burnus gcc/ChangeLog: * config.gcc (amdgcn): Add gcn-device-macros.h to tm_file. Add gcn-tables.opt to extra_options. * config/gcn/gcn-hsa.h (NO_XNACK): Delete. (NO_SRAM_ECC): Delete. (SRAMOPT): Move definition to generated file gcn-device-macros.h. (XNACKOPT): Likewise. (ASM_SPEC): Redefine using generated values from gcn-device-macros.h. * config/gcn/gcn-opts.h (enum processor_type): Generate from gcn-devices.def. (TARGET_VEGA10): Delete. (TARGET_VEGA20): Delete. (TARGET_GFX908): Delete. (TARGET_GFX90a): Delete. (TARGET_GFX90c): Delete. (TARGET_GFX1030): Delete. (TARGET_GFX1036): Delete. (TARGET_GFX1100): Delete. (TARGET_GFX1103): Delete. (TARGET_XNACK): Redefine to allow for HSACO_ATTR_UNSUPPORTED. (enum hsaco_attr_type): Add HSACO_ATTR_UNSUPPORTED. (TARGET_TGSPLIT): New define. * config/gcn/gcn.cc (gcn_devices): New constant table. (gcn_option_override): Rework to use gcn_devices table. (gcn_omp_device_kind_arch_isa): Likewise. (output_file_start): Likewise. (gcn_hsa_declare_function_name): Rework using TARGET_* macros. * config/gcn/gcn.h (gcn_devices): Declare struct and table. (TARGET_CPU_CPP_BUILTINS): Rework using gcn_devices. * config/gcn/gcn.opt: Move enum data to generated file gcn-tables.opt. Use new names for the default values. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX900): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX906): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX908): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX90a): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX90c): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX1030): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX1036): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX1100): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX1103): Delete. (enum elf_arch_code): Define using gcn-devices.def. (get_arch): Rework using gcn-devices.def. (main): Rework using gcn-devices.def * config/gcn/t-gcn-hsa (gcn-tables.opt): Generate file. (gcn-device-macros.h): Generate file. * config/gcn/t-omp-device: Generate isa list from gcn-devices.def. * config/gcn/gcn-devices.def: New file. * config/gcn/gcn-tables.opt: New file. * config/gcn/gcn-tables.opt.urls: New file. * config/gcn/gen-gcn-device-macros.awk: New file. * config/gcn/gen-opt-tables.awk: New file. libgomp/ChangeLog: * plugin/plugin-gcn.c (EF_AMDGPU_MACH): Generate from gcn-devices.def. (gcn_gfx803_s): Delete. (gcn_gfx900_s): Delete. (gcn_gfx906_s): Delete. (gcn_gfx908_s): Delete. (gcn_gfx90a_s): Delete. (gcn_gfx90c_s): Delete. (gcn_gfx1030_s): Delete. (gcn_gfx1036_s): Delete. (gcn_gfx1100_s): Delete. (gcn_gfx1103_s): Delete. (gcn_isa_name_len): Delete. (isa_hsa_name): Rename ... (isa_name): ... to this, and rework using gcn-devices.def. (isa_gcc_name): Delete. (isa_code): Rework using gcn-devices.def. (max_isa_vgprs): Rework using gcn-devices.def. (isa_matches_agent): Update isa_name usage. (GOMP_OFFLOAD_init_device): Improve diagnostic using the name. Diff: --- gcc/config.gcc | 3 +- gcc/config/gcn/gcn-devices.def | 143 +++ gcc/config/gcn/gcn-hsa.h | 23 ++--- gcc/config/gcn/gcn-opts.h| 31 +++ gcc/config/gcn/gcn-tables.opt| 52 +++ gcc/config/gcn/gcn-tables.opt.urls | 2 + gcc/config/gcn/gcn.cc| 132 ++-
[gcc r15-4989] openmp: Fix signed/unsigned warning
https://gcc.gnu.org/g:345eb9b795d9728733bd0e472529e259ce796ff6 commit r15-4989-g345eb9b795d9728733bd0e472529e259ce796ff6 Author: Andrew Stubbs Date: Wed Nov 6 17:50:00 2024 + openmp: Fix signed/unsigned warning My previous patch broke things when building with Werror. gcc/ChangeLog: * omp-general.cc (omp_max_vf): Cast the constant to poly_uint64. Diff: --- gcc/omp-general.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc index 1ae575ee181f..72fb7f92ff70 100644 --- a/gcc/omp-general.cc +++ b/gcc/omp-general.cc @@ -1005,7 +1005,7 @@ omp_max_vf (bool offload) for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c;) { if (startswith (c, "amdgcn")) - return ordered_max (64, omp_max_vf (false)); + return ordered_max (poly_uint64 (64), omp_max_vf (false)); else if ((c = strchr (c, ':'))) c++; }
[gcc r15-4987] openmp: Add IFN_GOMP_MAX_VF
https://gcc.gnu.org/g:2a2e6e9894f42fef9315aaad80c36843718ca0cb commit r15-4987-g2a2e6e9894f42fef9315aaad80c36843718ca0cb Author: Andrew Stubbs Date: Fri Nov 1 15:00:25 2024 + openmp: Add IFN_GOMP_MAX_VF Delay omp_max_vf call until after the host and device compilers have diverged so that the max_vf value can be tuned exactly right on both variants. This change means that the ompdevlow pass must be enabled for functions that use OpenMP directives with both "simd" and "schedule" enabled. gcc/ChangeLog: * internal-fn.cc (expand_GOMP_MAX_VF): New function. * internal-fn.def (GOMP_MAX_VF): New internal function. * omp-expand.cc (omp_adjust_chunk_size): Emit IFN_GOMP_MAX_VF when called in offload context, otherwise assume host context. * omp-offload.cc (execute_omp_device_lower): Expand IFN_GOMP_MAX_VF. Diff: --- gcc/internal-fn.cc | 8 gcc/internal-fn.def | 1 + gcc/omp-expand.cc | 30 ++ gcc/omp-offload.cc | 3 +++ 4 files changed, 34 insertions(+), 8 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 1b3fe7be0479..0ee5f5bc7c55 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -510,6 +510,14 @@ expand_GOMP_SIMT_VF (internal_fn, gcall *) /* This should get expanded in omp_device_lower pass. */ +static void +expand_GOMP_MAX_VF (internal_fn, gcall *) +{ + gcc_unreachable (); +} + +/* This should get expanded in omp_device_lower pass. */ + static void expand_GOMP_TARGET_REV (internal_fn, gcall *) { diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 2d4559382711..c3d0efc0f2c3 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -465,6 +465,7 @@ DEF_INTERNAL_FN (GOMP_SIMT_ENTER_ALLOC, ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_EXIT, ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_VF, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) +DEF_INTERNAL_FN (GOMP_MAX_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_LAST_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_ORDERED_PRED, ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_VOTE_ANY, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc index b0f9d375b6c7..80fb1843445d 100644 --- a/gcc/omp-expand.cc +++ b/gcc/omp-expand.cc @@ -229,15 +229,29 @@ omp_adjust_chunk_size (tree chunk_size, bool simd_schedule, bool offload) if (!simd_schedule || integer_zerop (chunk_size)) return chunk_size; - poly_uint64 vf = omp_max_vf (offload); - if (known_eq (vf, 1U)) -return chunk_size; - + tree vf; tree type = TREE_TYPE (chunk_size); - chunk_size = fold_build2 (PLUS_EXPR, type, chunk_size, - build_int_cst (type, vf - 1)); - return fold_build2 (BIT_AND_EXPR, type, chunk_size, - build_int_cst (type, -vf)); + + if (offload) +{ + cfun->curr_properties &= ~PROP_gimple_lomp_dev; + vf = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOMP_MAX_VF, +unsigned_type_node, 0); + vf = fold_convert (type, vf); +} + else +{ + poly_uint64 vf_num = omp_max_vf (false); + if (known_eq (vf_num, 1U)) + return chunk_size; + vf = build_int_cst (type, vf_num); +} + + tree vf_minus_one = fold_build2 (MINUS_EXPR, type, vf, + build_int_cst (type, 1)); + tree negative_vf = fold_build1 (NEGATE_EXPR, type, vf); + chunk_size = fold_build2 (PLUS_EXPR, type, chunk_size, vf_minus_one); + return fold_build2 (BIT_AND_EXPR, type, chunk_size, negative_vf); } /* Collect additional arguments needed to emit a combined diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc index 25ce8133fe5e..372b019f9d60 100644 --- a/gcc/omp-offload.cc +++ b/gcc/omp-offload.cc @@ -2754,6 +2754,9 @@ execute_omp_device_lower () case IFN_GOMP_SIMT_VF: rhs = build_int_cst (type, vf); break; + case IFN_GOMP_MAX_VF: + rhs = build_int_cst (type, omp_max_vf (false)); + break; case IFN_GOMP_SIMT_ORDERED_PRED: rhs = vf == 1 ? integer_zero_node : NULL_TREE; if (rhs || !lhs)
[gcc r15-4988] openmp: Add testcases for omp_max_vf
https://gcc.gnu.org/g:d334f729e53867b838e867375b3f475ba793d96e commit r15-4988-gd334f729e53867b838e867375b3f475ba793d96e Author: Andrew Stubbs Date: Wed Nov 6 12:26:08 2024 + openmp: Add testcases for omp_max_vf Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when offloading is enabled ("target" directives are present), and is inactive otherwise. libgomp/ChangeLog: * testsuite/libgomp.c/max_vf-1.c: New test. * testsuite/libgomp.c/max_vf-2.c: New test. gcc/testsuite/ChangeLog: * gcc.dg/gomp/max_vf-1.c: New test. Diff: --- gcc/testsuite/gcc.dg/gomp/max_vf-1.c | 37 ++ libgomp/testsuite/libgomp.c/max_vf-1.c | 47 ++ libgomp/testsuite/libgomp.c/max_vf-2.c | 21 +++ 3 files changed, 105 insertions(+) diff --git a/gcc/testsuite/gcc.dg/gomp/max_vf-1.c b/gcc/testsuite/gcc.dg/gomp/max_vf-1.c new file mode 100644 index ..0513aae226ce --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/max_vf-1.c @@ -0,0 +1,37 @@ +/* Test that omp parallel simd schedule uses the correct max_vf for the + host system, when no target directives are present. */ + +/* { dg-do compile } */ +/* { dg-options "-fopenmp -O2 -fdump-tree-ompexp" } */ + +/* Fix a max_vf size so we can scan for it. +{ dg-additional-options "-msse2" { target { x86_64-*-* i?86-*-* } } } */ + +#define N 1024 +int a[N], b[N], c[N]; + +void +f2 (void) +{ + int i; + #pragma omp parallel for simd schedule (simd: static, 7) + for (i = 0; i < N; i++) +a[i] = b[i] + c[i]; +} + +/* Make sure the max_vf is inlined as a number. + Hopefully there are no unrelated uses of these numbers ... +{ dg-final { scan-tree-dump-times {\* 16} 2 "ompexp" { target { x86_64-*-* } } } } +{ dg-final { scan-tree-dump-times {\+ 16} 1 "ompexp" { target { x86_64-*-* } } } } */ + +void +f3 (int *a, int *b, int *c) +{ + int i; + #pragma omp parallel for simd schedule (simd : dynamic, 7) + for (i = 0; i < N; i++) +a[i] = b[i] + c[i]; +} + +/* Make sure the max_vf is inlined as a number. +{ dg-final { scan-tree-dump-times {__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \(.*, 16, 0\);} 1 "ompexp" { target { x86_64-*-* } } } } */ diff --git a/libgomp/testsuite/libgomp.c/max_vf-1.c b/libgomp/testsuite/libgomp.c/max_vf-1.c new file mode 100644 index ..be900c565a37 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/max_vf-1.c @@ -0,0 +1,47 @@ +/* Test that omp parallel simd schedule uses the correct max_vf for the + host system, when target directives are present. */ + +/* { dg-require-effective-target offloading_enabled } */ + +/* { dg-do link } */ +/* { dg-options "-fopenmp -O2 -fdump-tree-ompexp -foffload=-fdump-tree-optimized" } */ + +/* Fix a max_vf size so we can scan for it. +{ dg-additional-options "-msse2" { target { x86_64-*-* i?86-*-* } } } */ + +#define N 1024 +int a[N], b[N], c[N]; + +/* Test both static schedules and inline target directives. */ +void +f2 (void) +{ + int i; + #pragma omp target parallel for simd schedule (simd: static, 7) + for (i = 0; i < N; i++) +a[i] = b[i] + c[i]; +} + +/* Test both dynamic schedules and declare target functions. */ +#pragma omp declare target +void +f3 (int *a, int *b, int *c) +{ + int i; + #pragma omp parallel for simd schedule (simd : dynamic, 7) + for (i = 0; i < N; i++) +a[i] = b[i] + c[i]; +} +#pragma omp end declare target + +/* Make sure that the max_vf is used as an IFN. +{ dg-final { scan-tree-dump-times {GOMP_MAX_VF} 2 "ompexp" { target { x86_64-*-* i?86-*-* } } } } */ + +/* Make sure the max_vf is passed as a temporary variable. +{ dg-final { scan-tree-dump-times {__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \(.*, D\.[0-9]*, 0\);} 1 "ompexp" { target { x86_64-*-* i?86-*-* } } } } */ + +/* Test SIMD offload devices +{ dg-final { scan-offload-tree-dump-times {__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \(.*, 64, 0\);} 1 "optimized" { target { offload_gcn } } } } +{ dg-final { scan-offload-tree-dump-times {__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \(.*, 7, 0\);} 1 "optimized" { target { offload_nvptx } } } } */ + +int main() {} diff --git a/libgomp/testsuite/libgomp.c/max_vf-2.c b/libgomp/testsuite/libgomp.c/max_vf-2.c new file mode 100644 index ..91744c309df8 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/max_vf-2.c @@ -0,0 +1,21 @@ +/* Ensure that the default safelen is set correctly for the larger of the host + and offload device, to prevent defeating the vectorizer. */ + +/* { dg-require-effective-target offloading_enabled } */ + +/* { dg-do link } */ +/* { dg-options "-fopenmp -O2 -fdump-tree-omplower" } */ + +int f(float *a, float *b, int n) +{ + float sum = 0; + #pragma omp target teams distribute parallel for simd map(tofrom: sum) reduction(+:sum) + for (int i = 0; i < n; i++) +sum += a[i] * b[i]; + return sum; +} + +/* Make sure that
[gcc r15-4985] openmp: Tune omp_max_vf for offload targets
https://gcc.gnu.org/g:5c9de3df8547682bfb3d484d7d28a27776bf979c commit r15-4985-g5c9de3df8547682bfb3d484d7d28a27776bf979c Author: Andrew Stubbs Date: Mon Oct 21 12:29:54 2024 + openmp: Tune omp_max_vf for offload targets If requested, return the vectorization factor appropriate for the offload device, if any. This change gives a significant speedup in the BabelStream "dot" benchmark on amdgcn. The omp_adjust_chunk_size usecase is set "false", for now, but I intend to change that in a follow-up patch. Note that NVPTX SIMT offload does not use this code-path. gcc/ChangeLog: * gimple-loop-versioning.cc (loop_versioning::loop_versioning): Set omp_max_vf to offload == false. * omp-expand.cc (omp_adjust_chunk_size): Likewise. * omp-general.cc (omp_max_vf): Add "offload" parameter, and detect amdgcn offload devices. * omp-general.h (omp_max_vf): Likewise. * omp-low.cc (lower_rec_simd_input_clauses): Pass offload state to omp_max_vf. Diff: --- gcc/gimple-loop-versioning.cc | 2 +- gcc/omp-expand.cc | 2 +- gcc/omp-general.cc| 17 +++-- gcc/omp-general.h | 2 +- gcc/omp-low.cc| 3 ++- 5 files changed, 20 insertions(+), 6 deletions(-) diff --git a/gcc/gimple-loop-versioning.cc b/gcc/gimple-loop-versioning.cc index 107b00200247..2968c929d04a 100644 --- a/gcc/gimple-loop-versioning.cc +++ b/gcc/gimple-loop-versioning.cc @@ -554,7 +554,7 @@ loop_versioning::loop_versioning (function *fn) handled efficiently by scalar code. omp_max_vf calculates the maximum number of bytes in a vector, when such a value is relevant to loop optimization. */ - m_maximum_scale = estimated_poly_value (omp_max_vf ()); + m_maximum_scale = estimated_poly_value (omp_max_vf (false)); m_maximum_scale = MAX (m_maximum_scale, MAX_FIXED_MODE_SIZE); } diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc index b0b4ddf5dbc8..907fd46a5b26 100644 --- a/gcc/omp-expand.cc +++ b/gcc/omp-expand.cc @@ -212,7 +212,7 @@ omp_adjust_chunk_size (tree chunk_size, bool simd_schedule) if (!simd_schedule || integer_zerop (chunk_size)) return chunk_size; - poly_uint64 vf = omp_max_vf (); + poly_uint64 vf = omp_max_vf (false); if (known_eq (vf, 1U)) return chunk_size; diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc index f74b9bf5e96c..1ae575ee181f 100644 --- a/gcc/omp-general.cc +++ b/gcc/omp-general.cc @@ -987,10 +987,11 @@ find_combined_omp_for (tree *tp, int *walk_subtrees, void *data) return NULL_TREE; } -/* Return maximum possible vectorization factor for the target. */ +/* Return maximum possible vectorization factor for the target, or for + the OpenMP offload target if one exists. */ poly_uint64 -omp_max_vf (void) +omp_max_vf (bool offload) { if (!optimize || optimize_debug @@ -999,6 +1000,18 @@ omp_max_vf (void) && OPTION_SET_P (flag_tree_loop_vectorize))) return 1; + if (ENABLE_OFFLOADING && offload) +{ + for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c;) + { + if (startswith (c, "amdgcn")) + return ordered_max (64, omp_max_vf (false)); + else if ((c = strchr (c, ':'))) + c++; + } + /* Otherwise, fall through to host VF. */ +} + auto_vector_modes modes; targetm.vectorize.autovectorize_vector_modes (&modes, true); if (!modes.is_empty ()) diff --git a/gcc/omp-general.h b/gcc/omp-general.h index f37781316269..70f78d2055b7 100644 --- a/gcc/omp-general.h +++ b/gcc/omp-general.h @@ -162,7 +162,7 @@ extern void omp_extract_for_data (gomp_for *for_stmt, struct omp_for_data *fd, struct omp_for_data_loop *loops); extern gimple *omp_build_barrier (tree lhs); extern tree find_combined_omp_for (tree *, int *, void *); -extern poly_uint64 omp_max_vf (void); +extern poly_uint64 omp_max_vf (bool); extern int omp_max_simt_vf (void); extern const char *omp_context_name_list_prop (tree); extern void omp_construct_traits_to_codes (tree, int, enum tree_code *); diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index 44c4310075bf..70a2c108fbca 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -4589,7 +4589,8 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx, { if (known_eq (sctx->max_vf, 0U)) { - sctx->max_vf = sctx->is_simt ? omp_max_simt_vf () : omp_max_vf (); + sctx->max_vf = (sctx->is_simt ? omp_max_simt_vf () + : omp_max_vf (omp_maybe_offloaded_ctx (ctx))); if (maybe_gt (sctx->max_vf, 1U)) { tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),
[gcc r15-4986] openmp: use offload max_vf for chunk_size
https://gcc.gnu.org/g:896c6c28939f0b1eb6582231d24ea07ce01d071e commit r15-4986-g896c6c28939f0b1eb6582231d24ea07ce01d071e Author: Andrew Stubbs Date: Fri Nov 1 13:53:34 2024 + openmp: use offload max_vf for chunk_size The chunk size for SIMD loops should be right for the current device; too big allocates too much memory, too small is inefficient. Getting it wrong doesn't actually break anything though. This patch attempts to choose the optimal setting based on the context. Both host-fallback and device will get the same chunk size, but device performance is the most important in this case. gcc/ChangeLog: * omp-expand.cc (is_in_offload_region): New function. (omp_adjust_chunk_size): Add pass-through "offload" parameter. (get_ws_args_for): Likewise. (determine_parallel_type): Use is_in_offload_region to adjust call to get_ws_args_for. (expand_omp_for_generic): Likewise. (expand_omp_for_static_chunk): Likewise. Diff: --- gcc/omp-expand.cc | 36 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc index 907fd46a5b26..b0f9d375b6c7 100644 --- a/gcc/omp-expand.cc +++ b/gcc/omp-expand.cc @@ -127,6 +127,23 @@ is_combined_parallel (struct omp_region *region) return region->is_combined_parallel; } +/* Return true is REGION is or is contained within an offload region. */ + +static bool +is_in_offload_region (struct omp_region *region) +{ + gimple *entry_stmt = last_nondebug_stmt (region->entry); + if (is_gimple_omp (entry_stmt) + && is_gimple_omp_offloaded (entry_stmt)) +return true; + else if (region->outer) +return is_in_offload_region (region->outer); + else +return (lookup_attribute ("omp declare target", + DECL_ATTRIBUTES (current_function_decl)) + != NULL); +} + /* Given two blocks PAR_ENTRY_BB and WS_ENTRY_BB such that WS_ENTRY_BB is the immediate dominator of PAR_ENTRY_BB, return true if there are no data dependencies that would prevent expanding the parallel @@ -207,12 +224,12 @@ workshare_safe_to_combine_p (basic_block ws_entry_bb) presence (SIMD_SCHEDULE). */ static tree -omp_adjust_chunk_size (tree chunk_size, bool simd_schedule) +omp_adjust_chunk_size (tree chunk_size, bool simd_schedule, bool offload) { if (!simd_schedule || integer_zerop (chunk_size)) return chunk_size; - poly_uint64 vf = omp_max_vf (false); + poly_uint64 vf = omp_max_vf (offload); if (known_eq (vf, 1U)) return chunk_size; @@ -228,7 +245,7 @@ omp_adjust_chunk_size (tree chunk_size, bool simd_schedule) expanded. */ static vec * -get_ws_args_for (gimple *par_stmt, gimple *ws_stmt) +get_ws_args_for (gimple *par_stmt, gimple *ws_stmt, bool offload) { tree t; location_t loc = gimple_location (ws_stmt); @@ -270,7 +287,7 @@ get_ws_args_for (gimple *par_stmt, gimple *ws_stmt) if (fd.chunk_size) { t = fold_convert_loc (loc, long_integer_type_node, fd.chunk_size); - t = omp_adjust_chunk_size (t, fd.simd_schedule); + t = omp_adjust_chunk_size (t, fd.simd_schedule, offload); ws_args->quick_push (t); } @@ -366,7 +383,8 @@ determine_parallel_type (struct omp_region *region) region->is_combined_parallel = true; region->inner->is_combined_parallel = true; - region->ws_args = get_ws_args_for (par_stmt, ws_stmt); + region->ws_args = get_ws_args_for (par_stmt, ws_stmt, +is_in_offload_region (region)); } } @@ -3929,6 +3947,7 @@ expand_omp_for_generic (struct omp_region *region, tree *counts = NULL; int i; bool ordered_lastprivate = false; + bool offload = is_in_offload_region (region); gcc_assert (!broken_loop || !in_combined_parallel); gcc_assert (fd->iter_type == long_integer_type_node @@ -4196,7 +4215,7 @@ expand_omp_for_generic (struct omp_region *region, if (fd->chunk_size) { t = fold_convert (fd->iter_type, fd->chunk_size); - t = omp_adjust_chunk_size (t, fd->simd_schedule); + t = omp_adjust_chunk_size (t, fd->simd_schedule, offload); if (sched_arg) { if (fd->ordered) @@ -4240,7 +4259,7 @@ expand_omp_for_generic (struct omp_region *region, { tree bfn_decl = builtin_decl_explicit (start_fn); t = fold_convert (fd->iter_type, fd->chunk_size); - t = omp_adjust_chunk_size (t, fd->simd_schedule); + t = omp_adjust_chunk_size (t, fd->simd_schedule, offload); if (sched_arg) t = build_call_expr (bfn_decl, 10, t5, t0, t1, t2, sched_arg, t, t3, t4, reductions, mem); @@ -5937,7 +5956,8 @@ expand_omp_for_static_chunk (struct om
[gcc r15-5010] openmp: Fix max_vf testcases with -march=cascadelake
https://gcc.gnu.org/g:4e91d0587200cf801b42abd74a837e0b3ce635d5 commit r15-5010-g4e91d0587200cf801b42abd74a837e0b3ce635d5 Author: Andrew Stubbs Date: Thu Nov 7 11:23:41 2024 + openmp: Fix max_vf testcases with -march=cascadelake Apparently we need to explicitly disable AVX, not just enabled SSE, to guarentee the 16-lane vectors we need for the pattern match. libgomp/ChangeLog: * testsuite/libgomp.c/max_vf-1.c: Add -mno-avx. gcc/testsuite/ChangeLog: * gcc.dg/gomp/max_vf-1.c: Add -mno-avx. Diff: --- gcc/testsuite/gcc.dg/gomp/max_vf-1.c | 2 +- libgomp/testsuite/libgomp.c/max_vf-1.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/gomp/max_vf-1.c b/gcc/testsuite/gcc.dg/gomp/max_vf-1.c index 0513aae226ce..d4617940eb29 100644 --- a/gcc/testsuite/gcc.dg/gomp/max_vf-1.c +++ b/gcc/testsuite/gcc.dg/gomp/max_vf-1.c @@ -5,7 +5,7 @@ /* { dg-options "-fopenmp -O2 -fdump-tree-ompexp" } */ /* Fix a max_vf size so we can scan for it. -{ dg-additional-options "-msse2" { target { x86_64-*-* i?86-*-* } } } */ +{ dg-additional-options "-msse2 -mno-avx" { target { x86_64-*-* i?86-*-* } } } */ #define N 1024 int a[N], b[N], c[N]; diff --git a/libgomp/testsuite/libgomp.c/max_vf-1.c b/libgomp/testsuite/libgomp.c/max_vf-1.c index be900c565a37..9c8d5dc0af97 100644 --- a/libgomp/testsuite/libgomp.c/max_vf-1.c +++ b/libgomp/testsuite/libgomp.c/max_vf-1.c @@ -7,7 +7,7 @@ /* { dg-options "-fopenmp -O2 -fdump-tree-ompexp -foffload=-fdump-tree-optimized" } */ /* Fix a max_vf size so we can scan for it. -{ dg-additional-options "-msse2" { target { x86_64-*-* i?86-*-* } } } */ +{ dg-additional-options "-msse2 -mno-avx" { target { x86_64-*-* i?86-*-* } } } */ #define N 1024 int a[N], b[N], c[N];
[gcc r15-5459] amdgcn: Fix build failure (PR117657)
https://gcc.gnu.org/g:234da38a0e68a204a59562fcca2aa6d297bc21ed commit r15-5459-g234da38a0e68a204a59562fcca2aa6d297bc21ed Author: Andrew Stubbs Date: Tue Nov 19 12:01:22 2024 + amdgcn: Fix build failure (PR117657) The last patch did the right thing to the wrong parameter, which caused a build failure in Newlib. This patch fixes it. gcc/ChangeLog: PR target/117657 * config/gcn/gcn-valu.md (mask_gather_load): Fix bug in maskload else patch. Diff: --- gcc/config/gcn/gcn-valu.md | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index ce7a68f0e2d3..f7ed0b825a16 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -4038,16 +4038,17 @@ if (GET_MODE (addr) == mode) emit_insn (gen_gather_insn_1offset_exec (operands[0], addr, const0_rtx, const0_rtx, +const0_rtx, gcn_gen_undef (mode), -operands[0], exec)); +exec)); else emit_insn (gen_gather_insn_2offsets_exec (operands[0], operands[1], addr, const0_rtx, - const0_rtx, + const0_rtx, const0_rtx, gcn_gen_undef (mode), - operands[0], exec)); + exec)); DONE; })
[gcc/devel/omp/gcc-14] openmp: Fix error reporting in parsing of C++ OpenMP to/from clause
https://gcc.gnu.org/g:9e3aeec74092e91b7f66d2cc5dc5885ef728d5b6 commit 9e3aeec74092e91b7f66d2cc5dc5885ef728d5b6 Author: Kwok Cheung Yeung Date: Mon Oct 7 16:19:39 2024 +0100 openmp: Fix error reporting in parsing of C++ OpenMP to/from clause The final 'else' when checking the motion modifiers is nested one level too deep. This patch should be folded into "OpenMP: Enable 'declare mapper' mappers for 'target update' directives" when pushing upstream. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_from_to): Move an "else" clause to a higher nesting level. Diff: --- gcc/cp/ChangeLog.omp | 6 ++ gcc/cp/parser.cc | 20 ++-- 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/gcc/cp/ChangeLog.omp b/gcc/cp/ChangeLog.omp index f0b50bea7f9f..1136e686b18c 100644 --- a/gcc/cp/ChangeLog.omp +++ b/gcc/cp/ChangeLog.omp @@ -1,3 +1,9 @@ +2024-12-06 Andrew Stubbs + Kwok Cheung Yeung + + * parser.cc (cp_parser_omp_clause_from_to): Move an "else" clause to + a higher nesting level. + 2024-05-15 Jakub Jelinek * semantics.cc (finish_omp_clauses): Diagnose grainsize diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 4157d912039c..f52446c5e46a 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -42058,16 +42058,16 @@ cp_parser_omp_clause_from_to (cp_parser *parser, enum omp_clause_code kind, mapper_modifier = true; pos += 3; } - else - { - cp_parser_error (parser, "% or % clause with " - "modifier other than % or %"); - cp_parser_skip_to_closing_parenthesis (parser, -/*recovering=*/true, -/*or_comma=*/false, -/*consume_paren=*/true); - return list; - } + } + else + { + cp_parser_error (parser, "% or % clause with " + "modifier other than % or %"); + cp_parser_skip_to_closing_parenthesis (parser, +/*recovering=*/true, +/*or_comma=*/false, +/*consume_paren=*/true); + return list; } }
[gcc r16-134] OpenMP, GCN: Add interop-hsa testcase
https://gcc.gnu.org/g:8d84ea28510054fbbb8a2b7441916bd75e29163f commit r16-134-g8d84ea28510054fbbb8a2b7441916bd75e29163f Author: Andrew Stubbs Date: Thu Apr 24 16:50:08 2025 + OpenMP, GCN: Add interop-hsa testcase This testcase ensures that the interop HSA support is sufficient to run a kernel manually on the same device. libgomp/ChangeLog: * testsuite/libgomp.c/interop-hsa.c: New test. Diff: --- libgomp/testsuite/libgomp.c/interop-hsa.c | 203 ++ 1 file changed, 203 insertions(+) diff --git a/libgomp/testsuite/libgomp.c/interop-hsa.c b/libgomp/testsuite/libgomp.c/interop-hsa.c new file mode 100644 index ..cf8bc90bb9c0 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/interop-hsa.c @@ -0,0 +1,203 @@ +/* { dg-additional-options "-ldl" } */ +/* { dg-require-effective-target offload_device_gcn } */ + +#include +#include +#include +#include +#include +#include +#include "../../../include/hsa.h" +#include "../../config/gcn/libgomp-gcn.h" + +#define STACKSIZE (100 * 1024) +#define HEAPSIZE (10 * 1024 * 1024) +#define ARENASIZE HEAPSIZE + +/* This code fragment must be optimized or else the host-fallback kernel has + * invalid ASM inserts. The rest of the file can be compiled safely at -O0. */ +#pragma omp declare target +uintptr_t __attribute__((optimize("O1"))) +get_kernel_ptr () +{ + uintptr_t val; + if (!omp_is_initial_device ()) +/* "main._omp_fn.0" is the name GCC gives the first OpenMP target + * region in the "main" function. + * The ".kd" suffix is added by the LLVM assembler when it creates the + * kernel meta-data, and this is what we need to launch a kernel. */ +asm ("s_getpc_b64 %0\n\t" +"s_add_u32 %L0, %L0, main._omp_fn.0.kd@rel32@lo+4\n\t" +"s_addc_u32 %H0, %H0, main._omp_fn.0.kd@rel32@hi+4" +: "=Sg"(val)); + return val; +} +#pragma omp end declare target + +int +main(int argc, char** argv) +{ + + /* Load the HSA runtime DLL. */ + void *hsalib = dlopen ("libhsa-runtime64.so.1", RTLD_LAZY); + assert (hsalib); + + hsa_status_t (*hsa_signal_create) (hsa_signal_value_t initial_value, +uint32_t num_consumers, +const hsa_agent_t *consumers, +hsa_signal_t *signal) += dlsym (hsalib, "hsa_signal_create"); + assert (hsa_signal_create); + + uint64_t (*hsa_queue_load_write_index_relaxed) (const hsa_queue_t *queue) += dlsym (hsalib, "hsa_queue_load_write_index_relaxed"); + assert (hsa_queue_load_write_index_relaxed); + + void (*hsa_signal_store_relaxed) (hsa_signal_t signal, + hsa_signal_value_t value) += dlsym (hsalib, "hsa_signal_store_relaxed"); + assert (hsa_signal_store_relaxed); + + hsa_signal_value_t (*hsa_signal_wait_relaxed) (hsa_signal_t signal, +hsa_signal_condition_t condition, +hsa_signal_value_t compare_value, +uint64_t timeout_hint, +hsa_wait_state_t wait_state_hint) += dlsym (hsalib, "hsa_signal_wait_relaxed"); + assert (hsa_signal_wait_relaxed); + + void (*hsa_queue_store_write_index_relaxed) (const hsa_queue_t *queue, + uint64_t value) += dlsym (hsalib, "hsa_queue_store_write_index_relaxed"); + assert (hsa_queue_store_write_index_relaxed); + + hsa_status_t (*hsa_signal_destroy) (hsa_signal_t signal) += dlsym (hsalib, "hsa_signal_destroy"); + assert (hsa_signal_destroy); + + /* Set up the device data environment. */ + int test_data_value = 0; +#pragma omp target enter data map(test_data_value) + + /* Get the interop details. */ + int device_num = omp_get_default_device(); + hsa_agent_t *gpu_agent; + hsa_queue_t *hsa_queue = NULL; + + omp_interop_t interop = omp_interop_none; +#pragma omp interop init(target, targetsync, prefer_type("hsa"): interop) device(device_num) + assert (interop != omp_interop_none); + + omp_interop_rc_t retcode; + omp_interop_fr_t fr = omp_get_interop_int (interop, omp_ipr_fr_id, &retcode); + assert (retcode == omp_irc_success); + assert (fr == omp_ifr_hsa); + + gpu_agent = omp_get_interop_ptr(interop, omp_ipr_device, &retcode); + assert (retcode == omp_irc_success); + + hsa_queue = omp_get_interop_ptr(interop, omp_ipr_targetsync, &retcode); + assert (retcode == omp_irc_success); + assert (hsa_queue); + + /* Call an offload kernel via OpenMP/libgomp. + * + * This kernel serves two purposes: + * 1) Lookup the device-side load-address of itself (thus avoiding the + * need to access the libgomp internals). + * 2) Count how many times it is called. + * We then call it once using OpenMP, and once manually, and check + * the counter reads "2". */ + uint64_t kernel_object =
[gcc/devel/omp/gcc-14] OpenMP, GCN: Add interop-hsa testcase
https://gcc.gnu.org/g:33e01148ab0ed5fba2a5ac380bbba2e90629d7fd commit 33e01148ab0ed5fba2a5ac380bbba2e90629d7fd Author: Andrew Stubbs Date: Thu Apr 24 16:50:08 2025 + OpenMP, GCN: Add interop-hsa testcase This testcase ensures that the interop HSA support is sufficient to run a kernel manually on the same device. libgomp/ChangeLog: * testsuite/libgomp.c/interop-hsa.c: New test. (cherry picked from commit 8d84ea28510054fbbb8a2b7441916bd75e29163f) Diff: --- libgomp/testsuite/libgomp.c/interop-hsa.c | 203 ++ 1 file changed, 203 insertions(+) diff --git a/libgomp/testsuite/libgomp.c/interop-hsa.c b/libgomp/testsuite/libgomp.c/interop-hsa.c new file mode 100644 index ..cf8bc90bb9c0 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/interop-hsa.c @@ -0,0 +1,203 @@ +/* { dg-additional-options "-ldl" } */ +/* { dg-require-effective-target offload_device_gcn } */ + +#include +#include +#include +#include +#include +#include +#include "../../../include/hsa.h" +#include "../../config/gcn/libgomp-gcn.h" + +#define STACKSIZE (100 * 1024) +#define HEAPSIZE (10 * 1024 * 1024) +#define ARENASIZE HEAPSIZE + +/* This code fragment must be optimized or else the host-fallback kernel has + * invalid ASM inserts. The rest of the file can be compiled safely at -O0. */ +#pragma omp declare target +uintptr_t __attribute__((optimize("O1"))) +get_kernel_ptr () +{ + uintptr_t val; + if (!omp_is_initial_device ()) +/* "main._omp_fn.0" is the name GCC gives the first OpenMP target + * region in the "main" function. + * The ".kd" suffix is added by the LLVM assembler when it creates the + * kernel meta-data, and this is what we need to launch a kernel. */ +asm ("s_getpc_b64 %0\n\t" +"s_add_u32 %L0, %L0, main._omp_fn.0.kd@rel32@lo+4\n\t" +"s_addc_u32 %H0, %H0, main._omp_fn.0.kd@rel32@hi+4" +: "=Sg"(val)); + return val; +} +#pragma omp end declare target + +int +main(int argc, char** argv) +{ + + /* Load the HSA runtime DLL. */ + void *hsalib = dlopen ("libhsa-runtime64.so.1", RTLD_LAZY); + assert (hsalib); + + hsa_status_t (*hsa_signal_create) (hsa_signal_value_t initial_value, +uint32_t num_consumers, +const hsa_agent_t *consumers, +hsa_signal_t *signal) += dlsym (hsalib, "hsa_signal_create"); + assert (hsa_signal_create); + + uint64_t (*hsa_queue_load_write_index_relaxed) (const hsa_queue_t *queue) += dlsym (hsalib, "hsa_queue_load_write_index_relaxed"); + assert (hsa_queue_load_write_index_relaxed); + + void (*hsa_signal_store_relaxed) (hsa_signal_t signal, + hsa_signal_value_t value) += dlsym (hsalib, "hsa_signal_store_relaxed"); + assert (hsa_signal_store_relaxed); + + hsa_signal_value_t (*hsa_signal_wait_relaxed) (hsa_signal_t signal, +hsa_signal_condition_t condition, +hsa_signal_value_t compare_value, +uint64_t timeout_hint, +hsa_wait_state_t wait_state_hint) += dlsym (hsalib, "hsa_signal_wait_relaxed"); + assert (hsa_signal_wait_relaxed); + + void (*hsa_queue_store_write_index_relaxed) (const hsa_queue_t *queue, + uint64_t value) += dlsym (hsalib, "hsa_queue_store_write_index_relaxed"); + assert (hsa_queue_store_write_index_relaxed); + + hsa_status_t (*hsa_signal_destroy) (hsa_signal_t signal) += dlsym (hsalib, "hsa_signal_destroy"); + assert (hsa_signal_destroy); + + /* Set up the device data environment. */ + int test_data_value = 0; +#pragma omp target enter data map(test_data_value) + + /* Get the interop details. */ + int device_num = omp_get_default_device(); + hsa_agent_t *gpu_agent; + hsa_queue_t *hsa_queue = NULL; + + omp_interop_t interop = omp_interop_none; +#pragma omp interop init(target, targetsync, prefer_type("hsa"): interop) device(device_num) + assert (interop != omp_interop_none); + + omp_interop_rc_t retcode; + omp_interop_fr_t fr = omp_get_interop_int (interop, omp_ipr_fr_id, &retcode); + assert (retcode == omp_irc_success); + assert (fr == omp_ifr_hsa); + + gpu_agent = omp_get_interop_ptr(interop, omp_ipr_device, &retcode); + assert (retcode == omp_irc_success); + + hsa_queue = omp_get_interop_ptr(interop, omp_ipr_targetsync, &retcode); + assert (retcode == omp_irc_success); + assert (hsa_queue); + + /* Call an offload kernel via OpenMP/libgomp. + * + * This kernel serves two purposes: + * 1) Lookup the device-side load-address of itself (thus avoiding the + * need to access the libgomp internals). + * 2) Count how many times it is called. + * We then call it once using OpenMP, and once manually, an
[gcc/devel/omp/gcc-14] libgomp.texi: Document supported OpenMP 'interop' types for nvptx and gcn
https://gcc.gnu.org/g:0572eb1918b4de3a27a24cf0a21c9b71aea7c5f7 commit 0572eb1918b4de3a27a24cf0a21c9b71aea7c5f7 Author: Tobias Burnus Date: Wed Mar 26 11:27:56 2025 +0100 libgomp.texi: Document supported OpenMP 'interop' types for nvptx and gcn Note that this commit also updates the API interface to OpenMP 6.0; while 5.1 and 5.2 use 'int *' for the the ret_code argument, OpenMP 6.0 changed this to omp_interop_rc_t *; this enum also exists in OpenMP 5.1. However, C++ does not like this change such that unless NULL is passed (i.e. the argument is ignored), OpenMP 5.x and 6.x are not compatible. Note that GCC's omp.h already follows OpenMP 6.0 and is now in sync with the documentation. libgomp/ChangeLog: * libgomp.texi (OpenMP 5.1): Add @ref to offload-target specifics for 'interop'. (OpenMP 6.0): Mark dispatch's interop clause as implemented. (omp_get_interop_int, omp_get_interop_str, omp_get_interop_ptr, omp_get_interop_type_desc): Add @ref to Offload-Target Specifics; change ret_code argument type to 'omp_interop_rc_t *'. (Offload-Target Specifics): Document the supported OpenMP interop foreign runtimes on AMD and Nvidia GPUs. (cherry picked from commit 2e7c1b589bc58be0e155098cf87d8535d41adeab) Diff: --- libgomp/libgomp.texi | 170 --- 1 file changed, 161 insertions(+), 9 deletions(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index e075cf1cfb98..42dabfc80562 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -313,7 +313,7 @@ The OpenMP 4.5 specification is fully supported. clauses @tab N @tab @item Indirect calls to the device version of a procedure or function in @code{target} regions @tab Y @tab -@item @code{interop} directive @tab N @tab +@item @code{interop} directive @tab Y @tab Cf. @ref{Offload-Target Specifics} @item @code{omp_interop_t} object support in runtime routines @tab Y @tab @item @code{nowait} clause in @code{taskwait} directive @tab Y @tab @item Extensions to the @code{atomic} directive @tab Y @tab @@ -516,7 +516,7 @@ Technical Report (TR) 12 is the second preview for OpenMP 6.0. @item Extension of @code{interop} operation of @code{append_args}, allowing all modifiers of the @code{init} clause @tab N @tab -@item @code{interop} clause to @code{dispatch} @tab N @tab +@item @code{interop} clause to @code{dispatch} @tab Y @tab @item @code{message} and @code{severity} clauses to @code{parallel} directive @tab N @tab @item @code{self} clause to @code{requires} directive @tab N @tab @@ -2945,7 +2945,7 @@ the initial device is unspecified. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{omp_intptr_t omp_get_interop_int(const omp_interop_t interop, - omp_interop_property_t property_id, int *ret_code)} + omp_interop_property_t property_id, omp_interop_rc_t *ret_code)} @end multitable @item @emph{Fortran}: @@ -2959,7 +2959,8 @@ the initial device is unspecified. @end multitable @item @emph{See also}: -@ref{omp_get_interop_ptr}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc} +@ref{omp_get_interop_ptr}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc}, +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.2, @@ -2990,7 +2991,7 @@ the initial device is unspecified. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{void *omp_get_interop_ptr(const omp_interop_t interop, - omp_interop_property_t property_id, int *ret_code)} + omp_interop_property_t property_id, omp_interop_rc_t *ret_code)} @end multitable @item @emph{Fortran}: @@ -3004,7 +3005,8 @@ the initial device is unspecified. @end multitable @item @emph{See also}: -@ref{omp_get_interop_int}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc} +@ref{omp_get_interop_int}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc}, +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.3, @@ -3034,7 +3036,7 @@ the initial device is unspecified. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{const char *omp_get_interop_str(const omp_interop_t interop, - omp_interop_property_t property_id, int *ret_code)} + omp_interop_property_t property_id, omp_interop_rc_t *ret_code)} @end multitable @item @emph{Fortran}: @@ -3048,7 +3050,8 @@ the initial device is unspecified. @end multitable @item @emph{See also}: -@ref{omp
[gcc/devel/omp/gcc-14] libgomp.texi: For HIP interop, mention cpp defines to set
https://gcc.gnu.org/g:fd91f571f6e986f84f09f35139ff0650caa669d6 commit fd91f571f6e986f84f09f35139ff0650caa669d6 Author: Tobias Burnus Date: Thu Apr 17 10:21:05 2025 +0200 libgomp.texi: For HIP interop, mention cpp defines to set The HIP header files recognize the used compiler, defaulting to either AMD or Nvidia/CUDA; thus, the alternative way of explicitly defining a macro is less prominently documented. With GCC, the user has to define the preprocessor macro manually. Hence, as a service to the user, mention __HIP_PLATFORM_AMD__ and __HIP_PLATFORM_NVIDIA__ in the interop documentation, even though it has only indirectly to do with GCC and its interop support. Note to commit-log readers, only: For Fortran, the hipfort modules can be used; when compiling the hipfort package (defaults to use gfortran), it generates the module (*.mod) files in include/hipfort/{amdgcn,nvidia}/ such that the choice is made by setting the respective include path. libgomp/ChangeLog: * libgomp.texi (gcn interop, nvptx interop): For HIP with C/C++, add a note about setting a preprocessor define. (cherry picked from commit 4bff3f0b89af9a9aad69b8f85859c0a3667533ae) Diff: --- libgomp/libgomp.texi | 6 ++ 1 file changed, 6 insertions(+) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 42dabfc80562..32d651498afa 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -6828,6 +6828,9 @@ or string (str) data type, call @code{omp_get_interop_int}, Note that @code{device_num} is the OpenMP device number while @code{device} is the HIP device number or HSA device handle. +When using HIP with C and C++, the @code{__HIP_PLATFORM_AMD__} preprocessor +macro must be defined before including the HIP header files. + For the API routine call, add the prefix @code{omp_ipr_} to the property name; for instance: @smallexample @@ -6990,6 +6993,9 @@ or string (str) data type, call @code{omp_get_interop_int}, Note that @code{device_num} is the OpenMP device number while @code{device} is the CUDA, CUDA Driver, or HIP device number. +When using HIP with C and C++, the @code{__HIP_PLATFORM_NVIDIA__} preprocessor +macro must be defined before including the HIP header files. + For the API routine call, add the prefix @code{omp_ipr_} to the property name; for instance: @smallexample
[gcc/devel/omp/gcc-14] OpenMP: Silence uninitialized variable warning in C++ front end.
https://gcc.gnu.org/g:a9f1e49fa11de85cdc55ee22ea3c021157e07719 commit a9f1e49fa11de85cdc55ee22ea3c021157e07719 Author: Sandra Loosemore Date: Sat Feb 22 16:54:50 2025 + OpenMP: Silence uninitialized variable warning in C++ front end. There's no actual problem with the code here, just a false-positive warning emitted by some older GCC versions. gcc/cp/ChangeLog * parser.cc (cp_finish_omp_declare_variant): Initialize append_args_last. (cherry picked from commit c978965b445079abbb88c22ba74de1e26e9f5b81) Diff: --- gcc/cp/parser.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 2faf82ef12d4..747209fc77f1 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -51270,7 +51270,7 @@ cp_finish_omp_declare_variant (cp_parser *parser, cp_token *pragma_tok, location_t varid_loc = make_location (caret_loc, start_loc, finish_loc); tree append_args_tree = NULL_TREE; - tree append_args_last; + tree append_args_last = NULL_TREE; bool has_match = false, has_adjust_args = false; location_t adjust_args_loc = UNKNOWN_LOCATION; location_t append_args_loc = UNKNOWN_LOCATION;