[PATCH] OpenMP front-end: allow requires dynamic_allocators
Hi all, This patch removes the "sorry" message for the OpenMP "requires dynamic_allocators" feature in C, C++ and Fortran. The clause is supposed to state that the user code will not work without the omp_alloc/omp_free and omp_init_allocator/omp_destroy_allocator and these things *are* present, so there should be no problem allowing it. I can't see any reason for our implementation to *do* anything with this statement -- it's fine for the allocators to work the same with or without it. I think this patch ought to be fine for GCC 12, but if not it can wait until stage 1 opens. I shall backport this to OG11 shortly. OK? AndrewOpenMP: allow requires dynamic_allocators There's no need to reject the dynamic_allocators requires directive because we actually do support the feature, and it doesn't have to actually "do" anything. gcc/c/ChangeLog: * c-parser.c (c_parser_omp_requires): Don't "sorry" dynamic_allocators. gcc/cp/ChangeLog: * parser.c (cp_parser_omp_requires): Don't "sorry" dynamic_allocators. gcc/fortran/ChangeLog: * openmp.c (gfc_match_omp_requires): Don't "sorry" dynamic_allocators. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/requires-8.f90: Reinstate dynamic allocators requirement. diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index d7e5f051ac0..808a73f1038 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -22581,7 +22581,7 @@ c_parser_omp_requires (c_parser *parser) c_parser_skip_to_pragma_eol (parser, false); return; } - if (p) + if (p && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS) sorry_at (cloc, "%qs clause on % directive not " "supported yet", p); if (p) diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index c2564e51e41..d55c9675c4e 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -46421,7 +46421,7 @@ cp_parser_omp_requires (cp_parser *parser, cp_token *pragma_tok) cp_parser_skip_to_pragma_eol (parser, pragma_tok); return false; } - if (p) + if (p && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS) sorry_at (cloc, "%qs clause on % directive not " "supported yet", p); if (p) diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index 2036bc1349f..f47a44f7539 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -5373,7 +5373,8 @@ gfc_match_omp_requires (void) else goto error; - if (requires_clause & ~OMP_REQ_ATOMIC_MEM_ORDER_MASK) + if (requires_clause & ~(OMP_REQ_ATOMIC_MEM_ORDER_MASK + | OMP_REQ_DYNAMIC_ALLOCATORS)) gfc_error_now ("Sorry, %qs clause at %L on REQUIRES directive is not " "yet supported", clause, &old_loc); if (!gfc_omp_requires_add_clause (requires_clause, clause, &old_loc, NULL)) diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-8.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-8.f90 index 3c32ae9860e..eadfcaf8609 100644 --- a/gcc/testsuite/gfortran.dg/gomp/requires-8.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/requires-8.f90 @@ -4,7 +4,7 @@ contains subroutine foo interface subroutine bar2 - !$!omp requires dynamic_allocators + !$omp requires dynamic_allocators end subroutine end interface !$omp target
Re: [PATCH 5/5] openmp: -foffload-memory=pinned
On 08/03/2022 11:30, Hafiz Abid Qadeer wrote: gcc/ChangeLog: * omp-low.cc (omp_enable_pinned_mode): New function. (execute_lower_omp): Call omp_enable_pinned_mode. This worked for x86_64, but I needed to make the attached adjustment to work on powerpc without a linker error. I've committed it to OG11. Andrewopenmp: BUILT_IN_GOMP_ENABLE_PINNED_MODE Rework the GOMP_enable_pinned_mode call so that it works on powerpc where the old way gave a local call. gcc/ChangeLog: * omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New. * omp-low.c (omp_enable_pinned_mode): Use BUILT_IN_GOMP_ENABLE_PINNED_MODE. diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def index c591d79fa07..e442b0b5c94 100644 --- a/gcc/omp-builtins.def +++ b/gcc/omp-builtins.def @@ -468,3 +468,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ERROR, "GOMP_error", DEF_GOMP_BUILTIN (BUILT_IN_GOMP_EVALUATE_TARGET_DEVICE, "GOMP_evaluate_target_device", BT_FN_BOOL_INT_CONST_PTR_CONST_PTR_CONST_PTR, ATTR_NOTHROW_LEAF_LIST) +DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ENABLE_PINNED_MODE, + "GOMP_enable_pinned_mode", + BT_FN_VOID, ATTR_NOTHROW_LIST) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index f7ecfb52c73..4e8ab9e4ca0 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -15432,9 +15432,7 @@ omp_enable_pinned_mode () push_struct_function (decl); init_tree_ssa (cfun); - tree callname = get_identifier ("GOMP_enable_pinned_mode"); - tree calldecl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, callname, - voidfntype); + tree calldecl = builtin_decl_explicit (BUILT_IN_GOMP_ENABLE_PINNED_MODE); gcall *call = gimple_build_call (calldecl, 0); gimple_seq seq = NULL;
Re: [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory.
On 08/03/2022 11:30, Hafiz Abid Qadeer wrote: This patches changes calls to malloc/free/calloc/realloc and operator new to memory allocation functions in libgomp with allocator=ompx_unified_shared_mem_alloc. This additional patch adds transformation for omp_target_alloc. The OpenMP 5.0 document says that addresses allocated this way needs to work without is_device_ptr. The easiest way to make that work is to make them USM addresses. I will commit this to OG11 shortly. Andrewopenmp: Do USM transform for omp_target_alloc OpenMP 5.0 says that omp_target_alloc should return USM addresses. gcc/ChangeLog: * omp-low.c (usm_transform): Transform omp_target_alloc and omp_target_free. libgomp/ChangeLog: * testsuite/libgomp.c/usm-6.c: Add omp_target_alloc. gcc/testsuite/ChangeLog: * c-c++-common/gomp/usm-2.c: Add omp_target_alloc. * c-c++-common/gomp/usm-3.c: Add omp_target_alloc. diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 4e8ab9e4ca0..9235eafd1d7 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -15880,7 +15880,8 @@ usm_transform (gimple_stmt_iterator *gsi_p, bool *, if ((strcmp (name, "malloc") == 0) || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL) && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MALLOC) -|| DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl)) +|| DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl) +|| strcmp (name, "omp_target_alloc") == 0) { tree omp_alloc_type = build_function_type_list (ptr_type_node, size_type_node, @@ -15952,7 +15953,8 @@ usm_transform (gimple_stmt_iterator *gsi_p, bool *, || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL) && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FREE) || (DECL_IS_OPERATOR_DELETE_P (fndecl) - && DECL_IS_REPLACEABLE_OPERATOR (fndecl))) + && DECL_IS_REPLACEABLE_OPERATOR (fndecl)) + || strcmp (name, "omp_target_free") == 0) { tree omp_free_type = build_function_type_list (void_type_node, ptr_type_node, diff --git a/gcc/testsuite/c-c++-common/gomp/usm-2.c b/gcc/testsuite/c-c++-common/gomp/usm-2.c index 64dbb6be131..8c20ef94e69 100644 --- a/gcc/testsuite/c-c++-common/gomp/usm-2.c +++ b/gcc/testsuite/c-c++-common/gomp/usm-2.c @@ -12,6 +12,8 @@ void *aligned_alloc (__SIZE_TYPE__, __SIZE_TYPE__); void *calloc(__SIZE_TYPE__, __SIZE_TYPE__); void *realloc(void *, __SIZE_TYPE__); void free (void *); +void *omp_target_alloc (__SIZE_TYPE__, int); +void omp_target_free (void *, int); #ifdef __cplusplus } @@ -24,16 +26,21 @@ foo () void *p2 = realloc(p1, 30); void *p3 = calloc(4, 15); void *p4 = aligned_alloc(16, 40); + void *p5 = omp_target_alloc(50, 1); free (p2); free (p3); free (p4); + omp_target_free (p5, 1); } /* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform" } } */ /* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 "usm_transform" } } */ /* { dg-final { scan-tree-dump-times "omp_calloc \\(4, 15, 10\\)" 1 "usm_transform" } } */ /* { dg-final { scan-tree-dump-times "omp_aligned_alloc \\(16, 40, 10\\)" 1 "usm_transform" } } */ -/* { dg-final { scan-tree-dump-times "omp_free" 3 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_alloc \\(50, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform" } } */ /* { dg-final { scan-tree-dump-not " free" "usm_transform" } } */ /* { dg-final { scan-tree-dump-not " aligned_alloc" "usm_transform" } } */ /* { dg-final { scan-tree-dump-not " malloc" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " omp_target_alloc" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " omp_target_free" "usm_transform" } } */ diff --git a/gcc/testsuite/c-c++-common/gomp/usm-3.c b/gcc/testsuite/c-c++-common/gomp/usm-3.c index 934582ea5fd..2b0cbb45e27 100644 --- a/gcc/testsuite/c-c++-common/gomp/usm-3.c +++ b/gcc/testsuite/c-c++-common/gomp/usm-3.c @@ -10,6 +10,8 @@ void *aligned_alloc (__SIZE_TYPE__, __SIZE_TYPE__); void *calloc(__SIZE_TYPE__, __SIZE_TYPE__); void *realloc(void *, __SIZE_TYPE__); void free (void *); +void *omp_target_alloc (__SIZE_TYPE__, int); +void omp_target_free (void *, int); #ifdef __cplusplus } @@ -22,16 +24,21 @@ foo () void *p2 = realloc(p1, 30); void *p3 = calloc(4, 15); void *p4 = aligned_alloc(16, 40); + void *p5 = omp_target_alloc(50, 1); free (p2); free (p3); free (p4); + omp_target_free (p5, 1); } /* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform" } } */ /* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 "usm_transform" } } */ /* { dg-final { scan-tree-dump-times "omp_calloc
Re: [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory.
On 02/04/2022 13:04, Andrew Stubbs wrote: This additional patch adds transformation for omp_target_alloc. The OpenMP 5.0 document says that addresses allocated this way needs to work without is_device_ptr. The easiest way to make that work is to make them USM addresses. Actually, reading on, it says "Every device address allocated through OpenMP device memory routines is a valid host pointer", so USM is the correct answer. Andrew
Re: [PATCH 0/5] openmp: Handle pinned and unified shared memory.
This patch adjusts the testcases, previously proposed, to allow for testing on machines with varying page sizes and default amounts of lockable memory. There turns out to be more variation than I had thought. This should go on mainline at the same time as the previous patches in this thread. I'll commit it to OG11 shortly. Andrewlibgomp: autodetect page sizes in pinned memory tests There's not one number that works everywhere. This also fixes the failure mode on non-Linux hosts. libgomp/ChangeLog: * testsuite/libgomp.c/alloc-pinned-1.c: Autodetect page size. * testsuite/libgomp.c/alloc-pinned-2.c: Likewise. * testsuite/libgomp.c/alloc-pinned-3.c: Likewise. * testsuite/libgomp.c/alloc-pinned-4.c: Likewise. * testsuite/libgomp.c/alloc-pinned-5.c: Likewise. * testsuite/libgomp.c/alloc-pinned-6.c: Likewise. * testsuite/libgomp.c/alloc-pinned-7.c: Clean up. diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c index 0a6360cda29..79792b16d83 100644 --- a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c @@ -4,13 +4,23 @@ /* Test that pinned memory works. */ +#include +#include + #ifdef __linux__ #include #include -#include -#include #include +#include + +#define PAGE_SIZE sysconf(_SC_PAGESIZE) +#define CHECK_SIZE(SIZE) { \ + struct rlimit limit; \ + if (getrlimit (RLIMIT_MEMLOCK, &limit) \ + || limit.rlim_cur <= SIZE) \ +fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \ + } int get_pinned_mem () @@ -34,6 +44,9 @@ get_pinned_mem () abort (); } #else +#define PAGE_SIZE 1 /* unknown */ +#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n"); + int get_pinned_mem () { @@ -43,12 +56,13 @@ get_pinned_mem () #include -/* Allocate more than a page each time, but stay within the ulimit. */ -#define SIZE 10*1024 - int main () { + /* Allocate at least a page each time, but stay within the ulimit. */ + const int SIZE = PAGE_SIZE; + CHECK_SIZE (SIZE*3); + const omp_alloctrait_t traits[] = { { omp_atk_pinned, 1 } }; diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c index 8fdb4ff5cfd..228c656b715 100644 --- a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c @@ -4,13 +4,23 @@ /* Test that pinned memory works (pool_size code path). */ +#include +#include + #ifdef __linux__ #include #include -#include -#include #include +#include + +#define PAGE_SIZE sysconf(_SC_PAGESIZE) +#define CHECK_SIZE(SIZE) { \ + struct rlimit limit; \ + if (getrlimit (RLIMIT_MEMLOCK, &limit) \ + || limit.rlim_cur <= SIZE) \ +fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \ + } int get_pinned_mem () @@ -34,6 +44,9 @@ get_pinned_mem () abort (); } #else +#define PAGE_SIZE 1 /* unknown */ +#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n"); + int get_pinned_mem () { @@ -43,12 +56,13 @@ get_pinned_mem () #include -/* Allocate more than a page each time, but stay within the ulimit. */ -#define SIZE 10*1024 - int main () { + /* Allocate at least a page each time, but stay within the ulimit. */ + const int SIZE = PAGE_SIZE; + CHECK_SIZE (SIZE*3); + const omp_alloctrait_t traits[] = { { omp_atk_pinned, 1 }, { omp_atk_pool_size, SIZE*8 } diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-3.c b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c index 943dfea5c9b..90539ffe3e0 100644 --- a/libgomp/testsuite/libgomp.c/alloc-pinned-3.c +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c @@ -2,15 +2,18 @@ /* Test that pinned memory fails correctly. */ +#include +#include + #ifdef __linux__ #include #include -#include -#include #include #include +#define PAGE_SIZE sysconf(_SC_PAGESIZE) + int get_pinned_mem () { @@ -45,6 +48,8 @@ set_pin_limit (int size) } #else int +#define PAGE_SIZE 1*1024 /* unknown */ + get_pinned_mem () { return 0; @@ -58,12 +63,12 @@ set_pin_limit () #include -/* This should be large enough to cover multiple pages. */ -#define SIZE 1*1024 - int main () { + /* This needs to be large enough to cover multiple pages. */ + const int SIZE = PAGE_SIZE*4; + /* Pinned memory, no fallback. */ const omp_alloctrait_t traits1[] = { { omp_atk_pinned, 1 }, diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-4.c b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c index d9cb8dfe1fd..534e49eefc4 100644 --- a/libgomp/testsuite/libgomp.c/alloc-pinned-4.c +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c @@ -2,15 +2,18 @@ /* Test that pinned memory fails correctly, pool_size code path. */ +#include +#include + #ifdef __linux__ #include #include -#include -#include #include #include +#define PAGE_SIZE
[PATCH] openmp: Handle unified address memory.
This patch adds enough support for "requires unified_address" to make the sollve_vv testcases pass. It implements unified_address as a synonym of unified_shared_memory, which is both valid and the only way I know of to unify addresses with Cuda (could be wrong). This patch should be applied on to of the previous patch set for USM. OK for stage 1? I'll apply it to OG11 shortly. Andrewopenmp: unified_address support This makes "requires unified_address" work by making it eqivalent to "requires unified_shared_memory". This is more than is strictly necessary, but should be standard compliant. gcc/c/ChangeLog: * c-parser.c (c_parser_omp_requires): Check requires unified_address for conflict with -foffload-memory=shared. gcc/cp/ChangeLog: * parser.c (cp_parser_omp_requires): Check requires unified_address for conflict with -foffload-memory=shared. gcc/fortran/ChangeLog: * openmp.c (gfc_match_omp_requires): Check requires unified_address for conflict with -foffload-memory=shared. gcc/ChangeLog: * omp-low.c: Do USM transformations for "unified_address". gcc/testsuite/ChangeLog: * c-c++-common/gomp/usm-4.c: New test. * gfortran.dg/gomp/usm-4.f90: New test. diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 12408770193..9a3d0cb8cea 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -22531,18 +22531,27 @@ c_parser_omp_requires (c_parser *parser) enum omp_requires this_req = (enum omp_requires) 0; if (!strcmp (p, "unified_address")) - this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + { + this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "unified_address is incompatible with the " + "selected -foffload-memory option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "unified_shared_memory")) - { - this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; - - if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED - && flag_offload_memory != OFFLOAD_MEMORY_NONE) - error_at (cloc, - "unified_shared_memory is incompatible with the " - "selected -foffload-memory option"); - flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; - } + { + this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "unified_shared_memory is incompatible with the " + "selected -foffload-memory option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "dynamic_allocators")) this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS; else if (!strcmp (p, "reverse_offload")) diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index fd9f62f4543..3a9ea272f10 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -46406,18 +46406,27 @@ cp_parser_omp_requires (cp_parser *parser, cp_token *pragma_tok) enum omp_requires this_req = (enum omp_requires) 0; if (!strcmp (p, "unified_address")) - this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + { + this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "unified_address is incompatible with the " + "selected -foffload-memory option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "unified_shared_memory")) - { - this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; - - if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED - && flag_offload_memory != OFFLOAD_MEMORY_NONE) - error_at (cloc, - "unified_shared_memory is incompatible with the " - "selected -foffload-memory option"); - flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; - } + { + this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "unified_shared_memory is incompatible with the " + "selected -foffload-memory option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "dynamic_allocators")) this_r
Re: Test with an lto-build of libgfortran.
On 28/09/2023 20:59, Toon Moene wrote: On 9/28/23 21:26, Jakub Jelinek wrote: It is worse than that, usually the LTO format changes e.g. any time any option or parameter is added on a release branch (several times a year) and at other times as well. Though, admittedly GCC is the single package that actually could get away with LTO in lib*.a libraries, at least in some packagings (if the static libraries are in gcc specific subdirectories rather than say /usr/lib{,64} or similar and if the packaging of gcc updates both the compiler and corresponding static libraries in a lock-step. Because in that case LTO in there will be always used only by the same snapshot from the release branch and so should be compatible with the LTO in it. This might be an argument to make it a configure option, e.g. --enable-lto-runtime. This sort of thing should definitely Just Work for cross compilers and embedded platforms where the libraries are bundled with the compiler. Andrew
Re: [PATCH 2/5] amdgcn: Add [us]mulsi3_highpart SGPR alternatives & [us]mulsid3/muldi3 expanders
On 18/06/2021 15:19, Julian Brown wrote: This patch improves 64-bit multiplication for AMD GCN: patterns for unsigned and signed 32x32->64 bit multiplication have been added, and also 64x64->64 bit multiplication is now open-coded rather than calling a library function (which may be a win for code size as well as speed: the function calling sequence isn't particularly concise for GCN). The mulsi3_highpart pattern has also been extended for GCN5+, since that ISA version supports high-part result multiply instructions with SGPR operands. The DImode multiply implementation is lost from libgcc if we build it for DImode/TImode rather than SImode/DImode, a change we make in a later patch in this series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown gcc/ * config/gcn/gcn.md (mulsi3_highpart): Add SGPR alternatives for GCN5+. (mulsidi3, muldi3): Add expanders. --- gcc/config/gcn/gcn.md | 55 ++- 1 file changed, 49 insertions(+), 6 deletions(-) diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index b5f895a93e2..70655ca4b8b 100644 --- a/gcc/config/gcn/gcn.md +++ b/gcc/config/gcn/gcn.md @@ -1392,19 +1392,62 @@ (define_code_attr e [(sign_extend "e") (zero_extend "")]) (define_insn "mulsi3_highpart" - [(set (match_operand:SI 0 "register_operand" "= v") + [(set (match_operand:SI 0 "register_operand" "=Sg, Sg, v") (truncate:SI (lshiftrt:DI (mult:DI (any_extend:DI - (match_operand:SI 1 "register_operand" "% v")) + (match_operand:SI 1 "register_operand" "%SgA,SgA, v")) (any_extend:DI - (match_operand:SI 2 "register_operand" "vSv"))) + (match_operand:SI 2 "register_operand" "SgA, B,vSv"))) (const_int 32] "" - "v_mul_hi0\t%0, %2, %1" - [(set_attr "type" "vop3a") - (set_attr "length" "8")]) + "@ + s_mul_hi0\t%0, %1, %2 + s_mul_hi0\t%0, %1, %2 + v_mul_hi0\t%0, %2, %1" + [(set_attr "type" "sop2,sop2,vop3a") + (set_attr "length" "4,8,8") + (set_attr "gcn_version" "gcn5,gcn5,*")]) + +(define_expand "mulsidi3" + [(set (match_operand:DI 0 "register_operand" "") + (mult:DI + (any_extend:DI (match_operand:SI 1 "register_operand" "")) + (any_extend:DI (match_operand:SI 2 "register_operand" ""] + "" + { +rtx dst = gen_reg_rtx (DImode); +rtx dstlo = gen_lowpart (SImode, dst); +rtx dsthi = gen_highpart_mode (SImode, DImode, dst); +emit_insn (gen_mulsi3 (dstlo, operands[1], operands[2])); +emit_insn (gen_mulsi3_highpart (dsthi, operands[1], operands[2])); +emit_move_insn (operands[0], dst); +DONE; + }) + +(define_expand "muldi3" + [(set (match_operand:DI 0 "register_operand" "") + (mult:DI (match_operand:DI 1 "register_operand" "") +(match_operand:DI 2 "register_operand" "")))] + "" + { +rtx tmp0 = gen_reg_rtx (SImode); +rtx tmp1 = gen_reg_rtx (SImode); +rtx dst = gen_reg_rtx (DImode); +rtx dsthi = gen_highpart_mode (SImode, DImode, dst); +rtx op1lo = gen_lowpart (SImode, operands[1]); +rtx op1hi = gen_highpart_mode (SImode, DImode, operands[1]); +rtx op2lo = gen_lowpart (SImode, operands[2]); +rtx op2hi = gen_highpart_mode (SImode, DImode, operands[2]); +emit_insn (gen_umulsidi3 (dst, op1lo, op2lo)); +emit_insn (gen_mulsi3 (tmp0, op1lo, op2hi)); +emit_insn (gen_addsi3 (dsthi, dsthi, tmp0)); +emit_insn (gen_mulsi3 (tmp1, op1hi, op2lo)); +emit_insn (gen_addsi3 (dsthi, dsthi, tmp1)); +emit_move_insn (operands[0], dst); +DONE; + }) (define_insn "mulhisi3" [(set (match_operand:SI 0 "register_operand" "=v") Most of the rest of the backend expands 64-bit operations to 32-bit pairs much later, using define_insn_and_split, because there were lots of issues with splitting it early. I don't recall exactly what right now, unfortunately. (It might have been related to spilling only half the value to the stack?) It also makes it hard to debug, I think. Andrew
Re: [PATCH 3/5] amdgcn: Add clrsbsi2/clrsbdi2 implementation
On 18/06/2021 15:19, Julian Brown wrote: This patch adds an open-coded implementation of the clrsb2 (count leading redundant sign bit) standard names using the GCN flbit_i* instructions for SImode and DImode. Those don't count exactly as we need, so we need a couple of other instructions to fix up the result afterwards. These patterns are lost from libgcc if we build it for DImode/TImode rather than SImode/DImode, a change we make in a later patch in this series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown gcc/ * config/gcn/gcn.md (UNSPEC_FLBIT_INT): New unspec constant. (s_mnemonic): Add clrsb. (gcn_flbit_int): Add insn pattern for SImode/DImode. (clrsb2): Add expander for SImode/DImode. --- gcc/config/gcn/gcn.md | 40 ++-- 1 file changed, 38 insertions(+), 2 deletions(-) diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index 70655ca4b8b..0fa7f86702e 100644 --- a/gcc/config/gcn/gcn.md +++ b/gcc/config/gcn/gcn.md @@ -81,7 +81,8 @@ UNSPEC_MOV_FROM_LANE63 UNSPEC_GATHER UNSPEC_SCATTER - UNSPEC_RCP]) + UNSPEC_RCP + UNSPEC_FLBIT_INT]) ;; }}} ;; {{{ Attributes @@ -338,7 +339,8 @@ [(not "not%b") (popcount "bcnt1_i32%b") (clz "flbit_i32%b") - (ctz "ff1_i32%b")]) + (ctz "ff1_i32%b") + (clrsb "flbit_i32%i")]) (define_code_attr revmnemonic [(minus "subrev%i") @@ -1509,6 +1511,40 @@ [(set_attr "type" "sop1") (set_attr "length" "4,8")]) +(define_insn "gcn_flbit_int" + [(set (match_operand:SI 0 "register_operand" "=Sg,Sg") + (unspec:SI [(match_operand:SIDI 1 "gcn_alu_operand" "SgA, B")] + UNSPEC_FLBIT_INT))] + "" + { +if (mode == SImode) + return "s_flbit_i32\t%0, %1"; +else + return "s_flbit_i32_i64\t%0, %1"; + } + [(set_attr "type" "sop1") + (set_attr "length" "4,8")]) + +(define_expand "clrsb2" + [(set (match_operand:SI 0 "register_operand" "") + (clrsb:SI (match_operand:SIDI 1 "gcn_alu_operand" "")))] + "" + { +rtx tmp = gen_reg_rtx (SImode); +/* FLBIT_I* counts sign or zero bits at the most-significant end of the + input register (and returns -1 for 0/-1 inputs). We want the number of + *redundant* bits (i.e. that value minus one), and an answer of 31/63 for + 0/-1 inputs. We can do that in three instructions... */ +emit_insn (gen_gcn_flbit_int (tmp, operands[1])); +emit_insn (gen_uminsi3 (tmp, tmp, + gen_int_mode (GET_MODE_BITSIZE (mode), + SImode))); +/* If we put this last, it can potentially be folded into a subsequent + arithmetic operation. */ +emit_insn (gen_subsi3 (operands[0], tmp, const1_rtx)); +DONE; + }) + ;; }}} ;; {{{ ALU: generic 32-bit binop OK. Andrew
Re: [PATCH 4/5] amdgcn: Enable support for TImode for AMD GCN
On 18/06/2021 15:19, Julian Brown wrote: This patch enables support for TImode for AMD GCN, the lack of which is currently causing a number of test failures for the target and which is also needed to support "omp_depend_kind" for OpenMP 5.0, since that is implemented as a 128-bit integer. Several libgcc support routines are built by default for the "word size" of a machine, and also for "2 * word size" of the machine. The libgcc build for AMD GCN is changed so that it builds for a "word size" of 64 bits, in order to better match the (64-bit) host compiler. However it isn't really true that we have 64-bit words -- GCN has 32-bit registers, so changing UNITS_PER_WORD unconditionally would be the wrong thing to do. Changing this setting for libgcc (only) means that support routines are built for "single word" operations that are DImode (64 bits), and those for "double word" operations are built for TImode (128 bits). That leaves some gaps regarding previous operations that were built for a "single word" size of 32 bits and a "double word" size of 64 bits (generic code doesn't cover both alternatives for all operations that might be needed). Those gaps are filled in by this patch, or by the preceding patches in the series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown gcc/ * config/gcn/gcn.c (gcn_init_libfuncs): New function. (TARGET_INIT_LIBFUNCS): Define target hook using above function. * config/gcn/gcn.h (UNITS_PER_WORD): Define to 8 for IN_LIBGCC2, 4 otherwise. (LIBGCC2_UNITS_PER_WORD, BITS_PER_WORD): Remove definitions. (MAX_FIXED_MODE_SIZE): Change to 128. libgcc/ * config/gcn/lib2-bswapti2.c: New file. * config/gcn/lib2-divmod-di.c: New file. * config/gcn/lib2-gcn.h (DItype, UDItype, TItype, UTItype): Add typedefs. (__divdi3, __moddi3, __udivdi3, __umoddi3): Add prototypes. * config/gcn/t-amdgcn (LIB2ADD): Add lib2-divmod-di.c and lib2-bswapti2.c. --- gcc/config/gcn/gcn.c | 30 +++ gcc/config/gcn/gcn.h | 11 --- libgcc/config/gcn/lib2-bswapti2.c | 47 ++ libgcc/config/gcn/lib2-divmod-di.c | 35 ++ libgcc/config/gcn/lib2-gcn.h | 8 + libgcc/config/gcn/t-amdgcn | 2 ++ 6 files changed, 129 insertions(+), 4 deletions(-) create mode 100644 libgcc/config/gcn/lib2-bswapti2.c create mode 100644 libgcc/config/gcn/lib2-divmod-di.c diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index 283a91fe50a..45f37d5310d 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -3610,6 +3610,34 @@ gcn_init_builtins (void) #endif } +/* Implement TARGET_INIT_LIBFUNCS. */ + +static void +gcn_init_libfuncs (void) +{ + /* BITS_PER_UNIT * 2 is 64 bits, which causes + optabs-libfuncs.c:gen_int_libfunc to omit TImode (i.e 128 bits) + libcalls that we need to support operations for that type. Initialise + them here instead. */ + set_optab_libfunc (udiv_optab, TImode, "__udivti3"); + set_optab_libfunc (umod_optab, TImode, "__umodti3"); + set_optab_libfunc (sdiv_optab, TImode, "__divti3"); + set_optab_libfunc (smod_optab, TImode, "__modti3"); + set_optab_libfunc (smul_optab, TImode, "__multi3"); + set_optab_libfunc (addv_optab, TImode, "__addvti3"); + set_optab_libfunc (subv_optab, TImode, "__subvti3"); + set_optab_libfunc (negv_optab, TImode, "__negvti2"); + set_optab_libfunc (absv_optab, TImode, "__absvti2"); + set_optab_libfunc (smulv_optab, TImode, "__mulvti3"); + set_optab_libfunc (ffs_optab, TImode, "__ffsti2"); + set_optab_libfunc (clz_optab, TImode, "__clzti2"); + set_optab_libfunc (ctz_optab, TImode, "__ctzti2"); + set_optab_libfunc (clrsb_optab, TImode, "__clrsbti2"); + set_optab_libfunc (popcount_optab, TImode, "__popcountti2"); + set_optab_libfunc (parity_optab, TImode, "__parityti2"); + set_optab_libfunc (bswap_optab, TImode, "__bswapti2"); +} + /* Expand the CMP_SWAP GCN builtins. We have our own versions that do not require taking the address of any object, other than the memory cell being operated on. @@ -6336,6 +6364,8 @@ gcn_dwarf_register_span (rtx rtl) #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS gcn_init_builtins +#undef TARGET_INIT_LIBFUNCS +#define TARGET_INIT_LIBFUNCS gcn_init_libfuncs #undef TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS #define TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS \ gcn_ira_change_pseudo_allocno_class diff --git a/gcc/config/gcn/gcn.h b/gcc/con
Re: [PATCH 1/5] amdgcn: Use unsigned types for udivsi3/umodsi3 libgcc helper args/return
On 18/06/2021 15:19, Julian Brown wrote: This patch changes the argument and return types for the libgcc __udivsi3 and __umodsi3 helper functions for GCN to USItype instead of SItype. This is probably just cosmetic in practice. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown libgcc/ * config/gcn/lib2-divmod.c (__udivsi3, __umodsi3): Change argument and return types to USItype. * config/gcn/lib2-gcn.h (__udivsi3, __umodsi3): Update prototypes. --- libgcc/config/gcn/lib2-divmod.c | 8 libgcc/config/gcn/lib2-gcn.h| 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/libgcc/config/gcn/lib2-divmod.c b/libgcc/config/gcn/lib2-divmod.c index 0d6ca44f521..7c72e24e0c3 100644 --- a/libgcc/config/gcn/lib2-divmod.c +++ b/libgcc/config/gcn/lib2-divmod.c @@ -102,15 +102,15 @@ __modsi3 (SItype a, SItype b) } -SItype -__udivsi3 (SItype a, SItype b) +USItype +__udivsi3 (USItype a, USItype b) { return udivmodsi4 (a, b, 0); } -SItype -__umodsi3 (SItype a, SItype b) +USItype +__umodsi3 (USItype a, USItype b) { return udivmodsi4 (a, b, 1); } diff --git a/libgcc/config/gcn/lib2-gcn.h b/libgcc/config/gcn/lib2-gcn.h index 11476c4cda8..9223d73b8e7 100644 --- a/libgcc/config/gcn/lib2-gcn.h +++ b/libgcc/config/gcn/lib2-gcn.h @@ -38,8 +38,8 @@ typedef int word_type __attribute__ ((mode (__word__))); /* Exported functions. */ extern SItype __divsi3 (SItype, SItype); extern SItype __modsi3 (SItype, SItype); -extern SItype __udivsi3 (SItype, SItype); -extern SItype __umodsi3 (SItype, SItype); +extern USItype __udivsi3 (USItype, USItype); +extern USItype __umodsi3 (USItype, USItype); extern HItype __divhi3 (HItype, HItype); extern HItype __modhi3 (HItype, HItype); extern UHItype __udivhi3 (UHItype, UHItype); OK, this seems to match what some other targets have. Except NIOS2 though, which is probably where this file was copied from. Andrew
Re: [wwwdocs][Patch] gcc-15/changes: Fortran + offload (C++) update | project/gomp: GCC 15 update
On 17/04/2025 15:10, Tobias Burnus wrote: Hi all, @Fortraners: Comments to the added 'do concurrent' item? @Thomas: Are you fine with this C++ wording? @Andrew: Likewise for C++ and ROCm bump? This part is fine with me. Andrew Anyone: comments are welcome. Affected pages: * https://gcc.gnu.org/gcc-15/changes.html * https://gcc.gnu.org/projects/gomp/ Cheers, Tobias