[PATCH] OpenMP front-end: allow requires dynamic_allocators

2021-12-20 Thread Andrew Stubbs

Hi all,

This patch removes the "sorry" message for the OpenMP "requires 
dynamic_allocators" feature in C, C++ and Fortran.


The clause is supposed to state that the user code will not work without 
the omp_alloc/omp_free and omp_init_allocator/omp_destroy_allocator and 
these things *are* present, so there should be no problem allowing it.


I can't see any reason for our implementation to *do* anything with this 
statement -- it's fine for the allocators to work the same with or 
without it.


I think this patch ought to be fine for GCC 12, but if not it can wait 
until stage 1 opens. I shall backport this to OG11 shortly.


OK?

AndrewOpenMP: allow requires dynamic_allocators

There's no need to reject the dynamic_allocators requires directive because
we actually do support the feature, and it doesn't have to actually "do"
anything.

gcc/c/ChangeLog:

* c-parser.c (c_parser_omp_requires): Don't "sorry" dynamic_allocators.

gcc/cp/ChangeLog:

* parser.c (cp_parser_omp_requires): Don't "sorry" dynamic_allocators.

gcc/fortran/ChangeLog:

* openmp.c (gfc_match_omp_requires): Don't "sorry" dynamic_allocators.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/requires-8.f90: Reinstate dynamic allocators
requirement.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index d7e5f051ac0..808a73f1038 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -22581,7 +22581,7 @@ c_parser_omp_requires (c_parser *parser)
  c_parser_skip_to_pragma_eol (parser, false);
  return;
}
- if (p)
+ if (p && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS)
sorry_at (cloc, "%qs clause on % directive not "
"supported yet", p);
  if (p)
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index c2564e51e41..d55c9675c4e 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -46421,7 +46421,7 @@ cp_parser_omp_requires (cp_parser *parser, cp_token 
*pragma_tok)
  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
  return false;
}
- if (p)
+ if (p && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS)
sorry_at (cloc, "%qs clause on % directive not "
"supported yet", p);
  if (p)
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 2036bc1349f..f47a44f7539 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5373,7 +5373,8 @@ gfc_match_omp_requires (void)
   else
goto error;
 
-  if (requires_clause & ~OMP_REQ_ATOMIC_MEM_ORDER_MASK)
+  if (requires_clause & ~(OMP_REQ_ATOMIC_MEM_ORDER_MASK
+ | OMP_REQ_DYNAMIC_ALLOCATORS))
gfc_error_now ("Sorry, %qs clause at %L on REQUIRES directive is not "
   "yet supported", clause, &old_loc);
   if (!gfc_omp_requires_add_clause (requires_clause, clause, &old_loc, 
NULL))
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-8.f90 
b/gcc/testsuite/gfortran.dg/gomp/requires-8.f90
index 3c32ae9860e..eadfcaf8609 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-8.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-8.f90
@@ -4,7 +4,7 @@ contains
  subroutine foo
   interface
subroutine bar2
- !$!omp requires dynamic_allocators
+ !$omp requires dynamic_allocators
end subroutine
   end interface
   !$omp target


Re: [PATCH 5/5] openmp: -foffload-memory=pinned

2022-03-30 Thread Andrew Stubbs

On 08/03/2022 11:30, Hafiz Abid Qadeer wrote:

gcc/ChangeLog:

* omp-low.cc (omp_enable_pinned_mode): New function.
(execute_lower_omp): Call omp_enable_pinned_mode.


This worked for x86_64, but I needed to make the attached adjustment to 
work on powerpc without a linker error.


I've committed it to OG11.

Andrewopenmp: BUILT_IN_GOMP_ENABLE_PINNED_MODE

Rework the GOMP_enable_pinned_mode call so that it works on powerpc where
the old way gave a local call.

gcc/ChangeLog:

* omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New.
* omp-low.c (omp_enable_pinned_mode): Use
BUILT_IN_GOMP_ENABLE_PINNED_MODE.

diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index c591d79fa07..e442b0b5c94 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -468,3 +468,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ERROR, "GOMP_error",
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_EVALUATE_TARGET_DEVICE, 
"GOMP_evaluate_target_device",
  BT_FN_BOOL_INT_CONST_PTR_CONST_PTR_CONST_PTR,
  ATTR_NOTHROW_LEAF_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ENABLE_PINNED_MODE,
+ "GOMP_enable_pinned_mode",
+ BT_FN_VOID, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index f7ecfb52c73..4e8ab9e4ca0 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -15432,9 +15432,7 @@ omp_enable_pinned_mode ()
   push_struct_function (decl);
   init_tree_ssa (cfun);
 
-  tree callname = get_identifier ("GOMP_enable_pinned_mode");
-  tree calldecl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, callname,
- voidfntype);
+  tree calldecl = builtin_decl_explicit (BUILT_IN_GOMP_ENABLE_PINNED_MODE);
   gcall *call = gimple_build_call (calldecl, 0);
 
   gimple_seq seq = NULL;


Re: [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory.

2022-04-02 Thread Andrew Stubbs

On 08/03/2022 11:30, Hafiz Abid Qadeer wrote:

This patches changes calls to malloc/free/calloc/realloc and operator new to
memory allocation functions in libgomp with
allocator=ompx_unified_shared_mem_alloc.


This additional patch adds transformation for omp_target_alloc. The 
OpenMP 5.0 document says that addresses allocated this way needs to work 
without is_device_ptr. The easiest way to make that work is to make them 
USM addresses.


I will commit this to OG11 shortly.

Andrewopenmp: Do USM transform for omp_target_alloc

OpenMP 5.0 says that omp_target_alloc should return USM addresses.

gcc/ChangeLog:

* omp-low.c (usm_transform): Transform omp_target_alloc and
omp_target_free.

libgomp/ChangeLog:

* testsuite/libgomp.c/usm-6.c: Add omp_target_alloc.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/usm-2.c: Add omp_target_alloc.
* c-c++-common/gomp/usm-3.c: Add omp_target_alloc.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 4e8ab9e4ca0..9235eafd1d7 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -15880,7 +15880,8 @@ usm_transform (gimple_stmt_iterator *gsi_p, bool *,
if ((strcmp (name, "malloc") == 0)
 || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
 && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MALLOC)
-|| DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl))
+|| DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl)
+|| strcmp (name, "omp_target_alloc") == 0)
  {
  tree omp_alloc_type
= build_function_type_list (ptr_type_node, size_type_node,
@@ -15952,7 +15953,8 @@ usm_transform (gimple_stmt_iterator *gsi_p, bool *,
   || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
   && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FREE)
   || (DECL_IS_OPERATOR_DELETE_P (fndecl)
-  && DECL_IS_REPLACEABLE_OPERATOR (fndecl)))
+  && DECL_IS_REPLACEABLE_OPERATOR (fndecl))
+  || strcmp (name, "omp_target_free") == 0)
  {
tree omp_free_type
  = build_function_type_list (void_type_node, ptr_type_node,
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-2.c 
b/gcc/testsuite/c-c++-common/gomp/usm-2.c
index 64dbb6be131..8c20ef94e69 100644
--- a/gcc/testsuite/c-c++-common/gomp/usm-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/usm-2.c
@@ -12,6 +12,8 @@ void *aligned_alloc (__SIZE_TYPE__, __SIZE_TYPE__);
 void *calloc(__SIZE_TYPE__, __SIZE_TYPE__);
 void *realloc(void *, __SIZE_TYPE__);
 void free (void *);
+void *omp_target_alloc (__SIZE_TYPE__, int);
+void omp_target_free (void *, int);
 
 #ifdef __cplusplus
 }
@@ -24,16 +26,21 @@ foo ()
   void *p2 = realloc(p1, 30);
   void *p3 = calloc(4, 15);
   void *p4 = aligned_alloc(16, 40);
+  void *p5 = omp_target_alloc(50, 1);
   free (p2);
   free (p3);
   free (p4);
+  omp_target_free (p5, 1);
 }
 
 /* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 
"usm_transform"  } } */
 /* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 
"usm_transform"  } } */
 /* { dg-final { scan-tree-dump-times "omp_calloc \\(4, 15, 10\\)" 1 
"usm_transform"  } } */
 /* { dg-final { scan-tree-dump-times "omp_aligned_alloc \\(16, 40, 10\\)" 1 
"usm_transform"  } } */
-/* { dg-final { scan-tree-dump-times "omp_free" 3 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(50, 10\\)" 1 
"usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-not " free"  "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-not " aligned_alloc"  "usm_transform"  } } */
 /* { dg-final { scan-tree-dump-not " malloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " omp_target_alloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " omp_target_free"  "usm_transform"  } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-3.c 
b/gcc/testsuite/c-c++-common/gomp/usm-3.c
index 934582ea5fd..2b0cbb45e27 100644
--- a/gcc/testsuite/c-c++-common/gomp/usm-3.c
+++ b/gcc/testsuite/c-c++-common/gomp/usm-3.c
@@ -10,6 +10,8 @@ void *aligned_alloc (__SIZE_TYPE__, __SIZE_TYPE__);
 void *calloc(__SIZE_TYPE__, __SIZE_TYPE__);
 void *realloc(void *, __SIZE_TYPE__);
 void free (void *);
+void *omp_target_alloc (__SIZE_TYPE__, int);
+void omp_target_free (void *, int);
 
 #ifdef __cplusplus
 }
@@ -22,16 +24,21 @@ foo ()
   void *p2 = realloc(p1, 30);
   void *p3 = calloc(4, 15);
   void *p4 = aligned_alloc(16, 40);
+  void *p5 = omp_target_alloc(50, 1);
   free (p2);
   free (p3);
   free (p4);
+  omp_target_free (p5, 1);
 }
 
 /* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 
"usm_transform"  } } */
 /* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 
"usm_transform"  } } */
 /* { dg-final { scan-tree-dump-times "omp_calloc

Re: [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory.

2022-04-02 Thread Andrew Stubbs

On 02/04/2022 13:04, Andrew Stubbs wrote:
This additional patch adds transformation for omp_target_alloc. The 
OpenMP 5.0 document says that addresses allocated this way needs to work 
without is_device_ptr. The easiest way to make that work is to make them 
USM addresses.


Actually, reading on, it says "Every device address allocated through 
OpenMP device memory routines is a valid host pointer", so USM is the 
correct answer.


Andrew



Re: [PATCH 0/5] openmp: Handle pinned and unified shared memory.

2022-04-13 Thread Andrew Stubbs
This patch adjusts the testcases, previously proposed, to allow for 
testing on machines with varying page sizes and default amounts of 
lockable memory.  There turns out to be more variation than I had thought.


This should go on mainline at the same time as the previous patches in 
this thread.


I'll commit it to OG11 shortly.

Andrewlibgomp: autodetect page sizes in pinned memory tests

There's not one number that works everywhere.
This also fixes the failure mode on non-Linux hosts.

libgomp/ChangeLog:

* testsuite/libgomp.c/alloc-pinned-1.c: Autodetect page size.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-3.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-4.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-6.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-7.c: Clean up.

diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c 
b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
index 0a6360cda29..79792b16d83 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
@@ -4,13 +4,23 @@
 
 /* Test that pinned memory works.  */
 
+#include 
+#include 
+
 #ifdef __linux__
 #include 
 #include 
-#include 
-#include 
 
 #include 
+#include 
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+#define CHECK_SIZE(SIZE) { \
+  struct rlimit limit; \
+  if (getrlimit (RLIMIT_MEMLOCK, &limit) \
+  || limit.rlim_cur <= SIZE) \
+fprintf (stderr, "unsufficient lockable memory; please increase 
ulimit\n"); \
+  }
 
 int
 get_pinned_mem ()
@@ -34,6 +44,9 @@ get_pinned_mem ()
   abort ();
 }
 #else
+#define PAGE_SIZE 1 /* unknown */
+#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n");
+
 int
 get_pinned_mem ()
 {
@@ -43,12 +56,13 @@ get_pinned_mem ()
 
 #include 
 
-/* Allocate more than a page each time, but stay within the ulimit.  */
-#define SIZE 10*1024
-
 int
 main ()
 {
+  /* Allocate at least a page each time, but stay within the ulimit.  */
+  const int SIZE = PAGE_SIZE;
+  CHECK_SIZE (SIZE*3);
+
   const omp_alloctrait_t traits[] = {
   { omp_atk_pinned, 1 }
   };
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c 
b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
index 8fdb4ff5cfd..228c656b715 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
@@ -4,13 +4,23 @@
 
 /* Test that pinned memory works (pool_size code path).  */
 
+#include 
+#include 
+
 #ifdef __linux__
 #include 
 #include 
-#include 
-#include 
 
 #include 
+#include 
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+#define CHECK_SIZE(SIZE) { \
+  struct rlimit limit; \
+  if (getrlimit (RLIMIT_MEMLOCK, &limit) \
+  || limit.rlim_cur <= SIZE) \
+fprintf (stderr, "unsufficient lockable memory; please increase 
ulimit\n"); \
+  }
 
 int
 get_pinned_mem ()
@@ -34,6 +44,9 @@ get_pinned_mem ()
   abort ();
 }
 #else
+#define PAGE_SIZE 1 /* unknown */
+#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n");
+
 int
 get_pinned_mem ()
 {
@@ -43,12 +56,13 @@ get_pinned_mem ()
 
 #include 
 
-/* Allocate more than a page each time, but stay within the ulimit.  */
-#define SIZE 10*1024
-
 int
 main ()
 {
+  /* Allocate at least a page each time, but stay within the ulimit.  */
+  const int SIZE = PAGE_SIZE;
+  CHECK_SIZE (SIZE*3);
+
   const omp_alloctrait_t traits[] = {
   { omp_atk_pinned, 1 },
   { omp_atk_pool_size, SIZE*8 }
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-3.c 
b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
index 943dfea5c9b..90539ffe3e0 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
@@ -2,15 +2,18 @@
 
 /* Test that pinned memory fails correctly.  */
 
+#include 
+#include 
+
 #ifdef __linux__
 #include 
 #include 
-#include 
-#include 
 
 #include 
 #include 
 
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+
 int
 get_pinned_mem ()
 {
@@ -45,6 +48,8 @@ set_pin_limit (int size)
 }
 #else
 int
+#define PAGE_SIZE 1*1024 /* unknown */
+
 get_pinned_mem ()
 {
   return 0;
@@ -58,12 +63,12 @@ set_pin_limit ()
 
 #include 
 
-/* This should be large enough to cover multiple pages.  */
-#define SIZE 1*1024
-
 int
 main ()
 {
+  /* This needs to be large enough to cover multiple pages.  */
+  const int SIZE = PAGE_SIZE*4;
+
   /* Pinned memory, no fallback.  */
   const omp_alloctrait_t traits1[] = {
   { omp_atk_pinned, 1 },
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-4.c 
b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
index d9cb8dfe1fd..534e49eefc4 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
@@ -2,15 +2,18 @@
 
 /* Test that pinned memory fails correctly, pool_size code path.  */
 
+#include 
+#include 
+
 #ifdef __linux__
 #include 
 #include 
-#include 
-#include 
 
 #include 
 #include 
 
+#define PAGE_SIZE

[PATCH] openmp: Handle unified address memory.

2022-04-20 Thread Andrew Stubbs
This patch adds enough support for "requires unified_address" to make 
the sollve_vv testcases pass. It implements unified_address as a synonym 
of unified_shared_memory, which is both valid and the only way I know of 
to unify addresses with Cuda (could be wrong).


This patch should be applied on to of the previous patch set for USM.

OK for stage 1?

I'll apply it to OG11 shortly.

Andrewopenmp: unified_address support

This makes "requires unified_address" work by making it eqivalent to
"requires unified_shared_memory".  This is more than is strictly necessary,
but should be standard compliant.

gcc/c/ChangeLog:

* c-parser.c (c_parser_omp_requires): Check requires unified_address
for conflict with -foffload-memory=shared.

gcc/cp/ChangeLog:

* parser.c (cp_parser_omp_requires): Check requires unified_address
for conflict with -foffload-memory=shared.

gcc/fortran/ChangeLog:

* openmp.c (gfc_match_omp_requires): Check requires unified_address
for conflict with -foffload-memory=shared.

gcc/ChangeLog:

* omp-low.c: Do USM transformations for "unified_address".

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/usm-4.c: New test.
* gfortran.dg/gomp/usm-4.f90: New test.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 12408770193..9a3d0cb8cea 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -22531,18 +22531,27 @@ c_parser_omp_requires (c_parser *parser)
  enum omp_requires this_req = (enum omp_requires) 0;
 
  if (!strcmp (p, "unified_address"))
-   this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+   {
+ this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+
+ if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+ && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+   error_at (cloc,
+ "unified_address is incompatible with the "
+ "selected -foffload-memory option");
+ flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+   }
  else if (!strcmp (p, "unified_shared_memory"))
- {
-   this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
-
-   if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
-   && flag_offload_memory != OFFLOAD_MEMORY_NONE)
- error_at (cloc,
-   "unified_shared_memory is incompatible with the "
-   "selected -foffload-memory option");
-   flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
- }
+   {
+ this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+ if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+ && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+   error_at (cloc,
+ "unified_shared_memory is incompatible with the "
+ "selected -foffload-memory option");
+ flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+   }
  else if (!strcmp (p, "dynamic_allocators"))
this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS;
  else if (!strcmp (p, "reverse_offload"))
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index fd9f62f4543..3a9ea272f10 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -46406,18 +46406,27 @@ cp_parser_omp_requires (cp_parser *parser, cp_token 
*pragma_tok)
  enum omp_requires this_req = (enum omp_requires) 0;
 
  if (!strcmp (p, "unified_address"))
-   this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+   {
+ this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+
+ if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+ && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+   error_at (cloc,
+ "unified_address is incompatible with the "
+ "selected -foffload-memory option");
+ flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+   }
  else if (!strcmp (p, "unified_shared_memory"))
- {
-   this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
-
-   if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
-   && flag_offload_memory != OFFLOAD_MEMORY_NONE)
- error_at (cloc,
-   "unified_shared_memory is incompatible with the "
-   "selected -foffload-memory option");
-   flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
- }
+   {
+ this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+ if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+ && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+   error_at (cloc,
+ "unified_shared_memory is incompatible with the "
+ "selected -foffload-memory option");
+ flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+   }
  else if (!strcmp (p, "dynamic_allocators"))
this_r

Re: Test with an lto-build of libgfortran.

2023-09-29 Thread Andrew Stubbs

On 28/09/2023 20:59, Toon Moene wrote:

On 9/28/23 21:26, Jakub Jelinek wrote:


It is worse than that, usually the LTO format changes e.g. any time any
option or parameter is added on a release branch (several times a 
year) and

at other times as well.
Though, admittedly GCC is the single package that actually could get away
with LTO in lib*.a libraries, at least in some packagings (if the static
libraries are in gcc specific subdirectories rather than say 
/usr/lib{,64}

or similar and if the packaging of gcc updates both the compiler and
corresponding static libraries in a lock-step.  Because in that case LTO
in there will be always used only by the same snapshot from the release
branch and so should be compatible with the LTO in it.
This might be an argument to make it a configure option, e.g. 
--enable-lto-runtime.


This sort of thing should definitely Just Work for cross compilers and 
embedded platforms where the libraries are bundled with the compiler.


Andrew


Re: [PATCH 2/5] amdgcn: Add [us]mulsi3_highpart SGPR alternatives & [us]mulsid3/muldi3 expanders

2021-06-18 Thread Andrew Stubbs

On 18/06/2021 15:19, Julian Brown wrote:

This patch improves 64-bit multiplication for AMD GCN: patterns for
unsigned and signed 32x32->64 bit multiplication have been added, and
also 64x64->64 bit multiplication is now open-coded rather than calling
a library function (which may be a win for code size as well as speed:
the function calling sequence isn't particularly concise for GCN).

The mulsi3_highpart pattern has also been extended for GCN5+, since
that ISA version supports high-part result multiply instructions with
SGPR operands.

The DImode multiply implementation is lost from libgcc if we build it
for DImode/TImode rather than SImode/DImode, a change we make in a later
patch in this series.

I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.

Thanks,

Julian

2021-06-18  Julian Brown  

gcc/
* config/gcn/gcn.md (mulsi3_highpart): Add SGPR alternatives for
GCN5+.
(mulsidi3, muldi3): Add expanders.
---
  gcc/config/gcn/gcn.md | 55 ++-
  1 file changed, 49 insertions(+), 6 deletions(-)

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index b5f895a93e2..70655ca4b8b 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -1392,19 +1392,62 @@
  (define_code_attr e [(sign_extend "e") (zero_extend "")])
  
  (define_insn "mulsi3_highpart"

-  [(set (match_operand:SI 0 "register_operand"  "= v")
+  [(set (match_operand:SI 0 "register_operand"   "=Sg, Sg,  v")
(truncate:SI
  (lshiftrt:DI
(mult:DI
  (any_extend:DI
-   (match_operand:SI 1 "register_operand" "% v"))
+   (match_operand:SI 1 "register_operand" "%SgA,SgA,  v"))
  (any_extend:DI
-   (match_operand:SI 2 "register_operand" "vSv")))
+   (match_operand:SI 2 "register_operand"  "SgA,  B,vSv")))
(const_int 32]
""
-  "v_mul_hi0\t%0, %2, %1"
-  [(set_attr "type" "vop3a")
-   (set_attr "length" "8")])
+  "@
+  s_mul_hi0\t%0, %1, %2
+  s_mul_hi0\t%0, %1, %2
+  v_mul_hi0\t%0, %2, %1"
+  [(set_attr "type" "sop2,sop2,vop3a")
+   (set_attr "length" "4,8,8")
+   (set_attr "gcn_version" "gcn5,gcn5,*")])
+
+(define_expand "mulsidi3"
+  [(set (match_operand:DI 0 "register_operand" "")
+   (mult:DI
+ (any_extend:DI (match_operand:SI 1 "register_operand" ""))
+ (any_extend:DI (match_operand:SI 2 "register_operand" ""]
+  ""
+  {
+rtx dst = gen_reg_rtx (DImode);
+rtx dstlo = gen_lowpart (SImode, dst);
+rtx dsthi = gen_highpart_mode (SImode, DImode, dst);
+emit_insn (gen_mulsi3 (dstlo, operands[1], operands[2]));
+emit_insn (gen_mulsi3_highpart (dsthi, operands[1], operands[2]));
+emit_move_insn (operands[0], dst);
+DONE;
+  })
+
+(define_expand "muldi3"
+  [(set (match_operand:DI 0 "register_operand" "")
+   (mult:DI (match_operand:DI 1 "register_operand" "")
+(match_operand:DI 2 "register_operand" "")))]
+  ""
+  {
+rtx tmp0 = gen_reg_rtx (SImode);
+rtx tmp1 = gen_reg_rtx (SImode);
+rtx dst = gen_reg_rtx (DImode);
+rtx dsthi = gen_highpart_mode (SImode, DImode, dst);
+rtx op1lo = gen_lowpart (SImode, operands[1]);
+rtx op1hi = gen_highpart_mode (SImode, DImode, operands[1]);
+rtx op2lo = gen_lowpart (SImode, operands[2]);
+rtx op2hi = gen_highpart_mode (SImode, DImode, operands[2]);
+emit_insn (gen_umulsidi3 (dst, op1lo, op2lo));
+emit_insn (gen_mulsi3 (tmp0, op1lo, op2hi));
+emit_insn (gen_addsi3 (dsthi, dsthi, tmp0));
+emit_insn (gen_mulsi3 (tmp1, op1hi, op2lo));
+emit_insn (gen_addsi3 (dsthi, dsthi, tmp1));
+emit_move_insn (operands[0], dst);
+DONE;
+  })
  
  (define_insn "mulhisi3"

[(set (match_operand:SI 0 "register_operand"  "=v")



Most of the rest of the backend expands 64-bit operations to 32-bit 
pairs much later, using define_insn_and_split, because there were lots 
of issues with splitting it early. I don't recall exactly what right 
now, unfortunately. (It might have been related to spilling only half 
the value to the stack?) It also makes it hard to debug, I think.


Andrew


Re: [PATCH 3/5] amdgcn: Add clrsbsi2/clrsbdi2 implementation

2021-06-18 Thread Andrew Stubbs

On 18/06/2021 15:19, Julian Brown wrote:

This patch adds an open-coded implementation of the clrsb2
(count leading redundant sign bit) standard names using the GCN flbit_i*
instructions for SImode and DImode.  Those don't count exactly as we need,
so we need a couple of other instructions to fix up the result afterwards.

These patterns are lost from libgcc if we build it for DImode/TImode
rather than SImode/DImode, a change we make in a later patch in this
series.

I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.

Thanks,

Julian

2021-06-18  Julian Brown  

gcc/
* config/gcn/gcn.md (UNSPEC_FLBIT_INT): New unspec constant.
(s_mnemonic): Add clrsb.
(gcn_flbit_int): Add insn pattern for SImode/DImode.
(clrsb2): Add expander for SImode/DImode.
---
  gcc/config/gcn/gcn.md | 40 ++--
  1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 70655ca4b8b..0fa7f86702e 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -81,7 +81,8 @@
UNSPEC_MOV_FROM_LANE63
UNSPEC_GATHER
UNSPEC_SCATTER
-  UNSPEC_RCP])
+  UNSPEC_RCP
+  UNSPEC_FLBIT_INT])
  
  ;; }}}

  ;; {{{ Attributes
@@ -338,7 +339,8 @@
[(not "not%b")
 (popcount "bcnt1_i32%b")
 (clz "flbit_i32%b")
-   (ctz "ff1_i32%b")])
+   (ctz "ff1_i32%b")
+   (clrsb "flbit_i32%i")])
  
  (define_code_attr revmnemonic

[(minus "subrev%i")
@@ -1509,6 +1511,40 @@
[(set_attr "type" "sop1")
 (set_attr "length" "4,8")])
  
+(define_insn "gcn_flbit_int"

+  [(set (match_operand:SI 0 "register_operand"  "=Sg,Sg")
+   (unspec:SI [(match_operand:SIDI 1 "gcn_alu_operand" "SgA, B")]
+  UNSPEC_FLBIT_INT))]
+  ""
+  {
+if (mode == SImode)
+  return "s_flbit_i32\t%0, %1";
+else
+  return "s_flbit_i32_i64\t%0, %1";
+  }
+  [(set_attr "type" "sop1")
+   (set_attr "length" "4,8")])
+
+(define_expand "clrsb2"
+  [(set (match_operand:SI 0 "register_operand" "")
+   (clrsb:SI (match_operand:SIDI 1 "gcn_alu_operand" "")))]
+  ""
+  {
+rtx tmp = gen_reg_rtx (SImode);
+/* FLBIT_I* counts sign or zero bits at the most-significant end of the
+   input register (and returns -1 for 0/-1 inputs).  We want the number of
+   *redundant* bits (i.e. that value minus one), and an answer of 31/63 for
+   0/-1 inputs.  We can do that in three instructions...  */
+emit_insn (gen_gcn_flbit_int (tmp, operands[1]));
+emit_insn (gen_uminsi3 (tmp, tmp,
+   gen_int_mode (GET_MODE_BITSIZE (mode),
+ SImode)));
+/* If we put this last, it can potentially be folded into a subsequent
+   arithmetic operation.  */
+emit_insn (gen_subsi3 (operands[0], tmp, const1_rtx));
+DONE;
+  })
+
  ;; }}}
  ;; {{{ ALU: generic 32-bit binop
  



OK.

Andrew


Re: [PATCH 4/5] amdgcn: Enable support for TImode for AMD GCN

2021-06-18 Thread Andrew Stubbs

On 18/06/2021 15:19, Julian Brown wrote:

This patch enables support for TImode for AMD GCN, the lack of which
is currently causing a number of test failures for the target and which
is also needed to support "omp_depend_kind" for OpenMP 5.0, since that
is implemented as a 128-bit integer.

Several libgcc support routines are built by default for the "word size"
of a machine, and also for "2 * word size" of the machine.  The libgcc
build for AMD GCN is changed so that it builds for a "word size" of 64
bits, in order to better match the (64-bit) host compiler.  However it
isn't really true that we have 64-bit words -- GCN has 32-bit registers,
so changing UNITS_PER_WORD unconditionally would be the wrong thing to do.

Changing this setting for libgcc (only) means that support routines
are built for "single word" operations that are DImode (64 bits), and
those for "double word" operations are built for TImode (128 bits).
That leaves some gaps regarding previous operations that were built
for a "single word" size of 32 bits and a "double word" size of 64 bits
(generic code doesn't cover both alternatives for all operations that
might be needed).  Those gaps are filled in by this patch, or by the
preceding patches in the series.

I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.

Thanks,

Julian

2021-06-18  Julian Brown  

gcc/
* config/gcn/gcn.c (gcn_init_libfuncs): New function.
(TARGET_INIT_LIBFUNCS): Define target hook using above function.
* config/gcn/gcn.h (UNITS_PER_WORD): Define to 8 for IN_LIBGCC2, 4
otherwise.
(LIBGCC2_UNITS_PER_WORD, BITS_PER_WORD): Remove definitions.
(MAX_FIXED_MODE_SIZE): Change to 128.

libgcc/
* config/gcn/lib2-bswapti2.c: New file.
* config/gcn/lib2-divmod-di.c: New file.
* config/gcn/lib2-gcn.h (DItype, UDItype, TItype, UTItype): Add
typedefs.
(__divdi3, __moddi3, __udivdi3, __umoddi3): Add prototypes.
* config/gcn/t-amdgcn (LIB2ADD): Add lib2-divmod-di.c and
lib2-bswapti2.c.
---
  gcc/config/gcn/gcn.c   | 30 +++
  gcc/config/gcn/gcn.h   | 11 ---
  libgcc/config/gcn/lib2-bswapti2.c  | 47 ++
  libgcc/config/gcn/lib2-divmod-di.c | 35 ++
  libgcc/config/gcn/lib2-gcn.h   |  8 +
  libgcc/config/gcn/t-amdgcn |  2 ++
  6 files changed, 129 insertions(+), 4 deletions(-)
  create mode 100644 libgcc/config/gcn/lib2-bswapti2.c
  create mode 100644 libgcc/config/gcn/lib2-divmod-di.c

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 283a91fe50a..45f37d5310d 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -3610,6 +3610,34 @@ gcn_init_builtins (void)
  #endif
  }
  
+/* Implement TARGET_INIT_LIBFUNCS.  */

+
+static void
+gcn_init_libfuncs (void)
+{
+  /* BITS_PER_UNIT * 2 is 64 bits, which causes
+ optabs-libfuncs.c:gen_int_libfunc to omit TImode (i.e 128 bits)
+ libcalls that we need to support operations for that type.  Initialise
+ them here instead.  */
+  set_optab_libfunc (udiv_optab, TImode, "__udivti3");
+  set_optab_libfunc (umod_optab, TImode, "__umodti3");
+  set_optab_libfunc (sdiv_optab, TImode, "__divti3");
+  set_optab_libfunc (smod_optab, TImode, "__modti3");
+  set_optab_libfunc (smul_optab, TImode, "__multi3");
+  set_optab_libfunc (addv_optab, TImode, "__addvti3");
+  set_optab_libfunc (subv_optab, TImode, "__subvti3");
+  set_optab_libfunc (negv_optab, TImode, "__negvti2");
+  set_optab_libfunc (absv_optab, TImode, "__absvti2");
+  set_optab_libfunc (smulv_optab, TImode, "__mulvti3");
+  set_optab_libfunc (ffs_optab, TImode, "__ffsti2");
+  set_optab_libfunc (clz_optab, TImode, "__clzti2");
+  set_optab_libfunc (ctz_optab, TImode, "__ctzti2");
+  set_optab_libfunc (clrsb_optab, TImode, "__clrsbti2");
+  set_optab_libfunc (popcount_optab, TImode, "__popcountti2");
+  set_optab_libfunc (parity_optab, TImode, "__parityti2");
+  set_optab_libfunc (bswap_optab, TImode, "__bswapti2");
+}
+
  /* Expand the CMP_SWAP GCN builtins.  We have our own versions that do
 not require taking the address of any object, other than the memory
 cell being operated on.
@@ -6336,6 +6364,8 @@ gcn_dwarf_register_span (rtx rtl)
  #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
  #undef  TARGET_INIT_BUILTINS
  #define TARGET_INIT_BUILTINS gcn_init_builtins
+#undef  TARGET_INIT_LIBFUNCS
+#define TARGET_INIT_LIBFUNCS gcn_init_libfuncs
  #undef  TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
  #define TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS \
gcn_ira_change_pseudo_allocno_class
diff --git a/gcc/config/gcn/gcn.h b/gcc/con

Re: [PATCH 1/5] amdgcn: Use unsigned types for udivsi3/umodsi3 libgcc helper args/return

2021-06-18 Thread Andrew Stubbs

On 18/06/2021 15:19, Julian Brown wrote:

This patch changes the argument and return types for the libgcc __udivsi3
and __umodsi3 helper functions for GCN to USItype instead of SItype.
This is probably just cosmetic in practice.

I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.

Thanks,

Julian

2021-06-18  Julian Brown  

libgcc/
* config/gcn/lib2-divmod.c (__udivsi3, __umodsi3): Change argument and
return types to USItype.
* config/gcn/lib2-gcn.h (__udivsi3, __umodsi3): Update prototypes.
---
  libgcc/config/gcn/lib2-divmod.c | 8 
  libgcc/config/gcn/lib2-gcn.h| 4 ++--
  2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/libgcc/config/gcn/lib2-divmod.c b/libgcc/config/gcn/lib2-divmod.c
index 0d6ca44f521..7c72e24e0c3 100644
--- a/libgcc/config/gcn/lib2-divmod.c
+++ b/libgcc/config/gcn/lib2-divmod.c
@@ -102,15 +102,15 @@ __modsi3 (SItype a, SItype b)
  }
  
  
-SItype

-__udivsi3 (SItype a, SItype b)
+USItype
+__udivsi3 (USItype a, USItype b)
  {
return udivmodsi4 (a, b, 0);
  }
  
  
-SItype

-__umodsi3 (SItype a, SItype b)
+USItype
+__umodsi3 (USItype a, USItype b)
  {
return udivmodsi4 (a, b, 1);
  }
diff --git a/libgcc/config/gcn/lib2-gcn.h b/libgcc/config/gcn/lib2-gcn.h
index 11476c4cda8..9223d73b8e7 100644
--- a/libgcc/config/gcn/lib2-gcn.h
+++ b/libgcc/config/gcn/lib2-gcn.h
@@ -38,8 +38,8 @@ typedef int word_type __attribute__ ((mode (__word__)));
  /* Exported functions.  */
  extern SItype __divsi3 (SItype, SItype);
  extern SItype __modsi3 (SItype, SItype);
-extern SItype __udivsi3 (SItype, SItype);
-extern SItype __umodsi3 (SItype, SItype);
+extern USItype __udivsi3 (USItype, USItype);
+extern USItype __umodsi3 (USItype, USItype);
  extern HItype __divhi3 (HItype, HItype);
  extern HItype __modhi3 (HItype, HItype);
  extern UHItype __udivhi3 (UHItype, UHItype);


OK, this seems to match what some other targets have. Except NIOS2 
though, which is probably where this file was copied from.


Andrew


Re: [wwwdocs][Patch] gcc-15/changes: Fortran + offload (C++) update | project/gomp: GCC 15 update

2025-04-17 Thread Andrew Stubbs

On 17/04/2025 15:10, Tobias Burnus wrote:

Hi all,

@Fortraners: Comments to the added 'do concurrent' item?

@Thomas: Are you fine with this C++ wording?

@Andrew: Likewise for C++ and ROCm bump?


This part is fine with me.

Andrew


Anyone: comments are welcome.

Affected pages:

* https://gcc.gnu.org/gcc-15/changes.html

* https://gcc.gnu.org/projects/gomp/

Cheers,

Tobias