On 25/01/2024 23:03, Tobias Burnus wrote:
When targeting AMD GPUs, the LLVM assembler (and linker) are used.
Two days ago LLVM changed the default for theAMDHSA code object version (COV) from 4 to 5. In principle, we do not
care which COV is used as long as it works; unfortunately,
"mkoffload.
On 26/01/2024 07:29, Richard Biener wrote:
On Fri, Jan 26, 2024 at 12:04 AM Tobias Burnus wrote:
When targeting AMD GPUs, the LLVM assembler (and linker) are used.
Two days ago LLVM changed the default for the AMDHSA code object
version (COV) from 4 to 5.
In principle, we do not care which C
On 26/01/2024 09:45, Richard Biener wrote:
On Fri, 26 Jan 2024, Richard Biener wrote:
=== libgomp Summary ===
# of expected passes29126
# of unexpected failures697
# of unexpected successes 1
# of expected failures 703
# of unresolved testcase
On 26/01/2024 10:22, Richard Biener wrote:
On Fri, 26 Jan 2024, Andrew Stubbs wrote:
On 26/01/2024 09:45, Richard Biener wrote:
On Fri, 26 Jan 2024, Richard Biener wrote:
=== libgomp Summary ===
# of expected passes29126
# of unexpected failures697
# of
On 26/01/2024 10:30, Richard Biener wrote:
When the agent reports a device ISA we don't support avoid hitting
an assert, instead report the raw integers as error. I'm not sure
whether -1 is special as I didn't figure where that field is
initialized. But I guess since agents are not rejected upf
On 26/01/2024 10:40, Richard Biener wrote:
The following avoids selecting an unsupported agent early, avoiding
later asserts when we rely on it being supported.
tested on x86_64-unknown-linux-gnu -> amdhsa-gcn on gfx1060
that's the alternative to the other patch. I do indeed seem to get
the ot
On 26/01/2024 10:39, Tobias Burnus wrote:
Hi all,
Andrew Stubbs wrote:
On 26/01/2024 07:29, Richard Biener wrote:
If you link against prebuilt objects with COV 5 it seems there's no
way to
override the COV version GCC uses? That is, do we want to add
a -mcode-object-version=... opti
On 26/01/2024 11:42, Richard Biener wrote:
The following makes the existing architecture support check work
instead of being optimized away (enum vs. -1). This avoids
later asserts when we assume such devices are never actually
used.
Tested as previously, now the error is
libgomp: GCN fatal er
On 26/01/2024 12:06, Jakub Jelinek wrote:
On Fri, Jan 26, 2024 at 01:00:28PM +0100, Richard Biener wrote:
The following avoids registering unsupported GCN offload devices
when iterating over available ones. With a Zen4 desktop CPU
you will have an IGPU (unspported) which will otherwise be made
On 26/01/2024 14:04, Richard Biener wrote:
On Fri, 26 Jan 2024, Andrew Stubbs wrote:
On 26/01/2024 12:06, Jakub Jelinek wrote:
On Fri, Jan 26, 2024 at 01:00:28PM +0100, Richard Biener wrote:
The following avoids registering unsupported GCN offload devices
when iterating over available ones
On 26/01/2024 14:21, Richard Biener wrote:
On Fri, 26 Jan 2024, Jakub Jelinek wrote:
On Fri, Jan 26, 2024 at 03:04:11PM +0100, Richard Biener wrote:
Otherwise it looks reasoanble to me, but let's see what Andrew thinks.
'n' before 'a', please. ;-)
?!
I've misspelled a word.
@@ -1443,6
On 25/01/2024 15:11, Tobias Burnus wrote:
Updated patch enclosed.
Tobias Burnus wrote:
I have now run the attached script and the result running yesterday's
build with both my patch and your patch applied.
(And the now committed gcn-hsa.h patch)
Now the result with the testscript is:
* fiji
On 26/01/2024 16:45, Tobias Burnus wrote:
Hi,
Thomas Schwinge wrote:
amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to
the docs
...
Further down in that file, we state:
@anchor{amdgcn-x-amdhsa}
@heading amdgcn-*-amdhsa
AMD GCN GPU target.
Instead o
On 26/01/2024 17:06, Tobias Burnus wrote:
Mention that gfx1030/gfx1100 are now supported.
As noted in another thread, LLVM 15's assembler is now required, before
LLVM 13.0.1 would do. (Alternatively, disabling gfx1100 support would
do.) Hence, the added link to the install documentation.
Com
On 29/01/2024 10:34, Tobias Burnus wrote:
Andrew wrote off list:
"Vector reductions don't work on RDNA, as is, but they're
supposed to be disabled by the insn condition"
This patch disables "fold_left_plus_", which is about
vectorization and in the code path shown in the backtrace.
I can
On 29/01/2024 12:50, Tobias Burnus wrote:
Andrew Stubbs wrote:
/tmp/ccrsHfVQ.mkoffload.2.s:788736:27: error: value out of range
.amdhsa_next_free_vgpr 516
^~~ [Obviously, likewise
forlibgomp.c++/..
Hmm, supposedly there are 768
On 31/01/2024 10:36, Thomas Schwinge wrote:
Hi!
OK to push "GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'",
see attached?
In pre-RDNA 3 ISA manuals, there are notes for 'DS_CMPST_[...]', like:
Caution, the order of src and cmp are the *opposite* of the
BUFFER_ATOMIC_CMPSWAP opcode
On 31/01/2024 17:12, Thomas Schwinge wrote:
Hi!
On 2018-12-12T11:52:52+, Andrew Stubbs wrote:
This patch contains the major part of the GCN back-end. [...]
--- /dev/null
+++ b/gcc/config/gcn/gcn.h
+#define FIRST_SGPR_REG 0
+#define SGPR_REGNO(N) ((N)+FIRST_SGPR_REG
On 31/01/2024 17:21, Thomas Schwinge wrote:
Hi!
On 2018-12-12T11:52:23+, Andrew Stubbs wrote:
This patch contains the machine description portion of the GCN back-end. [...]
--- /dev/null
+++ b/gcc/config/gcn/gcn.md
+;; {{{ Constants and enums
+
+; Named registers
+(define_constants
On 12/12/2023 09:02, Tobias Burnus wrote:
On 11.12.23 18:04, Andrew Stubbs wrote:
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to
ensure
that they can be unpinned safely when freed.
This
Some AMD GCN devices support an "XNACK" mode in which the device can
handle page-misses (and maybe other traps in memory instructions), but
it's not completely invisible to software.
We need this now to support OpenMP Unified Shared Memory (I plan to post
updated patches for that in January),
On 06/01/2024 21:20, Tobias Burnus wrote:
Hi Andrew,
I just spotted that this define was missing.
OK for mainline?
OK.
Andrew
This patch fixes a runtime error with offload kernels that use a lot of
registers, such as libgomp.fortran/target1.f90.
Committed to mainline.
Andrewamdgcn: Don't double-count AVGPRs
CDNA2 devices have VGPRs and AVGPRs combined into a single hardware register
file (they're seperate in CDNA1).
This patch fixes build failures with the offload toolchain since my
recent XNACK patch. The problem was simply that mkoffload made
out-of-date assumptions about the -mxnack defaults. This patch fixes the
mismatch.
Committed to mainline.
Andrewamdgcn: Don't double-count AVGPRs
CDNA2 devices h
On 07/01/2024 19:20, Tobias Burnus wrote:
ROCm meanwhile supports also some consumer cards; besides the semi-new
gfx1030, support for gfx1100 was added more recently (in ROCm 5.7.1 for
"Ubuntu 22.04 only" and without parenthesis since ROCm 6.0.0).
GCC has already very limited support for gfx10
The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all". This change means that the omp_low_lat_mem_alloc predefined
allocator no longer works (but omp_cgroup_mem_alloc still does).
libgomp/
time the omp_low_lat_mem_alloc does not work because the default
traits are incompatible (GPU low-latency memory is not accessible to
other teams). I've also included documentation and addressed the
comments from Tobias's review.
Andrew
Andrew Stubbs (3):
libgomp, nvptx: low-latency memory allocat
This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
t
This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).
Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible. This patch therefore switches the backend to use
entirely "Flat" addressing
On 04/12/2023 16:04, Tobias Burnus wrote:
On 03.12.23 01:32, Andrew Stubbs wrote:
This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The
memory
can be allocated, reallocated, and freed using a basi
r tweaks, but otherwise
the patches are the same.
The series implements device-specific allocators and adds a low-latency
allocator for both GPUs architectures.
Andrew Stubbs (3):
libgomp, nvptx: low-latency memory allocator
openmp, nvptx: low-lat memory access traits
amdgcn, libgomp: low-late
The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all". This change means that the omp_low_lat_mem_alloc predefined
allocator no longer works (but omp_cgroup_mem_alloc still does).
libgomp/
This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
t
This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).
Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible. This patch therefore switches the backend to use
entirely "Flat" addressing
@Thomas, there are questions for you below
On 22/11/2023 17:07, Tobias Burnus wrote:
Note before: Starting with TR11 alias OpenMP 6.0, OpenMP supports handling
multiple devices for allocation. It seems as if after using:
my_memspace = omp_get_device_and_host_memspace( 5 ,
omp_default_me
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.
This implementation will work OK for page-scale allocations, and finer-grained
allocations will be impl
that the
low-latency allocator patch have been committed.
An older, less compact, version of these patches is already applied to
the devel/omp/gcc-13 (OG13) branch.
OK for mainline?
Andrew
Andrew Stubbs (5):
libgomp: basic pinned memory on Linux
libgomp, openmp: Add ompx_pinned_mem_alloc
Add a new option. It's inactive until I add some follow-up patches.
gcc/ChangeLog:
* common.opt: Add -foffload-memory and its enum values.
* coretypes.h (enum offload_memory): New.
* doc/invoke.texi: Document -foffload-memory.
---
gcc/common.opt | 16 ++
This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.
The allocator is equivalent to using a custom allocator with the pinned
Use Cuda to pin memory, instead of Linux mlock, when available.
There are two advantages: firstly, this gives a significant speed boost for
NVPTX offloading, and secondly, it side-steps the usual OS ulimit/rlimit
setting.
The design adds a device independent plugin API for allocating pinned memo
Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up. The option is
intended to provide a performance boost to certain offload programs without
modifying the code.
This feature only works on Linux, at present, and simply calls mlo
This patch introduces a new custom memory allocator for use with pinned
memory (in the case where the Cuda allocator isn't available). In future,
this allocator will also be used for Unified Shared Memory. Both memories
are incompatible with the system malloc because allocated memory cannot
shar
On 12/12/2023 10:05, Tobias Burnus wrote:
Hi Andrew,
On 11.12.23 18:04, Andrew Stubbs wrote:
This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations
I've committed this patch to the devel/omp/gcc-12 branch. It fixes some
missed cases in the Unified Shared Memory implementation that were
especially noticeable in Fortran because the size of arrays are known.
This patch will have to be folded into the mainline USM patches that
were submitted
Here's a new version of the patch.
On 01/12/2022 14:16, Jakub Jelinek wrote:
+void __attribute__((noinline))
You should use noipa attribute instead of noinline on callers
which aren't declare simd (on declare simd it would prevent cloning
which is essential for the declare simd behavior), so t
This patch fixes a runtime issue I encountered with the AMD GCN Unified
Shared Memory implementation.
We were using regular malloc'd memory configured into USM mode, but
there were random intermittent crashes. I can't be completely sure, but
my best guess is that the HSA driver is using malloc
I changed it to use 128-byte alignment to match the GPU cache-lines.
Committed to OG12.
Andrew
On 11/01/2023 18:05, Andrew Stubbs wrote:
This patch fixes a runtime issue I encountered with the AMD GCN Unified
Shared Memory implementation.
We were using regular malloc'd memory confi
NVPTX offloading, and a custom allocator for better
handling of small allocations. The whole series has been bug fixed and
generally improved (mostly by Thomas :) ).
An older, less compact, version of these patches is already applied to
the devel/omp/gcc-13 (OG13) branch.
OK for mainline?
Andrew
This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.
The allocator is equivalent to using a custom allocator with the pinned
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.
This implementation will work OK for page-scale allocations, and finer-grained
allocations will be impl
Use Cuda to pin memory, instead of Linux mlock, when available.
There are two advantages: firstly, this gives a significant speed boost for
NVPTX offloading, and secondly, it side-steps the usual OS ulimit/rlimit
setting.
The design adds a device independent plugin API for allocating pinned memo
Add a new option. It's inactive until I add some follow-up patches.
gcc/ChangeLog:
* common.opt: Add -foffload-memory and its enum values.
* coretypes.h (enum offload_memory): New.
* doc/invoke.texi: Document -foffload-memory.
---
gcc/common.opt | 16 ++
Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up. The option is
intended to provide a performance boost to certain offload programs without
modifying the code.
This feature only works on Linux, at present, and simply calls mlo
This patch introduces a new custom memory allocator for use with pinned
memory (in the case where the Cuda allocator isn't available). In future,
this allocator will also be used for Unified Shared Memory. Both memories
are incompatible with the system malloc because allocated memory cannot
shar
On 07/11/2023 07:44, Juzhe-Zhong wrote:
This test shows vectorizing stmts using SLP 4 times instead of 2 for RVV.
The reason is RVV has 512 bit vector.
Here is comparison between RVV ans ARM SVE:
https://godbolt.org/z/xc5KE5rPs
But I notice AMDGCN also has 512 bit vector, seems this patch will c
tree-dump-not vect "gap of 6 elements"
It's different, but not "fixed".
Andrew
--------
juzhe.zh...@rivai.ai
*From:* Andrew Stubbs <mailto:a...@codesourcery.com>
*Date:* 2023-11-07 18:09
*To:* Juzhe-Zhong
ions, obviously).
Andrew
--------
juzhe.zh...@rivai.ai
*From:* Andrew Stubbs <mailto:a...@codesourcery.com>
*Date:* 2023-11-07 18:59
*To:* juzhe.zh...@rivai.ai <mailto:juzhe.zh...@rivai.ai>;
gcc-p
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4
"vect" { xfail { ! vect512 } } } } */
Could you try again ? If it works for you, I am gonna send V2 patch
to Richi.
Thank you so much for help.
----
ump-times "vectorizing stmts using SLP" 4
"vect" { target { vect512 } } } } */
Tested on RVV is OK.
5 PASS on amdgcn also.
Andrew
--------
juzhe.zh...@rivai.ai
*From:* Andrew Stubbs <mailto:a...@cod
I've just committed this patch to fix pr112313 (oops, I forgot to write
the number in the commit message).
The problem was a missed case in the vector reduction expand code.
Andrewamdgcn: Fix vector min/max ICE
The DImode min/max instructions need a clobber that SImode does not, so
add the spe
On 23/10/2023 11:43, Richard Biener wrote:
On Fri, 20 Oct 2023, Andrew Stubbs wrote:
This patch fixes a wrong-code bug on amdgcn in which the excess "ones" in the
mask enable extra lanes that were supposed to be unused and are therefore
undefined.
Richi suggested an alternativ
This patch makes no functional changes, but cleans up the code a little
to make way for my next patch.
The confusung "reload_in" and "reload_out" define_expand were used
solely for secondary reload and were nothing more than aliases for the
"sgprbase" instructions. I've now learned that the c
AMD GPUs since CDNA1 have had a new register file with an additional 256
32-bit-by-64-lane vector registers. This doubles the number of vector
registers on the device, compared to previous models. The way the
hardware works is that the register file is divided between all the
running threads,
This patch fixes a reload bug that's hard to reproduce reliably (so far
I've only observed it on the OG13 branch, with testcase
gcc.c-torture/compile/pr70355.c), but causes an infinite loop in reload
when it fails.
For some reason it wants to save a value from AVGPRs to memory, this
can't hap
On 24/11/2023 14:55, Thomas Schwinge wrote:
Hi!
On 2017-06-21T11:06:24+0100, Andrew Stubbs wrote:
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
+march=
+Target RejectNegative Joined ToLower Enum(gpu_type) Var(gcn_arch)
Init(PROCESSOR_CARRIZO)
+Specify the name of the target
On 24/11/2023 15:06, Thomas Schwinge wrote:
Hi!
On 2023-11-24T15:55:52+0100, I wrote:
OK to push the attached
"GCN: Tag '-march=[...]', '-mtune=[...]' as 'Negative' of themselves
[PR112669]"?
With that in place, we may then "GCN: Remove 'last_arg' spec function",
see attached; OK to push?
On 24/11/2023 16:07, Tobias Burnus wrote:
Stumbled over this.
I wondered whether we should recommend newlib >= 12e3bac3c (31st Oct
2023), but
given that gfx1030 is not yet supported, I decided against it. We can
revisit
this once newlib 4.4 is available.
This makes sense to me. We can maybe
I tried this patch for AMD GCN. We have a similar problem with excess
extends, but also for vector modes. Each lane has a minimum 32 bits and
GCC's normal assumption is that vector registers have precisely the
number of bits they need, so the amdgcn backend patterns have explicit
sign/zero exte
This fixes an ICE that affects some testsuite compiles that use vector
extensions, but probably not much real code (certainly not for offloading).
Andrewamdgcn: Disallow TImode vector permute
We don't support it and it doesn't happen without vector extensions, so
just remove the unhandled case.
On 28/11/2023 06:06, Jeff Law wrote:
- Verify we have a SUBREG before looking at SUBREG_BYTE.
The amdgcn ICE I reported still exists:
conftest.c:16:1: internal compiler error: RTL check: expected code 'subreg',
have 'reg' in ext_dce_process_uses, at ext-dce.cc:417
16 | }
| ^
0x8c7b2
On 28/11/2023 23:26, Jeff Law wrote:
On 11/28/23 15:18, Jivan Hakobyan wrote:
The amdgcn ICE I reported still exists:
Can you send a build command to reproduce ICE.
I built on x86-64, RV32/64, and did not get any faults.
THe code is clearly wrong though. We need to test that we have a s
On 29/11/2023 13:44, Thomas Schwinge wrote:
Hi!
On 2023-11-15T14:10:47+, Andrew Stubbs wrote:
* gcc.target/gcn/avgpr-mem-double.c: New test.
* gcc.target/gcn/avgpr-mem-int.c: New test.
* gcc.target/gcn/avgpr-mem-long.c: New test.
* gcc.target/gcn/avgpr-mem
On 08/09/2023 10:04, Tobias Burnus wrote:
Regarding patch 2/3 and MEMSPACE_VALIDATE.
In general, I wonder how to handle memory spaces (and traits) that
aren't supported. Namely, when to return 0L and when to silently use
ignore the trait / use another memory space.
The current omp_init_allocat
On 22/11/2023 14:26, Tobias Burnus wrote:
Hi Andrew,
Side remark:
-#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \ - calloc (1,
(((void)(MEMSPACE), (SIZE
This fits a bit more to previous patch, but I wonder whether that should
use (MEMSPACE, NMEMB, SIZE) instead - to fit to the actual calloc
Hi all,
This patch implements parallel execution of OpenMP reverse offload kernels.
The first problem was that GPU device kernels may request reverse
offload (via the "ancestor" clause) once for each running offload thread
-- of which there may be thousands -- and the existing implementation
itted to devel/omp/gcc-13.
Andrew
On 12/09/2023 15:27, Andrew Stubbs wrote:
Hi all,
This patch implements parallel execution of OpenMP reverse offload kernels.
The first problem was that GPU device kernels may request reverse
offload (via the "ancestor" clause) once for each running of
I don't have authority to approve anything, but here's a review anyway.
Thanks for working on this.
On 26/09/2023 17:24, Andre Vieira (lists) wrote:
The const attribute is ignored when simdclone's are used inbranch. This
is due to the fact that when analyzing a MASK_CALL we were not looking
at
On 27/09/2023 08:56, Andre Vieira (lists) wrote:
On 26/09/2023 17:37, Andrew Stubbs wrote:
I don't have authority to approve anything, but here's a review anyway.
Thanks for working on this.
Thank you for reviewing and apologies for the mess of a patch, may have
rushed it ;)
Some GPU compute systems allow the GPU to access host memory without much
prior setup, but that's not necessarily the fast way to do it. For shared
memory APUs this is almost certainly the correct choice, but for AMD there
is the difference between "fine-grained" and "coarse-grained" memory, and
f
approve the amdgcn patches myself, but comments are welcome.
OK for mainline? (Once the pinned memory dependencies are committed.)
Thanks
Andrew
P.S. This series includes contributions from (at least) Thomas Schwinge,
Marcel Vollweiler, Kwok Cheung Yeung, and Abid Qadeer.
From: Andrew Stubbs
Ensure that "requires unified_shared_memory" plays nicely with the
-foffload-memory options, and that enabling the option has the same effect as
enabling USM in the code.
Also adds some testcases.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_ta
From: Andrew Stubbs
This adds support for using Cuda Managed Memory with omp_alloc. It will be
used as the underpinnings for "requires unified_shared_memory" in a later
patch.
There are two new predefined allocators, ompx_gnu_unified_shared_mem_alloc and
ompx_gnu_host_mem_a
++.dg/gomp/usm-5.C: New test.
* gfortran.dg/gomp/usm-2.f90: New test.
* gfortran.dg/gomp/usm-3.f90: New test.
co-authored-by: Andrew Stubbs
---
gcc/omp-low.cc| 184 ++
gcc/passes.def| 1 +
gcc/tes
From: Andrew Stubbs
The AMD GCN runtime must be set to the correct mode for Unified Shared Memory
to work, but this is not always clear at compile and link time due to the split
nature of the offload compilation pipeline.
This patch sets a new attribute on OpenMP offload functions to ensure
From: Marcel Vollweiler
This patch handles Unified Shared Memory (USM) in the OpenMP runtime routine
omp_target_is_accessible.
libgomp/ChangeLog:
* target.c (omp_target_is_accessible): Handle unified shared memory.
* testsuite/libgomp.c-c++-common/target-is-accessible-1.c: Updat
From: Andrew Stubbs
Implement the Unified Shared Memory API calls in the GCN plugin.
The AMD equivalent of "Managed Memory" means registering previously allocated
host memory as "coarse-grained" (whereas allocating coarse-grained memory via
hsa_allocate_memory allocate
When unified shared memory is required, the default memory space should also be
unified.
libgomp/ChangeLog:
* config/linux/allocator.c (linux_memspace_alloc): Check
omp_requires_mask.
(linux_memspace_calloc): Likewise.
(linux_memspace_free): Likewise.
(linu
This is a first step towards having a device table so we can add new devices
more easily. It'll also make it easier to remove the deprecated GCN3 bits.
The patch should not change the behaviour of anything.
gcc/ChangeLog:
* config/gcn/gcn-opts.h (TARGET_GLOBAL_ADDRSPACE): New.
(
On 15/07/2024 10:29, Thomas Schwinge wrote:
Hi!
On 2021-11-12T18:58:04+0100, Jakub Jelinek via Gcc-patches
wrote:
And finally here is a third version, [...]
... which became commit 9fa72756d90e0d9edadf6e6f5f56476029925788
"libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound".
Attached h
On 15/07/2024 16:36, Thomas Schwinge wrote:
Hi!
On 2024-07-15T12:16:30+0100, Andrew Stubbs wrote:
On 15/07/2024 10:29, Thomas Schwinge wrote:
On 2021-11-12T18:58:04+0100, Jakub Jelinek via Gcc-patches
wrote:
And finally here is a third version, [...]
... which became commit
ns to-do.
Besides rebase and retest, I've addressed the review comments regarding
the enum assignments.
OK for mainline?
Andrew
Andrew Stubbs (4):
libgomp, openmp: Add ompx_pinned_mem_alloc
openmp: Add -foffload-memory
openmp: -foffload-memory=pinned
libgomp: fine-grained pinned mem
Compared to the previous v3 posting of this patch, the enumeration of
the "ompx" allocators have been moved to start at "100".
-
This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. The name uses the OpenMP extension space and is
intended to be consi
From: Thomas Schwinge
This patch was already approved, by Tobias Burnus (with one caveat about
initialization location), but wasn't committed at that time as I didn't
want to disentangle it from the textual dependencies on the other
patches in the series.
Use Cuda to pin me
Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up. The option is
intended to provide a performance boost to certain offload programs without
modifying the code.
This feature only works on Linux, at present, and simply calls mloc
Add a new option. It's inactive until I add some follow-up patches.
gcc/ChangeLog:
* common.opt: Add -foffload-memory and its enum values.
* coretypes.h (enum offload_memory): New.
* doc/invoke.texi: Document -foffload-memory.
---
gcc/common.opt | 16 +++
This patch was already approved, by Tobias Burnus, in the v3 posting,
but I've not yet committed it because there are some textual dependecies
on the yet-to-be-approved patches.
-
This patch introduces a new custom memory allocator for use with pinned
memory (in the case where
On 29/05/2024 13:15, Tobias Burnus wrote:
This patch depends (on the libgomp/target.c parts) of the patch
"[patch] libgomp: Enable USM for some nvptx devices",
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652987.html
AMD GPUs that are either APU devices or MI200 [or MI300X]
(with HSA_XNACK
On 03/06/2024 04:01, Kewen Lin wrote:
This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
defines in gcn port.
gcc/ChangeLog:
* config/gcn/gcn.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
Assuming that this does n
On 28/05/2024 23:33, Tobias Burnus wrote:
While most of the nvptx systems I have access to don't have the support
for CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES,
one has:
Tesla V100-SXM2-16GB (as installed, e.g., on ORNL's Summit) does support
this feature. And with that
On 03/06/2024 17:46, Tobias Burnus wrote:
Andrew Stubbs wrote:
+ /* If USM has been requested and is supported by all devices
+ of this type, set the capability accordingly. */
+ if (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_ME
301 - 400 of 1052 matches
Mail list logo