Re: [patch] gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC

2024-01-26 Thread Andrew Stubbs
On 25/01/2024 23:03, Tobias Burnus wrote: When targeting AMD GPUs, the LLVM assembler (and linker) are used. Two days ago LLVM changed the default for theAMDHSA code object version (COV) from 4 to 5. In principle, we do not care which COV is used as long as it works; unfortunately, "mkoffload.

Re: [patch] gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 07:29, Richard Biener wrote: On Fri, Jan 26, 2024 at 12:04 AM Tobias Burnus wrote: When targeting AMD GPUs, the LLVM assembler (and linker) are used. Two days ago LLVM changed the default for the AMDHSA code object version (COV) from 4 to 5. In principle, we do not care which C

Re: [PATCH] amdgcn: additional gfx1100 support

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 09:45, Richard Biener wrote: On Fri, 26 Jan 2024, Richard Biener wrote: === libgomp Summary === # of expected passes29126 # of unexpected failures697 # of unexpected successes 1 # of expected failures 703 # of unresolved testcase

Re: [PATCH] amdgcn: additional gfx1100 support

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 10:22, Richard Biener wrote: On Fri, 26 Jan 2024, Andrew Stubbs wrote: On 26/01/2024 09:45, Richard Biener wrote: On Fri, 26 Jan 2024, Richard Biener wrote: === libgomp Summary === # of expected passes29126 # of unexpected failures697 # of

Re: [PATCH] Avoid assert for unknown device ISAs in GCN libgomp plugin

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 10:30, Richard Biener wrote: When the agent reports a device ISA we don't support avoid hitting an assert, instead report the raw integers as error. I'm not sure whether -1 is special as I didn't figure where that field is initialized. But I guess since agents are not rejected upf

Re: [PATCH] Avoid using an unsupported agent when offloading to GCN

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 10:40, Richard Biener wrote: The following avoids selecting an unsupported agent early, avoiding later asserts when we rely on it being supported. tested on x86_64-unknown-linux-gnu -> amdhsa-gcn on gfx1060 that's the alternative to the other patch. I do indeed seem to get the ot

Re: [patch] gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 10:39, Tobias Burnus wrote: Hi all, Andrew Stubbs wrote: On 26/01/2024 07:29, Richard Biener wrote: If you link against prebuilt objects with COV 5 it seems there's no way to override the COV version GCC uses?  That is, do we want to add a -mcode-object-version=... opti

Re: [PATCH] Fix architecture support in OMP_OFFLOAD_init_device for gcn

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 11:42, Richard Biener wrote: The following makes the existing architecture support check work instead of being optimized away (enum vs. -1). This avoids later asserts when we assume such devices are never actually used. Tested as previously, now the error is libgomp: GCN fatal er

Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 12:06, Jakub Jelinek wrote: On Fri, Jan 26, 2024 at 01:00:28PM +0100, Richard Biener wrote: The following avoids registering unsupported GCN offload devices when iterating over available ones. With a Zen4 desktop CPU you will have an IGPU (unspported) which will otherwise be made

Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 14:04, Richard Biener wrote: On Fri, 26 Jan 2024, Andrew Stubbs wrote: On 26/01/2024 12:06, Jakub Jelinek wrote: On Fri, Jan 26, 2024 at 01:00:28PM +0100, Richard Biener wrote: The following avoids registering unsupported GCN offload devices when iterating over available ones

Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 14:21, Richard Biener wrote: On Fri, 26 Jan 2024, Jakub Jelinek wrote: On Fri, Jan 26, 2024 at 03:04:11PM +0100, Richard Biener wrote: Otherwise it looks reasoanble to me, but let's see what Andrew thinks. 'n' before 'a', please. ;-) ?! I've misspelled a word. @@ -1443,6

Re: [patch][v2] gcn/mkoffload.cc: Fix SRAM_ECC and XNACK handling [PR111966]

2024-01-29 Thread Andrew Stubbs
On 25/01/2024 15:11, Tobias Burnus wrote: Updated patch enclosed. Tobias Burnus wrote: I have now run the attached script and the result running yesterday's build with both my patch and your patch applied. (And the now committed gcn-hsa.h patch) Now the result with the testscript is: * fiji

Re: [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled

2024-01-29 Thread Andrew Stubbs
On 26/01/2024 16:45, Tobias Burnus wrote: Hi, Thomas Schwinge wrote: amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs ... Further down in that file, we state: @anchor{amdgcn-x-amdhsa} @heading amdgcn-*-amdhsa AMD GCN GPU target. Instead o

Re: [wwwdocs][patch] gcc-14/changes.html (amdgcn): Update for gfx1030/gfx1100

2024-01-29 Thread Andrew Stubbs
On 26/01/2024 17:06, Tobias Burnus wrote: Mention that gfx1030/gfx1100 are now supported. As noted in another thread, LLVM 15's assembler is now required, before LLVM 13.0.1 would do. (Alternatively, disabling gfx1100 support would do.) Hence, the added link to the install documentation. Com

Re: [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

2024-01-29 Thread Andrew Stubbs
On 29/01/2024 10:34, Tobias Burnus wrote: Andrew wrote off list:   "Vector reductions don't work on RDNA, as is, but they're    supposed to be disabled by the insn condition" This patch disables "fold_left_plus_", which is about vectorization and in the code path shown in the backtrace. I can

Re: [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

2024-01-29 Thread Andrew Stubbs
On 29/01/2024 12:50, Tobias Burnus wrote: Andrew Stubbs wrote: /tmp/ccrsHfVQ.mkoffload.2.s:788736:27: error: value out of range    .amdhsa_next_free_vgpr    516 ^~~ [Obviously, likewise forlibgomp.c++/.. Hmm, supposedly there are 768

Re: GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'

2024-01-31 Thread Andrew Stubbs
On 31/01/2024 10:36, Thomas Schwinge wrote: Hi! OK to push "GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'", see attached? In pre-RDNA 3 ISA manuals, there are notes for 'DS_CMPST_[...]', like: Caution, the order of src and cmp are the *opposite* of the BUFFER_ATOMIC_CMPSWAP opcode

Re: GCN: Remove 'SGPR_OR_VGPR_REGNO_P' definition

2024-01-31 Thread Andrew Stubbs
On 31/01/2024 17:12, Thomas Schwinge wrote: Hi! On 2018-12-12T11:52:52+, Andrew Stubbs wrote: This patch contains the major part of the GCN back-end. [...] --- /dev/null +++ b/gcc/config/gcn/gcn.h +#define FIRST_SGPR_REG 0 +#define SGPR_REGNO(N) ((N)+FIRST_SGPR_REG

Re: GCN: Remove 'FIRST_{SGPR,VGPR,AVGPR}_REG', 'LAST_{SGPR,VGPR,AVGPR}_REG' from machine description

2024-01-31 Thread Andrew Stubbs
On 31/01/2024 17:21, Thomas Schwinge wrote: Hi! On 2018-12-12T11:52:23+, Andrew Stubbs wrote: This patch contains the machine description portion of the GCN back-end. [...] --- /dev/null +++ b/gcc/config/gcn/gcn.md +;; {{{ Constants and enums + +; Named registers +(define_constants

Re: [PATCH v3 1/6] libgomp: basic pinned memory on Linux

2023-12-13 Thread Andrew Stubbs
On 12/12/2023 09:02, Tobias Burnus wrote: On 11.12.23 18:04, Andrew Stubbs wrote: Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall.  Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when freed. This

[committed] amdgcn: XNACK support

2023-12-13 Thread Andrew Stubbs
Some AMD GCN devices support an "XNACK" mode in which the device can handle page-misses (and maybe other traps in memory instructions), but it's not completely invisible to software. We need this now to support OpenMP Unified Shared Memory (I plan to post updated patches for that in January),

Re: [Patch] gcn.h: Add builtin_define ("__gfx1030")

2024-01-08 Thread Andrew Stubbs
On 06/01/2024 21:20, Tobias Burnus wrote: Hi Andrew, I just spotted that this define was missing. OK for mainline? OK. Andrew

[committed] amdgcn: Don't double-count AVGPRs

2024-01-08 Thread Andrew Stubbs
This patch fixes a runtime error with offload kernels that use a lot of registers, such as libgomp.fortran/target1.f90. Committed to mainline. Andrewamdgcn: Don't double-count AVGPRs CDNA2 devices have VGPRs and AVGPRs combined into a single hardware register file (they're seperate in CDNA1).

[committed] amdgcn: Match new XNACK defaults in mkoffload

2024-01-08 Thread Andrew Stubbs
This patch fixes build failures with the offload toolchain since my recent XNACK patch. The problem was simply that mkoffload made out-of-date assumptions about the -mxnack defaults. This patch fixes the mismatch. Committed to mainline. Andrewamdgcn: Don't double-count AVGPRs CDNA2 devices h

Re: [Patch] GCN: Add pre-initial support for gfx1100

2024-01-08 Thread Andrew Stubbs
On 07/01/2024 19:20, Tobias Burnus wrote: ROCm meanwhile supports also some consumer cards; besides the semi-new gfx1030, support for gfx1100 was added more recently (in ROCm 5.7.1 for "Ubuntu 22.04 only" and without parenthesis since ROCm 6.0.0). GCC has already very limited support for gfx10

[PATCH v3 2/3] openmp, nvptx: low-lat memory access traits

2023-12-02 Thread Andrew Stubbs
The NVPTX low latency memory is not accessible outside the team that allocates it, and therefore should be unavailable for allocators with the access trait "all". This change means that the omp_low_lat_mem_alloc predefined allocator no longer works (but omp_cgroup_mem_alloc still does). libgomp/

[PATCH v3 0/3] libgomp: OpenMP low-latency omp_alloc

2023-12-02 Thread Andrew Stubbs
time the omp_low_lat_mem_alloc does not work because the default traits are incompatible (GPU low-latency memory is not accessible to other teams). I've also included documentation and addressed the comments from Tobias's review. Andrew Andrew Stubbs (3): libgomp, nvptx: low-latency memory allocat

[PATCH v3 1/3] libgomp, nvptx: low-latency memory allocator

2023-12-02 Thread Andrew Stubbs
This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using t

[PATCH v3 3/3] amdgcn, libgomp: low-latency allocator

2023-12-02 Thread Andrew Stubbs
This implements the OpenMP low-latency memory allocator for AMD GCN using the small per-team LDS memory (Local Data Store). Since addresses can now refer to LDS space, the "Global" address space is no-longer compatible. This patch therefore switches the backend to use entirely "Flat" addressing

Re: [PATCH v3 1/3] libgomp, nvptx: low-latency memory allocator

2023-12-05 Thread Andrew Stubbs
On 04/12/2023 16:04, Tobias Burnus wrote: On 03.12.23 01:32, Andrew Stubbs wrote: This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory can be allocated, reallocated, and freed using a basi

[committed v4 0/3] libgomp: OpenMP low-latency omp_alloc

2023-12-06 Thread Andrew Stubbs
r tweaks, but otherwise the patches are the same. The series implements device-specific allocators and adds a low-latency allocator for both GPUs architectures. Andrew Stubbs (3): libgomp, nvptx: low-latency memory allocator openmp, nvptx: low-lat memory access traits amdgcn, libgomp: low-late

[committed v4 2/3] openmp, nvptx: low-lat memory access traits

2023-12-06 Thread Andrew Stubbs
The NVPTX low latency memory is not accessible outside the team that allocates it, and therefore should be unavailable for allocators with the access trait "all". This change means that the omp_low_lat_mem_alloc predefined allocator no longer works (but omp_cgroup_mem_alloc still does). libgomp/

[committed v4 1/3] libgomp, nvptx: low-latency memory allocator

2023-12-06 Thread Andrew Stubbs
This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using t

[committed v4 3/3] amdgcn, libgomp: low-latency allocator

2023-12-06 Thread Andrew Stubbs
This implements the OpenMP low-latency memory allocator for AMD GCN using the small per-team LDS memory (Local Data Store). Since addresses can now refer to LDS space, the "Global" address space is no-longer compatible. This patch therefore switches the backend to use entirely "Flat" addressing

Re: [PATCH v2 5/6] libgomp, nvptx: Cuda pinned memory

2023-12-07 Thread Andrew Stubbs
@Thomas, there are questions for you below On 22/11/2023 17:07, Tobias Burnus wrote: Note before: Starting with TR11 alias OpenMP 6.0, OpenMP supports handling multiple devices for allocation. It seems as if after using:   my_memspace = omp_get_device_and_host_memspace( 5 , omp_default_me

[PATCH v3 1/6] libgomp: basic pinned memory on Linux

2023-12-11 Thread Andrew Stubbs
Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when freed. This implementation will work OK for page-scale allocations, and finer-grained allocations will be impl

[PATCH v3 0/6] libgomp: OpenMP pinned memory omp_alloc

2023-12-11 Thread Andrew Stubbs
that the low-latency allocator patch have been committed. An older, less compact, version of these patches is already applied to the devel/omp/gcc-13 (OG13) branch. OK for mainline? Andrew Andrew Stubbs (5): libgomp: basic pinned memory on Linux libgomp, openmp: Add ompx_pinned_mem_alloc

[PATCH v3 3/6] openmp: Add -foffload-memory

2023-12-11 Thread Andrew Stubbs
Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16 ++

[PATCH v3 2/6] libgomp, openmp: Add ompx_pinned_mem_alloc

2023-12-11 Thread Andrew Stubbs
This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. The name uses the OpenMP extension space and is intended to be consistent with other OpenMP implementations currently in development. The allocator is equivalent to using a custom allocator with the pinned

[PATCH v3 5/6] libgomp, nvptx: Cuda pinned memory

2023-12-11 Thread Andrew Stubbs
Use Cuda to pin memory, instead of Linux mlock, when available. There are two advantages: firstly, this gives a significant speed boost for NVPTX offloading, and secondly, it side-steps the usual OS ulimit/rlimit setting. The design adds a device independent plugin API for allocating pinned memo

[PATCH v3 4/6] openmp: -foffload-memory=pinned

2023-12-11 Thread Andrew Stubbs
Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls mlo

[PATCH v3 6/6] libgomp: fine-grained pinned memory allocator

2023-12-11 Thread Andrew Stubbs
This patch introduces a new custom memory allocator for use with pinned memory (in the case where the Cuda allocator isn't available). In future, this allocator will also be used for Unified Shared Memory. Both memories are incompatible with the system malloc because allocated memory cannot shar

Re: [PATCH v3 2/6] libgomp, openmp: Add ompx_pinned_mem_alloc

2023-12-12 Thread Andrew Stubbs
On 12/12/2023 10:05, Tobias Burnus wrote: Hi Andrew, On 11.12.23 18:04, Andrew Stubbs wrote: This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP.  The name uses the OpenMP extension space and is intended to be consistent with other OpenMP implementations

[OG12][committed] libgomp: Fix USM bugs

2022-12-16 Thread Andrew Stubbs
I've committed this patch to the devel/omp/gcc-12 branch. It fixes some missed cases in the Unified Shared Memory implementation that were especially noticeable in Fortran because the size of arrays are known. This patch will have to be folded into the mainline USM patches that were submitted

Re: [PATCH 3/3] vect: inbranch SIMD clones

2023-01-06 Thread Andrew Stubbs
Here's a new version of the patch. On 01/12/2022 14:16, Jakub Jelinek wrote: +void __attribute__((noinline)) You should use noipa attribute instead of noinline on callers which aren't declare simd (on declare simd it would prevent cloning which is essential for the declare simd behavior), so t

[OG12][committed] amdgcn, libgomp: custom USM allocator

2023-01-11 Thread Andrew Stubbs
This patch fixes a runtime issue I encountered with the AMD GCN Unified Shared Memory implementation. We were using regular malloc'd memory configured into USM mode, but there were random intermittent crashes. I can't be completely sure, but my best guess is that the HSA driver is using malloc

Re: [OG12][committed] amdgcn, libgomp: custom USM allocator

2023-01-13 Thread Andrew Stubbs
I changed it to use 128-byte alignment to match the GPU cache-lines. Committed to OG12. Andrew On 11/01/2023 18:05, Andrew Stubbs wrote: This patch fixes a runtime issue I encountered with the AMD GCN Unified Shared Memory implementation. We were using regular malloc'd memory confi

[PATCH v2 0/6] libgomp: OpenMP pinned memory omp_alloc

2023-08-23 Thread Andrew Stubbs
NVPTX offloading, and a custom allocator for better handling of small allocations. The whole series has been bug fixed and generally improved (mostly by Thomas :) ). An older, less compact, version of these patches is already applied to the devel/omp/gcc-13 (OG13) branch. OK for mainline? Andrew

[PATCH v2 2/6] libgomp, openmp: Add ompx_pinned_mem_alloc

2023-08-23 Thread Andrew Stubbs
This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. The name uses the OpenMP extension space and is intended to be consistent with other OpenMP implementations currently in development. The allocator is equivalent to using a custom allocator with the pinned

[PATCH v2 1/6] libgomp: basic pinned memory on Linux

2023-08-23 Thread Andrew Stubbs
Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when freed. This implementation will work OK for page-scale allocations, and finer-grained allocations will be impl

[PATCH v2 5/6] libgomp, nvptx: Cuda pinned memory

2023-08-23 Thread Andrew Stubbs
Use Cuda to pin memory, instead of Linux mlock, when available. There are two advantages: firstly, this gives a significant speed boost for NVPTX offloading, and secondly, it side-steps the usual OS ulimit/rlimit setting. The design adds a device independent plugin API for allocating pinned memo

[PATCH v2 3/6] openmp: Add -foffload-memory

2023-08-23 Thread Andrew Stubbs
Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16 ++

[PATCH v2 4/6] openmp: -foffload-memory=pinned

2023-08-23 Thread Andrew Stubbs
Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls mlo

[PATCH v2 6/6] libgomp: fine-grained pinned memory allocator

2023-08-23 Thread Andrew Stubbs
This patch introduces a new custom memory allocator for use with pinned memory (in the case where the Cuda allocator isn't available). In future, this allocator will also be used for Unified Shared Memory. Both memories are incompatible with the system malloc because allocated memory cannot shar

Re: [PATCH] test: Fix FAIL of pr97428.c for RVV

2023-11-07 Thread Andrew Stubbs
On 07/11/2023 07:44, Juzhe-Zhong wrote: This test shows vectorizing stmts using SLP 4 times instead of 2 for RVV. The reason is RVV has 512 bit vector. Here is comparison between RVV ans ARM SVE: https://godbolt.org/z/xc5KE5rPs But I notice AMDGCN also has 512 bit vector, seems this patch will c

Re: [PATCH] test: Fix FAIL of pr97428.c for RVV

2023-11-07 Thread Andrew Stubbs
tree-dump-not vect "gap of 6 elements" It's different, but not "fixed". Andrew -------- juzhe.zh...@rivai.ai *From:* Andrew Stubbs <mailto:a...@codesourcery.com> *Date:* 2023-11-07 18:09 *To:* Juzhe-Zhong

Re: [PATCH] test: Fix FAIL of pr97428.c for RVV

2023-11-07 Thread Andrew Stubbs
ions, obviously). Andrew -------- juzhe.zh...@rivai.ai *From:* Andrew Stubbs <mailto:a...@codesourcery.com> *Date:* 2023-11-07 18:59 *To:* juzhe.zh...@rivai.ai <mailto:juzhe.zh...@rivai.ai>; gcc-p

Re: [PATCH] test: Fix FAIL of pr97428.c for RVV

2023-11-07 Thread Andrew Stubbs
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { xfail { ! vect512 } } } } */ Could you try again ? If it works for you, I am gonna send V2 patch to Richi. Thank you so much for help. ----

Re: [PATCH] test: Fix FAIL of pr97428.c for RVV

2023-11-07 Thread Andrew Stubbs
ump-times "vectorizing stmts using SLP" 4 "vect" { target { vect512 } } } } */ Tested on RVV is OK. 5 PASS on amdgcn also. Andrew -------- juzhe.zh...@rivai.ai *From:* Andrew Stubbs <mailto:a...@cod

[committed] amdgcn: Fix vector min/max ICE (pr112313)

2023-11-10 Thread Andrew Stubbs
I've just committed this patch to fix pr112313 (oops, I forgot to write the number in the commit message). The problem was a missed case in the vector reduction expand code. Andrewamdgcn: Fix vector min/max ICE The DImode min/max instructions need a clobber that SImode does not, so add the spe

Re: [PATCH] vect: Don't set excess bits in unform masks

2023-11-10 Thread Andrew Stubbs
On 23/10/2023 11:43, Richard Biener wrote: On Fri, 20 Oct 2023, Andrew Stubbs wrote: This patch fixes a wrong-code bug on amdgcn in which the excess "ones" in the mask enable extra lanes that were supposed to be unused and are therefore undefined. Richi suggested an alternativ

[committed] amdgcn: simplify secondary reload patterns

2023-11-15 Thread Andrew Stubbs
This patch makes no functional changes, but cleans up the code a little to make way for my next patch. The confusung "reload_in" and "reload_out" define_expand were used solely for secondary reload and were nothing more than aliases for the "sgprbase" instructions. I've now learned that the c

[committed] amdgcn: Add Accelerator VGPR registers

2023-11-15 Thread Andrew Stubbs
AMD GPUs since CDNA1 have had a new register file with an additional 256 32-bit-by-64-lane vector registers. This doubles the number of vector registers on the device, compared to previous models. The way the hardware works is that the register file is divided between all the running threads,

[committed] amdgcn: Fix vector TImode reload loop

2023-11-22 Thread Andrew Stubbs
This patch fixes a reload bug that's hard to reproduce reliably (so far I've only observed it on the OG13 branch, with testcase gcc.c-torture/compile/pr70355.c), but causes an infinite loop in reload when it fails. For some reason it wants to save a value from AVGPRs to memory, this can't hap

Re: GCN: Tag '-march=[...]', '-mtune=[...]' as 'Negative' of themselves [PR112669]

2023-11-24 Thread Andrew Stubbs
On 24/11/2023 14:55, Thomas Schwinge wrote: Hi! On 2017-06-21T11:06:24+0100, Andrew Stubbs wrote: --- a/gcc/config/gcn/gcn.opt +++ b/gcc/config/gcn/gcn.opt +march= +Target RejectNegative Joined ToLower Enum(gpu_type) Var(gcn_arch) Init(PROCESSOR_CARRIZO) +Specify the name of the target

Re: GCN: Remove 'last_arg' spec function

2023-11-24 Thread Andrew Stubbs
On 24/11/2023 15:06, Thomas Schwinge wrote: Hi! On 2023-11-24T15:55:52+0100, I wrote: OK to push the attached "GCN: Tag '-march=[...]', '-mtune=[...]' as 'Negative' of themselves [PR112669]"? With that in place, we may then "GCN: Remove 'last_arg' spec function", see attached; OK to push?

Re: [patch][GCN] install.texi: Update GCN entry - @uref and LLVM version remark

2023-11-24 Thread Andrew Stubbs
On 24/11/2023 16:07, Tobias Burnus wrote: Stumbled over this. I wondered whether we should recommend newlib >= 12e3bac3c (31st Oct 2023), but given that gfx1030 is not yet supported, I decided against it. We can revisit this once newlib 4.4 is available. This makes sense to me. We can maybe

Re: [RFA] New pass for sign/zero extension elimination

2023-11-27 Thread Andrew Stubbs
I tried this patch for AMD GCN. We have a similar problem with excess extends, but also for vector modes. Each lane has a minimum 32 bits and GCC's normal assumption is that vector registers have precisely the number of bits they need, so the amdgcn backend patterns have explicit sign/zero exte

[committed] amdgcn: Disallow TImode vector permute

2023-11-27 Thread Andrew Stubbs
This fixes an ICE that affects some testsuite compiles that use vector extensions, but probably not much real code (certainly not for offloading). Andrewamdgcn: Disallow TImode vector permute We don't support it and it doesn't happen without vector extensions, so just remove the unhandled case.

Re: [V2] New pass for sign/zero extension elimination -- not ready for "final" review

2023-11-28 Thread Andrew Stubbs
On 28/11/2023 06:06, Jeff Law wrote: - Verify we have a SUBREG before looking at SUBREG_BYTE. The amdgcn ICE I reported still exists: conftest.c:16:1: internal compiler error: RTL check: expected code 'subreg', have 'reg' in ext_dce_process_uses, at ext-dce.cc:417 16 | } | ^ 0x8c7b2

Re: [V2] New pass for sign/zero extension elimination -- not ready for "final" review

2023-11-29 Thread Andrew Stubbs
On 28/11/2023 23:26, Jeff Law wrote: On 11/28/23 15:18, Jivan Hakobyan wrote:     The amdgcn ICE I reported still exists: Can you send a build command to reproduce ICE. I built on x86-64, RV32/64, and did not get any faults. THe code is clearly wrong though.  We need to test that we have a s

Re: GCN: Generally enable the 'gcc.target/gcn/avgpr-[...]' test cases

2023-11-29 Thread Andrew Stubbs
On 29/11/2023 13:44, Thomas Schwinge wrote: Hi! On 2023-11-15T14:10:47+, Andrew Stubbs wrote: * gcc.target/gcn/avgpr-mem-double.c: New test. * gcc.target/gcn/avgpr-mem-int.c: New test. * gcc.target/gcn/avgpr-mem-long.c: New test. * gcc.target/gcn/avgpr-mem

Re: [PATCH v2 1/3] libgomp, nvptx: low-latency memory allocator

2023-11-29 Thread Andrew Stubbs
On 08/09/2023 10:04, Tobias Burnus wrote: Regarding patch 2/3 and MEMSPACE_VALIDATE. In general, I wonder how to handle memory spaces (and traits) that aren't supported. Namely, when to return 0L and when to silently use ignore the trait / use another memory space. The current omp_init_allocat

Re: [PATCH v2 1/6] libgomp: basic pinned memory on Linux

2023-11-29 Thread Andrew Stubbs
On 22/11/2023 14:26, Tobias Burnus wrote: Hi Andrew, Side remark: -#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \ - calloc (1, (((void)(MEMSPACE), (SIZE This fits a bit more to previous patch, but I wonder whether that should use (MEMSPACE, NMEMB, SIZE) instead - to fit to the actual calloc

[PATCH] libgomp, nvptx, amdgcn: parallel reverse offload

2023-09-12 Thread Andrew Stubbs
Hi all, This patch implements parallel execution of OpenMP reverse offload kernels. The first problem was that GPU device kernels may request reverse offload (via the "ancestor" clause) once for each running offload thread -- of which there may be thousands -- and the existing implementation

[OG13][committed] libgomp, nvptx, amdgcn: parallel reverse offload

2023-09-12 Thread Andrew Stubbs
itted to devel/omp/gcc-13. Andrew On 12/09/2023 15:27, Andrew Stubbs wrote: Hi all, This patch implements parallel execution of OpenMP reverse offload kernels. The first problem was that GPU device kernels may request reverse offload (via the "ancestor" clause) once for each running of

Re: [PATCH] vect, omp: inbranch simdclone dropping const

2023-09-26 Thread Andrew Stubbs
I don't have authority to approve anything, but here's a review anyway. Thanks for working on this. On 26/09/2023 17:24, Andre Vieira (lists) wrote: The const attribute is ignored when simdclone's are used inbranch. This is due to the fact that when analyzing a MASK_CALL we were not looking at

Re: [PATCH] vect, omp: inbranch simdclone dropping const

2023-09-27 Thread Andrew Stubbs
On 27/09/2023 08:56, Andre Vieira (lists) wrote: On 26/09/2023 17:37, Andrew Stubbs wrote: I don't have authority to approve anything, but here's a review anyway. Thanks for working on this. Thank you for reviewing and apologies for the mess of a patch, may have rushed it ;)

[PATCH v2 1/8] libgomp: Disentangle shared memory from managed

2024-06-28 Thread Andrew Stubbs
Some GPU compute systems allow the GPU to access host memory without much prior setup, but that's not necessarily the fast way to do it. For shared memory APUs this is almost certainly the correct choice, but for AMD there is the difference between "fine-grained" and "coarse-grained" memory, and f

[PATCH v2 0/8] OpenMP: Unified Shared Memory via Managed Memory

2024-06-28 Thread Andrew Stubbs
approve the amdgcn patches myself, but comments are welcome. OK for mainline? (Once the pinned memory dependencies are committed.) Thanks Andrew P.S. This series includes contributions from (at least) Thomas Schwinge, Marcel Vollweiler, Kwok Cheung Yeung, and Abid Qadeer.

[PATCH v2 3/8] openmp: Enable -foffload-memory=unified

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs Ensure that "requires unified_shared_memory" plays nicely with the -foffload-memory options, and that enabling the option has the same effect as enabling USM in the code. Also adds some testcases. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_ta

[PATCH v2 2/8] openmp, nvptx: ompx_gnu_unified_shared_mem_alloc

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs This adds support for using Cuda Managed Memory with omp_alloc. It will be used as the underpinnings for "requires unified_shared_memory" in a later patch. There are two new predefined allocators, ompx_gnu_unified_shared_mem_alloc and ompx_gnu_host_mem_a

[PATCH v2 4/8] openmp: Use libgomp memory allocation functions with unified shared memory.

2024-06-28 Thread Andrew Stubbs
++.dg/gomp/usm-5.C: New test. * gfortran.dg/gomp/usm-2.f90: New test. * gfortran.dg/gomp/usm-3.f90: New test. co-authored-by: Andrew Stubbs --- gcc/omp-low.cc| 184 ++ gcc/passes.def| 1 + gcc/tes

[PATCH v2 5/8] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs The AMD GCN runtime must be set to the correct mode for Unified Shared Memory to work, but this is not always clear at compile and link time due to the split nature of the offload compilation pipeline. This patch sets a new attribute on OpenMP offload functions to ensure

[PATCH v2 7/8] openmp, libgomp: Handle unified shared memory in omp_target_is_accessible

2024-06-28 Thread Andrew Stubbs
From: Marcel Vollweiler This patch handles Unified Shared Memory (USM) in the OpenMP runtime routine omp_target_is_accessible. libgomp/ChangeLog: * target.c (omp_target_is_accessible): Handle unified shared memory. * testsuite/libgomp.c-c++-common/target-is-accessible-1.c: Updat

[PATCH v2 6/8] amdgcn: libgomp plugin USM implementation

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs Implement the Unified Shared Memory API calls in the GCN plugin. The AMD equivalent of "Managed Memory" means registering previously allocated host memory as "coarse-grained" (whereas allocating coarse-grained memory via hsa_allocate_memory allocate

[PATCH v2 8/8] libgomp: Map omp_default_mem_space to USM

2024-06-28 Thread Andrew Stubbs
When unified shared memory is required, the default memory space should also be unified. libgomp/ChangeLog: * config/linux/allocator.c (linux_memspace_alloc): Check omp_requires_mask. (linux_memspace_calloc): Likewise. (linux_memspace_free): Likewise. (linu

[committed] amdgcn: invent target feature flags

2024-07-02 Thread Andrew Stubbs
This is a first step towards having a device table so we can add new devices more easily. It'll also make it easier to remove the deprecated GCN3 bits. The patch should not change the behaviour of anything. gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_GLOBAL_ADDRSPACE): New. (

Re: GCN: Honor OpenMP 5.1 'num_teams' lower bound

2024-07-15 Thread Andrew Stubbs
On 15/07/2024 10:29, Thomas Schwinge wrote: Hi! On 2021-11-12T18:58:04+0100, Jakub Jelinek via Gcc-patches wrote: And finally here is a third version, [...] ... which became commit 9fa72756d90e0d9edadf6e6f5f56476029925788 "libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound". Attached h

Re: GCN: Honor OpenMP 5.1 'num_teams' lower bound

2024-07-15 Thread Andrew Stubbs
On 15/07/2024 16:36, Thomas Schwinge wrote: Hi! On 2024-07-15T12:16:30+0100, Andrew Stubbs wrote: On 15/07/2024 10:29, Thomas Schwinge wrote: On 2021-11-12T18:58:04+0100, Jakub Jelinek via Gcc-patches wrote: And finally here is a third version, [...] ... which became commit

[PATCH v4 0/5] libgomp: OpenMP pinned memory for omp_alloc

2024-05-31 Thread Andrew Stubbs
ns to-do. Besides rebase and retest, I've addressed the review comments regarding the enum assignments. OK for mainline? Andrew Andrew Stubbs (4): libgomp, openmp: Add ompx_pinned_mem_alloc openmp: Add -foffload-memory openmp: -foffload-memory=pinned libgomp: fine-grained pinned mem

[PATCH v4 1/5] libgomp, openmp: Add ompx_pinned_mem_alloc

2024-05-31 Thread Andrew Stubbs
Compared to the previous v3 posting of this patch, the enumeration of the "ompx" allocators have been moved to start at "100". - This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. The name uses the OpenMP extension space and is intended to be consi

[PATCH v4 4/5] libgomp, nvptx: Cuda pinned memory

2024-05-31 Thread Andrew Stubbs
From: Thomas Schwinge This patch was already approved, by Tobias Burnus (with one caveat about initialization location), but wasn't committed at that time as I didn't want to disentangle it from the textual dependencies on the other patches in the series. Use Cuda to pin me

[PATCH v4 3/5] openmp: -foffload-memory=pinned

2024-05-31 Thread Andrew Stubbs
Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls mloc

[PATCH v4 2/5] openmp: Add -foffload-memory

2024-05-31 Thread Andrew Stubbs
Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16 +++

[PATCH v4 5/5] libgomp: fine-grained pinned memory allocator

2024-05-31 Thread Andrew Stubbs
This patch was already approved, by Tobias Burnus, in the v3 posting, but I've not yet committed it because there are some textual dependecies on the yet-to-be-approved patches. - This patch introduces a new custom memory allocator for use with pinned memory (in the case where

Re: [patch] libgomp: Enable USM for AMD APUs and MI200 devices

2024-05-31 Thread Andrew Stubbs
On 29/05/2024 13:15, Tobias Burnus wrote: This patch depends (on the libgomp/target.c parts) of the patch "[patch] libgomp: Enable USM for some nvptx devices", https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652987.html AMD GPUs that are either APU devices or MI200 [or MI300X] (with HSA_XNACK

Re: [PATCH 17/52] gcn: Remove macros {FLOAT, DOUBLE, LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Andrew Stubbs
On 03/06/2024 04:01, Kewen Lin wrote: This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE defines in gcn port. gcc/ChangeLog: * config/gcn/gcn.h (FLOAT_TYPE_SIZE): Remove. (DOUBLE_TYPE_SIZE): Likewise. (LONG_DOUBLE_TYPE_SIZE): Likewise. Assuming that this does n

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Andrew Stubbs
On 28/05/2024 23:33, Tobias Burnus wrote: While most of the nvptx systems I have access to don't have the support for CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES, one has: Tesla V100-SXM2-16GB (as installed, e.g., on ORNL's Summit) does support this feature. And with that

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Andrew Stubbs
On 03/06/2024 17:46, Tobias Burnus wrote: Andrew Stubbs wrote: +    /* If USM has been requested and is supported by all devices +   of this type, set the capability accordingly.  */ +    if (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_ME

<    1   2   3   4   5   6   7   8   9   10   >