Re: [Patch] libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async [PR120444]

2025-06-02 Thread Andrew Stubbs
On 02/06/2025 15:40, Tobias Burnus wrote: Hi Andrew, Andrew Stubbs wrote: The hsa_memory_copy API is known to be slow, so for smaller data sizes it's probably better to have one hsa_memory_copy replace the whole memset than use three API calls, even with setting up some host-side memo

Re: [Patch] libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async [PR120444]

2025-06-02 Thread Andrew Stubbs
On 30/05/2025 23:36, Tobias Burnus wrote: Attached patch adds omp_target_memset and omp_target_memset_async permitting to set (potentially large) data on the device to a certain value - in particular to '\0'. It uses 'memset' on the host (and for shared memory, e.g. via requires unified_shared_m

[PATCH] OpenMP, GCN: Add interop-hsa testcase

2025-04-25 Thread Andrew Stubbs
This testcase ensures that the interop HSA support is sufficient to run a kernel manually on the same device. It reuses an OpenMP kernel in order to avoid all the complication of compiling a custom kernel in Dejagnu (although, this does mean matching the OpenMP runtime environment, which might be

Re: [PATCH] GCN, nvptx offloading: Host/device compatibility: Itanium C++ ABI, DSO Object Destruction API [PR119853, PR119854]

2025-04-24 Thread Andrew Stubbs
On 23/04/2025 20:49, Thomas Schwinge wrote: '__dso_handle' for '__cxa_atexit', '__cxa_finalize'. See . PR target/119853 PR target/119854 libgcc/ * config/gcn/crt0.c (_fini_array): Call '__GCC_of

Re: [PATCH] GCN: Properly switch sections in 'gcn_hsa_declare_function_name' [PR119737]

2025-04-23 Thread Andrew Stubbs
On 22/04/2025 21:41, Thomas Schwinge wrote: From: Andrew Pinski There are GCN/C++ target as well as offloading codes, where the hard-coded section names in 'gcn_hsa_declare_function_name' do not fit, and assembly thus fails: LLVM ERROR: Size expression must be absolute. This commit progr

Re: [wwwdocs][Patch] gcc-15/changes: Fortran + offload (C++) update | project/gomp: GCC 15 update

2025-04-17 Thread Andrew Stubbs
On 17/04/2025 15:10, Tobias Burnus wrote: Hi all, @Fortraners: Comments to the added 'do concurrent' item? @Thomas: Are you fine with this C++ wording? @Andrew: Likewise for C++ and ROCm bump? This part is fine with me. Andrew Anyone: comments are welcome. Affected pages: * https://gcc.

Re: [PATCH] testsuite: force AMDGCN test for vect-early-break_18.c to consistent architecture [PR119286]

2025-04-16 Thread Andrew Stubbs
On 16/04/2025 08:57, Tamar Christina wrote: Hi All, The given test is intended to test vectorization of a strided access done by having a step of > 1. GCN target doesn't support load lanes, so the testcase is expected to fail, other targets create a permuted load here which we then then reject.

Re: GCN, nvptx libstdc++: Force use of '__atomic' builtins [PR119645]

2025-04-07 Thread Andrew Stubbs
On 07/04/2025 09:07, Thomas Schwinge wrote: Hi! On 2025-03-14T11:39:20+0100, I wrote: As the first of a few patches to enable libstdc++ for GCN, nvptx targets, [...] some more fine-tuning is to follow later on.) Any comments before I push the attached "GCN, nvptx libstdc++: Force use of '_

Re: [PATCH 2/2] GCN: Don't emit weak undefined symbols [PR119369]

2025-04-04 Thread Andrew Stubbs
On 31/03/2025 10:48, Thomas Schwinge wrote: This resolves all instances of PR119369 "GCN: weak undefined symbols -> execution test FAIL, 'HSA_STATUS_ERROR_VARIABLE_UNDEFINED'"; for all affected test cases, the execution test status progresses FAIL -> PASS. This however also causes a small numbe

Re: [Patch] install.texi: gcn - suggest to use Newlib with simd math fix [PR119325]

2025-03-25 Thread Andrew Stubbs
On 25/03/2025 12:05, Tobias Burnus wrote: A GCC 15 regression turned out to be a bug in Newlib related to undefined behavior that just started to trigger in some cases. As it is now fixed, it makes IMHO sense to mention that Newlib commit in GCC's install documentation for AMD GPUs. Comments, s

Re: [PATCH] Fix GCN SIMD libm bug

2025-03-20 Thread Andrew Stubbs
I meant to send this to the newlib list. Apparently yesterday was a bad day for sending emails correctly. :( Andrew On 19/03/2025 15:04, Andrew Stubbs wrote: Since January, GCC has been miscompiling Newlib libm on AMD GCN due to undefined behaviour in the RESIZE_VECTOR macro. It was "wo

[PATCH] Fix GCN SIMD libm bug

2025-03-19 Thread Andrew Stubbs
Since January, GCC has been miscompiling Newlib libm on AMD GCN due to undefined behaviour in the RESIZE_VECTOR macro. It was "working" but expanding the size of a vector would no longer zero the additional lanes, as it expected. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119325 --- newlib

Re: GCN, nvptx: Allow for "hosted" libstdc++ build

2025-03-14 Thread Andrew Stubbs
On 14/03/2025 10:39, Thomas Schwinge wrote: Hi! As the first of a few patches to enable libstdc++ for GCN, nvptx targets, and eventually for OpenACC, OpenMP offloading use, I intend to push the attached 'GCN, nvptx: Allow for "hosted" libstdc++ build'. Any objections? It's not exactly pretty,

Re: [Patch] libgomp/plugin: Add initial interop support to nvptx + gcn

2025-03-11 Thread Andrew Stubbs
On 10/03/2025 21:48, Tobias Burnus wrote: This patch requires the to be submitted GOMP_interop patch, which handles the generic libgomp parts. But once it is available, this patch adds support for the foreign runtimes cuda/cuda_driver/hip for nvptx and hip/hsa for gcn. The patch is based on my o

Re: [wwwdocs] gcc-15/changes.html: Update AMD GPU (GCN) section for new gfx*

2025-02-14 Thread Andrew Stubbs
On 14/02/2025 09:02, Tobias Burnus wrote: Update https://gcc.gnu.org/gcc-15/changes.html#amdgcn for the newly added generic support and the GPUs compatible with the generic devices. OK? To have clickable links: In the patch, both https://gcc.gnu.org/ onlinedocs/gcc/AMD-GCN-Options.html and ht

Re: [Patch] [gcn] install.texi: Update for new ISA targets and their requirements

2025-02-10 Thread Andrew Stubbs
On 10/02/2025 15:50, Tobias Burnus wrote: Update the GCN install documentation for added ISAs, especially as no longer all supported ISA are enabled by default.  And for the (ROCm wise: upcoming) generic support. OK for mainline? +By default, multilib support is build for @code{gfx900}, @code

Re: [Patch] [gcn] mkoffload.cc: Print fatal error if -march has no multilib but generic has

2025-02-10 Thread Andrew Stubbs
On 10/02/2025 15:24, Tobias Burnus wrote: Hi all, Andrew and I discussed about the following: Andrew Stubbs wrote: This business of changing the -march flag from what the user specified is also questionable. Result: the new patch (attached) no longer automatically chooses the associated

Re: [Patch] [gcn] mkoffload.cc: Automatically use gfx*-generic if -march has no multilib but generic has

2025-02-07 Thread Andrew Stubbs
On 07/02/2025 12:53, Tobias Burnus wrote: Hi Andrew, Andrew Stubbs wrote: I think the correct place for this whole concept might be in the MULTILIB_MATCHES configuration option, not in mkoffload. In any case, mkoffload needs to know about this; if only the driver ('gcc') knows ab

Re: [Patch][v2] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c

2025-02-07 Thread Andrew Stubbs
On 07/02/2025 11:44, Tobias Burnus wrote: Andrew Stubbs wrote: On 07/02/2025 09:40, Tobias Burnus wrote: This patch permits loading generic ISA code objects - by just trying whether the runtime accepts it.  If not, it fails with an error. - The error messages should be a bit more helpful in

Re: [Patch] [gcn] mkoffload.cc: Automatically use gfx*-generic if -march has no multilib but generic has

2025-02-07 Thread Andrew Stubbs
On 07/02/2025 10:17, Tobias Burnus wrote: This patch is part of the following series (not yet in mainline); this patch depends on the first one, but only makes sense if both are in: * "[gcn] Add gfx9-generic and generic-associated gfx*"   (email subject: "Re: [Patch] [GCN] Handle generic ISA na

Re: [Patch] [gcn] Fix the output amdhsa.version

2025-02-07 Thread Andrew Stubbs
On 07/02/2025 11:16, Tobias Burnus wrote: Andrew Stubbs wrote: Otherwise, this patch seems fine (I have not reviewed the new magic numbers and settings.) As Andrew mentioned via chat, we also have to update the 'amdhsa.version'. Well, that's what the attached patch does.

Re: [Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c

2025-02-07 Thread Andrew Stubbs
On 07/02/2025 09:40, Tobias Burnus wrote: This patch is part of the following series (all unreviewed so far) but can be independently applied: * [Patch] [gcn] Fix gfx906's sramecc setting,   https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675251.html * "[gcn] Add gfx9-generic and gener

Re: [Patch, v2] [gcn] Add gfx9-generic and generic-associated gfx*

2025-02-07 Thread Andrew Stubbs
On 07/02/2025 10:37, Tobias Burnus wrote: Andrew Stubbs wrote: The attached patch now adds gfx9-generic - alongside the existing gfx{10-3,1}-generic and all gfx* that are enabled by those. What happened to the documentation patch with the "Experimental" markers? I'm still unc

Re: [Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c

2025-02-07 Thread Andrew Stubbs
On 07/02/2025 00:25, Tobias Burnus wrote: After spending some time with the debugger, I am now convinced that ROCm 6.3.2 does not yet support generic. The amd-staging branch at https://github.com/ROCm/ROCR-Runtime/ support does, albeit only after the tag rocm-6.3.2. However, the released ROCm 6.3

Re: [Patch] [gcn] Fix gfx906's sramecc setting

2025-02-07 Thread Andrew Stubbs
On 06/02/2025 22:09, Tobias Burnus wrote: ROCm 6.3.2 does not like my patch for reasons that I do not understand; https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675200.html Until that's sorted, I decided to split off two obvious fixes; I might suggest some further changes, but the full

Re: [Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c

2025-02-05 Thread Andrew Stubbs
On 05/02/2025 12:51, Tobias Burnus wrote: Hi Andrew, Andrew Stubbs wrote: On 05/02/2025 11:14, Tobias Burnus wrote: Therefore, the following GPUs are now supported in addition: gfx902, gfx904, gfx909, gfx1031, gfx1032, gfx1033, gfx1034, gfx1035, gfx1101, gfx1102, gfx1150, gfx1151, gfx1152

Re: [Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c

2025-02-05 Thread Andrew Stubbs
On 05/02/2025 11:14, Tobias Burnus wrote: The number of AMD GPUs is huge - and, unfortunately, every GPU device is potentially slightly different, requiring different code generation either in some dusty corner case or for standard code. As for several GPUs identical code can run (either all or

[committed][OG14] openmp: Fix error reporting in parsing of C++ OpenMP to/from clause

2024-12-06 Thread Andrew Stubbs
From: Kwok Cheung Yeung The final 'else' when checking the motion modifiers is nested one level too deep. This patch should be folded into "OpenMP: Enable 'declare mapper' mappers for 'target update' directives" when merging to mainline. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_cla

Re: GCN: Fix 'real_from_integer' usage

2024-12-06 Thread Andrew Stubbs
On 12/6/24 13:56, Thomas Schwinge wrote: Hi Andrew! On 2024-12-05T15:14:45+0100, I wrote: On 2020-01-31T11:20:14+, Andrew Stubbs wrote: This is one of those things I don't know why we didn't notice sooner. ..., and here's another thing I don't know why we didn&

Re: [PATCH v4 6/8] gcn: Add else operand to masked loads.

2024-11-20 Thread Andrew Stubbs
On 11/7/24 18:02, Andrew Stubbs wrote: On 07/11/2024 17:57, Robin Dapp wrote: From: Robin Dapp This patch adds an undefined else operand to the masked loads. gcc/ChangeLog: * config/gcn/predicates.md (maskload_else_operand): New predicate. * config/gcn/gcn-valu.md: Use new

Re: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally

2024-11-18 Thread Andrew Stubbs
On 11/18/24 13:23, Tobias Burnus wrote: This fixes a C23 error, causing a build fail: 'false' should have been 'NULL'. The NULL value is not really handled as the code calling maybe_init_omp_async assumes that agent->omp_async_queue can be dereferenced. Hence, besides fixing the false/NULL issu

Re: [Patch] libgomp/plugin/plugin-gcn.c: Show device number in ISA error

2024-11-11 Thread Andrew Stubbs
On 11/11/2024 09:42, Tobias Burnus wrote: Currently, for GCN, only one offload ISA is supported; this might lead to errors when multiple different AMD GPUs are installed on the same system, at least when using the "wrong" device/device number. In case of the testsuite, this occurs for instance w

Re: [PATCH] Add push/pop_function_decl

2024-11-08 Thread Andrew Stubbs
On 08/11/2024 12:25, Richard Sandiford wrote: For the aarch64 simd clones patches, it would be useful to be able to push a function declaration onto the cfun stack, even though it has no function body associated with it. That is, we want cfun to be null, current_function_decl to be the decl itse

Re: [PATCH v4 6/8] gcn: Add else operand to masked loads.

2024-11-07 Thread Andrew Stubbs
On 07/11/2024 17:57, Robin Dapp wrote: From: Robin Dapp This patch adds an undefined else operand to the masked loads. gcc/ChangeLog: * config/gcn/predicates.md (maskload_else_operand): New predicate. * config/gcn/gcn-valu.md: Use new predicate. --- gcc/config/gcn/gc

Re: [r15-4988 Regression] FAIL: gcc.dg/gomp/max_vf-1.c scan-tree-dump-times ompexp "__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \\(.*, 16, 0\\);" 1 on Linux/x86_64

2024-11-07 Thread Andrew Stubbs
On 07/11/2024 11:07, Jakub Jelinek wrote: On Thu, Nov 07, 2024 at 10:54:40AM +, Andrew Stubbs wrote: On 07/11/2024 00:37, haochen.jiang wrote: d334f729e53867b838e867375b3f475ba793d96e is the first bad commit commit d334f729e53867b838e867375b3f475ba793d96e Author: Andrew Stubbs Date: Wed

Re: [r15-4988 Regression] FAIL: gcc.dg/gomp/max_vf-1.c scan-tree-dump-times ompexp "__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \\(.*, 16, 0\\);" 1 on Linux/x86_64

2024-11-07 Thread Andrew Stubbs
On 07/11/2024 00:37, haochen.jiang wrote: On Linux/x86_64, d334f729e53867b838e867375b3f475ba793d96e is the first bad commit commit d334f729e53867b838e867375b3f475ba793d96e Author: Andrew Stubbs Date: Wed Nov 6 12:26:08 2024 + openmp: Add testcases for omp_max_vf caused FAIL

Re: [PATCH 1/4] openmp: Tune omp_max_vf for offload targets

2024-11-06 Thread Andrew Stubbs
On 06/11/2024 17:59, Jakub Jelinek wrote: On Wed, Nov 06, 2024 at 05:53:53PM +, Andrew Stubbs wrote: I'm not sure why I didn't see this. Was it bootstrap tested or just built without bootstrap + tested? Otherwise it is just a warning. Apparently I forgot to rerun the boots

Re: [PATCH 1/4] openmp: Tune omp_max_vf for offload targets

2024-11-06 Thread Andrew Stubbs
uses a bootstrap failure for me (and others) on x86_64-linux-gnu: I'm not sure why I didn't see this. I'm testing the attached patch. AndrewFrom 345eb9b795d9728733bd0e472529e259ce796ff6 Mon Sep 17 00:00:00 2001 From: Andrew Stubbs Date: Wed, 6 Nov 2024 17:50:00 + Subject: [PAT

Re: [PATCH 4/4] openmp: Add testcases for omp_max_vf

2024-11-06 Thread Andrew Stubbs
On 06/11/2024 15:41, Jakub Jelinek wrote: On Wed, Nov 06, 2024 at 03:27:22PM +, Andrew Stubbs wrote: Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when offloading is enabled ("target" directives are present), and is inactive otherwise. This requires en

[PATCH 0/4] openmp: Fix omp_max_vf in offload contexts

2024-11-06 Thread Andrew Stubbs
letely sure if the IFN is overkill. I'm aware that Prathamesh is also working on code in this area. His RFC patch doesn't work for my use-case, and seems to have other issues. This patch conflicts, but hopefully it's not unresolvable. OK for mainline? Andrew Andrew Stubbs (4): ope

[PATCH 1/4] openmp: Tune omp_max_vf for offload targets

2024-11-06 Thread Andrew Stubbs
If requested, return the vectorization factor appropriate for the offload device, if any. This change gives a significant speedup in the BabelStream "dot" benchmark on amdgcn. The omp_adjust_chunk_size usecase is set "false", for now, but I intend to change that in a follow-up patch. Note that N

[PATCH 3/4] openmp: Add IFN_GOMP_MAX_VF

2024-11-06 Thread Andrew Stubbs
Delay omp_max_vf call until after the host and device compilers have diverged so that the max_vf value can be tuned exactly right on both variants. This change means that the ompdevlow pass must be enabled for functions that use OpenMP directives with both "simd" and "schedule" enabled. gcc/Chang

[PATCH 4/4] openmp: Add testcases for omp_max_vf

2024-11-06 Thread Andrew Stubbs
Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when offloading is enabled ("target" directives are present), and is inactive otherwise. This requires enabling the offload-dump scanning features previously only used in the libgomp testsuite. The automake scheme used there

[PATCH 2/4] openmp: use offload max_vf for chunk_size

2024-11-06 Thread Andrew Stubbs
The chunk size for SIMD loops should be right for the current device; too big allocates too much memory, too small is inefficient. Getting it wrong doesn't actually break anything though. This patch attempts to choose the optimal setting based on the context. Both host-fallback and device will g

Re: [PATCH v3 6/8] gcn: Add else operand to masked loads.

2024-11-04 Thread Andrew Stubbs
On 02/11/2024 12:58, Robin Dapp wrote: From: Robin Dapp This patch adds an undefined else operand to the masked loads. gcc/ChangeLog: * config/gcn/predicates.md (maskload_else_operand): New predicate. * config/gcn/gcn-valu.md: Use new predicate. --- gcc/config/gcn/gc

Re: [PATCH v3] Remove sys/user time in -ftime-report

2024-10-31 Thread Andrew Stubbs
On 30/10/2024 16:06, Andi Kleen wrote: On Wed, Oct 23, 2024 at 02:56:51PM +0200, Richard Biener wrote: On Wed, Oct 9, 2024 at 6:18 PM Andi Kleen wrote: From: Andi Kleen Retrieving sys/user time in timevars is quite expensive because it always needs a system call. Only getting the wall time

Re: [PATCH v2 6/8] gcn: Add else operand to masked loads.

2024-10-29 Thread Andrew Stubbs
On 29/10/2024 09:39, Andrew Stubbs wrote: On 28/10/2024 20:03, Robin Dapp wrote: I'm not sure how this is different to just deleting the zero-initializer, which is what I already tested and found some random behaviour? The difference is in the else-operand predicate.  So unless there are

Re: [Patch] AMD GCN: Set HSA_XNACK for USM and 'xnack+' / 'xnack-'

2024-10-29 Thread Andrew Stubbs
On 29/10/2024 12:10, Tobias Burnus wrote: Hi Andrew, Am 29.10.24 um 13:07 schrieb Andrew Stubbs: On 29/10/2024 11:44, Tobias Burnus wrote: This somewhat matches what is done in OG13 and in Andrew's patch at https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655951.html albeit the co

Re: [Patch] AMD GCN: Set HSA_XNACK for USM and 'xnack+' / 'xnack-'

2024-10-29 Thread Andrew Stubbs
On 29/10/2024 11:44, Tobias Burnus wrote: While users can set HSA_XNACK themselves, it is much more convenient if the compiler sets it for them (at least if it is overriddable). Some systems don't have XNACK, but for those that have it, the somewhat newisher object code versions support three mo

Re: [PATCH v2 6/8] gcn: Add else operand to masked loads.

2024-10-29 Thread Andrew Stubbs
On 28/10/2024 20:03, Robin Dapp wrote: I'm not sure how this is different to just deleting the zero-initializer, which is what I already tested and found some random behaviour? The difference is in the else-operand predicate. So unless there are more bugs we should only have added VCOND_EXPRs

Re: [Patch][v2] GCN: Initial generic-target handling

2024-10-22 Thread Andrew Stubbs
On 22/10/2024 17:29, Tobias Burnus wrote: Andrew Stubbs wrote: I'm going to push the base patch shortly. … which happened in commit r15-4540-ga6b26e5ea09779. Updated patch attached. Some more testing showed that there was an issue with the builtin defines, which has been fixed and

Re: [PATCH v2 6/8] gcn: Add else operand to masked loads.

2024-10-22 Thread Andrew Stubbs
On 18/10/2024 15:22, Robin Dapp wrote: This patch adds an undefined else operand to the masked loads. @@ -4027,7 +4025,8 @@ (define_expand "mask_gather_load" (match_operand: 2 "register_operand") (match_operand 3 "immediate_operand") (match_operand:SI 4 "gcn_alu_operand") - (

Re: [Patch] GCN: Initial generic-target handling

2024-10-22 Thread Andrew Stubbs
On 22/10/2024 11:04, Tobias Burnus wrote: Hi Andrew, Andrew Stubbs wrote: On 21/10/2024 20:49, Tobias Burnus wrote: GCN_DEVICE field descriptions: -  0  "name"  (text, external) +  0 Generic flag/version (0 = non-generic, 1 to 255 = generic version, +    external)

Re: [Patch] GCN: Initial generic-target handling

2024-10-22 Thread Andrew Stubbs
On 21/10/2024 20:49, Tobias Burnus wrote: I have now attached a proper version of my patch, which is relative to your patch. OK once your patch is in? GCN_DEVICE field descriptions: - 0 "name" (text, external) + 0 Generic flag/version (0 = non-generic, 1 to 255 = generic ve

[PATCH] amdgcn: Refactor device settings into a def file

2024-10-21 Thread Andrew Stubbs
I'm going to commit this soon, but I'd appreciate if anybody could have a quick look and let me know if anything is obviously broken or doing things the hard way, or something. Thanks! Andrew -- Almost all device-specific settings are now centralised into gcn-devices.def for the

[committed] amdgcn: silence warning

2024-10-21 Thread Andrew Stubbs
FIRST_SGPR_REG is register zero so the compiler always claims this comparison is redundant. It's right, of course, but I'd have preferred to keep the comparison for completeness. Probably the "correct" solution is to use an enum for these values. gcc/ChangeLog: * config/gcn/gcn.h (SGPR_

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-11 Thread Andrew Stubbs
On 10/09/2024 10:43, Andrew Stubbs wrote: On 06/09/2024 09:47, Robin Dapp wrote: So we only found two instances of this problem and both were related to _Bools.  In case you have more cases, it would be greatly appreciated to verify the series with them.  If you don't mind, would

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-10 Thread Andrew Stubbs
On 06/09/2024 09:47, Robin Dapp wrote: So we only found two instances of this problem and both were related to _Bools. In case you have more cases, it would be greatly appreciated to verify the series with them. If you don't mind, would it be possible to comment out the zeroing, re-run the test

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-06 Thread Andrew Stubbs
On 06/09/2024 08:06, Robin Dapp wrote: There were absolutely problems without this. It's a while ago now, so I'm struggling with the details, but as GCC only applies the mask to selected operations there were all sorts of issues that crept in. Zeroing the undefined lanes seemed to match the middl

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-05 Thread Andrew Stubbs
On Thu, 5 Sept 2024, 21:10 Robin Dapp, wrote: > > > +(define_predicate "maskload_else_operand" > > > + (and (match_code "const_int,const_vector") > > > + (match_test "op == CONST0_RTX (GET_MODE (op))"))) > > > > This forces maskload and mask_gather_load to only accept zero here, but > > in

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-05 Thread Andrew Stubbs
(Sorry, I missed this because I was on vacation.) On 11/08/2024 22:00, Robin Dapp wrote: This patch adds a zero else operand to the masked loads. The patch is OK, but I have a question below. gcc/ChangeLog: * config/gcn/predicates.md (maskload_else_operand): New predicate.

[committed, wwwdocs] gcc-15: Fiji gfx803 device support removed

2024-09-02 Thread Andrew Stubbs
--- htdocs/gcc-15/changes.html | 7 +++ 1 file changed, 7 insertions(+) diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index edce138e..7c372688 100644 --- a/htdocs/gcc-15/changes.html +++ b/htdocs/gcc-15/changes.html @@ -123,6 +123,13 @@ a work-in-progress. +AMD Ra

[committed] amdgcn: remove gfx803 "Fiji" support

2024-09-02 Thread Andrew Stubbs
The gfx803 "Fiji" device was deprecated in GCC 14, removed from LLVM 18, and hasn't worked properly with the drivers since about ROCm 4. This patch removes the device from GCC options and documentation, and removes the direct mentions from the internals. The TARGET_GCN3 support in the back-end is

[committed] amdgcn: Remove TARGET_GCN5_PLUS

2024-09-02 Thread Andrew Stubbs
Now that GCN3 support is gone, TARGET_GCN5_PLUS always evaluates to true, so we can make that code unconditional, and remove all the "else" cases. The ISA features TARGET_GLOBAL_ADDRSPACE, TARGET_FLAT_OFFSETS, TARGET_EXPLICIT_CARRY, and TARGET_MULTIPLY_IMMEDIATE, are similarly also redundant and c

[committed] amdgcn: Remove TARGET_GCN3

2024-09-02 Thread Andrew Stubbs
The only GCN3 ISA device was remove (Fiji, gfx803) so all the GCN3-specific code and features can be removed from the back-end. gcc/ChangeLog: * config/gcn/gcn-opts.h (enum gcn_isa): Delete ISA_GCN3. (TARGET_GCN3): Delete. (TARGET_GCN3_PLUS): Delete. (TARGET_M0_LDS

Re: [patch][rfc] libgomp: Add OpenMP interop support to nvptx + gcn plugin

2024-08-27 Thread Andrew Stubbs
On 22/08/2024 19:26, Tobias Burnus wrote: This patch adds OpenMP's interop support to the libgomp plugins (nvptx: cuda, cuda_driver, hip; gcn: hip, hsa).* [The idea is that the user can ask OpenMP to return a foreign-runtime handle (CUdevice, hipCtx_t, …) for to a specified OpenMP device numbe

Re: [commit] amdgcn: Re-enable trampolines

2024-08-09 Thread Andrew Stubbs
On 09/08/2024 07:53, Thomas Schwinge wrote: Hi Andrew! On 2024-08-08T13:50:17+, Andrew Stubbs wrote: Previously, trampolines worked on GCN3 devices, but the newer GCN5 devices had different permissions on the stack memory space we were using. That changed when we added the reverse

[commit] amdgcn: Add padding to trampoline

2024-08-09 Thread Andrew Stubbs
This avoids a -Wpadded warning (testcase gcc.dg/20050607-1.c). gcc/ChangeLog: * config/gcn/gcn.cc (gcn_asm_trampoline_template): Add .align. * config/gcn/gcn.h (TRAMPOLINE_SIZE): Increase to 40. --- gcc/config/gcn/gcn.cc | 1 + gcc/config/gcn/gcn.h | 2 +- 2 files changed, 2 ins

[committed] amdgcn: Fix VGPR max count

2024-08-08 Thread Andrew Stubbs
The metadata for RDNA3 kernels allocates VGPRs in blocks of 12, which means the maximum usable number of registers is 252. This patch prevents the compiler from exceeding this artifical limit. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_conditional_register_usage): Fix registers rema

[commit] amdgcn: Re-enable trampolines

2024-08-08 Thread Andrew Stubbs
Previously, trampolines worked on GCN3 devices, but the newer GCN5 devices had different permissions on the stack memory space we were using. That changed when we added the reverse-offload features because we switched from using the "private" memory space to using a regular memory allocation. The

Re: [Patch] install.texi (gcn): Suggest newer commit for Newlib

2024-07-23 Thread Andrew Stubbs
On 23/07/2024 11:05, Tobias Burnus wrote: Hi Andrew, hi all, to be compatible with C++ (and Thomas' WIP work for GCN C++ support), I suggest the attach patch that also suggest Thomas' Newlib commit (April 4, 2024) ed50a50b9   amdgcn: Implement proper locks: Fix 'newlib/libc/sys/amdgcn/inclu

Re: GCN: Honor OpenMP 5.1 'num_teams' lower bound

2024-07-15 Thread Andrew Stubbs
On 15/07/2024 16:36, Thomas Schwinge wrote: Hi! On 2024-07-15T12:16:30+0100, Andrew Stubbs wrote: On 15/07/2024 10:29, Thomas Schwinge wrote: On 2021-11-12T18:58:04+0100, Jakub Jelinek via Gcc-patches wrote: And finally here is a third version, [...] ... which became commit

Re: GCN: Honor OpenMP 5.1 'num_teams' lower bound

2024-07-15 Thread Andrew Stubbs
On 15/07/2024 10:29, Thomas Schwinge wrote: Hi! On 2021-11-12T18:58:04+0100, Jakub Jelinek via Gcc-patches wrote: And finally here is a third version, [...] ... which became commit 9fa72756d90e0d9edadf6e6f5f56476029925788 "libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound". Attached h

[committed] amdgcn: invent target feature flags

2024-07-02 Thread Andrew Stubbs
This is a first step towards having a device table so we can add new devices more easily. It'll also make it easier to remove the deprecated GCN3 bits. The patch should not change the behaviour of anything. gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_GLOBAL_ADDRSPACE): New. (

[PATCH v2 8/8] libgomp: Map omp_default_mem_space to USM

2024-06-28 Thread Andrew Stubbs
When unified shared memory is required, the default memory space should also be unified. libgomp/ChangeLog: * config/linux/allocator.c (linux_memspace_alloc): Check omp_requires_mask. (linux_memspace_calloc): Likewise. (linux_memspace_free): Likewise. (linu

[PATCH v2 6/8] amdgcn: libgomp plugin USM implementation

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs Implement the Unified Shared Memory API calls in the GCN plugin. The AMD equivalent of "Managed Memory" means registering previously allocated host memory as "coarse-grained" (whereas allocating coarse-grained memory via hsa_allocate_memory allocate

[PATCH v2 7/8] openmp, libgomp: Handle unified shared memory in omp_target_is_accessible

2024-06-28 Thread Andrew Stubbs
From: Marcel Vollweiler This patch handles Unified Shared Memory (USM) in the OpenMP runtime routine omp_target_is_accessible. libgomp/ChangeLog: * target.c (omp_target_is_accessible): Handle unified shared memory. * testsuite/libgomp.c-c++-common/target-is-accessible-1.c: Updat

[PATCH v2 5/8] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs The AMD GCN runtime must be set to the correct mode for Unified Shared Memory to work, but this is not always clear at compile and link time due to the split nature of the offload compilation pipeline. This patch sets a new attribute on OpenMP offload functions to ensure

[PATCH v2 4/8] openmp: Use libgomp memory allocation functions with unified shared memory.

2024-06-28 Thread Andrew Stubbs
++.dg/gomp/usm-5.C: New test. * gfortran.dg/gomp/usm-2.f90: New test. * gfortran.dg/gomp/usm-3.f90: New test. co-authored-by: Andrew Stubbs --- gcc/omp-low.cc| 184 ++ gcc/passes.def| 1 + gcc/tes

[PATCH v2 2/8] openmp, nvptx: ompx_gnu_unified_shared_mem_alloc

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs This adds support for using Cuda Managed Memory with omp_alloc. It will be used as the underpinnings for "requires unified_shared_memory" in a later patch. There are two new predefined allocators, ompx_gnu_unified_shared_mem_alloc and ompx_gnu_host_mem_a

[PATCH v2 3/8] openmp: Enable -foffload-memory=unified

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs Ensure that "requires unified_shared_memory" plays nicely with the -foffload-memory options, and that enabling the option has the same effect as enabling USM in the code. Also adds some testcases. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_ta

[PATCH v2 0/8] OpenMP: Unified Shared Memory via Managed Memory

2024-06-28 Thread Andrew Stubbs
approve the amdgcn patches myself, but comments are welcome. OK for mainline? (Once the pinned memory dependencies are committed.) Thanks Andrew P.S. This series includes contributions from (at least) Thomas Schwinge, Marcel Vollweiler, Kwok Cheung Yeung, and Abid Qadeer.

[PATCH v2 1/8] libgomp: Disentangle shared memory from managed

2024-06-28 Thread Andrew Stubbs
Some GPU compute systems allow the GPU to access host memory without much prior setup, but that's not necessarily the fast way to do it. For shared memory APUs this is almost certainly the correct choice, but for AMD there is the difference between "fine-grained" and "coarse-grained" memory, and f

Re: [Patch, v2] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-06-21 Thread Andrew Stubbs
On 21/06/2024 16:30, Tobias Burnus wrote: [I messed up copying from the build system, picking up an old version. Changes to v1 (bottom of the diff): fopen is no longer required.] Tobias Burnus wrote: mkoffload's generated .c file looks much nicer with '#embed'. This patch depends on Jakub's #e

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-17 Thread Andrew Stubbs
On 14/06/2024 11:31, Richard Biener wrote: The following retires vcond{,u,eq} optabs by stopping to use them from the middle-end. Targets instead (should) implement vcond_mask and vec_cmp{,u,eq} optabs. The PR this change refers to lists possibly affected targets - those implementing these patt

[PATCH v5 6/6] libgomp: fine-grained pinned memory allocator

2024-06-12 Thread Andrew Stubbs
This patch introduces a new custom memory allocator for use with pinned memory (in the case where the Cuda allocator isn't available). In future, this allocator will also be used for Unified Shared Memory. Both memories are incompatible with the system malloc because allocated memory cannot share

[PATCH v5 4/6] openmp: -foffload-memory=pinned

2024-06-12 Thread Andrew Stubbs
Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls mloc

[PATCH v5 5/6] libgomp, nvptx: Cuda pinned memory

2024-06-12 Thread Andrew Stubbs
This patch was already approved, in the v3 posting by Tobias Burnus (with one caveat about initialization location), but wasn't committed at that time as I didn't want to disentangle it from the textual dependencies on the other patches in the series. -- Use Cuda to pin memory, instead of Lin

[PATCH v5 2/6] libgomp, openmp: Add ompx_gnu_pinned_mem_alloc

2024-06-12 Thread Andrew Stubbs
Compared to the previous v4 (1/5) posting of this patch: - The enumeration of the ompx allocators have been moved (again) to 200 (as 100 is already in use by another toolchain vendor and this seems like a possible source of confusion). - The "ompx" has also been changed to "ompx_gnu" to highlig

[PATCH v5 3/6] openmp: Add -foffload-memory

2024-06-12 Thread Andrew Stubbs
Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16 +++

[PATCH v5 0/6] libgomp: OpenMP pinned memory for omp_alloc

2024-06-12 Thread Andrew Stubbs
the new testcases included in the rest of the series. Otherwise, I've address comments regarding the enum values, naming, and implemented previously missed cases in the environment variables and parsers. OK for mainline? Andrew Andrew Stubbs (6): libgomp: change alloc-pinned tests failure m

[PATCH v5 1/6] libgomp: change alloc-pinned tests failure mode

2024-06-12 Thread Andrew Stubbs
The feature doesn't work on non-Linux hosts, at present, so skip the tests entirely. On Linux systems that have insufficient lockable memory configured we still need to fail or else the feature won't be getting tested when we think it is, but now there's a message to explain why. libgomp/ChangeLo

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-04 Thread Andrew Stubbs
On 03/06/2024 21:40, Tobias Burnus wrote: Andrew Stubbs wrote: On 03/06/2024 17:46, Tobias Burnus wrote: Andrew Stubbs wrote: +    /* If USM has been requested and is supported by all devices +   of this type, set the capability accordingly. */ +    if (omp_requires_mask

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Andrew Stubbs
On 03/06/2024 17:46, Tobias Burnus wrote: Andrew Stubbs wrote: +    /* If USM has been requested and is supported by all devices +   of this type, set the capability accordingly.  */ +    if (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_ME

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Andrew Stubbs
On 28/05/2024 23:33, Tobias Burnus wrote: While most of the nvptx systems I have access to don't have the support for CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES, one has: Tesla V100-SXM2-16GB (as installed, e.g., on ORNL's Summit) does support this feature. And with that

Re: [PATCH 17/52] gcn: Remove macros {FLOAT, DOUBLE, LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Andrew Stubbs
On 03/06/2024 04:01, Kewen Lin wrote: This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE defines in gcn port. gcc/ChangeLog: * config/gcn/gcn.h (FLOAT_TYPE_SIZE): Remove. (DOUBLE_TYPE_SIZE): Likewise. (LONG_DOUBLE_TYPE_SIZE): Likewise. Assuming that this does n

Re: [patch] libgomp: Enable USM for AMD APUs and MI200 devices

2024-05-31 Thread Andrew Stubbs
On 29/05/2024 13:15, Tobias Burnus wrote: This patch depends (on the libgomp/target.c parts) of the patch "[patch] libgomp: Enable USM for some nvptx devices", https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652987.html AMD GPUs that are either APU devices or MI200 [or MI300X] (with HSA_XNACK

[PATCH v4 5/5] libgomp: fine-grained pinned memory allocator

2024-05-31 Thread Andrew Stubbs
This patch was already approved, by Tobias Burnus, in the v3 posting, but I've not yet committed it because there are some textual dependecies on the yet-to-be-approved patches. - This patch introduces a new custom memory allocator for use with pinned memory (in the case where

[PATCH v4 2/5] openmp: Add -foffload-memory

2024-05-31 Thread Andrew Stubbs
Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16 +++

[PATCH v4 4/5] libgomp, nvptx: Cuda pinned memory

2024-05-31 Thread Andrew Stubbs
From: Thomas Schwinge This patch was already approved, by Tobias Burnus (with one caveat about initialization location), but wasn't committed at that time as I didn't want to disentangle it from the textual dependencies on the other patches in the series. Use Cuda to pin me

  1   2   3   4   5   6   7   8   9   10   >