Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-04 Thread Andrew Stubbs
On 03/06/2024 21:40, Tobias Burnus wrote: Andrew Stubbs wrote: On 03/06/2024 17:46, Tobias Burnus wrote: Andrew Stubbs wrote: +    /* If USM has been requested and is supported by all devices +   of this type, set the capability accordingly. */ +    if (omp_requires_mask

[commit] amdgcn: Re-enable trampolines

2024-08-08 Thread Andrew Stubbs
Previously, trampolines worked on GCN3 devices, but the newer GCN5 devices had different permissions on the stack memory space we were using. That changed when we added the reverse-offload features because we switched from using the "private" memory space to using a regular memory allocation. The

[committed] amdgcn: Fix VGPR max count

2024-08-08 Thread Andrew Stubbs
The metadata for RDNA3 kernels allocates VGPRs in blocks of 12, which means the maximum usable number of registers is 252. This patch prevents the compiler from exceeding this artifical limit. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_conditional_register_usage): Fix registers rema

[commit] amdgcn: Add padding to trampoline

2024-08-09 Thread Andrew Stubbs
This avoids a -Wpadded warning (testcase gcc.dg/20050607-1.c). gcc/ChangeLog: * config/gcn/gcn.cc (gcn_asm_trampoline_template): Add .align. * config/gcn/gcn.h (TRAMPOLINE_SIZE): Increase to 40. --- gcc/config/gcn/gcn.cc | 1 + gcc/config/gcn/gcn.h | 2 +- 2 files changed, 2 ins

Re: [commit] amdgcn: Re-enable trampolines

2024-08-09 Thread Andrew Stubbs
On 09/08/2024 07:53, Thomas Schwinge wrote: Hi Andrew! On 2024-08-08T13:50:17+, Andrew Stubbs wrote: Previously, trampolines worked on GCN3 devices, but the newer GCN5 devices had different permissions on the stack memory space we were using. That changed when we added the reverse

Re: [patch][rfc] libgomp: Add OpenMP interop support to nvptx + gcn plugin

2024-08-27 Thread Andrew Stubbs
On 22/08/2024 19:26, Tobias Burnus wrote: This patch adds OpenMP's interop support to the libgomp plugins (nvptx: cuda, cuda_driver, hip; gcn: hip, hsa).* [The idea is that the user can ask OpenMP to return a foreign-runtime handle (CUdevice, hipCtx_t, …) for to a specified OpenMP device numbe

[PATCH v5 1/6] libgomp: change alloc-pinned tests failure mode

2024-06-12 Thread Andrew Stubbs
The feature doesn't work on non-Linux hosts, at present, so skip the tests entirely. On Linux systems that have insufficient lockable memory configured we still need to fail or else the feature won't be getting tested when we think it is, but now there's a message to explain why. libgomp/ChangeLo

[PATCH v5 0/6] libgomp: OpenMP pinned memory for omp_alloc

2024-06-12 Thread Andrew Stubbs
the new testcases included in the rest of the series. Otherwise, I've address comments regarding the enum values, naming, and implemented previously missed cases in the environment variables and parsers. OK for mainline? Andrew Andrew Stubbs (6): libgomp: change alloc-pinned tests failure m

[PATCH v5 3/6] openmp: Add -foffload-memory

2024-06-12 Thread Andrew Stubbs
Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16 +++

[PATCH v5 2/6] libgomp, openmp: Add ompx_gnu_pinned_mem_alloc

2024-06-12 Thread Andrew Stubbs
Compared to the previous v4 (1/5) posting of this patch: - The enumeration of the ompx allocators have been moved (again) to 200 (as 100 is already in use by another toolchain vendor and this seems like a possible source of confusion). - The "ompx" has also been changed to "ompx_gnu" to highlig

[PATCH v5 5/6] libgomp, nvptx: Cuda pinned memory

2024-06-12 Thread Andrew Stubbs
This patch was already approved, in the v3 posting by Tobias Burnus (with one caveat about initialization location), but wasn't committed at that time as I didn't want to disentangle it from the textual dependencies on the other patches in the series. -- Use Cuda to pin memory, instead of Lin

[PATCH v5 4/6] openmp: -foffload-memory=pinned

2024-06-12 Thread Andrew Stubbs
Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls mloc

[PATCH v5 6/6] libgomp: fine-grained pinned memory allocator

2024-06-12 Thread Andrew Stubbs
This patch introduces a new custom memory allocator for use with pinned memory (in the case where the Cuda allocator isn't available). In future, this allocator will also be used for Unified Shared Memory. Both memories are incompatible with the system malloc because allocated memory cannot share

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-17 Thread Andrew Stubbs
On 14/06/2024 11:31, Richard Biener wrote: The following retires vcond{,u,eq} optabs by stopping to use them from the middle-end. Targets instead (should) implement vcond_mask and vec_cmp{,u,eq} optabs. The PR this change refers to lists possibly affected targets - those implementing these patt

Re: [Patch, v2] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-06-21 Thread Andrew Stubbs
On 21/06/2024 16:30, Tobias Burnus wrote: [I messed up copying from the build system, picking up an old version. Changes to v1 (bottom of the diff): fopen is no longer required.] Tobias Burnus wrote: mkoffload's generated .c file looks much nicer with '#embed'. This patch depends on Jakub's #e

Re: [Patch] install.texi (gcn): Suggest newer commit for Newlib

2024-07-23 Thread Andrew Stubbs
On 23/07/2024 11:05, Tobias Burnus wrote: Hi Andrew, hi all, to be compatible with C++ (and Thomas' WIP work for GCN C++ support), I suggest the attach patch that also suggest Thomas' Newlib commit (April 4, 2024) ed50a50b9   amdgcn: Implement proper locks: Fix 'newlib/libc/sys/amdgcn/inclu

Re: [wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx1036 support

2024-04-15 Thread Andrew Stubbs
On 15/04/2024 11:03, Tobias Burnus wrote: I experimented with some variants to make clearer that each of RDNA2 and RNDA3 applies to two card types, but at the end I settled on the fewest-word version. Comments, remarks, suggestions? (To this change or in general?) Current version: https://gcc

Re: [wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx1036 support

2024-04-15 Thread Andrew Stubbs
On 15/04/2024 13:00, Richard Biener wrote: On Mon, Apr 15, 2024 at 12:04 PM Tobias Burnus wrote: I experimented with some variants to make clearer that each of RDNA2 and RNDA3 applies to two card types, but at the end I settled on the fewest-word version. Comments, remarks, suggestions? (To t

Re: GCN: Enable effective-target 'vect_long_long'

2024-04-17 Thread Andrew Stubbs
On 16/04/2024 20:01, Thomas Schwinge wrote: Hi! OK to push the attached "GCN: Enable effective-target 'vect_long_long'"? (Or is that not what you'd expect to see for GCN? I haven't checked the actual back end code...) I think if there are still missing int64 vector operations then they're ex

Re: [patch] [gcn][nvptx] Add warning to mkoffload for 32bit host code

2024-04-25 Thread Andrew Stubbs
On 25/04/2024 11:51, Tobias Burnus wrote: Motivated by a surprise of a colleague that with -m32, no offload dumps were created; that's because mkoffload does not process host binaries when the are 32bit (i.e. ilp32). Internally, that done as follows: The host compiler passes to 'mkoffload' the u

Re: [PATCH] amdgcn: Add gfx90c target

2024-04-26 Thread Andrew Stubbs
On 25/04/2024 19:37, Frederik Harwath wrote: Hi Andrew, this patch adds support for gfx90c GCN5 APU integrated graphics devices. The LLVM AMDGPU documentation (https://llvm.org/docs/AMDGPUUsage.html) lists those devices as unsupported by rocm-amdhsa. As we have discussed elsewhere, I have tested

[wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx90c support

2024-04-26 Thread Andrew Stubbs
I will push this shortly. I think the gfx90c patch just made the cut for the GCC-14 branch! Andrew --- htdocs/gcc-14/changes.html | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index fce0fb44..47fef32d 100644 --

[wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx90c support

2024-04-26 Thread Andrew Stubbs
I will push this shortly. I think the gfx90c patch just made the cut for the GCC-14 branch! Andrew --- htdocs/gcc-14/changes.html | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index fce0fb44..47fef32d 100644 --

[committed] amdgcn: Remove TARGET_GCN3

2024-09-02 Thread Andrew Stubbs
The only GCN3 ISA device was remove (Fiji, gfx803) so all the GCN3-specific code and features can be removed from the back-end. gcc/ChangeLog: * config/gcn/gcn-opts.h (enum gcn_isa): Delete ISA_GCN3. (TARGET_GCN3): Delete. (TARGET_GCN3_PLUS): Delete. (TARGET_M0_LDS

[committed] amdgcn: Remove TARGET_GCN5_PLUS

2024-09-02 Thread Andrew Stubbs
Now that GCN3 support is gone, TARGET_GCN5_PLUS always evaluates to true, so we can make that code unconditional, and remove all the "else" cases. The ISA features TARGET_GLOBAL_ADDRSPACE, TARGET_FLAT_OFFSETS, TARGET_EXPLICIT_CARRY, and TARGET_MULTIPLY_IMMEDIATE, are similarly also redundant and c

[committed] amdgcn: remove gfx803 "Fiji" support

2024-09-02 Thread Andrew Stubbs
The gfx803 "Fiji" device was deprecated in GCC 14, removed from LLVM 18, and hasn't worked properly with the drivers since about ROCm 4. This patch removes the device from GCC options and documentation, and removes the direct mentions from the internals. The TARGET_GCN3 support in the back-end is

[committed, wwwdocs] gcc-15: Fiji gfx803 device support removed

2024-09-02 Thread Andrew Stubbs
--- htdocs/gcc-15/changes.html | 7 +++ 1 file changed, 7 insertions(+) diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index edce138e..7c372688 100644 --- a/htdocs/gcc-15/changes.html +++ b/htdocs/gcc-15/changes.html @@ -123,6 +123,13 @@ a work-in-progress. +AMD Ra

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-05 Thread Andrew Stubbs
(Sorry, I missed this because I was on vacation.) On 11/08/2024 22:00, Robin Dapp wrote: This patch adds a zero else operand to the masked loads. The patch is OK, but I have a question below. gcc/ChangeLog: * config/gcn/predicates.md (maskload_else_operand): New predicate.

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-05 Thread Andrew Stubbs
On Thu, 5 Sept 2024, 21:10 Robin Dapp, wrote: > > > +(define_predicate "maskload_else_operand" > > > + (and (match_code "const_int,const_vector") > > > + (match_test "op == CONST0_RTX (GET_MODE (op))"))) > > > > This forces maskload and mask_gather_load to only accept zero here, but > > in

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-06 Thread Andrew Stubbs
On 06/09/2024 08:06, Robin Dapp wrote: There were absolutely problems without this. It's a while ago now, so I'm struggling with the details, but as GCC only applies the mask to selected operations there were all sorts of issues that crept in. Zeroing the undefined lanes seemed to match the middl

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-10 Thread Andrew Stubbs
On 06/09/2024 09:47, Robin Dapp wrote: So we only found two instances of this problem and both were related to _Bools. In case you have more cases, it would be greatly appreciated to verify the series with them. If you don't mind, would it be possible to comment out the zeroing, re-run the test

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-11 Thread Andrew Stubbs
On 10/09/2024 10:43, Andrew Stubbs wrote: On 06/09/2024 09:47, Robin Dapp wrote: So we only found two instances of this problem and both were related to _Bools.  In case you have more cases, it would be greatly appreciated to verify the series with them.  If you don't mind, would

Re: [PATCH] Generate fused widening multiply-and-accumulate operations only when the widening multiply has single use

2013-10-22 Thread Andrew Stubbs
On 21/10/13 23:01, Yufeng Zhang wrote: Hi, This patch changes the widening_mul pass to fuse the widening multiply with accumulate only when the multiply has single use. The widening_mul pass currently does the conversion regardless of the number of the uses, which can cause poor code-gen in cas

[PATCH] dwarf: Multi-register CFI address support

2020-08-28 Thread Andrew Stubbs
Hi all, This patch introduces DWARF CFI support for architectures that require multiple registers to hold pointers, such as the stack pointer, frame pointer, and return address. The motivating case is the AMD GCN architecture which has 64-bit address pointers, but 32-bit registers. The curre

Re: [PATCH 1/3] vec: add exact argument for various grow functions.

2020-08-28 Thread Andrew Stubbs
On 11/08/2020 12:36, Martin Liška wrote: Hello. All right, I did it in 3 steps: 1) - new exact argument is added (no default value) - I tested the on x86_64-linux-gnu and I build all cross targets. 2) set default value of exact = false 3) change places which calculate its own growth to use the

Re: [PATCH 1/3] vec: add exact argument for various grow functions.

2020-08-28 Thread Andrew Stubbs
On 28/08/2020 15:26, Martin Liška wrote: On 8/28/20 4:24 PM, Andrew Stubbs wrote: Should I just add "true" to fix it? That's enough to make it build. Yes, please and push it as obvious. I've committed the attached. Andrew amdgcn: Update vec_safe_grow_cleared usage An

Re: [PATCH] dwarf: Multi-register CFI address support

2020-09-02 Thread Andrew Stubbs
On 02/09/2020 18:49, Tom Tromey wrote: "Andrew" == Andrew Stubbs writes: Andrew> To be fair, the DWARF standard makes a similar assumption; the Andrew> engineers working on LLVM and GDB, at AMD, have therefore invented Andrew> some new DWARF operators that they plan to p

Re: [PATCH] dwarf: Multi-register CFI address support

2020-09-03 Thread Andrew Stubbs
On 28/08/2020 13:04, Andrew Stubbs wrote: Hi all, This patch introduces DWARF CFI support for architectures that require multiple registers to hold pointers, such as the stack pointer, frame pointer, and return address. The motivating case is the AMD GCN architecture which has 64-bit address

[PATCH] testsuite: gimplefe-44 requires exceptions

2020-09-10 Thread Andrew Stubbs
This patch prevents an ICE (segmentation fault) the occurs for amdgcn because the test is trying to use -fexceptions which is unsupported on the target. Arguably it should fail more gracefully, but either way the test is inappropriate. OK to commit? Andrew testsuite: gimplefe-44 requires ex

[committed] amdgcn: align TImode registers

2020-09-11 Thread Andrew Stubbs
This patch fixes an execution failure in which the compiler would corrupt TImode values due to missed early clobber problems with partially overlapping register allocations. In fact, adding early clobber constraints does not fix the issue because IRA doesn't check that for move instructions pr

Re: [committed] amdgcn: align TImode registers

2020-09-11 Thread Andrew Stubbs
This is now backported to GCC 10. Andrew On 11/09/2020 11:17, Andrew Stubbs wrote: This patch fixes an execution failure in which the compiler would corrupt TImode values due to missed early clobber problems with partially overlapping register allocations.  In fact, adding early clobber

Re: [PATCH] testsuite: gimplefe-44 requires exceptions

2020-09-11 Thread Andrew Stubbs
This is now committed and backported to GCC 10. Andrew On 10/09/2020 15:03, Andrew Stubbs wrote: This patch prevents an ICE (segmentation fault) the occurs for amdgcn because the test is trying to use -fexceptions which is unsupported on the target. Arguably it should fail more gracefully

Re: [RFC][nvptx, libgomp] Add 128-bit atomic support

2020-09-11 Thread Andrew Stubbs
On 11/09/2020 15:25, Tom de Vries wrote: --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c @@ -1,4 +1,5 @@ /*·{·dg-do·run·}·*/ +/*·{·dg-additional-options·"-foffload=-latomic"·}·*/ This will probably break amdgcn, where lib

Re: [OG10] Merge GCC 10 into branch; merge some mainline nvptx patches

2020-09-11 Thread Andrew Stubbs
On 11/09/2020 13:02, Tobias Burnus wrote: OG10 = devel/omp/gcc-10 I have merged releases/gcc-10 into that branch. And added a bunch of mainline alias GCC 11 nvptx patches to that branch. 2df8e0f1bc4 [libatomic] Add nvptx support 5544bca37bc [nvptx] Fix UB in nvptx_assemble_value 7e10b6b0b34 [nv

[committed] amdgcn: Align VGPR pairs

2020-02-21 Thread Andrew Stubbs
the architecture, but doing so allows us to remove the requirement for bug-prone early-clobber constraints from many split patterns (and avoid adding more in future). 2020-02-20 Andrew Stubbs gcc/ * config/gcn/gcn.c (gcn_hard_regno_mode_ok): Align VGPR pairs. * config/gcn/gcn-v

[committed] amdgcn: fix mode in vec_series

2020-02-21 Thread Andrew Stubbs
This patch fixes any obvious typo in the definition of vec_seriesv64di. It's never worked, so the fact it's taken this long for me to notice shows how little the middle-end takes advantage of this pattern. :-( Andrew amdgcn: fix mode in vec_series 2020-02-20 Andrew Stub

[committed] amdgcn: Use correct offset mode for gather/scatter

2020-02-21 Thread Andrew Stubbs
mode for gather/scatter The scatter/gather pattern names changed for GCC 10, but I hadn't noticed. This switches the patterns to the new offset mode scheme. 2020-02-21 Andrew Stubbs gcc/ * config/gcn/gcn-valu.md (gather_load): Rename to ... (gather_loadv64si): ... this and set operand

Re: patch to fix PR93564

2020-02-26 Thread Andrew Stubbs
On 23/02/2020 21:25, Vladimir Makarov wrote: The following patch is for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93564 The patch was successfully bootstrapped on x86-64 and benchmarked on SPEC2000. Since this patch I get an ICE with checking enabled, for amdgcn-amdhsa: during RTL pa

[committed] amdgcn: fix ICE on subreg of BI reg

2020-02-27 Thread Andrew Stubbs
both LRA and DF calculate nregs the same, and ICE goes away. As soon as LRA is done the subregs all evaporate anyway. 2020-02-27 Andrew Stubbs gcc/ * config/gcn/gcn.md (mov): Add transformations for BI subregs. diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index b527d9a7a8b.

Re: patch to fix PR93564

2020-02-27 Thread Andrew Stubbs
On 26/02/2020 15:16, Andrew Stubbs wrote: The problem appears to be that the high-part of a register pair is not marked as "ever live".  I'm trying to figure out whether this is some kind of target-specific issue that has merely been exposed, but it's difficult to see

[committed] amdgcn: sub-dword vector min/max/shift/bit operators

2020-02-27 Thread Andrew Stubbs
e representation of the stores to allow combining truncates. Andrew amdgcn: sub-dword vector min/max/shift/bit operators 2020-02-27 Andrew Stubbs gcc/ * config/gcn/gcn-valu.md (VEC_SUBDWORD_MODE): New mode iterator. (2): Change modes to VEC_ALL1REG_INT_MODE. (3): Likewise. (3): New.

[committed] amdgcn: Extend reductions to all types

2020-03-02 Thread Andrew Stubbs
irect hardware support for these, so we use regular vector instructions and separate lane shift instructions. Also add support for V64QI and V64HI reductions. Some of these require additional extends and truncates, because AMD GCN has 32-bit vector lanes. 2020-03-02 Andrew Stubbs gcc/ * confi

[committed] amdgcn: Add cond_add/sub/and/ior/xor for all vector modes

2020-03-18 Thread Andrew Stubbs
require extends and truncates for some modes. Andrew amdgcn: Add cond_add/sub/and/ior/xor for all vector modes 2020-03-18 Andrew Stubbs gcc/ * config/gcn/gcn-valu.md (COND_MODE): Delete. (COND_INT_MODE): Delete. (cond_op): Add "mult". (cond_): Use VEC_ALLREG_MODE. (c

[committed] amdgcn: Fix vector compare modes

2020-03-18 Thread Andrew Stubbs
setting STORE_REGISTER_VALUE to -1, meaning that all the bits are significant. (It would be better if we could set STORE_REGISTER_VALUE according to the known mask or vector size, but we can't.) 2020-03-18 Andrew Stubbs gcc/ * config/gcn/gcn-valu.md (vec_cmpdi): Set operand 1 to D

Re: [Patch, comitted] libgomp/testsuite: ignore blank-line output for function-not-offloaded.c

2020-03-19 Thread Andrew Stubbs
On 19/03/2020 11:47, Tobias Burnus wrote: This error only appears for C++ as the reason seems to be that there are two unresolved symbols: "foo" and "__gxx_personality_v0". Those error messages are separated by an empty line. The blank lines are a feature of using the LLVM linker. GNU bintils

Re: [Patch]+[RFC] AMDGCN offloading – use amdgcn-amdhsa vs. amdgcn-unknown-amdhsa

2020-03-23 Thread Andrew Stubbs
On 20/03/2020 21:08, Tobias Burnus wrote: Dear all, normally, the target triplet does not play much of a role as it is not really exposed to the user. However, for offloading, it appears often: * In distribution use, offloading support is compiled in, but   not enabled by default; one needs to

[committed] amdgcn: adjust testsuite

2020-03-25 Thread Andrew Stubbs
stsuite: adjustments for amdgcn 2020-03-25 Andrew Stubbs gcc/testsuite/ * gcc.dg/vect/bb-slp-pr69907.c: Disable the dump scan for amdgcn. * lib/target-supports.exp (check_effective_target_vect_unpack): Add amdgcn. diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c b/gcc/testsuite/gcc.dg

Re: [committed] amdgcn: Add fold_left_plus vector reductions

2020-07-07 Thread Andrew Stubbs
On 07/07/2020 12:03, Richard Sandiford wrote: Andrew Stubbs writes: This patch implements a floating-point fold_left_plus vector pattern, which gives a significant speed-up in the BabelStream "dot" benchmark. The GCN architecture can't actually do an in-order vector red

[committed, OG10] amdgcn: Tune default OpenMP/OpenACC GPU utilization

2020-07-15 Thread Andrew Stubbs
-gcn.c (parse_target_attributes): Automatically set the number of teams and threads if necessary. (gcn_exec): Automatically set the number of gangs and workers if necessary. Co-Authored-By: Andrew Stubbs diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp index 9b9e1981f9a..e93424072e6 100644

[committed] amdgcn: Handle early debug info in mkoffload

2020-07-16 Thread Andrew Stubbs
This patch adds debug support to mkoffload, similar to what happens in lto-wrapper. Unlike lto-wrapper, we must deal with mismatched architectures and mismatched program scope. These are fixed up using manual ELF patching because there's no useful support in simple_object (yet). Should this be

[committed,OG10] amdgcn: Support basic DWARF

2020-07-16 Thread Andrew Stubbs
On 29/06/2020 12:05, Andrew Stubbs wrote: This patch configures the DWARF debug output to match the proposed DWARF specification from AMD.  This is already implemented in LLVM and rocgdb (out of tree). This makes no attempt to support CFI, yet, and has some issues with vector registers. GCC

[committed,OG10] amdgcn: Handle early debug info in mkoffload

2020-07-16 Thread Andrew Stubbs
On 16/07/2020 16:06, Andrew Stubbs wrote: This patch adds debug support to mkoffload, similar to what happens in lto-wrapper. Unlike lto-wrapper, we must deal with mismatched architectures and mismatched program scope. These are fixed up using manual ELF patching because there's no u

Re: [committed] amdgcn: Handle early debug info in mkoffload

2020-07-17 Thread Andrew Stubbs
On 17/07/2020 07:20, Thomas Schwinge wrote: --- a/gcc/config/gcn/mkoffload.c +++ b/gcc/config/gcn/mkoffload.c @@ -33,31 +33,53 @@ #include #include "collect-utils.h" #include "gomp-constants.h" +#include "simple-object.h" +#include "elf.h" + +/* These probably won't be in elf.h for a while

Re: [committed] amdgcn: Handle early debug info in mkoffload

2020-07-17 Thread Andrew Stubbs
On 17/07/2020 12:29, Andrew Stubbs wrote: For easier later maintenance, shouldn't this be a '#define' (or similar) done next to where the GCC back end defines its default? I thought of this, but I don't think there's actually a problem. The default is defined via

Re: [committed] amdgcn: Handle early debug info in mkoffload

2020-07-20 Thread Andrew Stubbs
On 20/07/2020 08:35, Richard Biener wrote: The way simple_object is supposed to work is to clone (or merge) the ELF headers from an existing binary. Unfortunately, the way mkoffload is currently coded we don't have any to clone from until too late. We could separate the assemble and link steps, b

Re: [committed] amdgcn: Handle early debug info in mkoffload

2020-07-20 Thread Andrew Stubbs
On 20/07/2020 11:01, Richard Biener wrote: On Mon, Jul 20, 2020 at 10:40 AM Andrew Stubbs wrote: On 20/07/2020 08:35, Richard Biener wrote: The way simple_object is supposed to work is to clone (or merge) the ELF headers from an existing binary. Unfortunately, the way mkoffload is currently

[committed] amdgcn: Enable TImode

2020-07-25 Thread Andrew Stubbs
This enables types __int128 et al for move, add, subtract, and logical operations. At least shift, rotate, multiple, divide, and modulus are broken so we can expect some test failures. This is required now because libgomp no longer builds without __int128. An additional patch will be required

[PATCH] OpenACC: Separate enter/exit data APIs

2020-07-29 Thread Andrew Stubbs
This patch does not implement anything new, but simply separates OpenACC 'enter data' and 'exit data' into two libgomp API functions. The original API name is kept for backward compatibility, but no longer referenced by the compiler. The previous implementation assumed that it would always be

[PATCH] OpenACC: Support GOMP_MAP_ZERO_LEN_ARRAY_SECTION

2020-07-29 Thread Andrew Stubbs
This patch adds support for zero-length arrays in OpenACC data transfers. Previously, trying to use an array section with zero length would cause a fatal error at runtime. This patch requires that my other patch "OpenACC: Separate enter/exit data APIs" is already applied. Unfortunately, beca

Re: [PATCH] OpenACC: Separate enter/exit data APIs

2020-07-30 Thread Andrew Stubbs
On 29/07/2020 15:05, Andrew Stubbs wrote: This patch does not implement anything new, but simply separates OpenACC 'enter data' and 'exit data' into two libgomp API functions.  The original API name is kept for backward compatibility, but no longer referenced by the comp

[committed] amdgcn: TImode shifts

2020-08-04 Thread Andrew Stubbs
This patch implements scalar TImode shifts using hardware DImode shifts. The middle-end cannot synthesize these because BITS_PER_WORD is 32, on this architecture, meaning it would try to use SImode shifts, and only double-word shifts are implemented. This fixes a large number of test failures

[committed] amdgcn: Remove dead defines from gcn-run

2020-08-04 Thread Andrew Stubbs
This is just an obvious code cleanup; the relocation defines have been unused since the move to HSACOv3. They were just left in by mistake. Andrew amdgcn: Remove dead defines from gcn-run Nothing uses these since the switch to HSACOv3. gcc/ChangeLog: * config/gcn/gcn-run.c (R_AMDGPU_NONE): D

[PATCH] emit-rtl.c: Allow vector subreg of float vectors

2020-08-05 Thread Andrew Stubbs
This patch is a prerequisite for some patches I have to add multiple vector sizes on amdgcn. The problem is that validate_subreg rejects things like this: (subreg:V32SF (reg:V64SF) 0) These are commonly generated by the compiler. The integer equivalents work fine. To be honest, I don't kn

[PATCH] vect: Try smaller vector size when SLP split fails

2020-08-05 Thread Andrew Stubbs
This patch improves SLP performance in combination with some patches I have in development to add multiple vector sizes to amdgcn. The problem is that amdgcn's preferred vector size has 64 lanes, and SLP does not support lane masking. My patches will add smaller vector sizes (32, 16, 8, 4, 2)

Re: [PATCH] emit-rtl.c: Allow vector subreg of float vectors

2020-08-07 Thread Andrew Stubbs
On 06/08/2020 04:54, Richard Sandiford wrote: diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index f9b0e9714d9..d7067989ad7 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -947,6 +947,11 @@ validate_subreg (machine_mode omode, machine_mode imode, else if (VECTOR_MODE_P (omode) &&

[committed] amdgcn: refactor mode iterators

2020-03-27 Thread Andrew Stubbs
shorter and tidier. It does not change the output machine description at all. 2020-03-27 Andrew Stubbs gcc/ * config/gcn/gcn-valu.md: (VEC_SUBDWORD_MODE): Rename to V_QIHI throughout. (VEC_1REG_MODE): Delete. (VEC_1REG_ALT): Delete. (VEC_ALL1REG_MODE): Rename to V_1REG throughout

[committed] amdgcn: generalize vector insn modes

2020-03-31 Thread Andrew Stubbs
after shows only white-space differences and the use of GET_MODE_NUNITS. 2020-03-31 Andrew Stubbs gcc/ * config/gcn/gcn-valu.md (V_QI, V_HI, V_HF, V_SI, V_SF, V_DI, V_DF): New mode iterators. (vnsi, VnSI, vndi, VnDI): New mode attributes. (mov): Use in place of V64DI. (mov_exec): Lik

Re: [PATCH] [amdgcn] Add support for unordered floating-point comparisons

2020-04-02 Thread Andrew Stubbs
On 02/04/2020 15:55, Kwok Cheung Yeung wrote: Hello This patch adds support for the unordered floating-point comparison operators (UNEQ, UNGE, UNGT, UNLE, UNLT), which return true if one of the operands is a NaN. These comparisons can be generated by builtins such as __builtin_isgreater. GC

Re: [rtl] Harden 'set_noop_p' for non-constant selectors [PR94279]

2020-04-22 Thread Andrew Stubbs
On 22/04/2020 17:43, Thomas Schwinge wrote: In "[amdgcn] internal compiler error: RTL check: expected code 'const_int', have 'reg' in rtx_to_poly_int64, at rtl.h:2379", we recently found that that it's wrong to expect constant selectors, at least in the current code

Re: [PATCH] amdgcn: Add stub personality function

2020-04-23 Thread Andrew Stubbs
On 22/04/2020 22:10, Kwok Cheung Yeung wrote: Hello This patch adds a stub implementation of __gxx_personality_v0, which is used in C++ exception handling. AMD GCN currently does not actually support exception handling (the unwind functions are all stubs too), so adding an extra stub function

Re: [PATCH] amdgcn: Add stub personality function

2020-04-23 Thread Andrew Stubbs
On 23/04/2020 12:21, Kwok Cheung Yeung wrote: I agree that not generating the problematic code in the first place is the better approach. Does that mean we can now remove libgcc/config/gcn/unwind-gcn.c completely? That was added for the benefit of libgfortran, not C++. It's used by the backtr

[committed] amdgcn: Check HSA return codes [PR94629]

2020-04-23 Thread Andrew Stubbs
status was ignored. Anyway, this is probably good practice. Andrew amdgcn: Check HSA return codes [PR94629] Ensure that the returned status values are not ignored. The old code was not broken, but this is both safer and satisfies static analysis. 2020-04-23 Andrew Stubbs PR other/94629

Re: [PR93488] [OpenACC] ICE in type-cast 'async', 'wait' clauses

2020-04-23 Thread Andrew Stubbs
record the review effort, please include "Reviewed-by: Thomas Schwinge " in the commit log, see <https://gcc.gnu.org/wiki/Reviewed-by>. I've committed the attached. Andrew OpenACC: Avoid ICE in type-cast 'async', 'wait' clauses 2020-04-23 An

Re: [AMD GCN] Use 'radeon' for the environment variable 'ACC_DEVICE_TYPE'

2020-04-23 Thread Andrew Stubbs
On 21/04/2020 13:24, Thomas Schwinge wrote: I wondered whether for symmetry, the GCC-internal 'GOMP_DEVICE_GCN', 'OFFLOAD_TARGET_TYPE_GCN' should also be renamed to '*_RADEON'? Or, going by example of '*_NVIDIA_PTX', name them '*_AMD_GCN'. Or, in fact then leave them as '*_GCN', given Julian's

[committed] amdgcn: Swap mov_exec operands

2020-04-23 Thread Andrew Stubbs
has the exec-mask operand last except this one. Andrew amdgcn: Swap mov_exec operands Every other *_exec insn has the exec operand last. This being the other way around is a cause of bugs, and prevents use in macro templates. 2020-04-23 Andrew Stubbs gcc/ * config/gcn/gcn-valu.md (mov_exec)

[committed] amdgcn: Testsuite tweaks

2020-04-24 Thread Andrew Stubbs
well adapted for GCN, in which the vector size varies with the number of lanes, not the other way around, but this is ok for now. 2020-04-24 Andrew Stubbs gcc/testsuite/ * lib/target-supports.exp (available_vector_sizes): Add amdgcn. (check_effective_target_vect_cmdline_needed): Disable for

[committed] amdgcn: Split 64-bit constant loads post-reload

2020-04-24 Thread Andrew Stubbs
stack by simplifying the code that LRA sees. 2020-04-17 Andrew Stubbs gcc/ * config/gcn/gcn.md (*mov_insn): Only split post-reload. diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index 702ba55c11a..8f5937781b2 100644 --- a/gcc/config/gcn/gcn.md +++ b/gcc/config/gcn/gcn.md

[committed] amdgcn: Fix wrong-code bug in 64-bit masked add

2020-04-24 Thread Andrew Stubbs
asked add 2020-04-24 Andrew Stubbs gcc/ * config/gcn/gcn-valu.md (add_zext_dup2_exec): Fix merge of high-part. (add_sext_dup2_exec): Likewise. diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index 0422e153cf3..d3badb4059c 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b

[committed] amdgcn: fix vcc clobber in vector load/store

2020-05-14 Thread Andrew Stubbs
fine in code expanded in earlier passes, but addresses expanded late, such as for stack spills or reloads, could clobber live VCC values, causing execution failures. This is the first target-specific testcase for GCN, so the new .exp file is included. 2020-05-14 Andrew Stubbs gcc/ * config/gcn

[committed] amdgcn: use unsigned extend for lshiftrt

2020-05-15 Thread Andrew Stubbs
be removed, but I didn't notice it until after the push. :-( WIP amdgcn: use unsigned extend for lshiftrt This fixes a wrong-code logic error in a previous patch. Detected by gcc.c-torture/execute/pr53645-2.c. 2020-05-15 Andrew Stubbs gcc/ * config/gcn/gcn-valu.md (v3): Fix unsignedp

[committed] amdgcn: Fix VCC early clobber

2020-05-29 Thread Andrew Stubbs
operations, and each writes to the whole of VCC, creating an early-clobber situation for this specific input register. Andrew amdgcn: Fix VCC early clobber gcc/ChangeLog: 2020-05-28 Andrew Stubbs * config/gcn/gcn-valu.md (add3_vcc_zext_dup): Add early clobber. (add3_vcc_zext_dup_exec

[committed][GCC10] amdgcn: fix vcc clobber in vector load/store

2020-05-29 Thread Andrew Stubbs
loads, could clobber live VCC values, causing execution failures. This is the first target-specific testcase for GCN, so the new .exp file is included. 2020-05-28 Andrew Stubbs Backport from master: 2020-05-14 Andrew Stubbs gcc/ * config/gcn/gcn-valu.md (add3_zext_dup): Change

[COMMITTED] amdgcn: Remove -mlocal-symbol-id option

2020-06-02 Thread Andrew Stubbs
This patch removes the vestiges of the GCN-specific -mlocal-symbol-id option. Previously, this was part of a horrible workaround for a bug in the Radeon Open Compute ELF loader. The bug has been fixed a while now, and the name mangling has not been present in the compiler for a while, so the op

[PATCH] DWARF: fix debug info for offload kernels

2020-12-18 Thread Andrew Stubbs
This patch adjusts the DWARF emitted for OpenACC/OpenMP offload kernels such that the AMD rocgdb debugger can display the contents of variables within kernel code. The problem was that GDB was discarding the kernel functions' debug info because it is represented as a nested function inside the

[committed][OG10] Fix offload dwarf info

2021-01-15 Thread Andrew Stubbs
This patch corrects a problem in which GDB ignores the debug info for offload kernel entry functions because they're represented as nested functions inside a function that does not exist on the accelerator device (only on the host). The fix is to add a notional code range to the non-existent p

[committed][OG10] amdgcn: Fix DWARF variables with alloca

2021-01-15 Thread Andrew Stubbs
This patch fixes DWARF frame calculations for functions that use alloca on AMD GCN. Like many other platforms, it achieves this by switching to frame-pointer mode for this function. The frame pointer is necessary for debugability only, so if the user specifies -fomit-frame-pointer then this

[committed][OG10] amdgcn: DWARF address spaces

2021-01-15 Thread Andrew Stubbs
This patch implements DWARF address spaces for pointers to LDS, etc., on AMD GCN. The address space mappings are defined by AMD in their DWARF proposals, and the LLVM implementation. ROCGDB does not actually support this feature yet, I don't believe, but will do so soonish. Committed to de

[committed][OG10] DWARF address space for variables

2021-01-15 Thread Andrew Stubbs
This patch adds DWARF support for "local" variables that are actually located in a different address space. This situation occurs for variables shared between all the worker threads of an OpenACC gang. On AMD GCN the variables are allocated to the low-latency LDS memory associated with each ph

Re: [committed][OG10] Fix offload dwarf info

2021-01-16 Thread Andrew Stubbs
On 15/01/2021 11:43, Andrew Stubbs wrote: This patch corrects a problem in which GDB ignores the debug info for offload kernel entry functions because they're represented as nested functions inside a function that does not exist on the accelerator device (only on the host). Apparently

Re: [committed] amdgcn: use unsigned extend for lshiftrt

2020-06-15 Thread Andrew Stubbs
On 15/05/2020 11:37, Andrew Stubbs wrote: This patch fixes a bug in which 8 and 16-bit vector shifts used the wrong kind of extend, thus causing wrong results.  It was simply a thinko in the insn code, so easily fixed. This is now back-ported to releases/gcc-10. Andrew

<    1   2   3   4   5   6   7   8   9   10   >