[PATCH] vect: while_ult for integer mask

2022-09-28 Thread Andrew Stubbs
This patch is a prerequisite for some amdgcn patches I'm working on to support shorter vector lengths (having fixed 64 lanes tends to miss optimizations, and masking is not supported everywhere yet). The problem is that, unlike AArch64, I'm not using different mask modes for different sized ve

Re: [PATCH] vect: while_ult for integer mask

2022-09-29 Thread Andrew Stubbs
On 29/09/2022 08:52, Richard Biener wrote: On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs wrote: This patch is a prerequisite for some amdgcn patches I'm working on to support shorter vector lengths (having fixed 64 lanes tends to miss optimizations, and masking is not supported everywher

Re: [PATCH] vect: while_ult for integer mask

2022-09-29 Thread Andrew Stubbs
On 29/09/2022 10:24, Richard Sandiford wrote: Otherwise: operand0[0] = operand1 < operand2; for (i = 1; i < operand3; i++) operand0[i] = operand0[i - 1] && (operand1 + i < operand2); looks like a "length and mask" operation, which IIUC is also what RVV wanted? (Wasn't at the Cauldro

Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-09-29 Thread Andrew Stubbs
On 27/09/2022 14:16, Tobias Burnus wrote: @@ -422,6 +428,12 @@ struct agent_info if it has been. */ bool initialized; + /* Flag whether the HSA program that consists of all the modules has been + finalized. */ + bool prog_finalized; + /* Flag whether the HSA OpenMP's requires

[committed] amdgcn: remove unused variable

2022-09-29 Thread Andrew Stubbs
I've committed this small clean up. It silences a warning. Andrewamdgcn: remove unused variable This was left over from a previous version of the SIMD clone patch. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): Remove unused elt_bits variable.

Re: [PATCH] vect: while_ult for integer mask

2022-10-03 Thread Andrew Stubbs
On 29/09/2022 14:46, Richard Biener wrote: It's not the nicest way of carrying the information but short of inventing new modes I can't see something better (well, another optab). I see the GCN backend expects a constant in operand 3 but the docs don't specify the operand has to be a CONST_INT,

Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-10 Thread Andrew Stubbs
On 10/10/2022 12:03, Richard Biener wrote: The following picks up the prototype by Ju-Zhe Zhong for vectorizing first order recurrences. That solves two TSVC missed optimization PRs. There's a new scalar cycle def kind, vect_first_order_recurrence and it's handling of the backedge value vectori

[committed 0/6] amdgcn: Add V32, V16, V8, V4, and V2 vectors

2022-10-11 Thread Andrew Stubbs
ssues, but rather existing problems that did not show up because the code did not previously vectorize. Expanding the testcase to allow 64-lane vectors shows the same problems there. I shall backport these patches to the OG12 branch shortly. Andrew Andrew Stubbs (6): amdgcn: add multiple ve

[committed 2/6] amdgcn: Resolve insn conditions at compile time

2022-10-11 Thread Andrew Stubbs
GET_MODE_NUNITS isn't a compile time constant, so we end up with many impossible insns in the machine description. Adding MODE_VF allows the insns to be eliminated completely. gcc/ChangeLog: * config/gcn/gcn-valu.md (2): Use MODE_VF. (2): Likewise. * config/gcn/g

[committed 3/6] amdgcn: Add vec_extract for partial vectors

2022-10-11 Thread Andrew Stubbs
Add vec_extract expanders for all valid pairs of vector types. gcc/ChangeLog: * config/gcn/gcn-protos.h (get_exec): Add prototypes for two variants. * config/gcn/gcn-valu.md (vec_extract): New define_expand. * config/gcn/gcn.cc (get_exec): Export the existing func

[committed 4/6] amdgcn: vec_init for multiple vector sizes

2022-10-11 Thread Andrew Stubbs
Implements vec_init when the input is a vector of smaller vectors, or of vector MEM types, or a smaller vector duplicated several times. gcc/ChangeLog: * config/gcn/gcn-valu.md (vec_init): New. * config/gcn/gcn.cc (GEN_VN): Add andvNsi3, subvNsi3. (GEN_VNM): Add gathervNm

[committed 1/6] amdgcn: add multiple vector sizes

2022-10-11 Thread Andrew Stubbs
The vectors sizes are simulated using implicit masking, but they make life easier for the autovectorizer and SLP passes. gcc/ChangeLog: * config/gcn/gcn-modes.def (VECTOR_MODE): Add new modes V32QI, V32HI, V32SI, V32DI, V32TI, V32HF, V32SF, V32DF, V16QI, V16HI, V16SI, V16

[committed 5/6] amdgcn: Add vector integer negate insn

2022-10-11 Thread Andrew Stubbs
Another example of the vectorizer needing explicit insns where the scalar expander just works. gcc/ChangeLog: * config/gcn/gcn-valu.md (neg2): New define_expand. --- gcc/config/gcn/gcn-valu.md | 13 + 1 file changed, 13 insertions(+) diff --git a/gcc/config/gcn/gcn-valu.md

[committed 6/6] amdgcn: vector testsuite tweaks

2022-10-11 Thread Andrew Stubbs
The testsuite needs a few tweaks following my patches to add multiple vector sizes for amdgcn. gcc/testsuite/ChangeLog: * gcc.dg/pr104464.c: Xfail on amdgcn. * gcc.dg/signbit-2.c: Likewise. * gcc.dg/signbit-5.c: Likewise. * gcc.dg/vect/bb-slp-68.c: Likewise.

Re: [committed 0/6] amdgcn: Add V32, V16, V8, V4, and V2 vectors

2022-10-11 Thread Andrew Stubbs
On 11/10/2022 12:29, Richard Biener wrote: On Tue, Oct 11, 2022 at 1:03 PM Andrew Stubbs wrote: This patch series adds additional vector sizes for the amdgcn backend. The hardware supports any arbitrary vector length up to 64-lanes via masking, but GCC cannot (yet) make full use of them due

Re: [Patch] GCN: Implement __atomic_compare_exchange_{1, 2} in libgcc [PR102215]

2022-03-09 Thread Andrew Stubbs
On 09/03/2022 16:29, Tobias Burnus wrote: This shows up with with OpenMP offloading as libgomp since a couple of months uses __atomic_compare_exchange (see PR for details), causing link errors when the gcn libgomp.a is linked. It also shows up with sollve_vv. The implementation does a bit copy'n

Re: [PATCH, OpenMP 5.0] More implementation of the requires directive

2022-03-29 Thread Andrew Stubbs
On 13/01/2021 15:07, Chung-Lin Tang wrote: We currently emit errors, but do not fatally cause exit of the program if those are not met. We're still unsure if complete block-out of program execution is the right thing for the user. This can be discussed later. After the Unified Shared Memory p

Re: [PATCH 5/5] openmp: -foffload-memory=pinned

2022-03-30 Thread Andrew Stubbs
On 08/03/2022 11:30, Hafiz Abid Qadeer wrote: gcc/ChangeLog: * omp-low.cc (omp_enable_pinned_mode): New function. (execute_lower_omp): Call omp_enable_pinned_mode. This worked for x86_64, but I needed to make the attached adjustment to work on powerpc without a linker error.

Re: [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory.

2022-04-02 Thread Andrew Stubbs
On 08/03/2022 11:30, Hafiz Abid Qadeer wrote: This patches changes calls to malloc/free/calloc/realloc and operator new to memory allocation functions in libgomp with allocator=ompx_unified_shared_mem_alloc. This additional patch adds transformation for omp_target_alloc. The OpenMP 5.0 documen

Re: [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory.

2022-04-02 Thread Andrew Stubbs
On 02/04/2022 13:04, Andrew Stubbs wrote: This additional patch adds transformation for omp_target_alloc. The OpenMP 5.0 document says that addresses allocated this way needs to work without is_device_ptr. The easiest way to make that work is to make them USM addresses. Actually, reading on

Re: [PATCH 0/5] openmp: Handle pinned and unified shared memory.

2022-04-13 Thread Andrew Stubbs
This patch adjusts the testcases, previously proposed, to allow for testing on machines with varying page sizes and default amounts of lockable memory. There turns out to be more variation than I had thought. This should go on mainline at the same time as the previous patches in this thread.

[PATCH] openmp: Handle unified address memory.

2022-04-20 Thread Andrew Stubbs
This patch adds enough support for "requires unified_address" to make the sollve_vv testcases pass. It implements unified_address as a synonym of unified_shared_memory, which is both valid and the only way I know of to unify addresses with Cuda (could be wrong). This patch should be applied on

Re: [PATCH] libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets.

2022-01-18 Thread Andrew Stubbs
Sorry, I had not seen that this was entirely within my amdgcn remit On 12/01/2022 09:43, Marcel Vollweiler wrote: Hi, Currently omp_get_device_num does not work on gcn targets with more than one offload device. The reason is that GOMP_DEVICE_NUM_VAR is static in icv-device.c and thus "__gom

Re: [PATCH] libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets.

2022-01-18 Thread Andrew Stubbs
On 18/01/2022 12:25, Thomas Schwinge wrote: Hi! Maybe I'm just totally confused -- as so often ;-) -- but things seem strange here: On 2022-01-12T10:43:05+0100, Marcel Vollweiler wrote: Currently omp_get_device_num does not work on gcn targets with more than one offload device. The reason is

[PATCH] libgomp, openmp: Add ompx_pinned_mem_alloc

2022-01-20 Thread Andrew Stubbs
This patch adds a new predefined allocator named ompx_pinned_mem_alloc as an extension to the OpenMP standard. It is intended as a convenient way to allocate pinned memory using the Linux support patch I posted recently. I anticipate it being used by compiler internals in future as part of a pr

[PATCH] openmp, nvptx: low-lat memory access traits

2022-01-27 Thread Andrew Stubbs
This patch adjusts the NVPTX low-latency allocator that I have previously posted (awaiting re-review). The patch assumes that all my previously posted patches are applied already. Given that any memory allocated from the low-latency memory space cannot support the "access=all" allocator trait

Re: [wwwdocs] gcc-12/changes.html (GCN): >1 workers per gang

2022-02-02 Thread Andrew Stubbs
On 02/02/2022 15:39, Tobias Burnus wrote: On 09.08.21 15:55, Tobias Burnus wrote: Now that the GCN/OpenACC patches for this have been committed today, I think it makes sense to add it to the documentation. (I was told that some follow-up items are still pending, but as the feature does work ...)

[committed] amdgcn: Allow vector reductions on constants

2022-02-14 Thread Andrew Stubbs
I've committed this fix for an ICE compiling sollve_vv testcase test_target_teams_distribute_defaultmap.c. Somehow the optimizers result in a vector reduction on a vector of duplicated constants. This was a case the backend didn't handle, so we ended up with an unrecognised instruction ICE.

[OG11][committed] amdgcn: Allow vector reductions on constants

2022-02-14 Thread Andrew Stubbs
On 14/02/2022 14:13, Andrew Stubbs wrote: I've committed this fix for an ICE compiling sollve_vv testcase test_target_teams_distribute_defaultmap.c. Somehow the optimizers result in a vector reduction on a vector of duplicated constants. This was a case the backend didn't handle, so

[commit][master+OG11] amdgcn: Fix ICE generating CFI [PR103396]

2021-11-25 Thread Andrew Stubbs
If committed this patch to fix the amdgcn ICE reported in PR103396. The problem was that it was mis-counting the number of registers to save when the link register was only clobbered implicitly by calls. The issue is easily fixed by adjusting the condition to match elsewhere in the same functi

Re: [PATCH] OpenMP: Ensure that offloaded variables are public

2021-12-02 Thread Andrew Stubbs
On 30/11/2021 16:54, Jakub Jelinek wrote: Why does the GCN plugin or runtime need to know those vars? It needs to know the single array that contains their addresses of course... With older LLVM there were issues with relocations that made it impossible to link the the offload_var_table. This

Re: [PATCH] OpenMP: Ensure that offloaded variables are public

2021-12-02 Thread Andrew Stubbs
On 02/12/2021 12:58, Jakub Jelinek wrote: I've tried modifying offload_handle_link_vars but that spot doesn't catch the omp_data_sizes variables emitted by libgomp.c-c++-common/target_42.c, which was one of the motivating examples. Why doesn't catch it? Is the variable created only post-IPA? I

Re: [PATCH] OpenMP: Ensure that offloaded variables are public

2021-12-02 Thread Andrew Stubbs
On 02/12/2021 16:05, Andrew Stubbs wrote: On 02/12/2021 12:58, Jakub Jelinek wrote: I've tried modifying offload_handle_link_vars but that spot doesn't catch the omp_data_sizes variables emitted by libgomp.c-c++-common/target_42.c, which was one of the motivating examples. Why doe

Re: [PATCH] OpenMP: Ensure that offloaded variables are public

2021-12-09 Thread Andrew Stubbs
On 02/12/2021 16:43, Jakub Jelinek wrote: On Thu, Dec 02, 2021 at 04:31:36PM +, Andrew Stubbs wrote: On 02/12/2021 16:05, Andrew Stubbs wrote: On 02/12/2021 12:58, Jakub Jelinek wrote: I've tried modifying offload_handle_link_vars but that spot doesn't catch the omp_data_sizes

Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-10-12 Thread Andrew Stubbs
On 12/10/2022 15:29, Tobias Burnus wrote: On 29.09.22 18:24, Andrew Stubbs wrote: On 27/09/2022 14:16, Tobias Burnus wrote: Andrew did suggest a while back to piggyback on the console_output handling, avoiding another atomic access. - If this is still wanted, I like to have some guidance

Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-14 Thread Andrew Stubbs
On 14/10/2022 08:07, Richard Biener wrote: On Tue, 11 Oct 2022, Richard Sandiford wrote: Richard Biener writes: On Mon, 10 Oct 2022, Andrew Stubbs wrote: On 10/10/2022 12:03, Richard Biener wrote: The following picks up the prototype by Ju-Zhe Zhong for vectorizing first order recurrences

[PATCH] libgomp: fix hang on fatal error

2022-10-14 Thread Andrew Stubbs
This patch fixes a problem in which fatal errors inside mutex-locked regions (i.e. basically anything in the plugin) will cause it to hang up trying to take the lock to clean everything up. Using abort() instead of exit(1) bypasses the atexit handlers and solves the problem. OK for mainline?

[OG12 commit] amdgcn, libgomp: USM allocation update

2022-10-24 Thread Andrew Stubbs
I've committed this patch to the devel/omp/gcc-12 branch. I will have to fold it into my previous OpenMP memory management patch series when I repost it. The patch changes the internal memory allocation method such that memory is allocated in the regular heap and then marked as "coarse-grained

[OG12 commit] amdgcn: disallow USM on gfx908

2022-10-24 Thread Andrew Stubbs
I've committed this patch to the devel/omp/gcc-12 branch. I will have to fold it into my previous OpenMP memory management patch series when I repost it. The GFX908 (MI100) devices only partially support the Unified Shared Memory model that we have, and only then with additional kernel boot p

[OG12 commit] vect: WORKAROUND vectorizer bug

2022-10-24 Thread Andrew Stubbs
I've committed this to the OG12 branch to remove some test failures. We probably ought to have something on mainline also, but a proper fix would be better. Without this. the libgomp.oacc-c-c++-common/private-variables.c testcase fails to compile due to an ICE. The OpenACC worker broadcasting

Re: [OG12 commit] vect: WORKAROUND vectorizer bug

2022-10-27 Thread Andrew Stubbs
On 24/10/2022 19:06, Richard Biener wrote: Am 24.10.2022 um 18:51 schrieb Andrew Stubbs : I've committed this to the OG12 branch to remove some test failures. We probably ought to have something on mainline also, but a proper fix would be better. Without this. the libgomp.oac

[committed] amdgcn: Silence unused parameter warning

2022-10-31 Thread Andrew Stubbs
A function parameter was left over from a previous draft of my multiple-vector-length patch. This patch silences the harmless warning. Andrewamdgcn: Silence unused parameter warning gcc/ChangeLog: * config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): Set base_type a

[committed] amdgcn: add fmin/fmax patterns

2022-10-31 Thread Andrew Stubbs
This patch adds patterns for the fmin and fmax operators, for scalars, vectors, and vector reductions. The compiler uses smin and smax for most floating-point optimizations, etc., but not where the user calls fmin/fmax explicitly. On amdgcn the hardware min/max instructions are already IEEE c

[committed] amdgcn: multi-size vector reductions

2022-10-31 Thread Andrew Stubbs
My recent patch to add additional vector lengths didn't address the vector reductions yet. This patch adds the missing support. Shorter vectors use fewer reduction steps, and the means to extract the final value has been adjusted. Lacking from this is any useful costs, so for loops the vect p

Re: [PATCH] amdgcn: Fix instruction generation for exp2 and log2 operations

2022-11-03 Thread Andrew Stubbs
On 03/11/2022 17:47, Kwok Cheung Yeung wrote: Hello This patch fixes a bug introduced in a previous patch adding support for generating native instructions for the exp2 and log2 patterns. The problem is that the name of the instruction implementing the exp2 operation is v_exp (and not v_exp2)

Re: [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors

2022-08-30 Thread Andrew Stubbs
On 26/08/2022 12:04, Jakub Jelinek wrote: gcc/ChangeLog: * doc/tm.texi: Regenerate. * omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero vecsize. (simd_clone_adjust_argument_types): Likewise. * target.def (compute_vecsize_and_simdlen): Document

Re: [PATCH 2/3] amdgcn: OpenMP SIMD routine support

2022-08-30 Thread Andrew Stubbs
On 09/08/2022 14:23, Andrew Stubbs wrote: Enable and configure SIMD clones for amdgcn. This affects both the __simd__ function attribute, and the OpenMP "declare simd" directive. Note that the masked SIMD variants are generated, but the middle end doesn't actually support c

Re: [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors

2022-08-31 Thread Andrew Stubbs
On 31/08/2022 09:29, Jakub Jelinek wrote: On Tue, Aug 30, 2022 at 06:54:49PM +0200, Rainer Orth wrote: --- a/gcc/omp-simd-clone.cc +++ b/gcc/omp-simd-clone.cc @@ -504,7 +504,10 @@ simd_clone_adjust_return_type (struct cgraph_node *node) veclen = node->simdclone->vecsize_int; else

Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations

2022-09-09 Thread Andrew Stubbs
On 08/09/2022 21:38, Kwok Cheung Yeung wrote: Hello This patch adds support for some additional floating-point operations, in scalar and vector modes, which are natively supported by the AMD GCN instruction set, but haven't been implemented in GCC yet. With the exception of frexp, these imple

Re: GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246] (was: Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations)

2022-09-09 Thread Andrew Stubbs
On 09/09/2022 13:20, Tobias Burnus wrote: However, the pre-existing 'sqrt' problem still is real. It also applies to reverse sqrt ("v_rsq"), but that's for whatever reason not used for GCN. This patch now adds a commandline flag - off by default - to choose whether this behavior is wanted. I d

Re: libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time)

2022-04-28 Thread Andrew Stubbs
On 06/04/2022 11:02, Thomas Schwinge wrote: Hi! On 2021-01-14T15:50:23+0100, I wrote: I'm raising here an issue with HSA libgomp plugin code changes from a while ago. While HSA is now no longer relevant for GCC master branch, the same code has also been copied into the GCN libgomp plugin. He

Re: [Patch] gcn/t-omp-device: Add 'amdgcn' as 'arch' [PR105602]

2022-05-16 Thread Andrew Stubbs
On 16/05/2022 11:28, Tobias Burnus wrote: While 'vendor' and 'kind' is well defined, 'arch' and 'isa' isn't. When looking at an 'metadirective' testcase (which oddly uses 'arch(amd)'), I noticed that LLVM uses 'arch(amdgcn)' while we use 'gcn', cf. e.g. 'clang/lib/Headers/openmp_wrappers/math.h'

Re: [PATCH, OpenMP, v2] Implement uses_allocators clause for target regions

2022-05-19 Thread Andrew Stubbs
On 19/05/2022 17:00, Jakub Jelinek wrote: Without requires dynamic_allocators, there are various extra restrictions imposed: 1) omp_init_allocator/omp_destroy_allocator may not be called (except for implicit calls to it from uses_allocators) in a target region I interpreted that more like "

[committed] amdgcn: Remove LLVM 9 assembler/linker support

2022-05-24 Thread Andrew Stubbs
I've committed this patch to set the minimum required LLVM version, for the assembler and linker, to 13.0.1. An upgrade from LLVM 9 is a prerequisite for the gfx90a support, and 13.0.1 is now the oldest version not known to have compatibility issues. The patch removes all the obsolete feature

[committed] amdgcn: Add gfx90a support

2022-05-24 Thread Andrew Stubbs
I've committed this patch to add support for gfx90a AMD GPU devices. The patch updates all the places that have architecture/ISA specific code, tidies up the ISA naming and handling in the backend, and adds a new multilib. This is just lightly tested at this point, but there are no known issu

Re: [Patch][AMD GCN][OpenMP] Add gcc/config/gcn/t-omp-device for OpenMP declare variant kind/arch/isa

2019-11-04 Thread Andrew Stubbs
On 04/11/2019 15:37, Jakub Jelinek wrote: My preference would be that arch on amdgcn is something like amdgcn or gcn. I hope the general distinction between arch and isa will be something that will be discussed next Tuesday on the language committee, so hopefully we'll know more afterwards and ca

[PATCH 1/7 libgomp,nvptx] Move generic libgomp files from nvptx to accel

2019-11-12 Thread Andrew Stubbs
shared with the GCN port, thus preventing much of the duplication. OK to commit? Thanks Andrew 2019-11-12 Andrew Stubbs libgomp/ * configure.tgt (nvptx*-*-*): Add "accel" directory. * config/nvptx/libgomp-plugin.c: Move ... * config/accel/libgomp-plugin.c: ..

[PATCH 0/7 libgomp,amdgcn] AMD GCN Offloading Support

2019-11-12 Thread Andrew Stubbs
to forward-port. Otherwise that will have to wait for GCC 11. Andrew Andrew Stubbs (7): Move generic libgomp files from nvptx to accel GCN mkoffload Add device number to GOMP_OFFLOAD_openacc_async_construct GCN libgomp port Optimize GCN OpenMP malloc performance Use a single worker for Ope

[PATCH 2/7 amdgcn] GCN mkoffload

2019-11-12 Thread Andrew Stubbs
This patch adds the mkoffload tool to the amdgcn backend. It's similar, but not quite the same as that on the openacc-gcc-9-branch. I will commit this patch when the others in this series are approved. Andrew 2019-11-12 Andrew Stubbs gcc/ * config/gcn/mkoffload.c: New

[PATCH 3/7 libgomp,nvptx] Add device number to GOMP_OFFLOAD_openacc_async_construct

2019-11-12 Thread Andrew Stubbs
e the queue is intended, so this simply provides that information to the queue constructor. OK to commit? Thanks Andrew 2019-11-12 Andrew Stubbs libgomp/ * libgomp-plugin.h (GOMP_OFFLOAD_openacc_async_construct): Add int parameter. * oacc-as

[PATCH 4/7 libgomp,amdgcn] GCN libgomp port

2019-11-12 Thread Andrew Stubbs
new target-specific symbols to be added to libgomp. I couldn't find an existing way to do this without adding a new top-level file also, to there's an empty placeholder also. (The OG9 branch has this symbol in libgcc, but that seems wrong.) OK to commit? Thanks Andrew 2019-11-12

[PATCH 7/7 libgomp,amdgcn] GCN Libgomp Plugin

2019-11-12 Thread Andrew Stubbs
This patch contributes the GCN libgomp plugin, with the various configure and make bits to go with it. This implementation is a much-cleaned-up version of the one present on the openacc-gcc-9-branch. OK to commit? Thanks Andrew 2019-11-12 Andrew Stubbs libgomp/ * plugin

[PATCH 5/7 libgomp,amdgcn] Optimize GCN OpenMP malloc performance

2019-11-12 Thread Andrew Stubbs
l search and replace. Dummy pass-through definitions are provided for other targets. OK to commit? Thanks Andrew 2019-11-12 Andrew Stubbs libgomp/ * config/gcn/team.c (gomp_gcn_enter_kernel): Set up the team arena and use team_malloc variants. (gomp_gcn_exit_ke

[PATCH 6/7 amdgcn] Use a single worker for OpenACC on AMD GCN

2019-11-12 Thread Andrew Stubbs
This patch prevents the compiler using multiple workers in a gang. This should be reverted when worker support is committed. I will commit this with the reset of the series. Andrew 2019-11-12 Andrew Stubbs Julian Brown gcc/ * config/gcn/gcn.c

Re: [PATCH 4/7 libgomp,amdgcn] GCN libgomp port

2019-11-12 Thread Andrew Stubbs
On 12/11/2019 13:46, Jakub Jelinek wrote: On Tue, Nov 12, 2019 at 01:29:13PM +, Andrew Stubbs wrote: 2019-11-12 Andrew Stubbs include/ * gomp-constants.h (GOMP_DEVICE_GCN): Define. (GOMP_VERSION_GCN): Define. Perhaps this could be 0, but not a big deal. OG9

Re: [PATCH 7/7 libgomp,amdgcn] GCN Libgomp Plugin

2019-11-12 Thread Andrew Stubbs
On 12/11/2019 14:01, Jakub Jelinek wrote: On Tue, Nov 12, 2019 at 01:29:16PM +, Andrew Stubbs wrote: 2019-11-12 Andrew Stubbs libgomp/ * plugin/Makefrag.am: Add amdgcn plugin support. * plugin/configfrag.ac: Likewise. * plugin/plugin-gcn.c: New file

Re: [PATCH 5/7 libgomp,amdgcn] Optimize GCN OpenMP malloc performance

2019-11-12 Thread Andrew Stubbs
e that the loop would vectorize inline, but I don't think it was doing so anyway. I need to look at that, but how is this, for now? Andrew Optimize GCN OpenMP malloc performance 2019-11-12 Andrew Stubbs libgomp/ * config/gcn/team.c (gomp_gcn_enter_kernel): Set up t

[committed, amdgcn] Move gcn-run heap into GPU memory

2019-11-13 Thread Andrew Stubbs
fload kernels will experience, and therefore make standalone testing more meaningful. Andrew Move gcn-run heap into GPU memory. 2019-11-13 Andrew Stubbs gcc/ * config/gcn/gcn-run.c (heap_region): New global variable. (struct hsa_runtime_fn_info): Add hsa_memory_assign_age

Re: [PATCH 0/7 libgomp,amdgcn] AMD GCN Offloading Support

2019-11-13 Thread Andrew Stubbs
These patches are now all committed. I've adjusted the changelogs to list all the proper authors (apologies if I missed anyone). Thank you for the quick reviews, Jakub. :-) Andrew On 12/11/2019 13:29, Andrew Stubbs wrote: Hi all, This patch series contributes initial OpenMP and Op

Re: vect: Allow vconds between different vector sizes

2020-11-11 Thread Andrew Stubbs
On 11/11/2020 11:16, Richard Sandiford wrote: [Andrew: cc:ing you in case this affects/helps GCN.] The vcond code requires the compared vectors and the selected vectors to have both the same size and the same number of elements as each other. But the operation makes logical sense even for diffe

Re: [PATCH] [amdgcn] Remove dependency on stdint.h in libgcc

2020-01-10 Thread Andrew Stubbs
On 10/01/2020 14:21, Kwok Cheung Yeung wrote: The patch for sub-word atomics support added an include of stdint.h for the definition of uintptr_h, but this can result in GCC compilation failing if the stdint.h header has not been installed (from newlib in the case of AMD GCN). I have fixed th

Re: [patch, openacc] Fix ICE verifying gimple

2020-01-16 Thread Andrew Stubbs
Ping. On 22/11/2019 11:06, Andrew Stubbs wrote: This test case causes an ICE (reformatted for email):   void test(int k)   {     unsigned int x = 1;   #pragma acc parallel loop async(x)     for (int i = 0; i < k; i++) { }   }   t.c: In function 'test':   t.c:4:9: e

Re: [committed, amdgcn] Allow constants in vector extends and truncates

2020-01-16 Thread Andrew Stubbs
On 19/12/2019 17:39, Richard Sandiford wrote: Andrew Stubbs writes: This patch changes the operand predicates such that vector constants are permitted during compilation. This prevents ICEs caused by the compiler trying to emit such instructions without checking. That sounds like a target

[committed, amdgcn/openacc] Rename acc_device_gcn to acc_device_radeon

2020-01-17 Thread Andrew Stubbs
y existing code will use, if anything, so we ought to be compatible. There's no official release using the "wrong" name, so I don't believe we need to retain that name for any reason. I've tested that there are no regressions. Andrew Rename acc_device_gcn to acc_devic

Re: [PATCH][amdgcn] Add runtime ISA check for amdgcn offloading

2020-01-20 Thread Andrew Stubbs
Hi Frederik, On 20/01/2020 06:57, Harwath, Frederik wrote: Hi, this patch implements a runtime ISA check for amdgcn offloading. The check verifies that the ISA of the GPU to which we try to offload matches the ISA for which the code to be offloaded has been compiled. If it detects a mismatch, it

Re: [PATCH][amdgcn] Add runtime ISA check for amdgcn offloading

2020-01-20 Thread Andrew Stubbs
On 20/01/2020 10:08, Jakub Jelinek wrote: On Mon, Jan 20, 2020 at 10:00:09AM +, Andrew Stubbs wrote: @@ -396,6 +396,88 @@ struct gcn_image_desc struct global_var_info *global_variables; }; +/* This enum mirrors the corresponding LLVM enum's values for all ISAs that we + su

Re: [PATCH][amdgcn] Add runtime ISA check for amdgcn offloading

2020-01-20 Thread Andrew Stubbs
On 20/01/2020 10:42, Jakub Jelinek wrote: :( Another option would be to build offloading code by GCN multiple times, once for each incompatible ISA the user is asking for, so that one can have then binaries that will work on different hw. Because e.g. with the distro vendor hat, it is hard to gue

Re: [PATCH][amdgcn] Add runtime ISA check for amdgcn offloading

2020-01-20 Thread Andrew Stubbs
On 20/01/2020 11:07, Jakub Jelinek wrote: On Mon, Jan 20, 2020 at 11:00:58AM +, Andrew Stubbs wrote: Indeed, fat binaries would be a good solution. Presumably it's possible, but I'm not sure how we'd go about getting the offload mechanism to launch the backend multiple ti

[committed, amdgcn] Update OpenACC testcases for amdgcn

2020-01-20 Thread Andrew Stubbs
tests for amdgcn 2020-01-20 Andrew Stubbs libgomp/ * testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Skip test on gcn. * testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c (main): Adjust test dimensions for amdgcn. * testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c (main): A

Re: [PATCH][amdgcn] Add runtime ISA check for amdgcn offloading

2020-01-20 Thread Andrew Stubbs
On 20/01/2020 16:42, Harwath, Frederik wrote: Hi Andrew, Thanks for the review! I have attached a revised patch containing the changes that you suggested. On 20.01.20 11:00, Andrew Stubbs wrote: On 20/01/2020 06:57, Harwath, Frederik wrote: Is it ok to commit this patch to the master branch

[committed, libgomp,amdgcn] Fix plugin-gcn.c bug

2020-01-23 Thread Andrew Stubbs
theory it could read attempt to read any unhandled argument as the thread limit. Andrew Fix libgomp plugin-gcn bug 2020-01-23 Andrew Stubbs libgomp/ * plugin/plugin-gcn.c (parse_target_attributes): Use correct mask for the device id. diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plu

[committed, amdgcn] Fix ICE on unsupported FP comparison

2020-01-24 Thread Andrew Stubbs
ld have been rejected, but the predicates were too loose. Andrew Fix ICE on unsupported FP comparison 2020-01-24 Andrew Stubbs gcc/ * config/gcn/gcn-valu.md (vec_cmpdi): Use gcn_fp_compare_operator. (vec_cmpudi): Use gcn_compare_operator. (vec_cmpv64qidi): Use gcn_compare_op

Re: [Patch] [libgomp, build] Skip plugin-{gcn,hsa} for (-m)x32 (PR bootstrap/93409)

2020-01-27 Thread Andrew Stubbs
On 24/01/2020 14:59, Tobias Burnus wrote: As reported in PR93409, the build of libgomp/plugin/plugin-gcn.c fails with a bunch of error messages when building with --with-multilib-list=m32,m64,mx32 The reason is that the GCN plugin assumes 64bit pointers. As with HSA, the build is only enabled

Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-28 Thread Andrew Stubbs
On 28/01/2020 14:55, Harwath, Frederik wrote: Hi, this patch adds full support for the OpenACC 2.6 acc_get_property and acc_get_property_string functions to the libgomp GCN plugin. This replaces the existing stub in libgomp/plugin-gcn.c. Andrew: The value returned for acc_property_memory ("size

Re: [committed, amdgcn] Fix ICE on unsupported FP comparison

2020-01-28 Thread Andrew Stubbs
On 24/01/2020 14:58, Andrew Stubbs wrote: I've committed this patch to fix an ICE building the gcc.dg/vect/fast-math-pr55281.c testcase. Oops, I got that crossed. This was the fix for gcc.dg/pr50310-2.c. The fast-math-pr55281.c fix will be posted shortly. The problem was that the co

[PATCH, ivopts] Fix fast-math-pr55281.c ICE

2020-01-28 Thread Andrew Stubbs
tand why I only see this issue on amdgcn, but it might be because the pointer in question is in a MASK_LOAD which is perhaps not that commonly used? I've tested this on amdgcn, and done a full bootstrap and test on x86_64 also. OK to commit? Thanks Andrew Fix fast-math-pr55281.c ICE.

Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-29 Thread Andrew Stubbs
On 29/01/2020 09:52, Harwath, Frederik wrote: @@ -1513,6 +1518,23 @@ init_hsa_context (void) GOMP_PLUGIN_error ("Failed to list all HSA runtime agents"); } + uint16_t minor, major; + status = hsa_fns.hsa_system_get_info_fn (HSA_SYSTEM_INFO_VERSION_MINOR, &minor); + if (status

Re: [PR93488] [OpenACC] ICE in type-cast 'async', 'wait' clauses

2020-01-29 Thread Andrew Stubbs
On 29/01/2020 12:30, Thomas Schwinge wrote: Hi Andrew! On 2019-11-22T11:06:14+, Andrew Stubbs wrote: This test case causes an ICE (reformatted for email): void test(int k) { unsigned int x = 1; #pragma acc parallel loop async(x) for (int i = 0; i < k

Re: [Patch] GCN – call assembler with -mattr=-code-object-v3 (PR93409)

2020-01-29 Thread Andrew Stubbs
On 29/01/2020 12:53, Tobias Burnus wrote: Cf. PR93409 comments 4 and later. The comments 1–3 of the PR are covered by patch https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01663.html (skip building libgomp's HSA/GCN plugin with -mx32). For AMDGCN, the LLVM assembler is used. While for LLVM 7+8,

Re: [Patch] GCN – call assembler with -mattr=-code-object-v3 (PR93409)

2020-01-29 Thread Andrew Stubbs
On 29/01/2020 15:40, Tobias Burnus wrote: Hi Andrew, On 1/29/20 2:01 PM, Andrew Stubbs wrote: On 29/01/2020 12:53, Tobias Burnus wrote: With LLVM 9, the old variant is only accepted when also passing "-mattr=-code-object-v3" to the compiler; that's a"-" after the &qu

Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-29 Thread Andrew Stubbs
On 29/01/2020 17:44, Thomas Schwinge wrote: @@ -1513,6 +1518,23 @@ init_hsa_context (void) + size_t len = sizeof hsa_context.driver_version_s; + int printed = snprintf (hsa_context.driver_version_s, len, + "HSA Runtime %hu.%hu", (unsigned short int)major, +

Re: [Patch] [libgomp, build] Skip plugin-{gcn,hsa} for (-m)x32 (PR bootstrap/93409)

2020-01-30 Thread Andrew Stubbs
On 30/01/2020 09:20, Jakub Jelinek wrote: On Fri, Jan 24, 2020 at 03:59:28PM +0100, Tobias Burnus wrote: As reported in PR93409, the build of libgomp/plugin/plugin-gcn.c fails with a bunch of error messages when building with --with-multilib-list=m32,m64,mx32 The reason is that the GCN plugin a

Re: [PATCH, ivopts] Fix fast-math-pr55281.c ICE

2020-01-30 Thread Andrew Stubbs
On 29/01/2020 08:24, Richard Biener wrote: On Tue, Jan 28, 2020 at 5:53 PM Andrew Stubbs wrote: This patch fixes an ICE compiling fast-math-pr55281.c for amdgcn. The problem is that an "iv" is created in which both base and step are pointer types, How did you get a POINTER

Re: [PATCH, ivopts] Fix fast-math-pr55281.c ICE

2020-01-30 Thread Andrew Stubbs
On 30/01/2020 13:49, Richard Biener wrote: On Thu, Jan 30, 2020 at 2:04 PM Bin.Cheng wrote: On Thu, Jan 30, 2020 at 8:53 PM Andrew Stubbs wrote: On 29/01/2020 08:24, Richard Biener wrote: On Tue, Jan 28, 2020 at 5:53 PM Andrew Stubbs wrote: This patch fixes an ICE compiling fast-math

Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-30 Thread Andrew Stubbs
On 30/01/2020 16:08, Thomas Schwinge wrote: Hi! Andrew and Frederik, thanks for your emails reminding/educating me about 'snprintf' as well as this HSA fixed-size buffer API. There doesn't happen to be something available in the HSA API available so that we could use 'sizeof [something]' instea

[committed, amdgcn] Add LTGT support

2020-01-30 Thread Andrew Stubbs
fixes an ICE in testcase gcc.dg/pr81228.c. Andrew Add LTGT operator support for amdgcn Fixes ICE in testcase gcc.dg/pr81228.c 2020-01-30 Andrew Stubbs gcc/ * config/gcn/gcn.c (print_operand): Handle LTGT. * config/gcn/predicates.md (gcn_fp_compare_operator): Allow ltgt. diff --git a/gcc/con

Re: [PR93488] [OpenACC] ICE in type-cast 'async', 'wait' clauses

2020-01-30 Thread Andrew Stubbs
e PR, but I've not verified whether "that covers *all* relevant code paths". This should then be backported to all GCC release branches; I can easily test the backports for you, if you're not already set up to do such testing. How's this? Andrew Normalize GOACC_parallel_k

[committed, amdgcn] Zero-initialise masked load destinations

2020-01-31 Thread Andrew Stubbs
ailure in testcase gfortran.dg/assumed_rank_1.f90. 2020-01-30 Andrew Stubbs gcc/ * config/gcn/gcn-valu.md (gather_exec): Move contents ... (mask_gather_load): ... here, and zero-initialize the destination. (maskloaddi): Zero-initialize the destination. * config/gcn/gcn.c: diff --git a/gc

Re: [PATCH] [amdgcn] Scale number of threads/workers with VGPR usage

2020-01-31 Thread Andrew Stubbs
On 31/01/2020 13:56, Kwok Cheung Yeung wrote: The GCN architecture has 4 SIMD units per compute unit, with 256 VGPRs per SIMD unit. OpenMP threads or OpenACC workers must be distributed across the SIMD units, with each thread/worker fitting entirely within a single SIMD unit. VGPRs are shared b

Re: [PATCH, ivopts] Fix fast-math-pr55281.c ICE

2020-01-31 Thread Andrew Stubbs
On 31/01/2020 08:09, Richard Biener wrote: On Thu, Jan 30, 2020 at 3:09 PM Andrew Stubbs wrote: How about this? I've only tested it on the one testcase, so far, but it works for that. OK to commit (following a full test)? OK. X86_64 bootstrap and test showed no issues. Nor amdgcn

<    1   2   3   4   5   6   7   8   9   10   >