from:"Andrew Stubbs"

[committed] amdgcn: silence warning

2023-10-06 Thread Andrew Stubbs

I've just committed this simple patch to silence an enum warning. Andrewamdgcn: silence warning gcc/ChangeLog: * config/gcn/gcn.cc (print_operand): Adjust xcode type to fix warning. diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index f6cff659703..ef3b6472a52 100644 --- a/g

[committed] amdgcn: switch mov insns to compact syntax

2023-10-06 Thread Andrew Stubbs

I've just committed this patch. It should have no functional changes except to make it easier to add new alternatives into the alternative-heavy move instructions. Andrewamdgcn: switch mov insns to compact syntax The move instructions typically have many alternatives (and I'm about to add more

Re: [PATCH] test: Isolate slp-1.c check of target supports vect_strided5

2023-10-06 Thread Andrew Stubbs

On 15/09/2023 10:16, Juzhe-Zhong wrote: This test failed in RISC-V: FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 4 FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using SLP" 4 Because this loop: /* SLP with

Re: [PATCH] test: Isolate slp-1.c check of target supports vect_strided5

2023-10-09 Thread Andrew Stubbs

On 07/10/2023 02:04, juzhe.zh...@rivai.ai wrote: Thanks for reporting it. I think we may need to change it into: + /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { target {! vect_load_lanes } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SL

Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV

2023-10-10 Thread Andrew Stubbs

On 10/10/2023 02:39, Juzhe-Zhong wrote: Here is the reference comparing dump IR between ARM SVE and RVV. https://godbolt.org/z/zqess8Gss We can see RVV has one more dump IR: optimized: basic block part vectorized using 128 byte vectors since RVV has 1024 bit vectors. The codegen is reasonable

[committed] amdgcn: deprecate Fiji device and multilib

2023-10-19 Thread Andrew Stubbs

The build has been failing for the last few days because LLVM removed support for the HSACOv3 binary metadata format, which we were still using for the Fiji multilib. The LLVM commit has now been reverted (thank you Pierre van Houtryve), but it's only a temporary repreive. This patch removes

[PATCH] wwwdocs: gcc-14: mark amdgcn fiji deprecated

2023-10-19 Thread Andrew Stubbs

OK to commit? Andrewgcc-14: mark amdgcn fiji deprecated diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index c817dde4..91ab8132 100644 --- a/htdocs/gcc-14/changes.html +++ b/htdocs/gcc-14/changes.html @@ -178,6 +178,16 @@ a work-in-progress. +AMD Radeon (GCN) + + +

[PATCH] Add COMPLEX_VECTOR_INT modes

2023-05-26 Thread Andrew Stubbs

Hi all, I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just do it because the GCC middle-end models DIVMOD's return value as "complex int" type, and there are no vector equivalents of that type. Therefore, this patch adds minimal support for "complex vector int" modes.

Re: [patch] amdgcn: Change -m(no-)xnack to -mxnack=(on,off,any)

2023-05-26 Thread Andrew Stubbs

OK. Andrew On 26/05/2023 15:58, Tobias Burnus wrote: (Update the syntax of the amdgcn commandline option in anticipation of later patches; while -m(no-)xnack is in mainline since r12-2396-gaad32a00b7d2b6 (for PR100208), -mxsnack (contrary to -msram-ecc) is currently mostly a stub for later pat

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-05 Thread Andrew Stubbs

On 30/05/2023 07:26, Richard Biener wrote: On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs wrote: Hi all, I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just do it because the GCC middle-end models DIVMOD's return value as "complex int" type, and

Re: [Patch] libgomp: plugin-gcn - support 'unified_address'

2023-06-06 Thread Andrew Stubbs

On 06/06/2023 16:33, Tobias Burnus wrote: Andrew: Does the GCN change look okay to you? This patch permits to use GCN devices with 'omp requires unified_address' which in principle works already, except that the requirement handling did disable it. (It also updates libgomp.texi for this chan

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-09 Thread Andrew Stubbs

On 07/06/2023 20:42, Richard Sandiford wrote: I don't know if this helps (probably not), but we have a similar situation on AArch64: a 64-bit mode like V8QI can be doubled to a 128-bit vector or to a pair of 64-bit vectors. We used V16QI for the former and "V2x8QI" for the latter. V2x8QI is for

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-09 Thread Andrew Stubbs

On 09/06/2023 10:02, Richard Sandiford wrote: Andrew Stubbs writes: On 07/06/2023 20:42, Richard Sandiford wrote: I don't know if this helps (probably not), but we have a similar situation on AArch64: a 64-bit mode like V8QI can be doubled to a 128-bit vector or to a pair of 64-bit ve

[PATCH] vect: Vectorize via libfuncs

2023-06-13 Thread Andrew Stubbs

This patch allows vectorization when operators are available as libfuncs, rather that only as insns. This will be useful for amdgcn where we plan to vectorize loops that contain integer division or modulus, but don't want to generate inline instructions for the division algorithm every time.

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-14 Thread Andrew Stubbs

On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote: This implemens fully masked vectorization or a masked epilog for AVX512 style masks which single themselves out by representing each lane with a single bit and by using integer modes for the mask (both is much like GCN). AVX512 is also s

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs

On 14/06/2023 15:29, Richard Biener wrote: Am 14.06.2023 um 16:27 schrieb Andrew Stubbs : On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote: This implemens fully masked vectorization or a masked epilog for AVX512 style masks which single themselves out by representing each lane

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs

On 15/06/2023 10:58, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 14/06/2023 15:29, Richard Biener wrote: Am 14.06.2023 um 16:27 schrieb Andrew Stubbs : On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote: This implemens fully masked vectorization or a masked

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs

On 15/06/2023 12:06, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 15/06/2023 10:58, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 14/06/2023 15:29, Richard Biener wrote: Am 14.06.2023 um 16:27 schrieb Andrew Stubbs : On 14/06/2023 12:54

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs

On 15/06/2023 14:34, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 15/06/2023 12:06, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 15/06/2023 10:58, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 14/06/2023 15:29, Richard

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs

On 15/06/2023 15:00, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 15/06/2023 14:34, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 15/06/2023 12:06, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 15/06/2023 10:58, Richard

[PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators

2022-07-07 Thread Andrew Stubbs

ement memory that's both high-bandwidth and pinned anyway). Patches 15 to 17 are new work. I can probably approve these myself, but they can't be committed until the rest of the series is approved. Andrew Andrew Stubbs (11): libgomp, nvptx: low-latency memory allocator libgomp: pinned m

[PATCH 02/17] libgomp: pinned memory

2022-07-07 Thread Andrew Stubbs

Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when freed. libgomp/ChangeLog: * allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN

[PATCH 01/17] libgomp, nvptx: low-latency memory allocator

2022-07-07 Thread Andrew Stubbs

This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using t

[PATCH 04/17] openmp, nvptx: low-lat memory access traits

2022-07-07 Thread Andrew Stubbs

The NVPTX low latency memory is not accessible outside the team that allocates it, and therefore should be unavailable for allocators with the access trait "all". This change means that the omp_low_lat_mem_alloc predefined allocator now implicitly implies the "pteam" trait. libgomp/ChangeLog:

[PATCH 03/17] libgomp, openmp: Add ompx_pinned_mem_alloc

2022-07-07 Thread Andrew Stubbs

This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. The name uses the OpenMP extension space and is intended to be consistent with other OpenMP implementations currently in development. The allocator is equivalent to using a custom allocator with the pinned

[PATCH 09/17] openmp: Use libgomp memory allocation functions with unified shared memory.

2022-07-07 Thread Andrew Stubbs

. co-authored-by: Andrew Stubbs --- gcc/omp-low.cc | 174 +++ gcc/passes.def | 1 + gcc/testsuite/c-c++-common/gomp/usm-2.c | 46 ++ gcc/testsuite/c-c++-common/gomp/usm-3.c | 44 ++ gcc/testsuite/g++.dg/gomp/usm-1

[PATCH 06/17] openmp: Add -foffload-memory

2022-07-07 Thread Andrew Stubbs

Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16 ++

[PATCH 05/17] openmp, nvptx: ompx_unified_shared_mem_alloc

2022-07-07 Thread Andrew Stubbs

This adds support for using Cuda Managed Memory with omp_alloc. It will be used as the underpinnings for "requires unified_shared_memory" in a later patch. There are two new predefined allocators, ompx_unified_shared_mem_alloc and ompx_host_mem_alloc, plus corresponding memory spaces, which can

[PATCH 12/17] Handle cleanup of omp allocated variables (OpenMP 5.0).

2022-07-07 Thread Andrew Stubbs

Currently we are only handling omp allocate directive that is associated with an allocate statement. This statement results in malloc and free calls. The malloc calls are easy to get to as they are in the same block as allocate directive. But the free calls come in a separate cleanup block. To

[PATCH 07/17] openmp: allow requires unified_shared_memory

2022-07-07 Thread Andrew Stubbs

This is the front-end portion of the Unified Shared Memory implementation. It removes the "sorry, unimplemented message" in C, C++, and Fortran, and sets flag_offload_memory, but is otherwise inactive, for now. It also checks that -foffload-memory isn't set to an incompatible mode. gcc/c/ChangeL

[PATCH 11/17] Translate allocate directive (OpenMP 5.0).

2022-07-07 Thread Andrew Stubbs

gcc/fortran/ChangeLog: * trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_ALLOCATOR. (gfc_trans_omp_allocate): New function. (gfc_trans_omp_directive): Handle EXEC_OMP_ALLOCATE. gcc/ChangeLog: * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_AL

[PATCH 14/17] Lower allocate directive (OpenMP 5.0).

2022-07-07 Thread Andrew Stubbs

This patch looks for malloc/free calls that were generated by allocate statement that is associated with allocate directive and replaces them with GOMP_alloc and GOMP_free. gcc/ChangeLog: * omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_ALLOCATOR. (scan_omp_allocate): New.

[PATCH 08/17] openmp: -foffload-memory=pinned

2022-07-07 Thread Andrew Stubbs

Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls mlo

[PATCH 13/17] Gimplify allocate directive (OpenMP 5.0).

2022-07-07 Thread Andrew Stubbs

gcc/ChangeLog: * doc/gimple.texi: Describe GIMPLE_OMP_ALLOCATE. * gimple-pretty-print.cc (dump_gimple_omp_allocate): New function. (pp_gimple_stmt_1): Call it. * gimple.cc (gimple_build_omp_allocate): New function. * gimple.def (GIMPLE_OMP_ALLOCATE): New no

[PATCH 10/17] Add parsing support for allocate directive (OpenMP 5.0)

2022-07-07 Thread Andrew Stubbs

Currently we only make use of this directive when it is associated with an allocate statement. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE. (show_code_node): Likewise. * gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE.

[PATCH 17/17] amdgcn: libgomp plugin USM implementation

2022-07-07 Thread Andrew Stubbs

Implement the Unified Shared Memory API calls in the GCN plugin. The allocate and free are pretty straight-forward because all "target" memory allocations are compatible with USM, on the right hardware. However, there's no known way to check what memory region was used, after the fact, so we use

[PATCH 15/17] amdgcn: Support XNACK mode

2022-07-07 Thread Andrew Stubbs

The XNACK feature allows memory load instructions to restart safely following a page-miss interrupt. This is useful for shared-memory devices, like APUs, and to implement OpenMP Unified Shared Memory. To support the feature we must be able to set the appropriate meta-data and set the load instru

[PATCH 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK

2022-07-07 Thread Andrew Stubbs

The AMD GCN runtime must be set to the correct mode for Unified Shared Memory to work, but this is not always clear at compile and link time due to the split nature of the offload compilation pipeline. This patch sets a new attribute on OpenMP offload functions to ensure that the information is p

Re: [PATCH 08/17] openmp: -foffload-memory=pinned

2022-07-07 Thread Andrew Stubbs

On 07/07/2022 12:54, Tobias Burnus wrote: Hi Andrew, On 07.07.22 12:34, Andrew Stubbs wrote: Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload

Re: [PATCH 08/17] openmp: -foffload-memory=pinned

2022-07-08 Thread Andrew Stubbs

On 08/07/2022 10:00, Tobias Burnus wrote: On 08.07.22 00:18, Andrew Stubbs wrote: Likewise, the 'requires' mechanism could then also be used in '[PATCH 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK'. No, I don't think so; that environment variable ne

[PATCH] openmp: fix max_vf setting for amdgcn offloading

2022-07-12 Thread Andrew Stubbs

This patch ensures that the maximum vectorization factor used to set the "safelen" attribute on "omp simd" constructs is suitable for all the configured offload devices. Right now it makes the proper adjustment for NVPTX, but otherwise just uses a value suitable for the host system (always x86

[committed] amdgcn: 64-bit not

2022-07-29 Thread Andrew Stubbs

I've committed this patch to enable DImode one's-complement on amdgcn. The hardware doesn't have 64-bit not, and this isn't needed by expand which is happy to use two SImode operations, but the vectorizer isn't so clever. Vector condition masks are DImode on amdgcn, so this has been causing lo

[committed] amdgcn: 64-bit vector shifts

2022-07-29 Thread Andrew Stubbs

I've committed this patch to implement V64DImode vector-vector and vector-scalar shifts. In particular, these are used by the SIMD "inbranch" clones that I'm working on right now, but it's an omission that ought to have been fixed anyway. Andrewamdgcn: 64-bit vector shifts Enable 64-bit vec

[PATCH] openmp-simd-clone: Match shift type

2022-07-29 Thread Andrew Stubbs

This patch adjusts the generation of SIMD "inbranch" clones that use integer masks to ensure that it vectorizes on amdgcn. The problem was only that an amdgcn mask is DImode and the shift amount was SImode, and the difference causes vectorization to fail. OK for mainline? Andrewopenmp-simd-c

Re: [PATCH] openmp-simd-clone: Match shift type

2022-07-29 Thread Andrew Stubbs

On 29/07/2022 16:59, Jakub Jelinek wrote: Doing the fold_convert seems to be a wasted effort to me. Can't this be done conditional on whether some change is needed at all and just using gimple_build_assign with NOP_EXPR, so something like: I'm just not familiar enough with this stuff to run fol

[committed] amdgcn: Vector procedure call ABI

2022-08-09 Thread Andrew Stubbs

I've committed this patch for amdgcn. This changes the procedure calling ABI such that vector arguments are passed in vector registers, rather than on the stack as before. The ABI for scalar functions is the same for arguments, but the return value has now moved to a vector register; keeping

[PATCH 0/3] OpenMP SIMD routines

2022-08-09 Thread Andrew Stubbs

ure has backend support for the clones at this time. OK for mainline (patches 1 & 3)? Thanks Andrew Andrew Stubbs (3): omp-simd-clone: Allow fixed-lane vectors amdgcn: OpenMP SIMD routine support vect: inbranch SIMD clones gcc/config/gcn/gcn.cc | 63 gcc

[PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors

2022-08-09 Thread Andrew Stubbs

The vecsize_int/vecsize_float has an assumption that all arguments will use the same bitsize, and vary the number of lanes according to the element size, but this is inappropriate on targets where the number of lanes is fixed and the bitsize varies (i.e. amdgcn). With this change the vecsize can

[PATCH 2/3] amdgcn: OpenMP SIMD routine support

2022-08-09 Thread Andrew Stubbs

Enable and configure SIMD clones for amdgcn. This affects both the __simd__ function attribute, and the OpenMP "declare simd" directive. Note that the masked SIMD variants are generated, but the middle end doesn't actually support calling them yet. gcc/ChangeLog: * config/gcn/gcn.cc (g

[PATCH 3/3] vect: inbranch SIMD clones

2022-08-09 Thread Andrew Stubbs

There has been support for generating "inbranch" SIMD clones for a long time, but nothing actually uses them (as far as I can see). This patch add supports for a sub-set of possible cases (those using mask_mode == VOIDmode). The other cases fail to vectorize, just as before, so there should be n

[PATCH v2 2/3] openmp, nvptx: low-lat memory access traits

2023-08-02 Thread Andrew Stubbs

The NVPTX low latency memory is not accessible outside the team that allocates it, and therefore should be unavailable for allocators with the access trait "all". This change means that the omp_low_lat_mem_alloc predefined allocator now implicitly implies the "pteam" trait. libgomp/ChangeLog:

[PATCH v2 0/3] libgomp: OpenMP low-latency omp_alloc

2023-08-02 Thread Andrew Stubbs

ation so both architectures can share the code. Andrew Andrew Stubbs (3): libgomp, nvptx: low-latency memory allocator openmp, nvptx: low-lat memory access traits amdgcn, libgomp: low-latency allocator gcc/config/gcn/gcn-builtins.def | 2 + gcc/config/gc

[PATCH v2 1/3] libgomp, nvptx: low-latency memory allocator

2023-08-02 Thread Andrew Stubbs

This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using t

[PATCH v2 3/3] amdgcn, libgomp: low-latency allocator

2023-08-02 Thread Andrew Stubbs

This implements the OpenMP low-latency memory allocator for AMD GCN using the small per-team LDS memory (Local Data Store). Since addresses can now refer to LDS space, the "Global" address space is no-longer compatible. This patch therefore switches the backend to use entirely "Flat" addressing

[committed] amdgcn: Delete inactive libfuncs

2023-06-19 Thread Andrew Stubbs

There were implementations for HImode division in libgcc, but there were no matching libfuncs defined in the compiler, so the code was inactive (GCC only defines SImode and DImode, by default, and amdgcn only adds TImode explicitly). On trying to activate it I find that the definition of TARG

[committed] amdgcn: minimal V64TImode vector support

2023-06-19 Thread Andrew Stubbs

This patch adds just enough TImode vector support to use them for moving data about. This is primarily for the use of divmodv64di4, which will use TImode to return a pair of DImode values. The TImode vectors have no other operators defined, and there are no hardware instructions to support thi

[committed] amdgcn: add -march=gfx1030 EXPERIMENTAL

2023-10-20 Thread Andrew Stubbs

I've committed this patch that allows building binaries for AMD gfx1030 GPUs. I can't actually test it, however, so somebody else will have to debug it (or wait for me to get my hands on a device). Richi reports that it does not execute correctly, as is. This is an experimental broken feature,

Re: [PATCH] wwwdocs: gcc-14: mark amdgcn fiji deprecated

2023-10-20 Thread Andrew Stubbs

On 19/10/2023 11:07, Tobias Burnus wrote: On 19.10.23 11:49, Andrew Stubbs wrote: OK to commit? (I think as maintainer you don't need approval - but of course comments by others can be helpful; I hope mine are. Additionally, Gerald (CCed) helps with keeping the webpages in good shape (t

[PATCH] vect: Don't set excess bits in unform masks

2023-10-20 Thread Andrew Stubbs

This patch fixes a wrong-code bug on amdgcn in which the excess "ones" in the mask enable extra lanes that were supposed to be unused and are therefore undefined. Richi suggested an alternative approach involving narrower types and then a zero-extend to the actual mask type. This solved the p

Re: [PATCH] wwwdocs: gcc-14: mark amdgcn fiji deprecated

2023-10-27 Thread Andrew Stubbs

On 22/10/2023 13:24, Gerald Pfeifer wrote: Hi Andrew, On Fri, 20 Oct 2023, Andrew Stubbs wrote: Additionally, I wonder whether "Fiji" should be changed to "Fiji (gfx803)" in the first line and whether the "," should be removed in "The ... configuration .

[committed] amdgcn: silence warnings

2023-10-27 Thread Andrew Stubbs

This trivial patch adds the "operands" keyword to the condition in a couple of patterns that cause warnings about "missing" mode specifiers. With the iterators, there were a large number of warnings about these cases that have now been silenced. Andrewamdgcn: silence warnings The operands re

Re: [committed] amdgcn: add -march=gfx1030 EXPERIMENTAL

2023-10-27 Thread Andrew Stubbs

On 20/10/2023 12:51, Andrew Stubbs wrote: I've committed this patch that allows building binaries for AMD gfx1030 GPUs. I can't actually test it, however, so somebody else will have to debug it (or wait for me to get my hands on a device). Richi reports that it does not execute cor

Re: [RFC] vect: disable multiple calls of poly simdclones

2023-11-06 Thread Andrew Stubbs

On 06/11/2023 07:52, Richard Biener wrote: On Fri, 3 Nov 2023, Andre Vieira (lists) wrote: Hi, The current codegen code to support VF's that are multiples of a simdclone simdlen rely on BIT_FIELD_REF to create multiple input vectors. This does not work for non-constant simdclones, so we sh

Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types

2023-02-13 Thread Andrew Stubbs

I presume I've been CC'd on this conversation because weird vector architecture problems have happened to me before. :) However, I'm not sure I can help much because AMD GCN does not use BImode vectors at all. This is partly because loading boolean values into a GCN vector would have 31 paddin

Re: -foffload-memory=pinned (was: [PATCH 1/5] openmp: Add -foffload-memory)

2023-02-13 Thread Andrew Stubbs

On 13/02/2023 14:38, Thomas Schwinge wrote: Hi! On 2022-03-08T11:30:55+, Hafiz Abid Qadeer wrote: From: Andrew Stubbs Add a new option. It will be used in follow-up patches. --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi +@option{-foffload-memory=pinned} forces all host

Re: [PATCH] amdgcn: Add instruction patterns for vector operations on complex numbers

2023-02-14 Thread Andrew Stubbs

On 09/02/2023 20:13, Andrew Jenner wrote: This patch introduces instruction patterns for complex number operations in the GCN machine description. These patterns are cmul, cmul_conj, vec_addsub, vec_fmaddsub, vec_fmsubadd, cadd90, cadd270, cmla and cmls (cmla_conj and cmls_conj were not found t

Re: [og12] In 'libgomp/allocator.c:omp_realloc', route 'free' through 'MEMSPACE_FREE' (was: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator)

2023-02-14 Thread Andrew Stubbs

On 14/02/2023 12:54, Thomas Schwinge wrote: Hi Andrew! On 2022-01-13T11:13:51+, Andrew Stubbs wrote: Updated patch: this version fixes some missed cases of malloc in the realloc implementation. Right, and as it seems I've run into another issue: a stray 'free'.

[OG12][committed] amdgcn: OpenMP low-latency allocator

2023-02-16 Thread Andrew Stubbs

These patches implement an LDS memory allocator for OpenMP on AMD. 1. 230216-basic-allocator.patch Separate the allocator from NVPTX so the code can be shared. 2. 230216-amd-low-lat.patch Allocate the memory, adjust the default address space, and hook up the allocator. They will need to be

Re: [og12] Attempt to register OpenMP pinned memory using a device instead of 'mlock' (was: [PATCH] libgomp, openmp: pinned memory)

2023-02-20 Thread Andrew Stubbs

On 17/02/2023 08:12, Thomas Schwinge wrote: Hi Andrew! On 2023-02-16T23:06:44+0100, I wrote: On 2023-02-16T16:17:32+, "Stubbs, Andrew via Gcc-patches" wrote: The mmap implementation was not optimized for a lot of small allocations, and I can't see that issue changing here That's corre

Re: [og12] Un-break nvptx libgomp build (was: [OG12][committed] amdgcn: OpenMP low-latency allocator)

2023-02-20 Thread Andrew Stubbs

On 16/02/2023 21:11, Thomas Schwinge wrote: --- /dev/null +++ b/libgomp/basic-allocator.c +#ifndef BASIC_ALLOC_YIELD +#deine BASIC_ALLOC_YIELD +#endif In file included from [...]/libgomp/config/nvptx/allocator.c:49: [...]/libgomp/config/nvptx/../../basic-allocator.c:52:2: error: in

Re: [PATCH 3/3] vect: inbranch SIMD clones

2023-02-23 Thread Andrew Stubbs

On 10/02/2023 09:11, Jakub Jelinek wrote: I've tried to fix the -flto thing and I can't figure out how. The problem seems to be that there are two dump files from the two compiler invocations and it scans the wrong one. Aarch64 has the same problem. Two dumps are because it is in a dg-do run te

[committed][OG12] libgomp: no need to attach USM pointers

2023-02-23 Thread Andrew Stubbs

This patch fixes a bug in which libgomp doesn't know what to do with attached pointers in fortran derived types when using Unified Shared Memory instead of explicit mappings. I've committed it to the devel/omp/gcc-12 branch (OG12) and will fold it into the next rebase/repost of the USM patches

Re: [PATCH] amdgcn: Enable SIMD vectorization of math functions

2023-03-01 Thread Andrew Stubbs

On 28/02/2023 23:01, Kwok Cheung Yeung wrote: Hello This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION target hook for the AMD GCN architecture, such that when vectorized, calls to builtin standard math functions such as asinf, exp, pow etc. are converted to calls to the r

Re: [PATCH] amdgcn: Enable SIMD vectorization of math functions

2023-03-01 Thread Andrew Stubbs

On 01/03/2023 10:52, Andre Vieira (lists) wrote: On 01/03/2023 10:01, Andrew Stubbs wrote: > On 28/02/2023 23:01, Kwok Cheung Yeung wrote: >> Hello >> >> This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION >> target hook for the AMD GCN a

Re: [PATCH] amdgcn: Add instruction patterns for conditional min/max operations

2023-03-02 Thread Andrew Stubbs

On 01/03/2023 16:56, Paul-Antoine Arras wrote: This patch introduces instruction patterns for conditional min and max operations (cond_{f|s|u}{max|min}) in the GCN machine description. It also allows the exec register to be saved in SGPRs to avoid spilling to memory. Tested on GCN3 Fiji gfx803

Re: [PATCH] amdgcn: Enable SIMD vectorization of math functions

2023-03-02 Thread Andrew Stubbs

On 02/03/2023 15:07, Kwok Cheung Yeung wrote: Hello I've made the suggested changes. Should I hold off on committing this until GCC 13 has been branched off? No need, amdgcn is not a primary target and this stuff won't affect anyone else. Please go ahead and commit. Andrew

Re: [PATCH] amdgcn: Add instruction patterns for conditional min/max operations

2023-03-06 Thread Andrew Stubbs

On 03/03/2023 17:05, Paul-Antoine Arras wrote: Le 02/03/2023 à 18:18, Andrew Stubbs a écrit : On 01/03/2023 16:56, Paul-Antoine Arras wrote: This patch introduces instruction patterns for conditional min and max operations (cond_{f|s|u}{max|min}) in the GCN machine description. It also allows

Re: [Patch] GCN update for wwwdocs / libgomp.texi

2023-03-08 Thread Andrew Stubbs

On 08/03/2023 11:06, Tobias Burnus wrote: Next try – this time with both patches. On 08.03.23 12:05, Tobias Burnus wrote: Hi Andrew, attached are two patches related to GCN, one for libgomp.texi documenting an env var and a release-notes update in www docs. OK? Comments? LGTM Andrew

Re: [Patch] gcn/mkoffload.cc: Pass -save-temps on for the hsaco step

2023-03-13 Thread Andrew Stubbs

On 13/03/2023 12:25, Tobias Burnus wrote: Found when comparing '-v -Wl,-v' output as despite -save-temps multiple runs yielded differed results. Fixed as attached. OK for mainline? OK. Andrew

[RFC] DWARF address spaces for local variables

2021-01-22 Thread Andrew Stubbs

Hi all, Jakub, I need to implement DWARF for local variables that exist in an alternative address space. This happens for OpenACC gang-private variables (or will when the patches are committed) on AMD GCN, at least. This is distinct from pointer variables that reference other address spaces,

Re: [RFC] DWARF address spaces for local variables

2021-01-22 Thread Andrew Stubbs

On 22/01/2021 11:42, Andrew Stubbs wrote: @@ -20294,15 +20315,6 @@ add_location_or_const_value_attribute (dw_die_ref die, tree decl, bool cache_p) if (list) { add_AT_location_description (die, DW_AT_location, list); - - addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (decl

[committed] amdgcn: Allow V64DFmode min/max reductions

2021-01-26 Thread Andrew Stubbs

This patch fixes and AMD GCN bug in which attempting to use DFmode vector reductions would cause an ICE. There's no reason not to allow the reductions, so we simply enable them thusly. Andrew amdgcn: Allow V64DFmode min/max reductions I don't know why these were disabled. There're no direct

[OG10][committed] amdgcn: Allow V64DFmode min/max reductions

2021-01-26 Thread Andrew Stubbs

Now backported to devel/omp/gcc-10. On 26/01/2021 10:29, Andrew Stubbs wrote: This patch fixes and AMD GCN bug in which attempting to use DFmode vector reductions would cause an ICE. There's no reason not to allow the reductions, so we simply enable them thusly. Andrew

[committed] amdgcn: Add gfx908 support

2021-02-03 Thread Andrew Stubbs

This patch adds a new -march option and multilib configuration to the amdgcn GPU target. The patch does not attempt to support any of the new features of the gfx908 devices, but does set the correct ELF flags etc. that are expected by the ROCm runtime. The GFX908 devices are not generally avai

Re: [RFC] DWARF address spaces for local variables

2021-02-04 Thread Andrew Stubbs

Ping. On 22/01/2021 11:42, Andrew Stubbs wrote: Hi all, Jakub, I need to implement DWARF for local variables that exist in an alternative address space. This happens for OpenACC gang-private variables (or will when the patches are committed) on AMD GCN, at least. This is distinct from

[commit][OG10] nvptx: remove erroneous stack deletion

2021-03-02 Thread Andrew Stubbs

This patch fixes an OpenMP performance issue on NVPTX. The problem is that it deallocates the stack memory when it shouldn't, forcing the GOMP_OFFLOAD_run function to allocate the stack space again, before every kernel launch. The memory is only meant to be deallocated when a data allocation

[committed][OG10] DWARF: late code range fixup

2021-03-06 Thread Andrew Stubbs

This patch fixes up the DWARF code ranges for offload debugging, again. This time it defers the changes until most other DWARF generation has occurred, because the previous method was causing ICEs on some testcases. This patch will be proposed for mainline in stage 1. Andrew DWARF: late code

[committed][OG10] amdgcn: Fix early-debug relocations

2021-03-06 Thread Andrew Stubbs

This patch is now backported to devel/omp/gcc-10. Andrew On 26/11/2020 14:41, Andrew Stubbs wrote: This patch fixes an error in GCN mkoffload that corrupted relocations in the early-debug info. The code now updates the relocation code without zeroing the symbol index. Andrew

[OG11, committed] libgomp amdgcn: Fix issues with dynamic OpenMP thread scaling

2021-08-04 Thread Andrew Stubbs

This patch fixes a bug in which testcases using thread_limit larger than the number of physical threads would crash with a memory fault. This was exacerbated in testcases with a lot of register pressure because the autoscaling reduces the number of physical threads to compensate for the increas

[committed] amdgcn: Remove omp_gcn pass

2020-09-18 Thread Andrew Stubbs

This patch removes the amdgcn-specific "omp_gcn" pass that was responsible for tweaking the OpenMP middle-end IR for GCN. In the past there were a few things there to make it work for simple cases while real support was built out in the backend and libgomp, but those haven't been needed ever s

[PATCH] amdgcn, nvptx: Disable OMP barriers in nested teams

2020-09-18 Thread Andrew Stubbs

This patch fixes a problem in which nested OpenMP parallel regions cause errors if the number of inner teams is not balanced (i.e. the number of loop iterations is not divisible by the number of physical threads). A testcase is included. On NVPTX the symptom was a fatal error: libgomp: cuCtxS

Re: [PATCH] amdgcn, nvptx: Disable OMP barriers in nested teams

2020-09-19 Thread Andrew Stubbs

On 18/09/2020 12:25, Andrew Stubbs wrote: This patch fixes a problem in which nested OpenMP parallel regions cause errors if the number of inner teams is not balanced (i.e. the number of loop iterations is not divisible by the number of physical threads). A testcase is included. This updated

Re: [PATCH] dwarf: Multi-register CFI address support

2020-09-21 Thread Andrew Stubbs

Ping. On 03/09/2020 16:29, Andrew Stubbs wrote: On 28/08/2020 13:04, Andrew Stubbs wrote: Hi all, This patch introduces DWARF CFI support for architectures that require multiple registers to hold pointers, such as the stack pointer, frame pointer, and return address. The motivating case is

[committed, OG10] dwarf: Multi-register CFI address support

2020-09-22 Thread Andrew Stubbs

On 03/09/2020 16:29, Andrew Stubbs wrote: OK to commit? (Although, I'll hold off until AMD release the compatible GDB.) The ROCm 3.8 ROCGDB is now released. I'm committing the attached patches to devel/omp/gcc-10 while I wait for review. The first patch is the multi-register C

Re: [PATCH] OpenACC: Separate enter/exit data APIs

2020-09-25 Thread Andrew Stubbs

On 30/07/2020 12:10, Andrew Stubbs wrote: On 29/07/2020 15:05, Andrew Stubbs wrote: This patch does not implement anything new, but simply separates OpenACC 'enter data' and 'exit data' into two libgomp API functions. The original API name is kept for backward compatibi

Re: [PATCH] amdgcn, nvptx: Disable OMP barriers in nested teams

2020-09-28 Thread Andrew Stubbs

On 28/09/2020 15:02, Tom de Vries wrote: This patch simply skips barriers when they would "wait" for only one thread (the current thread). This means that teams nested inside other teams now run independently, instead of strictly in lock-step, and is only valid as long as inner teams are limited

[PATCH] libgomp: Enforce 1-thread limit in subteams

2020-09-29 Thread Andrew Stubbs

ested teams are expected to have multiple threads each. libgomp/ChangeLog: 2020-09-29 Andrew Stubbs * parallel.c (gomp_resolve_num_threads): Ignore nest_var on nvptx and amdgcn targets. diff --git a/libgomp/parallel.c b/libgomp/parallel.c index 2423f11f44a..0618056a7fe 100644 --- a/li

Re: [PATCH] dwarf: Multi-register CFI address support

2020-10-05 Thread Andrew Stubbs

Ping. On 21/09/2020 14:51, Andrew Stubbs wrote: Ping. On 03/09/2020 16:29, Andrew Stubbs wrote: On 28/08/2020 13:04, Andrew Stubbs wrote: Hi all, This patch introduces DWARF CFI support for architectures that require multiple registers to hold pointers, such as the stack pointer, frame

[committed] amdgcn: Use scalar instructions for addptrdi3

2020-10-07 Thread Andrew Stubbs

This patch adds an extra alternative to the existing addptrdi3 pattern. It permits safe 64-bit addition in scalar registers, as well as vector registers. This is especially useful because the result of addptr typically gets fed to instructions that expect the base address to be in a scalar regi

Re: [OG12][PATCH] openmp: Fix handling of target constructs in static member

2022-09-13 Thread Andrew Stubbs

On 13/09/2022 12:03, Paul-Antoine Arras wrote: Hello, This patch intends to backport e90af965e5c by Jakub Jelinek to devel/omp/gcc-12. The original patch was described here: https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601189.html I've merged and committed it for you. Andrew

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1051 matches

Mail list logo