I've just committed this simple patch to silence an enum warning.
Andrewamdgcn: silence warning
gcc/ChangeLog:
* config/gcn/gcn.cc (print_operand): Adjust xcode type to fix warning.
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index f6cff659703..ef3b6472a52 100644
--- a/g
I've just committed this patch. It should have no functional changes
except to make it easier to add new alternatives into the
alternative-heavy move instructions.
Andrewamdgcn: switch mov insns to compact syntax
The move instructions typically have many alternatives (and I'm about to add
more
On 15/09/2023 10:16, Juzhe-Zhong wrote:
This test failed in RISC-V:
FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects scan-tree-dump-times vect
"vectorizing stmts using SLP" 4
FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using
SLP" 4
Because this loop:
/* SLP with
On 07/10/2023 02:04, juzhe.zh...@rivai.ai wrote:
Thanks for reporting it.
I think we may need to change it into:
+ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4
"vect" { target {! vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SL
On 10/10/2023 02:39, Juzhe-Zhong wrote:
Here is the reference comparing dump IR between ARM SVE and RVV.
https://godbolt.org/z/zqess8Gss
We can see RVV has one more dump IR:
optimized: basic block part vectorized using 128 byte vectors
since RVV has 1024 bit vectors.
The codegen is reasonable
The build has been failing for the last few days because LLVM removed
support for the HSACOv3 binary metadata format, which we were still
using for the Fiji multilib.
The LLVM commit has now been reverted (thank you Pierre van Houtryve),
but it's only a temporary repreive.
This patch removes
OK to commit?
Andrewgcc-14: mark amdgcn fiji deprecated
diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index c817dde4..91ab8132 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -178,6 +178,16 @@ a work-in-progress.
+AMD Radeon (GCN)
+
+
+
Hi all,
I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
do it because the GCC middle-end models DIVMOD's return value as
"complex int" type, and there are no vector equivalents of that type.
Therefore, this patch adds minimal support for "complex vector int"
modes.
OK.
Andrew
On 26/05/2023 15:58, Tobias Burnus wrote:
(Update the syntax of the amdgcn commandline option in anticipation of
later patches;
while -m(no-)xnack is in mainline since r12-2396-gaad32a00b7d2b6 (for
PR100208),
-mxsnack (contrary to -msram-ecc) is currently mostly a stub for later
pat
On 30/05/2023 07:26, Richard Biener wrote:
On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs wrote:
Hi all,
I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
do it because the GCC middle-end models DIVMOD's return value as
"complex int" type, and
On 06/06/2023 16:33, Tobias Burnus wrote:
Andrew: Does the GCN change look okay to you?
This patch permits to use GCN devices with 'omp requires
unified_address' which
in principle works already, except that the requirement handling did
disable it.
(It also updates libgomp.texi for this chan
On 07/06/2023 20:42, Richard Sandiford wrote:
I don't know if this helps (probably not), but we have a similar
situation on AArch64: a 64-bit mode like V8QI can be doubled to a
128-bit vector or to a pair of 64-bit vectors. We used V16QI for
the former and "V2x8QI" for the latter. V2x8QI is for
On 09/06/2023 10:02, Richard Sandiford wrote:
Andrew Stubbs writes:
On 07/06/2023 20:42, Richard Sandiford wrote:
I don't know if this helps (probably not), but we have a similar
situation on AArch64: a 64-bit mode like V8QI can be doubled to a
128-bit vector or to a pair of 64-bit ve
This patch allows vectorization when operators are available as
libfuncs, rather that only as insns.
This will be useful for amdgcn where we plan to vectorize loops that
contain integer division or modulus, but don't want to generate inline
instructions for the division algorithm every time.
On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).
AVX512 is also s
On 14/06/2023 15:29, Richard Biener wrote:
Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :
On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane
On 15/06/2023 10:58, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 14/06/2023 15:29, Richard Biener wrote:
Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :
On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
This implemens fully masked vectorization or a masked
On 15/06/2023 12:06, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 15/06/2023 10:58, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 14/06/2023 15:29, Richard Biener wrote:
Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :
On 14/06/2023 12:54
On 15/06/2023 14:34, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 15/06/2023 12:06, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 15/06/2023 10:58, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 14/06/2023 15:29, Richard
On 15/06/2023 15:00, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 15/06/2023 14:34, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 15/06/2023 12:06, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 15/06/2023 10:58, Richard
ement memory that's both high-bandwidth and pinned anyway).
Patches 15 to 17 are new work. I can probably approve these myself, but
they can't be committed until the rest of the series is approved.
Andrew
Andrew Stubbs (11):
libgomp, nvptx: low-latency memory allocator
libgomp: pinned m
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.
libgomp/ChangeLog:
* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN
This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
t
The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all". This change means that the omp_low_lat_mem_alloc predefined
allocator now implicitly implies the "pteam" trait.
libgomp/ChangeLog:
This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.
The allocator is equivalent to using a custom allocator with the pinned
.
co-authored-by: Andrew Stubbs
---
gcc/omp-low.cc | 174 +++
gcc/passes.def | 1 +
gcc/testsuite/c-c++-common/gomp/usm-2.c | 46 ++
gcc/testsuite/c-c++-common/gomp/usm-3.c | 44 ++
gcc/testsuite/g++.dg/gomp/usm-1
Add a new option. It's inactive until I add some follow-up patches.
gcc/ChangeLog:
* common.opt: Add -foffload-memory and its enum values.
* coretypes.h (enum offload_memory): New.
* doc/invoke.texi: Document -foffload-memory.
---
gcc/common.opt | 16 ++
This adds support for using Cuda Managed Memory with omp_alloc. It will be
used as the underpinnings for "requires unified_shared_memory" in a later
patch.
There are two new predefined allocators, ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc, plus corresponding memory spaces, which can
Currently we are only handling omp allocate directive that is associated
with an allocate statement. This statement results in malloc and free calls.
The malloc calls are easy to get to as they are in the same block as allocate
directive. But the free calls come in a separate cleanup block. To
This is the front-end portion of the Unified Shared Memory implementation.
It removes the "sorry, unimplemented message" in C, C++, and Fortran, and sets
flag_offload_memory, but is otherwise inactive, for now.
It also checks that -foffload-memory isn't set to an incompatible mode.
gcc/c/ChangeL
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_ALLOCATOR.
(gfc_trans_omp_allocate): New function.
(gfc_trans_omp_directive): Handle EXEC_OMP_ALLOCATE.
gcc/ChangeLog:
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_AL
This patch looks for malloc/free calls that were generated by allocate statement
that is associated with allocate directive and replaces them with GOMP_alloc
and GOMP_free.
gcc/ChangeLog:
* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_ALLOCATOR.
(scan_omp_allocate): New.
Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up. The option is
intended to provide a performance boost to certain offload programs without
modifying the code.
This feature only works on Linux, at present, and simply calls mlo
gcc/ChangeLog:
* doc/gimple.texi: Describe GIMPLE_OMP_ALLOCATE.
* gimple-pretty-print.cc (dump_gimple_omp_allocate): New function.
(pp_gimple_stmt_1): Call it.
* gimple.cc (gimple_build_omp_allocate): New function.
* gimple.def (GIMPLE_OMP_ALLOCATE): New no
Currently we only make use of this directive when it is associated
with an allocate statement.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE.
(show_code_node): Likewise.
* gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE.
Implement the Unified Shared Memory API calls in the GCN plugin.
The allocate and free are pretty straight-forward because all "target" memory
allocations are compatible with USM, on the right hardware. However, there's
no known way to check what memory region was used, after the fact, so we use
The XNACK feature allows memory load instructions to restart safely following
a page-miss interrupt. This is useful for shared-memory devices, like APUs,
and to implement OpenMP Unified Shared Memory.
To support the feature we must be able to set the appropriate meta-data and
set the load instru
The AMD GCN runtime must be set to the correct mode for Unified Shared Memory
to work, but this is not always clear at compile and link time due to the split
nature of the offload compilation pipeline.
This patch sets a new attribute on OpenMP offload functions to ensure that the
information is p
On 07/07/2022 12:54, Tobias Burnus wrote:
Hi Andrew,
On 07.07.22 12:34, Andrew Stubbs wrote:
Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up. The option is
intended to provide a performance boost to certain offload
On 08/07/2022 10:00, Tobias Burnus wrote:
On 08.07.22 00:18, Andrew Stubbs wrote:
Likewise, the 'requires' mechanism could then also be used in '[PATCH
16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK'.
No, I don't think so; that environment variable ne
This patch ensures that the maximum vectorization factor used to set the
"safelen" attribute on "omp simd" constructs is suitable for all the
configured offload devices.
Right now it makes the proper adjustment for NVPTX, but otherwise just
uses a value suitable for the host system (always x86
I've committed this patch to enable DImode one's-complement on amdgcn.
The hardware doesn't have 64-bit not, and this isn't needed by expand
which is happy to use two SImode operations, but the vectorizer isn't so
clever. Vector condition masks are DImode on amdgcn, so this has been
causing lo
I've committed this patch to implement V64DImode vector-vector and
vector-scalar shifts.
In particular, these are used by the SIMD "inbranch" clones that I'm
working on right now, but it's an omission that ought to have been fixed
anyway.
Andrewamdgcn: 64-bit vector shifts
Enable 64-bit vec
This patch adjusts the generation of SIMD "inbranch" clones that use
integer masks to ensure that it vectorizes on amdgcn.
The problem was only that an amdgcn mask is DImode and the shift amount
was SImode, and the difference causes vectorization to fail.
OK for mainline?
Andrewopenmp-simd-c
On 29/07/2022 16:59, Jakub Jelinek wrote:
Doing the fold_convert seems to be a wasted effort to me.
Can't this be done conditional on whether some change is needed at all
and just using gimple_build_assign with NOP_EXPR, so something like:
I'm just not familiar enough with this stuff to run fol
I've committed this patch for amdgcn.
This changes the procedure calling ABI such that vector arguments are
passed in vector registers, rather than on the stack as before.
The ABI for scalar functions is the same for arguments, but the return
value has now moved to a vector register; keeping
ure has
backend support for the clones at this time.
OK for mainline (patches 1 & 3)?
Thanks
Andrew
Andrew Stubbs (3):
omp-simd-clone: Allow fixed-lane vectors
amdgcn: OpenMP SIMD routine support
vect: inbranch SIMD clones
gcc/config/gcn/gcn.cc | 63
gcc
The vecsize_int/vecsize_float has an assumption that all arguments will use
the same bitsize, and vary the number of lanes according to the element size,
but this is inappropriate on targets where the number of lanes is fixed and
the bitsize varies (i.e. amdgcn).
With this change the vecsize can
Enable and configure SIMD clones for amdgcn. This affects both the __simd__
function attribute, and the OpenMP "declare simd" directive.
Note that the masked SIMD variants are generated, but the middle end doesn't
actually support calling them yet.
gcc/ChangeLog:
* config/gcn/gcn.cc (g
There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).
This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode). The other cases fail to vectorize, just as before,
so there should be n
The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all". This change means that the omp_low_lat_mem_alloc predefined
allocator now implicitly implies the "pteam" trait.
libgomp/ChangeLog:
ation so both
architectures can share the code.
Andrew
Andrew Stubbs (3):
libgomp, nvptx: low-latency memory allocator
openmp, nvptx: low-lat memory access traits
amdgcn, libgomp: low-latency allocator
gcc/config/gcn/gcn-builtins.def | 2 +
gcc/config/gc
This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
t
This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).
Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible. This patch therefore switches the backend to use
entirely "Flat" addressing
There were implementations for HImode division in libgcc, but there were
no matching libfuncs defined in the compiler, so the code was inactive
(GCC only defines SImode and DImode, by default, and amdgcn only adds
TImode explicitly).
On trying to activate it I find that the definition of
TARG
This patch adds just enough TImode vector support to use them for moving
data about. This is primarily for the use of divmodv64di4, which will
use TImode to return a pair of DImode values.
The TImode vectors have no other operators defined, and there are no
hardware instructions to support thi
I've committed this patch that allows building binaries for AMD gfx1030
GPUs. I can't actually test it, however, so somebody else will have to
debug it (or wait for me to get my hands on a device). Richi reports
that it does not execute correctly, as is.
This is an experimental broken feature,
On 19/10/2023 11:07, Tobias Burnus wrote:
On 19.10.23 11:49, Andrew Stubbs wrote:
OK to commit?
(I think as maintainer you don't need approval - but of course comments
by others can be helpful; I hope mine are. Additionally, Gerald (CCed)
helps with keeping the webpages in good shape (t
This patch fixes a wrong-code bug on amdgcn in which the excess "ones"
in the mask enable extra lanes that were supposed to be unused and are
therefore undefined.
Richi suggested an alternative approach involving narrower types and
then a zero-extend to the actual mask type. This solved the p
On 22/10/2023 13:24, Gerald Pfeifer wrote:
Hi Andrew,
On Fri, 20 Oct 2023, Andrew Stubbs wrote:
Additionally, I wonder whether "Fiji" should be changed to "Fiji
(gfx803)" in the first line and whether the "," should be removed in
"The ... configuration .
This trivial patch adds the "operands" keyword to the condition in a
couple of patterns that cause warnings about "missing" mode specifiers.
With the iterators, there were a large number of warnings about these
cases that have now been silenced.
Andrewamdgcn: silence warnings
The operands re
On 20/10/2023 12:51, Andrew Stubbs wrote:
I've committed this patch that allows building binaries for AMD gfx1030
GPUs. I can't actually test it, however, so somebody else will have to
debug it (or wait for me to get my hands on a device). Richi reports
that it does not execute cor
On 06/11/2023 07:52, Richard Biener wrote:
On Fri, 3 Nov 2023, Andre Vieira (lists) wrote:
Hi,
The current codegen code to support VF's that are multiples of a simdclone
simdlen rely on BIT_FIELD_REF to create multiple input vectors. This does not
work for non-constant simdclones, so we sh
I presume I've been CC'd on this conversation because weird vector
architecture problems have happened to me before. :)
However, I'm not sure I can help much because AMD GCN does not use
BImode vectors at all. This is partly because loading boolean values
into a GCN vector would have 31 paddin
On 13/02/2023 14:38, Thomas Schwinge wrote:
Hi!
On 2022-03-08T11:30:55+, Hafiz Abid Qadeer wrote:
From: Andrew Stubbs
Add a new option. It will be used in follow-up patches.
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
+@option{-foffload-memory=pinned} forces all host
On 09/02/2023 20:13, Andrew Jenner wrote:
This patch introduces instruction patterns for complex number operations
in the GCN machine description. These patterns are cmul, cmul_conj,
vec_addsub, vec_fmaddsub, vec_fmsubadd, cadd90, cadd270, cmla and cmls
(cmla_conj and cmls_conj were not found t
On 14/02/2023 12:54, Thomas Schwinge wrote:
Hi Andrew!
On 2022-01-13T11:13:51+, Andrew Stubbs wrote:
Updated patch: this version fixes some missed cases of malloc in the
realloc implementation.
Right, and as it seems I've run into another issue: a stray 'free'.
These patches implement an LDS memory allocator for OpenMP on AMD.
1. 230216-basic-allocator.patch
Separate the allocator from NVPTX so the code can be shared.
2. 230216-amd-low-lat.patch
Allocate the memory, adjust the default address space, and hook up the
allocator.
They will need to be
On 17/02/2023 08:12, Thomas Schwinge wrote:
Hi Andrew!
On 2023-02-16T23:06:44+0100, I wrote:
On 2023-02-16T16:17:32+, "Stubbs, Andrew via Gcc-patches"
wrote:
The mmap implementation was not optimized for a lot of small allocations, and I
can't see that issue changing here
That's corre
On 16/02/2023 21:11, Thomas Schwinge wrote:
--- /dev/null
+++ b/libgomp/basic-allocator.c
+#ifndef BASIC_ALLOC_YIELD
+#deine BASIC_ALLOC_YIELD
+#endif
In file included from [...]/libgomp/config/nvptx/allocator.c:49:
[...]/libgomp/config/nvptx/../../basic-allocator.c:52:2: error: in
On 10/02/2023 09:11, Jakub Jelinek wrote:
I've tried to fix the -flto thing and I can't figure out how. The problem
seems to be that there are two dump files from the two compiler invocations
and it scans the wrong one. Aarch64 has the same problem.
Two dumps are because it is in a dg-do run te
This patch fixes a bug in which libgomp doesn't know what to do with
attached pointers in fortran derived types when using Unified Shared
Memory instead of explicit mappings.
I've committed it to the devel/omp/gcc-12 branch (OG12) and will fold it
into the next rebase/repost of the USM patches
On 28/02/2023 23:01, Kwok Cheung Yeung wrote:
Hello
This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
target hook for the AMD GCN architecture, such that when vectorized,
calls to builtin standard math functions such as asinf, exp, pow etc.
are converted to calls to the r
On 01/03/2023 10:52, Andre Vieira (lists) wrote:
On 01/03/2023 10:01, Andrew Stubbs wrote:
> On 28/02/2023 23:01, Kwok Cheung Yeung wrote:
>> Hello
>>
>> This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
>> target hook for the AMD GCN a
On 01/03/2023 16:56, Paul-Antoine Arras wrote:
This patch introduces instruction patterns for conditional min and max
operations (cond_{f|s|u}{max|min}) in the GCN machine description. It
also allows the exec register to be saved in SGPRs to avoid spilling to
memory.
Tested on GCN3 Fiji gfx803
On 02/03/2023 15:07, Kwok Cheung Yeung wrote:
Hello
I've made the suggested changes. Should I hold off on committing this
until GCC 13 has been branched off?
No need, amdgcn is not a primary target and this stuff won't affect
anyone else. Please go ahead and commit.
Andrew
On 03/03/2023 17:05, Paul-Antoine Arras wrote:
Le 02/03/2023 à 18:18, Andrew Stubbs a écrit :
On 01/03/2023 16:56, Paul-Antoine Arras wrote:
This patch introduces instruction patterns for conditional min and max
operations (cond_{f|s|u}{max|min}) in the GCN machine description. It
also allows
On 08/03/2023 11:06, Tobias Burnus wrote:
Next try – this time with both patches.
On 08.03.23 12:05, Tobias Burnus wrote:
Hi Andrew,
attached are two patches related to GCN, one for libgomp.texi
documenting an env var
and a release-notes update in www docs.
OK? Comments?
LGTM
Andrew
On 13/03/2023 12:25, Tobias Burnus wrote:
Found when comparing '-v -Wl,-v' output as despite -save-temps multiple
runs
yielded differed results.
Fixed as attached.
OK for mainline?
OK.
Andrew
Hi all, Jakub,
I need to implement DWARF for local variables that exist in an
alternative address space. This happens for OpenACC gang-private
variables (or will when the patches are committed) on AMD GCN, at least.
This is distinct from pointer variables that reference other address
spaces,
On 22/01/2021 11:42, Andrew Stubbs wrote:
@@ -20294,15 +20315,6 @@ add_location_or_const_value_attribute (dw_die_ref die,
tree decl, bool cache_p)
if (list)
{
add_AT_location_description (die, DW_AT_location, list);
-
- addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (decl
This patch fixes and AMD GCN bug in which attempting to use DFmode
vector reductions would cause an ICE.
There's no reason not to allow the reductions, so we simply enable them
thusly.
Andrew
amdgcn: Allow V64DFmode min/max reductions
I don't know why these were disabled. There're no direct
Now backported to devel/omp/gcc-10.
On 26/01/2021 10:29, Andrew Stubbs wrote:
This patch fixes and AMD GCN bug in which attempting to use DFmode
vector reductions would cause an ICE.
There's no reason not to allow the reductions, so we simply enable them
thusly.
Andrew
This patch adds a new -march option and multilib configuration to the
amdgcn GPU target. The patch does not attempt to support any of the new
features of the gfx908 devices, but does set the correct ELF flags etc.
that are expected by the ROCm runtime.
The GFX908 devices are not generally avai
Ping.
On 22/01/2021 11:42, Andrew Stubbs wrote:
Hi all, Jakub,
I need to implement DWARF for local variables that exist in an
alternative address space. This happens for OpenACC gang-private
variables (or will when the patches are committed) on AMD GCN, at least.
This is distinct from
This patch fixes an OpenMP performance issue on NVPTX.
The problem is that it deallocates the stack memory when it shouldn't,
forcing the GOMP_OFFLOAD_run function to allocate the stack space again,
before every kernel launch.
The memory is only meant to be deallocated when a data allocation
This patch fixes up the DWARF code ranges for offload debugging, again.
This time it defers the changes until most other DWARF generation has
occurred, because the previous method was causing ICEs on some testcases.
This patch will be proposed for mainline in stage 1.
Andrew
DWARF: late code
This patch is now backported to devel/omp/gcc-10.
Andrew
On 26/11/2020 14:41, Andrew Stubbs wrote:
This patch fixes an error in GCN mkoffload that corrupted relocations in
the early-debug info.
The code now updates the relocation code without zeroing the symbol index.
Andrew
This patch fixes a bug in which testcases using thread_limit larger than
the number of physical threads would crash with a memory fault. This was
exacerbated in testcases with a lot of register pressure because the
autoscaling reduces the number of physical threads to compensate for the
increas
This patch removes the amdgcn-specific "omp_gcn" pass that was
responsible for tweaking the OpenMP middle-end IR for GCN.
In the past there were a few things there to make it work for simple
cases while real support was built out in the backend and libgomp, but
those haven't been needed ever s
This patch fixes a problem in which nested OpenMP parallel regions cause
errors if the number of inner teams is not balanced (i.e. the number of
loop iterations is not divisible by the number of physical threads). A
testcase is included.
On NVPTX the symptom was a fatal error:
libgomp: cuCtxS
On 18/09/2020 12:25, Andrew Stubbs wrote:
This patch fixes a problem in which nested OpenMP parallel regions cause
errors if the number of inner teams is not balanced (i.e. the number of
loop iterations is not divisible by the number of physical threads). A
testcase is included.
This updated
Ping.
On 03/09/2020 16:29, Andrew Stubbs wrote:
On 28/08/2020 13:04, Andrew Stubbs wrote:
Hi all,
This patch introduces DWARF CFI support for architectures that require
multiple registers to hold pointers, such as the stack pointer, frame
pointer, and return address. The motivating case is
On 03/09/2020 16:29, Andrew Stubbs wrote:
OK to commit? (Although, I'll hold off until AMD release the
compatible GDB.)
The ROCm 3.8 ROCGDB is now released. I'm committing the attached patches
to devel/omp/gcc-10 while I wait for review.
The first patch is the multi-register C
On 30/07/2020 12:10, Andrew Stubbs wrote:
On 29/07/2020 15:05, Andrew Stubbs wrote:
This patch does not implement anything new, but simply separates
OpenACC 'enter data' and 'exit data' into two libgomp API functions.
The original API name is kept for backward compatibi
On 28/09/2020 15:02, Tom de Vries wrote:
This patch simply skips barriers when they would "wait" for only one
thread (the current thread). This means that teams nested inside other
teams now run independently, instead of strictly in lock-step, and is
only valid as long as inner teams are limited
ested teams are expected
to have multiple threads each.
libgomp/ChangeLog:
2020-09-29 Andrew Stubbs
* parallel.c (gomp_resolve_num_threads): Ignore nest_var on nvptx
and amdgcn targets.
diff --git a/libgomp/parallel.c b/libgomp/parallel.c
index 2423f11f44a..0618056a7fe 100644
--- a/li
Ping.
On 21/09/2020 14:51, Andrew Stubbs wrote:
Ping.
On 03/09/2020 16:29, Andrew Stubbs wrote:
On 28/08/2020 13:04, Andrew Stubbs wrote:
Hi all,
This patch introduces DWARF CFI support for architectures that
require multiple registers to hold pointers, such as the stack
pointer, frame
This patch adds an extra alternative to the existing addptrdi3 pattern.
It permits safe 64-bit addition in scalar registers, as well as vector
registers. This is especially useful because the result of addptr
typically gets fed to instructions that expect the base address to be in
a scalar regi
On 13/09/2022 12:03, Paul-Antoine Arras wrote:
Hello,
This patch intends to backport e90af965e5c by Jakub Jelinek to
devel/omp/gcc-12.
The original patch was described here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601189.html
I've merged and committed it for you.
Andrew
1 - 100 of 1051 matches
Mail list logo