[ dropped gdb-patches, since already applied there. ]
On 6/27/22 15:38, Tom de Vries wrote:
On 6/27/22 15:03, Tom de Vries wrote:
Hi,
When building gdb with --enabled-shared, I run into:
...
ld: build/zlib/libz.a(libz_a-inffast.o): relocation R_X86_64_32S
against \
`.rodata' can not be use
On 7/12/22 15:59, Iain Sandoe wrote:
Hi Tom
On 12 Jul 2022, at 14:42, Tom de Vries via Gcc-patches
wrote:
[ dropped gdb-patches, since already applied there. ]
On 6/27/22 15:38, Tom de Vries wrote:
On 6/27/22 15:03, Tom de Vries wrote:
Hi,
When building gdb with --enabled-shared, I run
On 8/20/21 12:54 AM, Roger Sayle wrote:
>
> This patch adds a __PTX_ISA__ predefined macro to the nvptx backend that
> allows code to check the compute model being targeted by the compiler.
Hi Roger,
The naming __PTX_ISA__ is consistent with the naming of -misa=sm_30/sm_35.
The -misa=sm_30/sm_3
On 8/30/21 12:54 PM, Tobias Burnus wrote:
> Document Roger's patch
> https://gcc.gnu.org/g:3c496e92d795a8fe5c527e3c5b5a6606669ae50d
>
> OK? Suggestions?
>
LGTM.
Thanks,
- Tom
On 9/14/22 11:41, Thomas Schwinge wrote:
Hi Tom!
On 2022-02-01T19:31:13+0100, Tom de Vries via Gcc-patches
wrote:
On a GT 1030 (sm_61), with driver version 470.94 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
-DACC_DEVICE_TYPE_nvidia=1
On 9/14/22 11:41, Thomas Schwinge wrote:
Hi Tom!
On 2022-02-01T19:31:27+0100, Tom de Vries via Gcc-patches
wrote:
Hi,
On a GT 1030, with driver version 470.94 and -mptx=3.1 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
-DACC_DEVICE_TYPE_nvidia=1
On 8/6/22 21:20, Thomas Schwinge wrote:
Hi Tom!
Hi Thomas,
thanks for doing this.
Series approved.
As I mentioned, I'm not completely happy with the multilib name, but I
don't think it makes sense to post-pone approval for this.
Thanks,
- Tom
Ping.
Grüße
Thomas
On 2022-07-27T17:4
Hi,
Currently, we cannot build gdb without makeinfo installed.
It would be convenient to work around this by using the configure flag
MAKEINFO=/usr/bin/true or some such, but that doesn't work because top-level
configure requires a makeinfo of at least version 4.7, and that version check
fails fo
On 10/10/22 16:19, Thomas Schwinge wrote:
With that, OK to push?
FWIW, nvptx change looks in the obvious category to me.
Thanks,
- Tom
Hi,
Add a few test-cases that test passing each -misa=sm_xx version and verify that
the proper __PTX_SM__ is defined.
Tested on nvptx.
Committed to trunk.
Thanks,
- Tom
[nvptx, testsuite] Add gcc.target/nvptx/sm*.c
gcc/testsuite/ChangeLog:
2022-02-25 Tom de Vries
* gcc.target/nvp
Hi,
Add a file gcc/config/nvptx/nvptx-sm.def that lists all sm_xx versions used in
the port, like so:
...
NVPTX_SM(30, NVPTX_SM_SEP)
NVPTX_SM(35, NVPTX_SM_SEP)
NVPTX_SM(53, NVPTX_SM_SEP)
NVPTX_SM(70, NVPTX_SM_SEP)
NVPTX_SM(75, NVPTX_SM_SEP)
NVPTX_SM(80,)
...
and use it in various places using a pa
Hi,
Add a script gen-omp-device-properties.sh that uses nvptx-sm.def to generate
omp-device-properties-nvptx.
Tested on x86_64 with nvptx accelerator.
Committed to trunk.
Thanks,
- Tom
[nvptx] Use nvptx-sm.def for t-omp-device
gcc/ChangeLog:
2022-02-25 Tom de Vries
* config/nvptx
Hi,
Use nvptx-sm.def to generate new files nvptx-gen.h and nvptx-gen.opt, and:
- include nvptx-gen.h in nvptx.h, and
- add nvptx-gen.opt to extra_options (before nvptx.opt, in case that matters).
Tested on nvptx.
Committed to trunk.
Thanks,
- Tom
[nvptx] Add nvptx-gen.h and nvptx-gen.opt
gcc/
Hi,
For a test-case doing an openmp target simd reduction on a complex double:
...
DOUBLE COMPLEX :: counter_N0
...
!$OMP TARGET SIMD reduction(+: counter_N0)
...
we run into:
...
during RTL pass: expand
b.f90: In function ‘MAIN__._omp_fn.0’:
b.f90:23:32: internal compiler error: in expand_i
Hi,
With target board nvptx-none-run/-mptx=3.1 we run into:
...
cc1: error: PTX version (-mptx) needs to be at least 4.2 to support \
selected -misa (sm_53)^M
compiler exited with status 1
FAIL: gcc.target/nvptx/sm53.c (test for excess errors)
...
Fix this by adding -mptx=_ in sm53.c and simila
Hi,
In PR97348, we ran into the problem that recent CUDA dropped support for
sm_30, which inhibited the build when building with CUDA bin in the path,
because the nvptx-tools assembler uses CUDA's ptxas to do ptx verification.
To fix this, in gcc-11 the default sm_xx was moved from sm_30 to sm_35
Hi,
In gcc-11, when specifying -misa=sm_30, an executable may still contain sm_35
code (due to libraries being built with the default -misa=sm_35), so it won't
run on an sm_30 board.
Fix this by building libraries with sm_30, as was the case in gcc-5 to gcc-10.
Committed to trunk.
Thanks,
- To
Hi,
In gcc-5 to gcc-11, the ptx isa version was 3.1.
On trunk, the default is now 6.0, which is also what will be the value in
the libraries.
Consequently, there may be setups with an older driver that worked with
gcc-11, but will become unsupported with gcc-12.
Fix this by building the librari
On 2/22/22 14:55, Tom de Vries wrote:
Hi,
For the nvptx port, with -mptx-comment we have in pr53465.s:
...
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// Start: Added by -minit-regs=3:
// #NO_APP
mov.u32 %r26, 0;
// #APP
//
Hi,
With commit 07667c911b1 ("[nvptx] Build libraries with misa=sm_30") the
intention was that the sm_xx for all libraries was switched back to sm_30
using MULTILIB_EXTRA_OPTS, without changing the default sm_35.
Testing on an sm_30 board revealed that still some libs were build with sm_35,
so fi
Hi,
With commit 5b5e456f018 ("[nvptx] Build libraries with mptx=3.1") the
intention was that the ptx isa version for all libraries was switched back to
3.1 using MULTILIB_EXTRA_OPTS, without changing the default 6.0.
Further testing revealed that this is not the case, and some libs were still
bui
Hi,
The ptx manual prescribes the instruction format atom{.space}.op.type but the
compiler currently emits:
...
atom.b64.and %r31, [%r30], %r32;
...
which uses the instruction format atom{.space}.type.op.
Fix this by emitting instead:
...
atom.and.b64 %r31, [%r30], %r32;
...
Tested on nvptx
Hi,
For an atomic fetch operation that doesn't use the result:
...
__atomic_fetch_add (p64, v64, MEMMODEL_RELAXED);
...
we currently emit:
...
atom.add.u64 %r26, [%r25], %r27;
...
Detect the REG_UNUSED reg-note for %r26, and emit instead:
...
atom.add.u64 _, [%r25], %r27;
...
Likewise for
Hi,
For an example:
...
#pragma omp target map(tofrom: counter_N0)
#pragma omp simd
for (int i = 0 ; i < 1 ; i++ )
{
#pragma omp atomic update
counter_N0 = counter_N0 + 1 ;
}
...
I noticed that the result of the atomic update (%r30) is propagated:
...
@%r33 atom.add.u32
Hi,
I ran into a hang for this code:
...
#pragma omp target map(tofrom: counter_N0)
#pragma omp simd
for (int i = 0 ; i < 1 ; i++ )
{
#pragma omp atomic update
counter_N0 = counter_N0 + 1 ;
}
...
This has to do with the nature of -muniform-simt. It has two modes of
oper
Hi,
The documentation states about the predicable instruction attribute:
...
This attribute must be a boolean (i.e. have exactly two elements in its
list-of-values), with the possible values being no and yes.
...
The nvptx port has instead:
...
(define_attr "predicable" "false,true"
(const_stri
On 3/2/22 20:18, Jeff Law via Gcc-patches wrote:
On 2/28/2022 5:54 AM, Richard Biener via Gcc-patches wrote:
On Mon, 28 Feb 2022, Tobias Burnus wrote:
Ping**3
On 23.02.22 09:42, Tobias Burnus wrote:
PING**2 for the ME review or at least comments to that patch,
which fixes a build issue/ICE
On 3/9/22 13:50, Tom de Vries wrote:
On 2/22/22 14:55, Tom de Vries wrote:
Hi,
For the nvptx port, with -mptx-comment we have in pr53465.s:
...
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// Start: Added by -minit-regs=3:
// #NO_APP
Hi,
The test-case included in this patch contains:
...
#pragma omp taskloop simd shared(a) lastprivate(myId)
...
This is translated to 3 taskloop statements in gimple, visible with
-fdump-tree-gimple:
...
#pragma omp taskloop private(D.2124)
#pragma omp taskloop shared(a) shared(myId) pri
On 3/18/22 14:01, Jakub Jelinek wrote:
On Fri, Mar 18, 2022 at 01:44:00PM +0100, Tom de Vries wrote:
The test-case included in this patch contains:
...
#pragma omp taskloop simd shared(a) lastprivate(myId)
...
This is translated to 3 taskloop statements in gimple, visible with
-fdump-tree-gi
Hi,
Consider test-case pr104952-1.c, included in this commit, containing:
...
#pragma omp target map(tofrom:result) map(to:arr)
#pragma omp simd reduction(||: result)
...
When run on x86_64 with nvptx accelerator, the test-case either aborts or
hangs.
The reduction clause is translated by th
On 3/18/22 15:56, Jakub Jelinek wrote:
On Fri, Mar 18, 2022 at 03:42:48PM +0100, Tom de Vries wrote:
And for NVPTX we somehow lower the taskloop into GIMPLE_ASM
or how we end up ICEing?
In the nvptx backend, gen_comment (triggering not very frequently atm) uses
gen_rtx_ASM_INPUT_loc with as l
On 3/21/22 08:58, Richard Biener wrote:
On Thu, Mar 17, 2022 at 4:10 PM Tom de Vries via Gcc-patches
wrote:
On 3/9/22 13:50, Tom de Vries wrote:
On 2/22/22 14:55, Tom de Vries wrote:
Hi,
For the nvptx port, with -mptx-comment we have in pr53465.s:
...
// #APP
// 9 "gcc/test
On 3/21/22 14:49, Richard Biener wrote:
On Mon, Mar 21, 2022 at 12:50 PM Tom de Vries wrote:
On 3/21/22 08:58, Richard Biener wrote:
On Thu, Mar 17, 2022 at 4:10 PM Tom de Vries via Gcc-patches
wrote:
On 3/9/22 13:50, Tom de Vries wrote:
On 2/22/22 14:55, Tom de Vries wrote:
Hi,
For
Hi,
Consider this code (with N defined to 1024):
...
float v = 0.0;
#pragma omp target map(tofrom: v)
#pragma omp parallel for simd
for (int i = 0 ; i < N; i++)
{
#pragma omp atomic update
v = v + 1.0;
}
...
It hangs when executing on target board unix/-foffload=-misa=
Hi,
Starting with ptx isa version 6.3, a ptx directive .alias is available.
Use this directive to support symbol aliases, as far as possible.
The alias support is off by default. It can be turned on using a switch
-malias.
Furthermore, for pre-sm_75, it's not effective unless the ptx version is
Hi,
Add new option -mexperimental.
This allows, rather than developing a new feature to completion in a
development branch, to develop a new feature on trunk, without disturbing
trunk.
The equivalent of the feature branch merge then becomes making the
functionality available for -mno-experimenta
Hi,
With PR104489 still open and end-of-stage-4 approaching, classify HFmode
support as experimental, which is not enabled by default but can be enabled
using -mexperimental.
This fixes the nvptx build when the default sm_xx is set to sm_53 or higher.
Note that we're not using -mfp16 or some suc
Hi,
The percentage sign as first character of a ptx identifier can be used to
avoid name conflicts, e.g., between user-defined variable names and
compiler-generated names.
The insn nvptx_uniform_warp_check contains register names without '%' prefix,
which potentially could lead to name conflicts
Hi,
On nvptx (using a Quadro K2000 with driver 470.103.01) I ran into this:
...
FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
...
which mimimized to:
...
#include
atomic_flag a = ATOMIC_FLAG_INIT;
int main () {
if ((atomic_flag_test_and_set) (&a))
__builtin_abort ();
On 3/24/22 10:02, Jakub Jelinek wrote:
On Thu, Mar 24, 2022 at 09:28:15AM +0100, Tom de Vries via Gcc-patches wrote:
Hi,
On nvptx (using a Quadro K2000 with driver 470.103.01) I ran into this:
...
FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
...
which mimimized to:
...
#include
On 3/24/22 11:59, Jakub Jelinek wrote:
On Thu, Mar 24, 2022 at 11:01:30AM +0100, Tom de Vries wrote:
Shouldn't that be instead
return (woldval & ((UWORD) -1 << shift)) != 0;
or
return (woldval & ((UWORD) ~(UWORD) 0 << shift)) != 0;
?
Well, I used '(woldval & wval) == wval' based on the
Hi,
When a display manager is running on an nvidia card, all CUDA kernel launches
get a 5 seconds watchdog timer.
Consequently, when running the libgomp testsuite with nvptx accelerator and
GOMP_NVPTX_JIT=-O0 we run into a few FAILs like this:
...
libgomp: cuStreamSynchronize error: the launch ti
On 3/25/22 11:04, Tobias Burnus wrote:
On 25.03.22 10:27, Jakub Jelinek via Gcc-patches wrote:
On Fri, Mar 25, 2022 at 10:18:49AM +0100, Tom de Vries wrote:
[...]
Fix this by scaling down the failing test-cases.
Tested on x86_64-linux with nvptx accelerator.
[...]
Will defer to Thomas, as it i
On 3/25/22 13:35, Thomas Schwinge wrote:
Hi!
On 2022-03-25T13:08:52+0100, Tom de Vries wrote:
On 3/25/22 11:04, Tobias Burnus wrote:
On 25.03.22 10:27, Jakub Jelinek via Gcc-patches wrote:
On Fri, Mar 25, 2022 at 10:18:49AM +0100, Tom de Vries wrote:
[...]
Fix this by scaling down the faili
Hi,
When building an nvptx offloading configuration on openSUSE Leap 15.3, the
site script /usr/share/site/x86_64-unknown-linux-gnu is activated, setting
libexecdir to ${exec_prefix}/lib rather than ${exec_prefix}/libexec:
...
| # If user did not specify libexecdir, set the correct target:
| # Nor
On 3/28/22 10:49, Richard Biener wrote:
On Mon, 28 Mar 2022, Tom de Vries wrote:
Hi,
When building an nvptx offloading configuration on openSUSE Leap 15.3, the
site script /usr/share/site/x86_64-unknown-linux-gnu is activated, setting
libexecdir to ${exec_prefix}/lib rather than ${exec_prefix}
On 3/28/22 14:04, Richard Biener wrote:
On Mon, 28 Mar 2022, Andreas Schwab wrote:
On Mär 28 2022, Richard Biener via Gcc-patches wrote:
OK in principle, but I have no idea on how portable
$(libexecdir:\$(exec_prefix)/%=%)
is going to be?
We already require GNU make, don't we?
We should
Hi,
Currently we have:
...
$ gcc --target-help 2>&1 | egrep "misa|mptx"
-misa= Specify the version of the ptx ISA to use.
-mptx= Specify the version of the ptx version to use.
Known PTX ISA versions (for use with the -misa= option):
Known PTX versi
Hi,
The target option misa has the following description:
...
$ gcc --target-help 2>&1 | grep misa
-misa= Specify the PTX ISA target architecture to use.
...
The name misa is somewhat poorly chosen. It suggests that for a use
-misa=sm_30, sm_30 is the name of a specific In
Hi,
Say we have an sm_50 board, and we want to run a benchmark using the highest
possible march setting.
Currently there's march=sm_30, march=sm_35, march=sm_53, but no march=sm_50.
So, we'd need to pick march=sm_35.
Likewise, for a test script that handles multiple boards, we'd need a mapping
Hi,
In the docs we have for m64:
...
Ignored, but preserved for backward compatibility. Only 64-bit ABI is
supported.
...
But with --target-help, we have instead:
...
$ gcc --target-help
...
-m64Generate code for a 64-bit ABI.
...
which could be interpreted as meaning that generating cod
Hi,
Update nvptx documentation:
- Use meaningful terms: "PTX ISA target architecture" and "PTX ISA version".
- Remove invalid claim that "ISA strings must be lower-case".
- Add missing sm_xx entries.
- Fix default ISA.
- Add march, copying misa doc.
- Declare misa an march alias.
- Add march-map.
Hi,
Add preprocessor macros __PTX_ISA_VERSION_MAJOR__ and
__PTX_ISA_VERSION_MINOR__.
For the default 6.0, we have:
...
$ echo | cc1 -E -dD - 2>&1 | grep PTX_ISA_VERSION
#define __PTX_ISA_VERSION_MAJOR__ 6
#define __PTX_ISA_VERSION_MINOR__ 0
...
and for 3.1, we have:
...
$ echo | cc1 -mptx=3.1
On 3/29/22 16:28, Tobias Burnus wrote:
Hi Tom,
On 29.03.22 15:39, Tom de Vries wrote:
Any comments?
+(e.g.@: @samp{sm_35}). Valid architecture strings are @samp{sm_30},
+@samp{sm_35}, @samp{sm_53} @samp{sm_70}, @samp{sm_75} and
+@samp{sm_80}. The default target architecture is sm_30.
Missin
On 3/29/22 16:47, Tobias Burnus wrote:
On 29.03.22 16:28, Tobias Burnus wrote:
On 29.03.22 15:39, Tom de Vries wrote:
Any comments?
I think it would be useful to have additionally some wording for the
(new in GCC 12/new since today) macros,
Agreed.
i.e. something like:
--- a/gcc/doc/inv
On 3/30/22 11:02, Tobias Burnus wrote:
On 30.03.22 10:03, Tom de Vries wrote:
On 3/29/22 16:47, Tobias Burnus wrote:
I think it would be useful to have additionally some wording for the
(new in GCC 12/new since today) macros,
[...]
The macro is defined also if the option is not specified, so
[ was: Re: [wwwdocs][patch] gcc-12/changes.html: Document -misa update
for nvptx ]
On 3/3/22 13:27, Tobias Burnus wrote:
The current wording, https://gcc.gnu.org/gcc-12/changes.html#nvptx ,
is outdated and (now wrongly) encourages to use -mptx=.
Updated as follows.
I've taken these changes a
Hi,
Newer versions of CUDA no longer support sm_30, and nvptx-tools as
currently doesn't handle that gracefully when verifying
( https://github.com/MentorEmbedded/nvptx-tools/issues/30 ).
There's a --no-verify work-around in place in ASM_SPEC, but that one doesn't
work when using -Wa,--verify on
Hi,
The dg-options line in gcc.target/nvptx/march.c:
...
/* { dg-options "-march=sm_30"} */
...
currently doesn't have any effect because it's missing a space between '"' and
'}'.
Fix this by adding the missing space.
Tested on nvptx.
Committed to trunk.
Thanks,
- Tom
[nvptx, testsuite] Fix t
Hi,
When running test-cases gcc.target/nvptx/alias-*.c on target board
nvptx-none-run/-misa=sm_80 we run into fails because the test-cases add
-mptx=6.3, which doesn't support sm_80.
Fix this by only adding -mptx=6.3 if necessary, and simplify the test-cases by
using ptx_alias feature abstraction
Hi,
When running test-case libgomp.oacc-c-c++-common/vector-length-128-7.c on an
RTX A2000 (sm_86) with driver 510.60.02 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-7.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \
output p
Hi,
When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on
an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run
into:
...
FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 \
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.fortran/examples-
On 4/1/22 14:28, Thomas Schwinge wrote:
Hi Tom!
On 2022-04-01T13:24:40+0200, Tom de Vries wrote:
When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on
an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run
into:
...
FAIL: libgomp.fortran/examples-4
On 4/1/22 17:38, Jakub Jelinek wrote:
On Fri, Apr 01, 2022 at 05:34:50PM +0200, Tom de Vries wrote:
Do you perhaps have an idea why it's failing?
Because you call on_device_arch_nvptx () outside of
!$omp target region, so unless the host device is NVPTX,
it will not be true.
That bit does w
On 4/1/22 17:57, Tom de Vries wrote:
On 4/1/22 17:38, Jakub Jelinek wrote:
On Fri, Apr 01, 2022 at 05:34:50PM +0200, Tom de Vries wrote:
Do you perhaps have an idea why it's failing?
Because you call on_device_arch_nvptx () outside of
!$omp target region, so unless the host device is NVPTX,
i
On 4/4/22 13:07, Jakub Jelinek wrote:
On Mon, Apr 04, 2022 at 01:05:12PM +0200, Tom de Vries wrote:
2022-04-04 Tom de Vries
* testsuite/libgomp.fortran/examples-4/on_device_arch.c: Copy from
parent dir.
Wouldn't just ! { dg-additional-sources ../on_device_arch.c }
work?
I
On 4/5/22 17:14, Thomas Schwinge wrote:
Hi!
Still catching up with GCC/nvptx back end changes... %-)
In the following I'm not discussing the patch to document
"gcc-12: Nvptx updates", but rather one aspect of the
"gcc-12: Nvptx updates" themselves. ;-)
On 2022-03-30T14:27:41+0200, Tom de Vr
On 4/8/22 00:27, Thomas Schwinge wrote:
Hi!
On 2017-01-13T19:11:23+0100, Jakub Jelinek wrote:
Especially for distributions it is undesirable to need to have proprietary
CUDA libraries and headers installed when building GCC.
--- libgomp/plugin/configfrag.ac.jj 2017-01-13 12:07:56.
On 4/7/22 16:17, Thomas Schwinge wrote:
Hi!
On 2022-03-31T09:40:47+0200, Tom de Vries via Gcc-patches
wrote:
Newer versions of CUDA no longer support sm_30, and nvptx-tools as
currently doesn't handle that gracefully when verifying
( https://github.com/MentorEmbedded/nvptx-tools/issu
On 9/17/21 10:08, Richard Biener via Gcc-patches wrote:
On Mon, Sep 13, 2021 at 4:53 PM Stefan Schulze Frielinghaus
wrote:
On Mon, Sep 06, 2021 at 11:56:21AM +0200, Richard Biener wrote:
On Fri, Sep 3, 2021 at 10:01 AM Stefan Schulze Frielinghaus
wrote:
On Fri, Aug 20, 2021 at 12:35:58PM +
[ was: Re: [RFC] ldist: Recognize rawmemchr loop patterns ]
On 1/31/22 16:00, Richard Biener wrote:
I'm running into PR56888 (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888 ) on nvptx due to
this, f.i. in gcc/testsuite/gcc.c-torture/execute/builtins/strlen.c,
where gcc/testsuite/gcc.c-tortu
Hi,
When running the libgomp testsuite with GOMP_NVPTX_JIT=-O0 using an nvptx
accelerator (Nvidia T400, 2GB), I run into:
...
libgomp: cuCtxSynchronize error: unspecified launch failure \
(perhaps abort was called)
libgomp: cuMemFree_v2 error: unspecified launch failure
libgomp: device finaliz
Hi,
When running libgomp test-case broadcast-many.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
libgomp: The Nvidia accelerator has insufficient resources to launch \
'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \
recompile the program with 'num_wor
Hi,
When I run the libgomp test-case reduction-cplx-dbl.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \
execution test
FA
Hi,
The ptx insn atom doesn't support local memory. In case of doing an atomic
operation on local memory, we run into:
...
operation not supported on global/shared address space
...
This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE.
The message is somewhat confusing given
Hi,
When running libgomp test-case reduction-7.c on an nvptx accelerator
(T400, driver version 470.86) and GOMP_NVPTX_JIT=-O0, I run into:
...
reduction-7.exe:reduction-7.c:312: v_p_2: \
Assertion `out[j * 32 + i] == (i + j) * 2' failed.
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reductio
Hi,
In ptx isa 6.0, a new barrier instruction was added, and bar.sync was
redefined as barrier.sync.aligned.
The aligned modifier indicates that all threads in a CTA will execute the same
barrier instruction.
The seems fine for a form "bar.sync 0".
But a "bar.sync %rx,64" (as used for vector le
Hi,
With the following example, minimized from parallel-dims.c:
...
int
main (void)
{
int vectors_max = -1;
#pragma acc parallel num_gangs (1) num_workers (1) copy (vectors_max)
{
for (int i = 0; i < 2; i++)
for (int j = 0; j < 2; j++)
#pragma acc loop vector reduction (max
Hi,
On a GT 1030 (sm_61), with driver version 470.94 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
-O2 execution test
...
which minimizes to the same test-case as listed in commit "[nvptx
Hi,
On a GT 1030, with driver version 470.94 and -mptx=3.1 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
-O2 execution test
...
which minimizes to the same test-case as listed in commit "
On 2/3/22 10:40, Thomas Schwinge wrote:
Hi Tom!
On 2021-05-19T14:56:17+0200, I wrote:
On 2020-08-12T15:57:23+0200, Tom de Vries wrote:
When enabling sync_int_long for nvptx, we run into a failure in
gcc.dg/pr86314.c:
...
nvptx-run: error getting kernel result: operation not supported on \
On 2/2/22 09:30, Tobias Burnus wrote:
This patch updates the documentation for Tom's change of the default
-mptx= version - mentioning also -mptx=7.0.
I forgot whether ptx = 7.0 was working fine or whether there was
a reason not to mention it.
A ptx version is experimental if all sm versions i
Hi,
In PR target/104364, two problems were reported:
- in muniform-simt mode, an atom.cas insn is no longer executed in the
"master lane" only.
- in msoft-stack mode, an __atomic_compare_exchange_n on stack memory is
translated assuming it accesses local memory, while that's not the case.
Fix
Hi,
On nvptx, I run into an execution failure in test-case
gcc.dg/tree-ssa/builtin-sprintf.c because the test-case uses the 'hh'
modifier.
The port uses newlib, which does by default not support that modifier.
There's a configure option --enable-newlib-io-c99-formats to enable this
support, but t
Hi,
On nvptx, I run into an execution failure in test-case
gcc.dg/tree-ssa/builtin-sprintf.c because the test-case uses the 'hh'
modifier.
The port uses newlib, which does by default not support that modifier.
There's a configure option --enable-newlib-io-c99-formats to enable this
support, but t
Hi,
While testing with driver version 390.147 I ran into the problem that it
doesn't support ptx isa version 6.3 (the new default), only 6.1.
Furthermore, using the -mptx option is a bit user-unfriendly.
Say we want to compile for sm_80. We can use -misa=sm_80 to specify that, but
then run into
On 2/8/22 13:57, Tom de Vries via Gcc-patches wrote:
+static const char *
+sm_version_to_string (enum ptx_isa sm)
+{
+ switch (sm)
+{
+case PTX_ISA_SM30:
+ return "30";
+case PTX_ISA_SM35:
+ return "35";
+case PTX_ISA_SM53:
+ return "53
On 2/8/22 14:24, Tobias Burnus wrote:
Hi Tom,
if I understand the patch correctly, -misa=sm_53 -mptx=3.1 will ...
On 08.02.22 13:57, Tom de Vries via Gcc-patches wrote:
Furthermore, using the -mptx option is a bit user-unfriendly.
Say we want to compile for sm_80. We can use -misa=sm_80 to
Hi,
With the commit "[nvptx] Choose -mptx default based on -misa" I introduced a
use of PTX_ISA_SM70, without adding it first.
Add it, as well as the corresponding TARGET_SM70.
Build for x86_64 with nvptx accelerator.
Committed to trunk.
Thanks,
- Tom
[nvptx] Unbreak build, add PTX_ISA_SM70
On 1/8/22 13:21, Roger Sayle wrote:
This patch adds more support for _Float16 (HFmode) to the nvptx backend.
Currently negation, absolute value and floating point comparisons are
implemented by promoting to float (SFmode). This patch adds suitable
define_insns to nvptx.md, most conditional on T
On 1/10/22 11:58, Roger Sayle wrote:
One of the unusual target features of the Nvidia PTX ISA is that it
doesn't provide QI mode (byte sized) operations or registers.
[ FWIW: I recently happened to check this, and it actually supports
.u8/.s8/.b8 regs, but indeed just for very few operations
On 1/14/22 10:54, Roger Sayle wrote:
Now that the middle-end MULT_HIGHPART_EXPR pieces are in place, this
patch adds support for nvptx's mul.hi.s64 and mul.hi.u64 instructions,
as previously reviewed (provisionally pre-approved) back in August 2020:
https://gcc.gnu.org/pipermail/gcc-patches/2020
On 1/16/22 12:49, Roger Sayle wrote:
This patch adds support for nvptx's BImode and.pred, or.pred and
xor.pred instructions. Technically, nvptx.md previously defined
andbi3, iorbi3 and xorbi3 instructions, but the assembly language
mnemonic output for these was incorrect (e.g. and.b1) and would
On 2/3/22 22:00, Roger Sayle wrote:
This patch addresses the "increased register pressure" regression on
nvptx-none caused by my change to transition the backend to a
STORE_FLAG_VALUE = 1 target. This improved code generation for the
more common case of producing 0/1 Boolean values, but
On 2/8/22 14:09, Roger Sayle wrote:
Many thanks to Thomas Schwinge for confirming my hypothesis that the
register
usage regression, PR target/104345, is solely due to libgcc's _muldc3
function.
In addition to the isinf functionality in the previously proposed nvptx
patch at
https://gcc.gnu.org/p
Hi,
There's a nvidia driver JIT bug that mishandles this code (minimized from
builtin-arith-overflow-15.c):
...
int main (void) {
signed char r;
unsigned char y = (unsigned char) 0x80;
if (__builtin_sub_overflow ((unsigned char)0, (unsigned char)y, &r))
__builtin_abort ();
return 0;
}
Hi,
The ptx isa specifies (for pre-sm_7x) that atomic operations on shared memory
locations do not guarantee atomicity with respect to normal store instructions
to the same address.
This can be fixed by:
- inserting barriers between normal stores and atomic operations to a common
address
- usin
Hi,
For sm_7x atomic stores we fall back on expand_atomic_store, but this
results in using membar.sys for shared stores.
Fix this by adding an nvptx_atomic_store insn that adds a membar.cta for a
shared store.
Tested on x86_64 with nvptx accelerator.
Committed to trunk.
Thanks,
- Tom
[nvptx]
Hi,
The OpenACC execution model states that implementing a critical
section across workers using atomic operations and a busy-wait loop may never
succeed, since the scheduler may suspend the worker that owns the lock, in
which case the worker waiting on the lock can never complete.
Add a test-cas
1 - 100 of 165 matches
Mail list logo