[PING][PATCH][gdb/build] Fix build breaker with --enabled-shared

2022-07-12 Thread Tom de Vries via Gcc-patches
[ dropped gdb-patches, since already applied there. ] On 6/27/22 15:38, Tom de Vries wrote: On 6/27/22 15:03, Tom de Vries wrote: Hi, When building gdb with --enabled-shared, I run into: ... ld: build/zlib/libz.a(libz_a-inffast.o): relocation R_X86_64_32S against \    `.rodata' can not be use

Re: [PING][PATCH][gdb/build] Fix build breaker with --enabled-shared

2022-07-12 Thread Tom de Vries via Gcc-patches
On 7/12/22 15:59, Iain Sandoe wrote: Hi Tom On 12 Jul 2022, at 14:42, Tom de Vries via Gcc-patches wrote: [ dropped gdb-patches, since already applied there. ] On 6/27/22 15:38, Tom de Vries wrote: On 6/27/22 15:03, Tom de Vries wrote: Hi, When building gdb with --enabled-shared, I run

Re: [PATCH] nvptx: Add a __PTX_ISA__ predefined macro based on target ISA.

2021-08-24 Thread Tom de Vries via Gcc-patches
On 8/20/21 12:54 AM, Roger Sayle wrote: > > This patch adds a __PTX_ISA__ predefined macro to the nvptx backend that > allows code to check the compute model being targeted by the compiler. Hi Roger, The naming __PTX_ISA__ is consistent with the naming of -misa=sm_30/sm_35. The -misa=sm_30/sm_3

Re: [wwwdocs] gcc-12/changes.html: nvptx - new __PTX_SM__ macro

2021-08-31 Thread Tom de Vries via Gcc-patches
On 8/30/21 12:54 PM, Tobias Burnus wrote: > Document Roger's patch > https://gcc.gnu.org/g:3c496e92d795a8fe5c527e3c5b5a6606669ae50d > > OK? Suggestions? > LGTM. Thanks, - Tom

Re: [committed][nvptx] Add bar.warp.sync

2022-09-14 Thread Tom de Vries via Gcc-patches
On 9/14/22 11:41, Thomas Schwinge wrote: Hi Tom! On 2022-02-01T19:31:13+0100, Tom de Vries via Gcc-patches wrote: On a GT 1030 (sm_61), with driver version 470.94 I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \ -DACC_DEVICE_TYPE_nvidia=1

Re: [committed][nvptx] Add uniform_warp_check insn

2022-09-14 Thread Tom de Vries via Gcc-patches
On 9/14/22 11:41, Thomas Schwinge wrote: Hi Tom! On 2022-02-01T19:31:27+0100, Tom de Vries via Gcc-patches wrote: Hi, On a GT 1030, with driver version 470.94 and -mptx=3.1 I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \ -DACC_DEVICE_TYPE_nvidia=1

Re: [PING^5] nvptx: Allow '--with-arch' to override the default '-misa' (was: nvptx multilib setup)

2022-09-18 Thread Tom de Vries via Gcc-patches
On 8/6/22 21:20, Thomas Schwinge wrote: Hi Tom! Hi Thomas, thanks for doing this. Series approved. As I mentioned, I'm not completely happy with the multilib name, but I don't think it makes sense to post-pone approval for this. Thanks, - Tom Ping. Grüße Thomas On 2022-07-27T17:4

[PATCH] Add --without-makeinfo

2022-10-04 Thread Tom de Vries via Gcc-patches
Hi, Currently, we cannot build gdb without makeinfo installed. It would be convenient to work around this by using the configure flag MAKEINFO=/usr/bin/true or some such, but that doesn't work because top-level configure requires a makeinfo of at least version 4.7, and that version check fails fo

Re: Restore default 'sorry' 'TARGET_ASM_CONSTRUCTOR', 'TARGET_ASM_DESTRUCTOR' (was: [PATCH 1/3] STABS: remove -gstabs and -gxcoff functionality)

2022-10-10 Thread Tom de Vries via Gcc-patches
On 10/10/22 16:19, Thomas Schwinge wrote: With that, OK to push? FWIW, nvptx change looks in the obvious category to me. Thanks, - Tom

[committed][nvptx, testsuite] Add gcc.target/nvptx/sm*.c

2022-03-01 Thread Tom de Vries via Gcc-patches
Hi, Add a few test-cases that test passing each -misa=sm_xx version and verify that the proper __PTX_SM__ is defined. Tested on nvptx. Committed to trunk. Thanks, - Tom [nvptx, testsuite] Add gcc.target/nvptx/sm*.c gcc/testsuite/ChangeLog: 2022-02-25 Tom de Vries * gcc.target/nvp

[committed][nvptx] Add nvptx-sm.def

2022-03-01 Thread Tom de Vries via Gcc-patches
Hi, Add a file gcc/config/nvptx/nvptx-sm.def that lists all sm_xx versions used in the port, like so: ... NVPTX_SM(30, NVPTX_SM_SEP) NVPTX_SM(35, NVPTX_SM_SEP) NVPTX_SM(53, NVPTX_SM_SEP) NVPTX_SM(70, NVPTX_SM_SEP) NVPTX_SM(75, NVPTX_SM_SEP) NVPTX_SM(80,) ... and use it in various places using a pa

[committed][nvptx] Use nvptx-sm.def for t-omp-device

2022-03-01 Thread Tom de Vries via Gcc-patches
Hi, Add a script gen-omp-device-properties.sh that uses nvptx-sm.def to generate omp-device-properties-nvptx. Tested on x86_64 with nvptx accelerator. Committed to trunk. Thanks, - Tom [nvptx] Use nvptx-sm.def for t-omp-device gcc/ChangeLog: 2022-02-25 Tom de Vries * config/nvptx

[committed][nvptx] Add nvptx-gen.h and nvptx-gen.opt

2022-03-01 Thread Tom de Vries via Gcc-patches
Hi, Use nvptx-sm.def to generate new files nvptx-gen.h and nvptx-gen.opt, and: - include nvptx-gen.h in nvptx.h, and - add nvptx-gen.opt to extra_options (before nvptx.opt, in case that matters). Tested on nvptx. Committed to trunk. Thanks, - Tom [nvptx] Add nvptx-gen.h and nvptx-gen.opt gcc/

[committed][nvptx] Handle DCmode in define_expand "omp_simt_xchg_{bfly,idx}"

2022-03-01 Thread Tom de Vries via Gcc-patches
Hi, For a test-case doing an openmp target simd reduction on a complex double: ... DOUBLE COMPLEX :: counter_N0 ... !$OMP TARGET SIMD reduction(+: counter_N0) ... we run into: ... during RTL pass: expand b.f90: In function ‘MAIN__._omp_fn.0’: b.f90:23:32: internal compiler error: in expand_i

[committed][nvptx] Add -mptx=_ in gcc.target/nvptx/smxx.c

2022-03-03 Thread Tom de Vries via Gcc-patches
Hi, With target board nvptx-none-run/-mptx=3.1 we run into: ... cc1: error: PTX version (-mptx) needs to be at least 4.2 to support \ selected -misa (sm_53)^M compiler exited with status 1 FAIL: gcc.target/nvptx/sm53.c (test for excess errors) ... Fix this by adding -mptx=_ in sm53.c and simila

[committed][nvptx] Use --no-verify for sm_30

2022-03-03 Thread Tom de Vries via Gcc-patches
Hi, In PR97348, we ran into the problem that recent CUDA dropped support for sm_30, which inhibited the build when building with CUDA bin in the path, because the nvptx-tools assembler uses CUDA's ptxas to do ptx verification. To fix this, in gcc-11 the default sm_xx was moved from sm_30 to sm_35

[committed][nvptx] Build libraries with misa=sm_30

2022-03-03 Thread Tom de Vries via Gcc-patches
Hi, In gcc-11, when specifying -misa=sm_30, an executable may still contain sm_35 code (due to libraries being built with the default -misa=sm_35), so it won't run on an sm_30 board. Fix this by building libraries with sm_30, as was the case in gcc-5 to gcc-10. Committed to trunk. Thanks, - To

[committed][nvptx] Build libraries with mptx=3.1

2022-03-03 Thread Tom de Vries via Gcc-patches
Hi, In gcc-5 to gcc-11, the ptx isa version was 3.1. On trunk, the default is now 6.0, which is also what will be the value in the libraries. Consequently, there may be setups with an older driver that worked with gcc-11, but will become unsupported with gcc-12. Fix this by building the librari

[PING][PATCH][final] Handle compiler-generated asm insn

2022-03-09 Thread Tom de Vries via Gcc-patches
On 2/22/22 14:55, Tom de Vries wrote: Hi, For the nvptx port, with -mptx-comment we have in pr53465.s: ... // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // Start: Added by -minit-regs=3: // #NO_APP mov.u32 %r26, 0; // #APP //

[committed][nvptx] Restore default to sm_30

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi, With commit 07667c911b1 ("[nvptx] Build libraries with misa=sm_30") the intention was that the sm_xx for all libraries was switched back to sm_30 using MULTILIB_EXTRA_OPTS, without changing the default sm_35. Testing on an sm_30 board revealed that still some libs were build with sm_35, so fi

[committed][nvptx] Add multilib mptx=3.1

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi, With commit 5b5e456f018 ("[nvptx] Build libraries with mptx=3.1") the intention was that the ptx isa version for all libraries was switched back to 3.1 using MULTILIB_EXTRA_OPTS, without changing the default 6.0. Further testing revealed that this is not the case, and some libs were still bui

[committed][nvptx] Use atom.and.b64 instead of atom.b64.and

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi, The ptx manual prescribes the instruction format atom{.space}.op.type but the compiler currently emits: ... atom.b64.and %r31, [%r30], %r32; ... which uses the instruction format atom{.space}.type.op. Fix this by emitting instead: ... atom.and.b64 %r31, [%r30], %r32; ... Tested on nvptx

[committed][nvptx] Use bit-bucket operand for atom insns

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi, For an atomic fetch operation that doesn't use the result: ... __atomic_fetch_add (p64, v64, MEMMODEL_RELAXED); ... we currently emit: ... atom.add.u64 %r26, [%r25], %r27; ... Detect the REG_UNUSED reg-note for %r26, and emit instead: ... atom.add.u64 _, [%r25], %r27; ... Likewise for

[committed][nvptx] Handle unused result in nvptx_unisimt_handle_set

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi, For an example: ... #pragma omp target map(tofrom: counter_N0) #pragma omp simd for (int i = 0 ; i < 1 ; i++ ) { #pragma omp atomic update counter_N0 = counter_N0 + 1 ; } ... I noticed that the result of the atomic update (%r30) is propagated: ... @%r33 atom.add.u32

[committed][nvptx] Disable warp sync in simt region

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi, I ran into a hang for this code: ... #pragma omp target map(tofrom: counter_N0) #pragma omp simd for (int i = 0 ; i < 1 ; i++ ) { #pragma omp atomic update counter_N0 = counter_N0 + 1 ; } ... This has to do with the nature of -muniform-simt. It has two modes of oper

[committed][nvptx] Use no,yes for attribute predicable

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi, The documentation states about the predicable instruction attribute: ... This attribute must be a boolean (i.e. have exactly two elements in its list-of-values), with the possible values being no and yes. ... The nvptx port has instead: ... (define_attr "predicable" "false,true" (const_stri

PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Tom de Vries via Gcc-patches
On 3/2/22 20:18, Jeff Law via Gcc-patches wrote: On 2/28/2022 5:54 AM, Richard Biener via Gcc-patches wrote: On Mon, 28 Feb 2022, Tobias Burnus wrote: Ping**3 On 23.02.22 09:42, Tobias Burnus wrote: PING**2 for the ME review or at least comments to that patch, which fixes a build issue/ICE

[PING^2][PATCH][final] Handle compiler-generated asm insn

2022-03-17 Thread Tom de Vries via Gcc-patches
On 3/9/22 13:50, Tom de Vries wrote: On 2/22/22 14:55, Tom de Vries wrote: Hi, For the nvptx port, with -mptx-comment we have in pr53465.s: ... // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // Start: Added by -minit-regs=3: // #NO_APP

[PATCH][openmp] Set location for taskloop stmts

2022-03-18 Thread Tom de Vries via Gcc-patches
Hi, The test-case included in this patch contains: ... #pragma omp taskloop simd shared(a) lastprivate(myId) ... This is translated to 3 taskloop statements in gimple, visible with -fdump-tree-gimple: ... #pragma omp taskloop private(D.2124) #pragma omp taskloop shared(a) shared(myId) pri

Re: [PATCH][openmp] Set location for taskloop stmts

2022-03-18 Thread Tom de Vries via Gcc-patches
On 3/18/22 14:01, Jakub Jelinek wrote: On Fri, Mar 18, 2022 at 01:44:00PM +0100, Tom de Vries wrote: The test-case included in this patch contains: ... #pragma omp taskloop simd shared(a) lastprivate(myId) ... This is translated to 3 taskloop statements in gimple, visible with -fdump-tree-gi

[committed][openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPR

2022-03-18 Thread Tom de Vries via Gcc-patches
Hi, Consider test-case pr104952-1.c, included in this commit, containing: ... #pragma omp target map(tofrom:result) map(to:arr) #pragma omp simd reduction(||: result) ... When run on x86_64 with nvptx accelerator, the test-case either aborts or hangs. The reduction clause is translated by th

Re: [PATCH][openmp] Set location for taskloop stmts

2022-03-18 Thread Tom de Vries via Gcc-patches
On 3/18/22 15:56, Jakub Jelinek wrote: On Fri, Mar 18, 2022 at 03:42:48PM +0100, Tom de Vries wrote: And for NVPTX we somehow lower the taskloop into GIMPLE_ASM or how we end up ICEing? In the nvptx backend, gen_comment (triggering not very frequently atm) uses gen_rtx_ASM_INPUT_loc with as l

Re: [PING^2][PATCH][final] Handle compiler-generated asm insn

2022-03-21 Thread Tom de Vries via Gcc-patches
On 3/21/22 08:58, Richard Biener wrote: On Thu, Mar 17, 2022 at 4:10 PM Tom de Vries via Gcc-patches wrote: On 3/9/22 13:50, Tom de Vries wrote: On 2/22/22 14:55, Tom de Vries wrote: Hi, For the nvptx port, with -mptx-comment we have in pr53465.s: ... // #APP // 9 "gcc/test

Re: [PING^2][PATCH][final] Handle compiler-generated asm insn

2022-03-21 Thread Tom de Vries via Gcc-patches
On 3/21/22 14:49, Richard Biener wrote: On Mon, Mar 21, 2022 at 12:50 PM Tom de Vries wrote: On 3/21/22 08:58, Richard Biener wrote: On Thu, Mar 17, 2022 at 4:10 PM Tom de Vries via Gcc-patches wrote: On 3/9/22 13:50, Tom de Vries wrote: On 2/22/22 14:55, Tom de Vries wrote: Hi, For

[committed][nvptx] Add warp sync at simt exit

2022-03-22 Thread Tom de Vries via Gcc-patches
Hi, Consider this code (with N defined to 1024): ... float v = 0.0; #pragma omp target map(tofrom: v) #pragma omp parallel for simd for (int i = 0 ; i < N; i++) { #pragma omp atomic update v = v + 1.0; } ... It hangs when executing on target board unix/-foffload=-misa=

[committed][nvptx] Use .alias directive for mptx >= 6.3

2022-03-22 Thread Tom de Vries via Gcc-patches
Hi, Starting with ptx isa version 6.3, a ptx directive .alias is available. Use this directive to support symbol aliases, as far as possible. The alias support is off by default. It can be turned on using a switch -malias. Furthermore, for pre-sm_75, it's not effective unless the ptx version is

[committed][nvptx] Add mexperimental

2022-03-22 Thread Tom de Vries via Gcc-patches
Hi, Add new option -mexperimental. This allows, rather than developing a new feature to completion in a development branch, to develop a new feature on trunk, without disturbing trunk. The equivalent of the feature branch merge then becomes making the functionality available for -mno-experimenta

[committed][nvptx] Limit HFmode support to mexperimental

2022-03-22 Thread Tom de Vries via Gcc-patches
Hi, With PR104489 still open and end-of-stage-4 approaching, classify HFmode support as experimental, which is not enabled by default but can be enabled using -mexperimental. This fixes the nvptx build when the default sm_xx is set to sm_53 or higher. Note that we're not using -mfp16 or some suc

[committed][nvptx] Use '%' as register prefix

2022-03-22 Thread Tom de Vries via Gcc-patches
Hi, The percentage sign as first character of a ptx identifier can be used to avoid name conflicts, e.g., between user-defined variable names and compiler-generated names. The insn nvptx_uniform_warp_check contains register names without '%' prefix, which potentially could lead to name conflicts

[PATCH][libatomic] Fix return value in libat_test_and_set

2022-03-24 Thread Tom de Vries via Gcc-patches
Hi, On nvptx (using a Quadro K2000 with driver 470.103.01) I ran into this: ... FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test ... which mimimized to: ... #include atomic_flag a = ATOMIC_FLAG_INIT; int main () { if ((atomic_flag_test_and_set) (&a)) __builtin_abort ();

Re: [PATCH][libatomic] Fix return value in libat_test_and_set

2022-03-24 Thread Tom de Vries via Gcc-patches
On 3/24/22 10:02, Jakub Jelinek wrote: On Thu, Mar 24, 2022 at 09:28:15AM +0100, Tom de Vries via Gcc-patches wrote: Hi, On nvptx (using a Quadro K2000 with driver 470.103.01) I ran into this: ... FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test ... which mimimized to: ... #include

Re: [PATCH][libatomic] Fix return value in libat_test_and_set

2022-03-24 Thread Tom de Vries via Gcc-patches
On 3/24/22 11:59, Jakub Jelinek wrote: On Thu, Mar 24, 2022 at 11:01:30AM +0100, Tom de Vries wrote: Shouldn't that be instead return (woldval & ((UWORD) -1 << shift)) != 0; or return (woldval & ((UWORD) ~(UWORD) 0 << shift)) != 0; ? Well, I used '(woldval & wval) == wval' based on the

[PATCH][libgomp, testsuite] Scale down some OpenACC test-cases

2022-03-25 Thread Tom de Vries via Gcc-patches
Hi, When a display manager is running on an nvidia card, all CUDA kernel launches get a 5 seconds watchdog timer. Consequently, when running the libgomp testsuite with nvptx accelerator and GOMP_NVPTX_JIT=-O0 we run into a few FAILs like this: ... libgomp: cuStreamSynchronize error: the launch ti

Re: [PATCH][libgomp, testsuite] Scale down some OpenACC test-cases

2022-03-25 Thread Tom de Vries via Gcc-patches
On 3/25/22 11:04, Tobias Burnus wrote: On 25.03.22 10:27, Jakub Jelinek via Gcc-patches wrote: On Fri, Mar 25, 2022 at 10:18:49AM +0100, Tom de Vries wrote: [...] Fix this by scaling down the failing test-cases. Tested on x86_64-linux with nvptx accelerator. [...] Will defer to Thomas, as it i

Re: [PATCH][libgomp, testsuite] Scale down some OpenACC test-cases

2022-03-25 Thread Tom de Vries via Gcc-patches
On 3/25/22 13:35, Thomas Schwinge wrote: Hi! On 2022-03-25T13:08:52+0100, Tom de Vries wrote: On 3/25/22 11:04, Tobias Burnus wrote: On 25.03.22 10:27, Jakub Jelinek via Gcc-patches wrote: On Fri, Mar 25, 2022 at 10:18:49AM +0100, Tom de Vries wrote: [...] Fix this by scaling down the faili

[PATCH][libgomp, testsuite] Fix hardcoded libexec in plugin/configfrag.ac

2022-03-28 Thread Tom de Vries via Gcc-patches
Hi, When building an nvptx offloading configuration on openSUSE Leap 15.3, the site script /usr/share/site/x86_64-unknown-linux-gnu is activated, setting libexecdir to ${exec_prefix}/lib rather than ${exec_prefix}/libexec: ... | # If user did not specify libexecdir, set the correct target: | # Nor

Re: [PATCH][libgomp, testsuite] Fix hardcoded libexec in plugin/configfrag.ac

2022-03-28 Thread Tom de Vries via Gcc-patches
On 3/28/22 10:49, Richard Biener wrote: On Mon, 28 Mar 2022, Tom de Vries wrote: Hi, When building an nvptx offloading configuration on openSUSE Leap 15.3, the site script /usr/share/site/x86_64-unknown-linux-gnu is activated, setting libexecdir to ${exec_prefix}/lib rather than ${exec_prefix}

Re: [PATCH][libgomp, testsuite] Fix hardcoded libexec in plugin/configfrag.ac

2022-03-28 Thread Tom de Vries via Gcc-patches
On 3/28/22 14:04, Richard Biener wrote: On Mon, 28 Mar 2022, Andreas Schwab wrote: On Mär 28 2022, Richard Biener via Gcc-patches wrote: OK in principle, but I have no idea on how portable $(libexecdir:\$(exec_prefix)/%=%) is going to be? We already require GNU make, don't we? We should

[committed][nvptx] Improve help description of misa and mptx

2022-03-28 Thread Tom de Vries via Gcc-patches
Hi, Currently we have: ... $ gcc --target-help 2>&1 | egrep "misa|mptx" -misa= Specify the version of the ptx ISA to use. -mptx= Specify the version of the ptx version to use. Known PTX ISA versions (for use with the -misa= option): Known PTX versi

[committed][nvptx] Add march alias for misa

2022-03-29 Thread Tom de Vries via Gcc-patches
Hi, The target option misa has the following description: ... $ gcc --target-help 2>&1 | grep misa -misa= Specify the PTX ISA target architecture to use. ... The name misa is somewhat poorly chosen. It suggests that for a use -misa=sm_30, sm_30 is the name of a specific In

[committed][nvptx] Add march-map

2022-03-29 Thread Tom de Vries via Gcc-patches
Hi, Say we have an sm_50 board, and we want to run a benchmark using the highest possible march setting. Currently there's march=sm_30, march=sm_35, march=sm_53, but no march=sm_50. So, we'd need to pick march=sm_35. Likewise, for a test script that handles multiple boards, we'd need a mapping

[committed][nvptx] Update help text for m64

2022-03-29 Thread Tom de Vries via Gcc-patches
Hi, In the docs we have for m64: ... Ignored, but preserved for backward compatibility. Only 64-bit ABI is supported. ... But with --target-help, we have instead: ... $ gcc --target-help ... -m64Generate code for a 64-bit ABI. ... which could be interpreted as meaning that generating cod

[PATCH][nvptx, doc] Update misa and mptx, add march and march-map

2022-03-29 Thread Tom de Vries via Gcc-patches
Hi, Update nvptx documentation: - Use meaningful terms: "PTX ISA target architecture" and "PTX ISA version". - Remove invalid claim that "ISA strings must be lower-case". - Add missing sm_xx entries. - Fix default ISA. - Add march, copying misa doc. - Declare misa an march alias. - Add march-map.

[committed][nvptx] Add __PTX_ISA_VERSION_{MAJOR,MINOR}__

2022-03-29 Thread Tom de Vries via Gcc-patches
Hi, Add preprocessor macros __PTX_ISA_VERSION_MAJOR__ and __PTX_ISA_VERSION_MINOR__. For the default 6.0, we have: ... $ echo | cc1 -E -dD - 2>&1 | grep PTX_ISA_VERSION #define __PTX_ISA_VERSION_MAJOR__ 6 #define __PTX_ISA_VERSION_MINOR__ 0 ... and for 3.1, we have: ... $ echo | cc1 -mptx=3.1

Re: [PATCH][nvptx, doc] Update misa and mptx, add march and march-map

2022-03-30 Thread Tom de Vries via Gcc-patches
On 3/29/22 16:28, Tobias Burnus wrote: Hi Tom, On 29.03.22 15:39, Tom de Vries wrote: Any comments? +(e.g.@: @samp{sm_35}).  Valid architecture strings are @samp{sm_30}, +@samp{sm_35}, @samp{sm_53} @samp{sm_70}, @samp{sm_75} and +@samp{sm_80}.  The default target architecture is sm_30. Missin

Re: [PATCH][nvptx, doc] Update misa and mptx, add march and march-map

2022-03-30 Thread Tom de Vries via Gcc-patches
On 3/29/22 16:47, Tobias Burnus wrote: On 29.03.22 16:28, Tobias Burnus wrote: On 29.03.22 15:39, Tom de Vries wrote: Any comments? I think it would be useful to have additionally some wording for the (new in GCC 12/new since today) macros, Agreed. i.e. something like: --- a/gcc/doc/inv

Re: [PATCH][nvptx, doc] Update misa and mptx, add march and march-map

2022-03-30 Thread Tom de Vries via Gcc-patches
On 3/30/22 11:02, Tobias Burnus wrote: On 30.03.22 10:03, Tom de Vries wrote: On 3/29/22 16:47, Tobias Burnus wrote: I think it would be useful to have additionally some wording for the (new in GCC 12/new since today) macros, [...] The macro is defined also if the option is not specified, so

[wwwdocs][patch] gcc-12: Nvptx updates.

2022-03-30 Thread Tom de Vries via Gcc-patches
[ was: Re: [wwwdocs][patch] gcc-12/changes.html: Document -misa update for nvptx ] On 3/3/22 13:27, Tobias Burnus wrote: The current wording, https://gcc.gnu.org/gcc-12/changes.html#nvptx , is outdated and (now wrongly) encourages to use -mptx=. Updated as follows. I've taken these changes a

[committed][nvptx] Fix ASM_SPEC workaround for sm_30

2022-03-31 Thread Tom de Vries via Gcc-patches
Hi, Newer versions of CUDA no longer support sm_30, and nvptx-tools as currently doesn't handle that gracefully when verifying ( https://github.com/MentorEmbedded/nvptx-tools/issues/30 ). There's a --no-verify work-around in place in ASM_SPEC, but that one doesn't work when using -Wa,--verify on

[committed][nvptx, testsuite] Fix typo in gcc.target/nvptx/march.c

2022-03-31 Thread Tom de Vries via Gcc-patches
Hi, The dg-options line in gcc.target/nvptx/march.c: ... /* { dg-options "-march=sm_30"} */ ... currently doesn't have any effect because it's missing a space between '"' and '}'. Fix this by adding the missing space. Tested on nvptx. Committed to trunk. Thanks, - Tom [nvptx, testsuite] Fix t

[committed][nvptx, testsuite] Fix gcc.target/nvptx/alias-*.c on sm_80

2022-04-01 Thread Tom de Vries via Gcc-patches
Hi, When running test-cases gcc.target/nvptx/alias-*.c on target board nvptx-none-run/-misa=sm_80 we run into fails because the test-cases add -mptx=6.3, which doesn't support sm_80. Fix this by only adding -mptx=6.3 if necessary, and simplify the test-cases by using ptx_alias feature abstraction

[committed][libgomp, testsuite, nvptx] Fix dg-output test in vector-length-128-7.c

2022-04-01 Thread Tom de Vries via Gcc-patches
Hi, When running test-case libgomp.oacc-c-c++-common/vector-length-128-7.c on an RTX A2000 (sm_86) with driver 510.60.02 I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-7.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \ output p

[PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90

2022-04-01 Thread Tom de Vries via Gcc-patches
Hi, When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run into: ... FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 \ -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.fortran/examples-

Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90

2022-04-01 Thread Tom de Vries via Gcc-patches
On 4/1/22 14:28, Thomas Schwinge wrote: Hi Tom! On 2022-04-01T13:24:40+0200, Tom de Vries wrote: When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run into: ... FAIL: libgomp.fortran/examples-4

Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90

2022-04-01 Thread Tom de Vries via Gcc-patches
On 4/1/22 17:38, Jakub Jelinek wrote: On Fri, Apr 01, 2022 at 05:34:50PM +0200, Tom de Vries wrote: Do you perhaps have an idea why it's failing? Because you call on_device_arch_nvptx () outside of !$omp target region, so unless the host device is NVPTX, it will not be true. That bit does w

Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90

2022-04-04 Thread Tom de Vries via Gcc-patches
On 4/1/22 17:57, Tom de Vries wrote: On 4/1/22 17:38, Jakub Jelinek wrote: On Fri, Apr 01, 2022 at 05:34:50PM +0200, Tom de Vries wrote: Do you perhaps have an idea why it's failing? Because you call on_device_arch_nvptx () outside of !$omp target region, so unless the host device is NVPTX, i

Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90

2022-04-04 Thread Tom de Vries via Gcc-patches
On 4/4/22 13:07, Jakub Jelinek wrote: On Mon, Apr 04, 2022 at 01:05:12PM +0200, Tom de Vries wrote: 2022-04-04 Tom de Vries * testsuite/libgomp.fortran/examples-4/on_device_arch.c: Copy from parent dir. Wouldn't just ! { dg-additional-sources ../on_device_arch.c } work? I

Re: Proposal to remove '--with-cuda-driver' (was: [wwwdocs][patch] gcc-12: Nvptx updates)

2022-04-06 Thread Tom de Vries via Gcc-patches
On 4/5/22 17:14, Thomas Schwinge wrote: Hi! Still catching up with GCC/nvptx back end changes... %-) In the following I'm not discussing the patch to document "gcc-12: Nvptx updates", but rather one aspect of the "gcc-12: Nvptx updates" themselves. ;-) On 2022-03-30T14:27:41+0200, Tom de Vr

Re: libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA

2022-04-08 Thread Tom de Vries via Gcc-patches
On 4/8/22 00:27, Thomas Schwinge wrote: Hi! On 2017-01-13T19:11:23+0100, Jakub Jelinek wrote: Especially for distributions it is undesirable to need to have proprietary CUDA libraries and headers installed when building GCC. --- libgomp/plugin/configfrag.ac.jj 2017-01-13 12:07:56.

Re: [committed][nvptx] Fix ASM_SPEC workaround for sm_30

2022-04-11 Thread Tom de Vries via Gcc-patches
On 4/7/22 16:17, Thomas Schwinge wrote: Hi! On 2022-03-31T09:40:47+0200, Tom de Vries via Gcc-patches wrote: Newer versions of CUDA no longer support sm_30, and nvptx-tools as currently doesn't handle that gracefully when verifying ( https://github.com/MentorEmbedded/nvptx-tools/issu

Re: [RFC] ldist: Recognize rawmemchr loop patterns

2022-01-31 Thread Tom de Vries via Gcc-patches
On 9/17/21 10:08, Richard Biener via Gcc-patches wrote: On Mon, Sep 13, 2021 at 4:53 PM Stefan Schulze Frielinghaus wrote: On Mon, Sep 06, 2021 at 11:56:21AM +0200, Richard Biener wrote: On Fri, Sep 3, 2021 at 10:01 AM Stefan Schulze Frielinghaus wrote: On Fri, Aug 20, 2021 at 12:35:58PM +

[PATCH][ldist] Don't add lib calls with -fno-tree-loop-distribute-patterns

2022-01-31 Thread Tom de Vries via Gcc-patches
[ was: Re: [RFC] ldist: Recognize rawmemchr loop patterns ] On 1/31/22 16:00, Richard Biener wrote: I'm running into PR56888 ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888 ) on nvptx due to this, f.i. in gcc/testsuite/gcc.c-torture/execute/builtins/strlen.c, where gcc/testsuite/gcc.c-tortu

[committed][libgomp, testsuite] Reduce recursion depth in declare_target-*.f90

2022-01-31 Thread Tom de Vries via Gcc-patches
Hi, When running the libgomp testsuite with GOMP_NVPTX_JIT=-O0 using an nvptx accelerator (Nvidia T400, 2GB), I run into: ... libgomp: cuCtxSynchronize error: unspecified launch failure \ (perhaps abort was called) libgomp: cuMemFree_v2 error: unspecified launch failure libgomp: device finaliz

[committed][libgomp, testsuite] Fix insufficient resources in test-cases

2022-01-31 Thread Tom de Vries via Gcc-patches
Hi, When running libgomp test-case broadcast-many.c on an nvptx accelerator (T400, driver version 470.86), I run into: ... libgomp: The Nvidia accelerator has insufficient resources to launch \ 'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \ recompile the program with 'num_wor

[committed][nvptx] Fix reduction lock

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, When I run the libgomp test-case reduction-cplx-dbl.c on an nvptx accelerator (T400, driver version 470.86), I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \ execution test FA

[committed][nvptx] Add some support for .local atomics

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, The ptx insn atom doesn't support local memory. In case of doing an atomic operation on local memory, we run into: ... operation not supported on global/shared address space ... This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE. The message is somewhat confusing given

[committed][nvptx] Handle nop in prevent_branch_around_nothing

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, When running libgomp test-case reduction-7.c on an nvptx accelerator (T400, driver version 470.86) and GOMP_NVPTX_JIT=-O0, I run into: ... reduction-7.exe:reduction-7.c:312: v_p_2: \ Assertion `out[j * 32 + i] == (i + j) * 2' failed. FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reductio

[committed][nvptx] Update bar.sync for ptx isa 6.0

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, In ptx isa 6.0, a new barrier instruction was added, and bar.sync was redefined as barrier.sync.aligned. The aligned modifier indicates that all threads in a CTA will execute the same barrier instruction. The seems fine for a form "bar.sync 0". But a "bar.sync %rx,64" (as used for vector le

[committed][nvptx] Update default ptx isa to 6.3

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, With the following example, minimized from parallel-dims.c: ... int main (void) { int vectors_max = -1; #pragma acc parallel num_gangs (1) num_workers (1) copy (vectors_max) { for (int i = 0; i < 2; i++) for (int j = 0; j < 2; j++) #pragma acc loop vector reduction (max

[committed][nvptx] Add bar.warp.sync

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, On a GT 1030 (sm_61), with driver version 470.94 I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \ -O2 execution test ... which minimizes to the same test-case as listed in commit "[nvptx

[committed][nvptx] Add uniform_warp_check insn

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, On a GT 1030, with driver version 470.94 and -mptx=3.1 I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \ -O2 execution test ... which minimizes to the same test-case as listed in commit "

Re: Add 'libgomp.oacc-c-c++-common/private-atomic-1.c' [PR83812] (was: [PATCH][testsuite, nvptx] Add effective target sync_int_long_stack)

2022-02-03 Thread Tom de Vries via Gcc-patches
On 2/3/22 10:40, Thomas Schwinge wrote: Hi Tom! On 2021-05-19T14:56:17+0200, I wrote: On 2020-08-12T15:57:23+0200, Tom de Vries wrote: When enabling sync_int_long for nvptx, we run into a failure in gcc.dg/pr86314.c: ... nvptx-run: error getting kernel result: operation not supported on \

Re: [Patch][wwwdocs + gcc] nvptx – update for -mptx change – gcc-12/changes.html + gcc/docs/invoke.texi

2022-02-04 Thread Tom de Vries via Gcc-patches
On 2/2/22 09:30, Tobias Burnus wrote: This patch updates the documentation for Tom's change of the default -mptx= version - mentioning also -mptx=7.0. I forgot whether ptx = 7.0 was working fine or whether there was a reason not to mention it. A ptx version is experimental if all sm versions i

[committed][nvptx] Fix .local atomic regressions

2022-02-08 Thread Tom de Vries via Gcc-patches
Hi, In PR target/104364, two problems were reported: - in muniform-simt mode, an atom.cas insn is no longer executed in the "master lane" only. - in msoft-stack mode, an __atomic_compare_exchange_n on stack memory is translated assuming it accesses local memory, while that's not the case. Fix

[committed][testsuite] Require c99_runtime to run builtin-sprintf.c

2022-02-08 Thread Tom de Vries via Gcc-patches
Hi, On nvptx, I run into an execution failure in test-case gcc.dg/tree-ssa/builtin-sprintf.c because the test-case uses the 'hh' modifier. The port uses newlib, which does by default not support that modifier. There's a configure option --enable-newlib-io-c99-formats to enable this support, but t

[committed][testsuite] Require c99_runtime to run builtin-sprintf.c

2022-02-08 Thread Tom de Vries via Gcc-patches
Hi, On nvptx, I run into an execution failure in test-case gcc.dg/tree-ssa/builtin-sprintf.c because the test-case uses the 'hh' modifier. The port uses newlib, which does by default not support that modifier. There's a configure option --enable-newlib-io-c99-formats to enable this support, but t

[committed][nvptx] Choose -mptx default based on -misa

2022-02-08 Thread Tom de Vries via Gcc-patches
Hi, While testing with driver version 390.147 I ran into the problem that it doesn't support ptx isa version 6.3 (the new default), only 6.1. Furthermore, using the -mptx option is a bit user-unfriendly. Say we want to compile for sm_80. We can use -misa=sm_80 to specify that, but then run into

Re: [committed][nvptx] Choose -mptx default based on -misa

2022-02-08 Thread Tom de Vries via Gcc-patches
On 2/8/22 13:57, Tom de Vries via Gcc-patches wrote: +static const char * +sm_version_to_string (enum ptx_isa sm) +{ + switch (sm) +{ +case PTX_ISA_SM30: + return "30"; +case PTX_ISA_SM35: + return "35"; +case PTX_ISA_SM53: + return "53

Re: [committed][nvptx] Choose -mptx default based on -misa

2022-02-08 Thread Tom de Vries via Gcc-patches
On 2/8/22 14:24, Tobias Burnus wrote: Hi Tom, if I understand the patch correctly, -misa=sm_53 -mptx=3.1 will ... On 08.02.22 13:57, Tom de Vries via Gcc-patches wrote: Furthermore, using the -mptx option is a bit user-unfriendly. Say we want to compile for sm_80.  We can use -misa=sm_80 to

[committed][nvptx] Unbreak build, add PTX_ISA_SM70

2022-02-08 Thread Tom de Vries via Gcc-patches
Hi, With the commit "[nvptx] Choose -mptx default based on -misa" I introduced a use of PTX_ISA_SM70, without adding it first. Add it, as well as the corresponding TARGET_SM70. Build for x86_64 with nvptx accelerator. Committed to trunk. Thanks, - Tom [nvptx] Unbreak build, add PTX_ISA_SM70

Re: [PATCH] nvptx: Improved support for HFMode including neghf2 and abshf2.

2022-02-10 Thread Tom de Vries via Gcc-patches
On 1/8/22 13:21, Roger Sayle wrote: This patch adds more support for _Float16 (HFmode) to the nvptx backend. Currently negation, absolute value and floating point comparisons are implemented by promoting to float (SFmode). This patch adds suitable define_insns to nvptx.md, most conditional on T

Re: [PATCH] nvptx: Expand QI mode operations using SI mode instructions.

2022-02-10 Thread Tom de Vries via Gcc-patches
On 1/10/22 11:58, Roger Sayle wrote: One of the unusual target features of the Nvidia PTX ISA is that it doesn't provide QI mode (byte sized) operations or registers. [ FWIW: I recently happened to check this, and it actually supports .u8/.s8/.b8 regs, but indeed just for very few operations

Re: [PATCH] nvptx: Add support for 64-bit mul.hi (and other) instructions.

2022-02-10 Thread Tom de Vries via Gcc-patches
On 1/14/22 10:54, Roger Sayle wrote: Now that the middle-end MULT_HIGHPART_EXPR pieces are in place, this patch adds support for nvptx's mul.hi.s64 and mul.hi.u64 instructions, as previously reviewed (provisionally pre-approved) back in August 2020: https://gcc.gnu.org/pipermail/gcc-patches/2020

Re: [PATCH] nvptx: Fix and use BI mode logic instructions (e.g. and.pred).

2022-02-10 Thread Tom de Vries via Gcc-patches
On 1/16/22 12:49, Roger Sayle wrote: This patch adds support for nvptx's BImode and.pred, or.pred and xor.pred instructions. Technically, nvptx.md previously defined andbi3, iorbi3 and xorbi3 instructions, but the assembly language mnemonic output for these was incorrect (e.g. and.b1) and would

Re: [PATCH] PR target/104345: Use nvptx "set" instruction for cond ? -1 : 0.

2022-02-10 Thread Tom de Vries via Gcc-patches
On 2/3/22 22:00, Roger Sayle wrote: This patch addresses the "increased register pressure" regression on nvptx-none caused by my change to transition the backend to a STORE_FLAG_VALUE = 1 target. This improved code generation for the more common case of producing 0/1 Boolean values, but

Re: [PATCH] nvptx: Tweak constraints on copysign instructions.

2022-02-10 Thread Tom de Vries via Gcc-patches
On 2/8/22 14:09, Roger Sayle wrote: Many thanks to Thomas Schwinge for confirming my hypothesis that the register usage regression, PR target/104345, is solely due to libgcc's _muldc3 function. In addition to the isinf functionality in the previously proposed nvptx patch at https://gcc.gnu.org/p

[committed][nvptx] Workaround sub.u16 driver JIT bug

2022-02-10 Thread Tom de Vries via Gcc-patches
Hi, There's a nvidia driver JIT bug that mishandles this code (minimized from builtin-arith-overflow-15.c): ... int main (void) { signed char r; unsigned char y = (unsigned char) 0x80; if (__builtin_sub_overflow ((unsigned char)0, (unsigned char)y, &r)) __builtin_abort (); return 0; }

[committed][nvptx] Handle pre-sm_7x shared atomic store using atomic exchange

2022-02-10 Thread Tom de Vries via Gcc-patches
Hi, The ptx isa specifies (for pre-sm_7x) that atomic operations on shared memory locations do not guarantee atomicity with respect to normal store instructions to the same address. This can be fixed by: - inserting barriers between normal stores and atomic operations to a common address - usin

[committed][nvptx] Handle sm_7x shared atomic store more optimal

2022-02-10 Thread Tom de Vries via Gcc-patches
Hi, For sm_7x atomic stores we fall back on expand_atomic_store, but this results in using membar.sys for shared stores. Fix this by adding an nvptx_atomic_store insn that adds a membar.cta for a shared store. Tested on x86_64 with nvptx accelerator. Committed to trunk. Thanks, - Tom [nvptx]

[PATCH][libgomp, openacc] Add terminating spinlock test-cases

2022-02-10 Thread Tom de Vries via Gcc-patches
Hi, The OpenACC execution model states that implementing a critical section across workers using atomic operations and a busy-wait loop may never succeed, since the scheduler may suspend the worker that owns the lock, in which case the worker waiting on the lock can never complete. Add a test-cas

  1   2   >