nvptx: Support '-mfake-ptx-alloca': defer failure to run-time 'alloca' usage (was: [PUSHED] nvptx: Support '-mfake-ptx-alloca')

2025-04-07 Thread Thomas Schwinge
Hi! On 2025-02-27T21:51:11+0100, I wrote: > With '-mfake-ptx-alloca' enabled, the user-visible behavior changes only > for configurations where PTX 'alloca' is not available. Rather than a > compile-time 'sorry, unimplemented: dynamic stack allocation not s

nvptx: Don't use PTX '.const', constant state space [PR119573] (was: 'TREE_READONLY' for 'const' array in C vs. C++)

2025-04-03 Thread Thomas Schwinge
Hi! I have, by the way, filed <https://gcc.gnu.org/PR119573> "nvptx: PTX '.const', constant state space" for this topic. On 2025-04-01T09:32:46+0200, Jakub Jelinek wrote: > On Tue, Apr 01, 2025 at 09:19:08AM +0200, Richard Biener via Gcc wrote: >> On Tue, Apr

Re: [PUSHED] nvptx: Build libgfortran with '-mfake-ptx-alloca' [PR107635]

2025-02-28 Thread Andre Vehreschild
Hi Harald, my "rant" was more about "Why would one spend time with a library meant for testing only." I totally agree that the one code base approach is one fine way to go. I didn't not want to insult anyone and apologize, if I did. Finally this discussion made me think, what it would need to ha

Re: [PUSHED] nvptx: Build libgfortran with '-mfake-ptx-alloca' [PR107635]

2025-02-28 Thread Harald Anlauf
Am 28.02.25 um 08:24 schrieb Andre Vehreschild: Hi Thomas, are you really telling me, that gfortran's coarray test library is compiled for offloading to GPU (or other SIMD processors)? Because that's what NVPTX is used for most, right? In my opinion that makes no sense, because coarrays in Fortr

Re: [PUSHED] nvptx: Build libgfortran with '-mfake-ptx-alloca' [PR107635]

2025-02-27 Thread Andre Vehreschild
e got 'alloca' usage > in 'libgfortran/caf/single.c:_gfortran_caf_transfer_between_remotes', and > the libgfortran target library fails to build for legacy configurations where > PTX 'alloca' is not available: > > ../../../../source-gcc/libgfortran/caf/single.c: In function >

[PUSHED] nvptx: Build libgfortran with '-mfake-ptx-alloca' [PR107635]

2025-02-27 Thread Thomas Schwinge
As of recent commit 8bf0ee8d62b8a08e808344d31354ab713157e15d "Fortran: Add transfer_between_remotes [PR107635]", we've got 'alloca' usage in 'libgfortran/caf/single.c:_gfortran_caf_transfer_between_remotes', and the libgfortran target library fails to build

[PUSHED] nvptx: Support '-mfake-ptx-alloca'

2025-02-27 Thread Thomas Schwinge
With '-mfake-ptx-alloca' enabled, the user-visible behavior changes only for configurations where PTX 'alloca' is not available. Rather than a compile-time 'sorry, unimplemented: dynamic stack allocation not supported' in presence of dynamic stack allocation,

Refactor duplicated code into 'gcc/testsuite/lib/gcc-dg.exp:find-dg-do-what' (was: nvptx: PTX 'alloca' for '-mptx=7.3'+, '-march=sm_52'+ [PR65181])

2025-02-22 Thread Thomas Schwinge
Hi! On 2025-02-22T22:49:47+0100, I wrote: > On 2025-01-09T14:21:18+0100, I wrote: >> Pushed to trunk branch commit 3861d362ec7e3c50742fc43833fe9d8674f4070e >> "nvptx: PTX 'alloca' for '-mptx=7.3'+, '-march=sm_52'+ [PR65181]", >> [...]

Refactor duplicated code into 'gcc/testsuite/lib/gcc-dg.exp:find-dg-do-what' (was: nvptx: PTX 'alloca' for '-mptx=7.3'+, '-march=sm_52'+ [PR65181])

2025-02-22 Thread Thomas Schwinge
Hi! On 2025-01-09T14:21:18+0100, I wrote: > Pushed to trunk branch commit 3861d362ec7e3c50742fc43833fe9d8674f4070e > "nvptx: PTX 'alloca' for '-mptx=7.3'+, '-march=sm_52'+ [PR65181]", > [...] > --- a/gcc/testsuite/lib/target-supports.exp > +

nvptx: Gracefully handle '-mptx=3.1' if neither sm_30 nor sm_35 multilib variant is built (was: nvptx: Support '--with-multilib-list' (was: Raise nvptx code generation to default PTX ISA 7.3, sm_52, t

2025-01-20 Thread Thomas Schwinge
Hi! On 2024-12-06T12:03:22+0100, I wrote: > Pushed to trunk branch commit 86b3a7532d56f74fcd1c362f2da7f95e8cc4e4a6 > "nvptx: Support '--with-multilib-list'", [...] Pushed to trunk branch commit 6c5937991bd744a4916e9cf65eb5d9c9b5706120 "nvptx: Gracefully handle '-mptx=3.1' if neither sm_30 nor sm_

nvptx: PTX 'alloca' for '-mptx=7.3'+, '-march=sm_52'+ [PR65181] (was: Raise nvptx code generation to default PTX ISA 7.3, sm_52, therefore CUDA 11.3 (released 2021-04))

2025-01-09 Thread Thomas Schwinge
Hi! On 2024-09-20T18:49:46+0200, I wrote: > We'd like to raise nvptx code generation from PTX ISA 6.0, sm_30 "Kepler" > to default PTX ISA 7.3, sm_52 "Maxwell", therefore CUDA 11.3 (2021-04). > This is, primarily, so that we're able to use 'alloca&#x

nvptx: For '-march=sm_52' and higher, default at least to '-mptx=7.3' (was: Raise nvptx code generation to default PTX ISA 7.3, sm_52, therefore CUDA 11.3 (released 2021-04))

2025-01-08 Thread Thomas Schwinge
Hi! On 2024-09-20T18:49:46+0200, I wrote: > We'd like to raise nvptx code generation from PTX ISA 6.0, sm_30 "Kepler" > to default PTX ISA 7.3, sm_52 "Maxwell", therefore CUDA 11.3 (2021-04). > This is, primarily, so that we're able to use 'alloca&#x

[PUSHED 2/2] nvptx: Handle '__builtin_stack_save()' in a well-behaved way for PTX "native" stacks [PR65181]

2025-01-08 Thread Thomas Schwinge
)] + "!TARGET_SOFT_STACK" +{ + /* The concept of a '%stack' pointer doesn't apply like this for + PTX "native" stacks. GCC however occasionally synthesizes + '__builtin_stack_save ()', '__builtin_stack_restore ()', and is

[PUSHED] nvptx: Clarify that the PTX "native" stack pointer is handled implicitly at function level [PR65181]

2025-01-08 Thread Thomas Schwinge
-86,6 +86,13 @@ #define Pmode (TARGET_ABI64 ? DImode : SImode) #define STACK_SIZE_MODE Pmode +/* We always have to maintain the '-msoft-stack' pointer, but the PTX "native" + stack pointer is handled implicitly at function level. */ +#define STACK_SAVEAREA_MODE(LEV

nvptx: Switch default from '-march=sm_30' to '-march=sm_52' (was: Raise nvptx code generation to default PTX ISA 7.3, sm_52, therefore CUDA 11.3 (released 2021-04))

2024-12-09 Thread Thomas Schwinge
Hi! On 2024-09-20T18:49:46+0200, Thomas Schwinge wrote: > We'd like to raise nvptx code generation from PTX ISA 6.0, sm_30 "Kepler" > to default PTX ISA 7.3, sm_52 "Maxwell", therefore CUDA 11.3 (2021-04). > This is, primarily, so that we're able to use &#

nvptx: Clarify that our baseline is PTX ISA Version 3.1 (was: [committed][nvptx] Choose -mptx default based on -misa)

2024-12-06 Thread Thomas Schwinge
N_4_2, >PTX_VERSION_6_0, >PTX_VERSION_6_3, >PTX_VERSION_7_0 Pushed to trunk branch commit 380ceb23b130a2b9ec541607a3eb1ffd0387c576 "nvptx: Clarify that our baseline is PTX ISA Version 3.1", see attached. Grüße Thomas >From 380ceb23b130a2b9ec541607a3eb1ffd0387c5

nvptx: Support '--with-multilib-list' (was: Raise nvptx code generation to default PTX ISA 7.3, sm_52, therefore CUDA 11.3 (released 2021-04))

2024-12-06 Thread Thomas Schwinge
duplicates are filtered out. +If @option{--with-multilib-list} is not specified, then +@option{--with-multilib-list=default} is assumed. +For @samp{sm_30}, @samp{sm_35} target libraries, @option{-mptx-3.1} +sub-variants are additionally built. + @item riscv*-*-* @var{list} is a single ABI name.

Re: GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools p

2024-03-07 Thread Jakub Jelinek
On Thu, Mar 07, 2024 at 12:53:31PM +0100, Thomas Schwinge wrote: > >From 6a6520e01f7e7118b556683c2934f2c64c6dbc81 Mon Sep 17 00:00:00 2001 > From: Thomas Schwinge > Date: Thu, 7 Mar 2024 12:31:52 +0100 > Subject: [PATCH] GCN, nvptx: Fatal error for missing symbols in > 'libhsa-runtime64.so.1', 'l

GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patch

2024-03-07 Thread Thomas Schwinge
Hi! On 2017-01-13T19:11:23+0100, Jakub Jelinek wrote: > [...] If the nvptx libgomp plugin is installed, but libcuda.so.1 > can't be found, then the plugin behaves as if there are no PTX devices > available. [...] ACK. > --- libgomp/plugin/plugin-nvptx.c.jj 2017-01-13 12:07:5

Move 'g++.dg/abi/nvptx-[...].C' -> 'g++.target/nvptx/abi-[...].C' (was: [PTX] parameters and return values)

2023-09-18 Thread Thomas Schwinge
,4 +1,4 @@ -// { dg-do compile { target nvptx-*-* } } +// { dg-do compile } // { dg-additional-options "-m64" } // Check NRV optimization doesn't change the PTX prototypes. diff --git a/gcc/testsuite/g++.dg/abi/nvptx-ptrmem1.C b/gcc/testsuite/g++.target/nvptx/abi-ptrmem1.C sim

Fix up 'g++.dg/abi/nvptx-ptrmem1.C' (was: [PTX] more register cleanups)

2023-09-18 Thread Thomas Schwinge
Hi! On 2015-12-15T15:49:16-0500, Nathan Sidwell wrote: > this patch uses reg_names array to emit register names, rather than have > knowledge scattered throughout the PTX backend. Also, converted > write_fn_proto_from_insn to use (renamed) write_arg_mode and (new) > write_return_m

'include/cuda/cuda.h': Add parts necessary for nvptx-tools 'nvptx-run' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))

2022-05-18 Thread Thomas Schwinge
Hi! On 2017-01-13T19:11:23+0100, Jakub Jelinek wrote: > cuda.h header included > in this patch In order to be able to use that file without changes for nvptx-tools 'nvptx-run', I've pushed to GCC master branch commit 86f64400a5692499856d41462461327b93f82b8d "'include/cuda/cuda.h': Add parts nece

'include/cuda/cuda.h': For C++, wrap in 'extern "C"' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))

2022-05-18 Thread Thomas Schwinge
Hi! On 2017-01-13T19:11:23+0100, Jakub Jelinek wrote: > cuda.h header included > in this patch To make this '#include'able in C++ code, I've pushed to master branch commit bdd1dc1bfbe1492edf3ce5e4288cfbc55be329ab "'include/cuda/cuda.h': For C++, wrap in 'extern "C"'", see attached. Grüße Thom

libgomp plugins: Don't 'AC_SUBST' and 'AC_DEFINE_UNQUOTED' for 'PLUGIN_GCN', 'PLUGIN_NVPTX' (was: [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin)

2022-05-12 Thread Thomas Schwinge
Hi! On 2014-09-23T19:19:31+0100, Julian Brown wrote: > This patch contains the bulk of the OpenACC 2.0 runtime support, > building around, or on top of, the OpenMP 4.0 support (as previously > posted or already extant upstream) where we could. [...] > --- a/libgomp/Makefile.am > +++ b/libgomp/Ma

Re: libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA

2022-04-08 Thread Tom de Vries via Gcc-patches
On 4/8/22 00:27, Thomas Schwinge wrote: Hi! On 2017-01-13T19:11:23+0100, Jakub Jelinek wrote: Especially for distributions it is undesirable to need to have proprietary CUDA libraries and headers installed when building GCC. --- libgomp/plugin/configfrag.ac.jj 2017-01-13 12:07:56.

libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA bein

2022-04-07 Thread Thomas Schwinge
toolexeclib_HEADERS = libgomp.spec # -Wc is only a libtool option. @@ -559,16 +569,18 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \ oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \ oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \ affinity-fmt.c teams.c al

Re: Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))

2022-04-06 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 06, 2022 at 02:39:18PM +0200, Thomas Schwinge wrote: > ... so that it may be used by other projects that inherit GCC's 'include' > directory. > > include/ > * cuda/cuda.h: New file. > libgomp/ > * plugin/cuda/cuda.h: Remove file. > * plugin/plugin-nvptx.c

Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))

2022-04-06 Thread Thomas Schwinge
Hi! On 2017-01-13T19:11:23+0100, Jakub Jelinek wrote: > Especially for distributions it is undesirable to need to have proprietary > CUDA libraries and headers installed when building GCC. > I've talked to our lawyers and they said that the cuda.h header included > in this patch doesn't infringe

Re: [RFC][nvptx] Initialize ptx regs

2022-02-21 Thread Richard Biener via Gcc-patches
rsion 510.47.03 and board GT 1030 I, we run > >> into: > >> ... > >> FAIL: gcc.c-torture/execute/pr53465.c -O1 execution test > >> FAIL: gcc.c-torture/execute/pr53465.c -O2 execution test > >> FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test > >>

Re: [RFC][nvptx] Initialize ptx regs

2022-02-21 Thread Tom de Vries via Gcc-patches
-O2 execution test FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test ... while the test-cases pass with nvptx-none-run -O0. The problem is that the generated ptx contains a read from an uninitialized ptx register, and the driver JIT doesn't handle this well. For -O2 and -O3, we can ge

Re: [RFC][nvptx] Initialize ptx regs

2022-02-20 Thread Richard Biener via Gcc-patches
on test > FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test > ... > while the test-cases pass with nvptx-none-run -O0. > > The problem is that the generated ptx contains a read from an uninitialized > ptx register, and the driver JIT doesn't handle this well. >

[RFC][nvptx] Initialize ptx regs

2022-02-20 Thread Tom de Vries via Gcc-patches
nvptx-none-run -O0. The problem is that the generated ptx contains a read from an uninitialized ptx register, and the driver JIT doesn't handle this well. For -O2 and -O3, we can get rid of the FAIL using --param logical-op-non-short-circuit=0. But not for -O1. At -O1, the test-case minimiz

PTX code generation (was: [PATCH] PR target/104345: Use nvptx "set" instruction for cond ? -1 : 0)

2022-02-04 Thread Thomas Schwinge
x back end is generating some rather high-level IR (PTX) targeting a "black hole": not knowing what exactly the Nvidia/CUDA Driver, PTX -> SASS compiler are going to do with it. (Well, similar problem also exists for more traditional ISAs if CPU microcode etc. is involved, but it'

[committed][nvptx] Update default ptx isa to 6.3

2022-02-01 Thread Tom de Vries via Gcc-patches
y setting the ptx isa to 6.3 by default, which allows the use of shfl.sync. Tested on x86_64 with nvptx accelerator. Committed to trunk. Thanks, - Tom [nvptx] Update default ptx isa to 6.3 gcc/ChangeLog: 2022-01-27 Tom de Vries * config/nvptx/nvptx.opt (mptx): Set to PTX_VERSION_6_3 b

[committed][nvptx] Update bar.sync for ptx isa 6.0

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, In ptx isa 6.0, a new barrier instruction was added, and bar.sync was redefined as barrier.sync.aligned. The aligned modifier indicates that all threads in a CTA will execute the same barrier instruction. The seems fine for a form "bar.sync 0". But a "bar.sync %rx,64"

Re: [PATCH] nvptx: bump default to PTX 4.1

2022-01-05 Thread Tom de Vries via Gcc-patches
On 1/5/22 11:33, Andrew Stubbs wrote: On 05/01/2022 10:24, Tom de Vries wrote: On 12/21/21 12:33, Andrew Stubbs wrote: On 20/12/2021 15:58, Andrew Stubbs wrote: In order to support the %dynamic_smem_size PTX feature is is necessary to bump the minimum supported PTX version from 3.1 (~2013

Re: [PATCH] nvptx: bump default to PTX 4.1

2022-01-05 Thread Andrew Stubbs
On 05/01/2022 10:24, Tom de Vries wrote: On 12/21/21 12:33, Andrew Stubbs wrote: On 20/12/2021 15:58, Andrew Stubbs wrote: In order to support the %dynamic_smem_size PTX feature is is necessary to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014). Tobias has pointed out

Re: [PATCH] nvptx: bump default to PTX 4.1

2022-01-05 Thread Tom de Vries via Gcc-patches
On 12/21/21 12:33, Andrew Stubbs wrote: On 20/12/2021 15:58, Andrew Stubbs wrote: In order to support the %dynamic_smem_size PTX feature is is necessary to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014). Tobias has pointed out, privately, that the default version is

[PATCH] nvptx: bump default to PTX 4.1

2021-12-21 Thread Andrew Stubbs
On 20/12/2021 15:58, Andrew Stubbs wrote: In order to support the %dynamic_smem_size PTX feature is is necessary to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014). Tobias has pointed out, privately, that the default version is both documented and encoded in the -mptx

Re: [PATCH] nvptx: Add support for PTX highpart multiplications (e.g. mul.hi.s32)

2020-08-04 Thread Tom de Vries
me time. > > Ok for mainline (once the previous patch has been approved/pushed)? I've committed the HImode/SImode part of the patches (as attached below). DImode part is OK once the respective tests starts passing. Thanks, - Tom [PATCH] nvptx: Add support for PTX highpart multiplications (

[PATCH] nvptx: Add support for PTX highpart multiplications (e.g. mul.hi.s32)

2020-08-04 Thread Roger Sayle
This patch adds support for signed and unsigned, HImode, SImode and DImode highpart multiplications to the nvptx backend. Without the middle-end patch that I've just posted, the middle-end is able to (easily) make use of the narrow four of the six instructions, but with that patch, all six of the

Re: [PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904)

2019-01-15 Thread Tom de Vries
[ add gcc-patches@ ] On 15-01-19 11:38, Tom de Vries wrote: > Hi > > Copied from here ( > https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00532.html ): >> This too. Retested for libgomp/NVPTX. >> >> OK for trunk now? >> > > The plugin-nvptx.c part looks ok to me, for stage 1. > > Thanks, > - Tom

Re: [PATCH,PTX] Add support for CUDA 9

2018-03-02 Thread Thomas Schwinge
Hi! On Tue, 27 Feb 2018 15:12:47 +0100, Richard Biener wrote: > On Tue, 27 Feb 2018, Thomas Schwinge wrote: > > Given that several users have run into this, is this (trunk r256891) OK > > to commit to open release branches, too. > > Sure. Committed to gcc-7-branch in r258126: commit f0888f1155

Re: [PATCH,PTX] Add support for CUDA 9

2018-02-27 Thread Richard Biener
On Tue, 27 Feb 2018, Thomas Schwinge wrote: > Hi! > > Given that several users have run into this, is this (trunk r256891) OK > to commit to open release branches, too. Sure. > On Fri, 19 Jan 2018 09:42:08 +0100, Tom de Vries > wrote: > > On 01/19/2018 01:59 AM, Cesar Philippidis wrote: > > >

Re: [PATCH,PTX] Add support for CUDA 9

2018-02-27 Thread Thomas Schwinge
Hi! Given that several users have run into this, is this (trunk r256891) OK to commit to open release branches, too? On Fri, 19 Jan 2018 09:42:08 +0100, Tom de Vries wrote: > On 01/19/2018 01:59 AM, Cesar Philippidis wrote: > > Here's the updated patch with the changes that you requested. There

Re: [PATCH,PTX] Add support for CUDA 9

2018-01-19 Thread Tom de Vries
On 01/19/2018 01:59 AM, Cesar Philippidis wrote: Here's the updated patch with the changes that you requested. There are no new regressions in trunk. I tested it on my desktop running driver 387.34 on a Pascal GPU. Is this OK for trunk? OK with 'PR target/83790' added to the changelog entry.

Re: [PATCH,PTX] Add support for CUDA 9

2018-01-18 Thread Cesar Philippidis
On 12/19/2017 04:39 PM, Tom de Vries wrote: > On 12/20/2017 12:25 AM, Cesar Philippidis wrote: >> og7-ptx-cuda9.diff >> >> >> 2017-12-19  Cesar Philippidis  >> >> gcc/ >> * config/nvptx/nvptx.c (output_init_frag): Don't use generic addres

Re: [PATCH,PTX] Add support for CUDA 9

2018-01-17 Thread Tom de Vries
On 01/17/2018 06:29 PM, Cesar Philippidis wrote: Is this patch OK for trunk? You haven't made the changes I've asked for, this is the same patch as before. Thanks, - Tom

Re: [PATCH,PTX] Add support for CUDA 9

2018-01-17 Thread Cesar Philippidis
On 12/27/2017 01:16 AM, Tom de Vries wrote: > On 12/21/2017 06:19 PM, Cesar Philippidis wrote: >> My test results are somewhat inconsistent. On MG's build servers, there >> are no regressions in CUDA 8. > > Ack. > >> On my laptop, there are fewer regressions >> in CUDA 9, than CUDA 8. > > If th

Re: [PATCH,PTX] Add support for CUDA 9

2017-12-27 Thread Tom de Vries
On 12/21/2017 06:19 PM, Cesar Philippidis wrote: My test results are somewhat inconsistent. On MG's build servers, there are no regressions in CUDA 8. Ack. On my laptop, there are fewer regressions in CUDA 9, than CUDA 8. If the patch causes regressions for either cuda 8 or cuda 9, then th

Re: [PATCH,PTX] Add support for CUDA 9

2017-12-21 Thread Cesar Philippidis
ctions >>>> as generic address spaces as part of their PTX 6.0 changes. More >>>> specifically, >>>> <http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#changes-in-ptx-isa-version-6-0>: >>>> >>>> >>>> >>&

Re: [PATCH,PTX] Add support for CUDA 9

2017-12-20 Thread Tom de Vries
On 12/20/2017 11:59 PM, Cesar Philippidis wrote: On 12/19/2017 04:39 PM, Tom de Vries wrote: On 12/20/2017 12:25 AM, Cesar Philippidis wrote: In CUDA 9, Nvidia removed support for treating the labels of functions as generic address spaces as part of their PTX 6.0 changes. More specifically

Re: [PATCH,PTX] Add support for CUDA 9

2017-12-20 Thread Cesar Philippidis
On 12/19/2017 04:39 PM, Tom de Vries wrote: > On 12/20/2017 12:25 AM, Cesar Philippidis wrote: >> In CUDA 9, Nvidia removed support for treating the labels of functions >> as generic address spaces as part of their PTX 6.0 changes. More >> specifically, >> <http:/

Re: [PATCH,PTX] Add support for CUDA 9

2017-12-19 Thread Tom de Vries
On 12/20/2017 12:25 AM, Cesar Philippidis wrote: In CUDA 9, Nvidia removed support for treating the labels of functions as generic address spaces as part of their PTX 6.0 changes. More specifically, <http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#changes-in-ptx-isa-version-

[PATCH,PTX] Add support for CUDA 9

2017-12-19 Thread Cesar Philippidis
In CUDA 9, Nvidia removed support for treating the labels of functions as generic address spaces as part of their PTX 6.0 changes. More specifically, <http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#changes-in-ptx-isa-version-6-0>: Support for taking address of labels,

Re: [PTX] simplify movs

2017-05-22 Thread Nathan Sidwell
On 05/21/2017 03:35 AM, Tom de Vries wrote: On 12/02/2015 04:09 PM, Nathan Sidwell wrote: +/* Output a pattern for a move instruction. */ + +const char * +nvptx_output_mov_insn (rtx dst, rtx src) +{ src_inner uses dst_mode rather than GET_MODE (src). I'm trying to understand if that is inten

Re: [PTX] simplify movs

2017-05-21 Thread Tom de Vries
On 12/02/2015 04:09 PM, Nathan Sidwell wrote: +/* Output a pattern for a move instruction. */ + +const char * +nvptx_output_mov_insn (rtx dst, rtx src) +{ + machine_mode dst_mode = GET_MODE (dst); + machine_mode dst_inner = (GET_CODE (dst) == SUBREG + ? GET_MODE (XEXP

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-05-04 Thread Thomas Schwinge
Hi! On Wed, 3 May 2017 11:00:14 +0200, Jakub Jelinek wrote: > On Sat, Jan 21, 2017 at 03:50:43PM +0100, Thomas Schwinge wrote: > > > In order to configure gcc to load libcuda.so.1 dynamically, > > > one has to either configure it --without-cuda-driver, or without > > > --with-cuda-driver=/--with-

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-05-03 Thread Jakub Jelinek
-o FILE Write output to FILE\n\ -vBe verbose\n\ + --verify Do verify output is acceptable to ptxas\n\ --no-verify Do not verify output is acceptable to ptxas\n\ --helpPrint this help and exit\n\ --version

Re: [PATCHv2 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-03-31 Thread Thomas Schwinge
Hi! On Wed, 22 Mar 2017 18:46:30 +0300, Alexander Monakov wrote: > This patchset implements privatization of addressable variables in OpenMP SIMD > regions lowered for SIMT targets (i.e. NVPTX) via the approach identified in > the review of the previous submission. [...] Given that the subject

[PATCHv2 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-03-22 Thread Alexander Monakov
Hello, This patchset implements privatization of addressable variables in OpenMP SIMD regions lowered for SIMT targets (i.e. NVPTX) via the approach identified in the review of the previous submission. Now instead of explicitly privatizing those variables as fields of an allocated struct up front

libgomp, nvptx plugin: Make "nvptx_exec" static (was: [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin)

2017-02-02 Thread Thomas Schwinge
Hi! On Tue, 23 Sep 2014 19:19:31 +0100, Julian Brown wrote: > This patch contains the bulk of the OpenACC 2.0 runtime support, [...] > --- /dev/null > +++ b/libgomp/plugin-nvptx.c > +void > +PTX_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs, > + size_t *sizes, unsign

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 08:09:27PM +0300, Alexander Monakov wrote: > > That said, I think pointers to gimple stmts in struct loop or something > > similar is problematic, you'd need to adjust those whenever something would > > remove those stmts, or e.g. duplicate the loop and stmts, handle those >

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Alexander Monakov
On Wed, 1 Feb 2017, Jakub Jelinek wrote: > > Yes; I imagine the approach taken in patch 2/5 can be extended to achieve > > this. > > That is, instead of just storing a flag 'bool in_simtreg' in struct loop, > > store > > pointers to corresponding SIMT_ENTER/EXIT gimple statements, use a similar >

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 06:44:39PM +0300, Alexander Monakov wrote: > > That said, I understand how would you add these &varN arguments during > > lowering, but don't understand what would you want to do during inlining, > > if you have addressable vars in inlined function, you need to avoid > > esc

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Alexander Monakov
On Wed, 1 Feb 2017, Jakub Jelinek wrote: > IFN_ASAN_POISON is treated that way too. That also means that if a > variable is previously addressable and the only spot that takes its address > is that IFN, it can be rewritten into SSA form, but the IFN has to be > adjusted to something different whic

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 04:28:14PM +0300, Alexander Monakov wrote: > Hi, > > Earlier Richard mentioned the possibility to special-case GOMP_SIMT_ENTER to > allow passing privatized variables to it by reference without making them > addressable. I now see that such special-casing is already done f

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Alexander Monakov
Hi, Earlier Richard mentioned the possibility to special-case GOMP_SIMT_ENTER to allow passing privatized variables to it by reference without making them addressable. I now see that such special-casing is already done for IFN_ATOMIC_COMPARE_EXCHANGE in tree-ssa.c: execute_update_addresses_taken

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-25 Thread Alexander Monakov
Hi, Here's a different approach that doesn't introduce indirection for privatized variables at all, and keeps dependencies obvious in the IR, but, on the flip side, requires mentioning all subfields of privatized structures in a few places. For each privatized variable, add it to the list of outp

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-21 Thread Jakub Jelinek
On Sat, Jan 21, 2017 at 03:50:43PM +0100, Thomas Schwinge wrote: > > In order to configure gcc to load libcuda.so.1 dynamically, > > one has to either configure it --without-cuda-driver, or without > > --with-cuda-driver=/--with-cuda-driver-lib=/--with-cuda-driver-include= > > options if cuda.h and

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-21 Thread Thomas Schwinge
t; These two patches allow building GCC without CUDA around in a way that later > on can offload to PTX if libcuda.so.1 is installed Thanks! I'd like to have some additional changes done; see the attached patch, and also some further comments below. > In order to configure gcc to load l

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Alexander Monakov
On Thu, 19 Jan 2017, Jakub Jelinek wrote: > On Thu, Jan 19, 2017 at 04:36:25PM +0300, Alexander Monakov wrote: > > > One of the problems with that is that it means that you can't easily turn > > > addressable private variables into non-addressable ones once you force > > > them > > > into such str

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-19 Thread Jakub Jelinek
On Thu, Jan 19, 2017 at 06:09:35PM +0300, Alexander Monakov wrote: > > -#ifdef __LP64__ > > +#if defined(__LP64__) || defined(_WIN64) > > > > (is that the right define for 64-bit MingW, right?). > > Yes, _WIN64; libsanitizer has a similar test. Alternatively, I guess, > > #if __SIZEOF_POINTER

Re: Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Cesar Philippidis
On 01/18/2017 06:22 AM, Richard Biener wrote: > On Wed, Jan 18, 2017 at 3:11 PM, Alexander Monakov wrote: >> On Wed, 18 Jan 2017, Richard Biener wrote: After OpenMP lowering, inlining might break this by inlining functions with address-taken locals into SIMD regions. For now, such inlin

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-19 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote: > On Wed, Jan 18, 2017 at 10:52:32PM +0300, Alexander Monakov wrote: > > Sorry for not noticing this earlier, but ... > > > > > +#ifdef __LP64__ > > > +typedef unsigned long long CUdeviceptr; > > > +#else > > > +typedef unsigned CUdeviceptr; > > > +#endif

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Jakub Jelinek
On Thu, Jan 19, 2017 at 04:36:25PM +0300, Alexander Monakov wrote: > On Wed, 18 Jan 2017, Jakub Jelinek wrote: > > > Inlining needs to do just like omp-low; if we take the current framework, > > > it > > > would need to collect addressable locals into one struct, replace > > > references to > > >

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote: > > Inlining needs to do just like omp-low; if we take the current framework, it > > would need to collect addressable locals into one struct, replace > > references to > > those locals by field references in the inlined body. Then it needs to > > appropr

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Richard Biener
On Thu, Jan 19, 2017 at 11:00 AM, Jakub Jelinek wrote: > On Thu, Jan 19, 2017 at 10:45:08AM +0100, Richard Biener wrote: >> > But in the escape analysis we could consider all the specially marked >> > "omp simt private" addressable vars to escape and thus confine them into >> > the >> > SIMT regi

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Alexander Monakov
re cannot exist other stores to this variable in the SIMT region (because otherwise the original OpenMP SIMD code contained a race). So I cannot see how this can break program semantics. Do you mean the formal race of writing the same value from active lanes? On PTX that is well-defined, and the b

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Jakub Jelinek
On Thu, Jan 19, 2017 at 10:45:08AM +0100, Richard Biener wrote: > > But in the escape analysis we could consider all the specially marked > > "omp simt private" addressable vars to escape and thus confine them into the > > SIMT region that way, right? > > We could. But that doesn't prevent vars f

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Richard Biener
On Thu, Jan 19, 2017 at 10:44 AM, Alexander Monakov wrote: > On Thu, 19 Jan 2017, Richard Biener wrote: >> >> What about motion in the other direction, upwards across SIMT_ENTER()? >> > >> > I think this is a question for Richard, whether it can be done in the alias >> > oracle. If yes, it suppos

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Richard Biener
Jan 2017, Jakub Jelinek wrote: >> >> > We are talking here about addressable vars, right (so if we turn it into >> >> > non-addressable, in the SIMT region we just use the normal PTX pseudos), >> >> > right? We could emit inner ={v} {CLOBBER};

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Alexander Monakov
On Thu, 19 Jan 2017, Richard Biener wrote: > >> What about motion in the other direction, upwards across SIMT_ENTER()? > > > > I think this is a question for Richard, whether it can be done in the alias > > oracle. If yes, it supposedly can be done for both SIMT_ENTER and > > SIMT_EXIT. > > Code

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Jakub Jelinek
about addressable vars, right (so if we turn it into > >> > non-addressable, in the SIMT region we just use the normal PTX pseudos), > >> > right? We could emit inner ={v} {CLOBBER}; before SIMT_EXIT() to make it > >> > clear it shouldn't be moved afterwards.

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Richard Biener
in the SIMT region we just use the normal PTX pseudos), >> > right? We could emit inner ={v} {CLOBBER}; before SIMT_EXIT() to make it >> > clear it shouldn't be moved afterwards. For the private vars used directly >> > in SIMD region, for the vars from inlined funct

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-18 Thread Jakub Jelinek
On Wed, Jan 18, 2017 at 10:52:32PM +0300, Alexander Monakov wrote: > Sorry for not noticing this earlier, but ... > > > +#ifdef __LP64__ > > +typedef unsigned long long CUdeviceptr; > > +#else > > +typedef unsigned CUdeviceptr; > > +#endif > > I think this #ifdef doesn't do the right thing on Min

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-18 Thread Alexander Monakov
Hello Jakub, Sorry for not noticing this earlier, but ... > +#ifdef __LP64__ > +typedef unsigned long long CUdeviceptr; > +#else > +typedef unsigned CUdeviceptr; > +#endif I think this #ifdef doesn't do the right thing on MinGW. Would it be fine to simplify it? In my code I have typedef uint

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Jakub Jelinek
On Wed, Jan 18, 2017 at 08:02:14PM +0300, Alexander Monakov wrote: > On Wed, 18 Jan 2017, Jakub Jelinek wrote: > > > It is, but I think my approach is compatible with inlining too (and has a > > > more > > > localized impact on the compiler). > > > > But your 2/5 patch disables inlining into the

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote: > > It is, but I think my approach is compatible with inlining too (and has a > > more > > localized impact on the compiler). > > But your 2/5 patch disables inlining into the SIMT regions. Or do you mean > the approach with some new IFN for the pointers

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Jakub Jelinek
On Wed, Jan 18, 2017 at 07:15:34PM +0300, Alexander Monakov wrote: > On Wed, 18 Jan 2017, Jakub Jelinek wrote: > > We are talking here about addressable vars, right (so if we turn it into > > non-addressable, in the SIMT region we just use the normal PTX pseudos), > > right?

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote: > We are talking here about addressable vars, right (so if we turn it into > non-addressable, in the SIMT region we just use the normal PTX pseudos), > right? We could emit inner ={v} {CLOBBER}; before SIMT_EXIT() to make it > clear it should

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Jakub Jelinek
> Also, we'd need to ensure IPA-SRA propagates the magic flag when decomposing > structs. We are talking here about addressable vars, right (so if we turn it into non-addressable, in the SIMT region we just use the normal PTX pseudos), right? We could emit inner ={v} {CLOBBER}; before SI

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote: > On Wed, Jan 18, 2017 at 05:52:49PM +0300, Alexander Monakov wrote: > > On Wed, 18 Jan 2017, Jakub Jelinek wrote: > > > Can't it be e.g. recorded inside a flag on the VAR_DECLs or magic > > > attributes > > > on them during omplower time and then only fi

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Jakub Jelinek
On Wed, Jan 18, 2017 at 05:52:49PM +0300, Alexander Monakov wrote: > On Wed, 18 Jan 2017, Jakub Jelinek wrote: > > Can't it be e.g. recorded inside a flag on the VAR_DECLs or magic attributes > > on them during omplower time and then only finalized into the magic .local > > alloca in the pass_omp_d

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote: > Can't it be e.g. recorded inside a flag on the VAR_DECLs or magic attributes > on them during omplower time and then only finalized into the magic .local > alloca in the pass_omp_device_lower pass? No (see my adjacent response): it can't be a variable fl

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Jakub Jelinek
On Wed, Jan 18, 2017 at 03:32:56PM +0100, Jakub Jelinek wrote: > > It probably is. > > > > But I guess I was asking whether you could initially emit > > > > void *omp_simt = IFN_GOMP_SIMT_ENTER (0); > > > > for (int i = n1; i < n2; i++) > > foo (&tmp); > > > > IFN_GOMP_SIMT_EXIT (omp_

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Alexander Monakov
On Wed, 18 Jan 2017, Richard Biener wrote: > But I guess I was asking whether you could initially emit > > void *omp_simt = IFN_GOMP_SIMT_ENTER (0); > > for (int i = n1; i < n2; i++) > foo (&tmp); > > IFN_GOMP_SIMT_EXIT (omp_simt); > > and only after inlining do liveness / use analysi

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Jakub Jelinek
On Wed, Jan 18, 2017 at 03:22:15PM +0100, Richard Biener wrote: > > So I guess a way to keep allocation layout implicit until after inlining is > > this: instead of exposing the helper struct in the IR immediately, somehow > > keep > > it on the side, associated only with the SIMT region, and not

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Richard Biener
On Wed, Jan 18, 2017 at 3:11 PM, Alexander Monakov wrote: > On Wed, 18 Jan 2017, Richard Biener wrote: >> > After OpenMP lowering, inlining might break this by inlining functions with >> > address-taken locals into SIMD regions. For now, such inlining is >> > disallowed >> > (this penalizes only

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Alexander Monakov
On Wed, 18 Jan 2017, Richard Biener wrote: > > After OpenMP lowering, inlining might break this by inlining functions with > > address-taken locals into SIMD regions. For now, such inlining is > > disallowed > > (this penalizes only SIMT code), but eventually that can be handled by > > collecting

  1   2   3   4   >