Hi!
On 2025-02-27T21:51:11+0100, I wrote:
> With '-mfake-ptx-alloca' enabled, the user-visible behavior changes only
> for configurations where PTX 'alloca' is not available. Rather than a
> compile-time 'sorry, unimplemented: dynamic stack allocation not s
Hi!
I have, by the way, filed <https://gcc.gnu.org/PR119573>
"nvptx: PTX '.const', constant state space" for this topic.
On 2025-04-01T09:32:46+0200, Jakub Jelinek wrote:
> On Tue, Apr 01, 2025 at 09:19:08AM +0200, Richard Biener via Gcc wrote:
>> On Tue, Apr
Hi Harald,
my "rant" was more about "Why would one spend time with a library meant for
testing only." I totally agree that the one code base approach is one fine
way to go. I didn't not want to insult anyone and apologize, if I did.
Finally this discussion made me think, what it would need to ha
Am 28.02.25 um 08:24 schrieb Andre Vehreschild:
Hi Thomas,
are you really telling me, that gfortran's coarray test library is compiled for
offloading to GPU (or other SIMD processors)? Because that's what NVPTX is used
for most, right? In my opinion that makes no sense, because coarrays in Fortr
e got 'alloca' usage
> in 'libgfortran/caf/single.c:_gfortran_caf_transfer_between_remotes', and
> the libgfortran target library fails to build for legacy configurations where
> PTX 'alloca' is not available:
>
> ../../../../source-gcc/libgfortran/caf/single.c: In function
>
As of recent commit 8bf0ee8d62b8a08e808344d31354ab713157e15d
"Fortran: Add transfer_between_remotes [PR107635]", we've got 'alloca' usage
in 'libgfortran/caf/single.c:_gfortran_caf_transfer_between_remotes', and
the libgfortran target library fails to build
With '-mfake-ptx-alloca' enabled, the user-visible behavior changes only
for configurations where PTX 'alloca' is not available. Rather than a
compile-time 'sorry, unimplemented: dynamic stack allocation not supported'
in presence of dynamic stack allocation,
Hi!
On 2025-02-22T22:49:47+0100, I wrote:
> On 2025-01-09T14:21:18+0100, I wrote:
>> Pushed to trunk branch commit 3861d362ec7e3c50742fc43833fe9d8674f4070e
>> "nvptx: PTX 'alloca' for '-mptx=7.3'+, '-march=sm_52'+ [PR65181]",
>> [...]
Hi!
On 2025-01-09T14:21:18+0100, I wrote:
> Pushed to trunk branch commit 3861d362ec7e3c50742fc43833fe9d8674f4070e
> "nvptx: PTX 'alloca' for '-mptx=7.3'+, '-march=sm_52'+ [PR65181]",
> [...]
> --- a/gcc/testsuite/lib/target-supports.exp
> +
Hi!
On 2024-12-06T12:03:22+0100, I wrote:
> Pushed to trunk branch commit 86b3a7532d56f74fcd1c362f2da7f95e8cc4e4a6
> "nvptx: Support '--with-multilib-list'", [...]
Pushed to trunk branch commit 6c5937991bd744a4916e9cf65eb5d9c9b5706120
"nvptx: Gracefully handle '-mptx=3.1' if neither sm_30 nor sm_
Hi!
On 2024-09-20T18:49:46+0200, I wrote:
> We'd like to raise nvptx code generation from PTX ISA 6.0, sm_30 "Kepler"
> to default PTX ISA 7.3, sm_52 "Maxwell", therefore CUDA 11.3 (2021-04).
> This is, primarily, so that we're able to use 'alloca
Hi!
On 2024-09-20T18:49:46+0200, I wrote:
> We'd like to raise nvptx code generation from PTX ISA 6.0, sm_30 "Kepler"
> to default PTX ISA 7.3, sm_52 "Maxwell", therefore CUDA 11.3 (2021-04).
> This is, primarily, so that we're able to use 'alloca
)]
+ "!TARGET_SOFT_STACK"
+{
+ /* The concept of a '%stack' pointer doesn't apply like this for
+ PTX "native" stacks. GCC however occasionally synthesizes
+ '__builtin_stack_save ()', '__builtin_stack_restore ()', and is
-86,6 +86,13 @@
#define Pmode (TARGET_ABI64 ? DImode : SImode)
#define STACK_SIZE_MODE Pmode
+/* We always have to maintain the '-msoft-stack' pointer, but the PTX "native"
+ stack pointer is handled implicitly at function level. */
+#define STACK_SAVEAREA_MODE(LEV
Hi!
On 2024-09-20T18:49:46+0200, Thomas Schwinge wrote:
> We'd like to raise nvptx code generation from PTX ISA 6.0, sm_30 "Kepler"
> to default PTX ISA 7.3, sm_52 "Maxwell", therefore CUDA 11.3 (2021-04).
> This is, primarily, so that we're able to use
N_4_2,
>PTX_VERSION_6_0,
>PTX_VERSION_6_3,
>PTX_VERSION_7_0
Pushed to trunk branch commit 380ceb23b130a2b9ec541607a3eb1ffd0387c576
"nvptx: Clarify that our baseline is PTX ISA Version 3.1", see attached.
Grüße
Thomas
>From 380ceb23b130a2b9ec541607a3eb1ffd0387c5
duplicates are filtered out.
+If @option{--with-multilib-list} is not specified, then
+@option{--with-multilib-list=default} is assumed.
+For @samp{sm_30}, @samp{sm_35} target libraries, @option{-mptx-3.1}
+sub-variants are additionally built.
+
@item riscv*-*-*
@var{list} is a single ABI name.
On Thu, Mar 07, 2024 at 12:53:31PM +0100, Thomas Schwinge wrote:
> >From 6a6520e01f7e7118b556683c2934f2c64c6dbc81 Mon Sep 17 00:00:00 2001
> From: Thomas Schwinge
> Date: Thu, 7 Mar 2024 12:31:52 +0100
> Subject: [PATCH] GCN, nvptx: Fatal error for missing symbols in
> 'libhsa-runtime64.so.1', 'l
Hi!
On 2017-01-13T19:11:23+0100, Jakub Jelinek wrote:
> [...] If the nvptx libgomp plugin is installed, but libcuda.so.1
> can't be found, then the plugin behaves as if there are no PTX devices
> available. [...]
ACK.
> --- libgomp/plugin/plugin-nvptx.c.jj 2017-01-13 12:07:5
,4 +1,4 @@
-// { dg-do compile { target nvptx-*-* } }
+// { dg-do compile }
// { dg-additional-options "-m64" }
// Check NRV optimization doesn't change the PTX prototypes.
diff --git a/gcc/testsuite/g++.dg/abi/nvptx-ptrmem1.C b/gcc/testsuite/g++.target/nvptx/abi-ptrmem1.C
sim
Hi!
On 2015-12-15T15:49:16-0500, Nathan Sidwell wrote:
> this patch uses reg_names array to emit register names, rather than have
> knowledge scattered throughout the PTX backend. Also, converted
> write_fn_proto_from_insn to use (renamed) write_arg_mode and (new)
> write_return_m
Hi!
On 2017-01-13T19:11:23+0100, Jakub Jelinek wrote:
> cuda.h header included
> in this patch
In order to be able to use that file without changes for
nvptx-tools 'nvptx-run', I've pushed to GCC master branch
commit 86f64400a5692499856d41462461327b93f82b8d
"'include/cuda/cuda.h': Add parts nece
Hi!
On 2017-01-13T19:11:23+0100, Jakub Jelinek wrote:
> cuda.h header included
> in this patch
To make this '#include'able in C++ code, I've pushed to master branch
commit bdd1dc1bfbe1492edf3ce5e4288cfbc55be329ab
"'include/cuda/cuda.h': For C++, wrap in 'extern "C"'", see attached.
Grüße
Thom
Hi!
On 2014-09-23T19:19:31+0100, Julian Brown wrote:
> This patch contains the bulk of the OpenACC 2.0 runtime support,
> building around, or on top of, the OpenMP 4.0 support (as previously
> posted or already extant upstream) where we could. [...]
> --- a/libgomp/Makefile.am
> +++ b/libgomp/Ma
On 4/8/22 00:27, Thomas Schwinge wrote:
Hi!
On 2017-01-13T19:11:23+0100, Jakub Jelinek wrote:
Especially for distributions it is undesirable to need to have proprietary
CUDA libraries and headers installed when building GCC.
--- libgomp/plugin/configfrag.ac.jj 2017-01-13 12:07:56.
toolexeclib_HEADERS = libgomp.spec
# -Wc is only a libtool option.
@@ -559,16 +569,18 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \
oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \
oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c al
On Wed, Apr 06, 2022 at 02:39:18PM +0200, Thomas Schwinge wrote:
> ... so that it may be used by other projects that inherit GCC's 'include'
> directory.
>
> include/
> * cuda/cuda.h: New file.
> libgomp/
> * plugin/cuda/cuda.h: Remove file.
> * plugin/plugin-nvptx.c
Hi!
On 2017-01-13T19:11:23+0100, Jakub Jelinek wrote:
> Especially for distributions it is undesirable to need to have proprietary
> CUDA libraries and headers installed when building GCC.
> I've talked to our lawyers and they said that the cuda.h header included
> in this patch doesn't infringe
rsion 510.47.03 and board GT 1030 I, we run
> >> into:
> >> ...
> >> FAIL: gcc.c-torture/execute/pr53465.c -O1 execution test
> >> FAIL: gcc.c-torture/execute/pr53465.c -O2 execution test
> >> FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test
> >>
-O2 execution test
FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test
...
while the test-cases pass with nvptx-none-run -O0.
The problem is that the generated ptx contains a read from an uninitialized
ptx register, and the driver JIT doesn't handle this well.
For -O2 and -O3, we can ge
on test
> FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test
> ...
> while the test-cases pass with nvptx-none-run -O0.
>
> The problem is that the generated ptx contains a read from an uninitialized
> ptx register, and the driver JIT doesn't handle this well.
>
nvptx-none-run -O0.
The problem is that the generated ptx contains a read from an uninitialized
ptx register, and the driver JIT doesn't handle this well.
For -O2 and -O3, we can get rid of the FAIL using --param
logical-op-non-short-circuit=0. But not for -O1.
At -O1, the test-case minimiz
x back end is generating some rather high-level IR (PTX)
targeting a "black hole": not knowing what exactly the Nvidia/CUDA
Driver, PTX -> SASS compiler are going to do with it. (Well, similar
problem also exists for more traditional ISAs if CPU microcode etc. is
involved, but it'
y setting the ptx isa to 6.3 by default, which allows the use of
shfl.sync.
Tested on x86_64 with nvptx accelerator.
Committed to trunk.
Thanks,
- Tom
[nvptx] Update default ptx isa to 6.3
gcc/ChangeLog:
2022-01-27 Tom de Vries
* config/nvptx/nvptx.opt (mptx): Set to PTX_VERSION_6_3 b
Hi,
In ptx isa 6.0, a new barrier instruction was added, and bar.sync was
redefined as barrier.sync.aligned.
The aligned modifier indicates that all threads in a CTA will execute the same
barrier instruction.
The seems fine for a form "bar.sync 0".
But a "bar.sync %rx,64"
On 1/5/22 11:33, Andrew Stubbs wrote:
On 05/01/2022 10:24, Tom de Vries wrote:
On 12/21/21 12:33, Andrew Stubbs wrote:
On 20/12/2021 15:58, Andrew Stubbs wrote:
In order to support the %dynamic_smem_size PTX feature is is
necessary to bump the minimum supported PTX version from 3.1 (~2013
On 05/01/2022 10:24, Tom de Vries wrote:
On 12/21/21 12:33, Andrew Stubbs wrote:
On 20/12/2021 15:58, Andrew Stubbs wrote:
In order to support the %dynamic_smem_size PTX feature is is
necessary to bump the minimum supported PTX version from 3.1 (~2013)
to 4.1 (~2014).
Tobias has pointed out
On 12/21/21 12:33, Andrew Stubbs wrote:
On 20/12/2021 15:58, Andrew Stubbs wrote:
In order to support the %dynamic_smem_size PTX feature is is necessary
to bump the minimum supported PTX version from 3.1 (~2013) to 4.1
(~2014).
Tobias has pointed out, privately, that the default version is
On 20/12/2021 15:58, Andrew Stubbs wrote:
In order to support the %dynamic_smem_size PTX feature is is necessary
to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014).
Tobias has pointed out, privately, that the default version is both
documented and encoded in the -mptx
me time.
>
> Ok for mainline (once the previous patch has been approved/pushed)?
I've committed the HImode/SImode part of the patches (as attached below).
DImode part is OK once the respective tests starts passing.
Thanks,
- Tom
[PATCH] nvptx: Add support for PTX highpart multiplications (
This patch adds support for signed and unsigned, HImode, SImode and
DImode highpart multiplications to the nvptx backend. Without the
middle-end patch that I've just posted, the middle-end is able to
(easily) make use of the narrow four of the six instructions, but
with that patch, all six of the
[ add gcc-patches@ ]
On 15-01-19 11:38, Tom de Vries wrote:
> Hi
>
> Copied from here (
> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00532.html ):
>> This too. Retested for libgomp/NVPTX.
>>
>> OK for trunk now?
>>
>
> The plugin-nvptx.c part looks ok to me, for stage 1.
>
> Thanks,
> - Tom
Hi!
On Tue, 27 Feb 2018 15:12:47 +0100, Richard Biener wrote:
> On Tue, 27 Feb 2018, Thomas Schwinge wrote:
> > Given that several users have run into this, is this (trunk r256891) OK
> > to commit to open release branches, too.
>
> Sure.
Committed to gcc-7-branch in r258126:
commit f0888f1155
On Tue, 27 Feb 2018, Thomas Schwinge wrote:
> Hi!
>
> Given that several users have run into this, is this (trunk r256891) OK
> to commit to open release branches, too.
Sure.
> On Fri, 19 Jan 2018 09:42:08 +0100, Tom de Vries
> wrote:
> > On 01/19/2018 01:59 AM, Cesar Philippidis wrote:
> > >
Hi!
Given that several users have run into this, is this (trunk r256891) OK
to commit to open release branches, too?
On Fri, 19 Jan 2018 09:42:08 +0100, Tom de Vries wrote:
> On 01/19/2018 01:59 AM, Cesar Philippidis wrote:
> > Here's the updated patch with the changes that you requested. There
On 01/19/2018 01:59 AM, Cesar Philippidis wrote:
Here's the updated patch with the changes that you requested. There are
no new regressions in trunk. I tested it on my desktop running driver
387.34 on a Pascal GPU.
Is this OK for trunk?
OK with 'PR target/83790' added to the changelog entry.
On 12/19/2017 04:39 PM, Tom de Vries wrote:
> On 12/20/2017 12:25 AM, Cesar Philippidis wrote:
>> og7-ptx-cuda9.diff
>>
>>
>> 2017-12-19 Cesar Philippidis
>>
>> gcc/
>> * config/nvptx/nvptx.c (output_init_frag): Don't use generic addres
On 01/17/2018 06:29 PM, Cesar Philippidis wrote:
Is this patch OK for trunk?
You haven't made the changes I've asked for, this is the same patch as
before.
Thanks,
- Tom
On 12/27/2017 01:16 AM, Tom de Vries wrote:
> On 12/21/2017 06:19 PM, Cesar Philippidis wrote:
>> My test results are somewhat inconsistent. On MG's build servers, there
>> are no regressions in CUDA 8.
>
> Ack.
>
>> On my laptop, there are fewer regressions
>> in CUDA 9, than CUDA 8.
>
> If th
On 12/21/2017 06:19 PM, Cesar Philippidis wrote:
My test results are somewhat inconsistent. On MG's build servers, there
are no regressions in CUDA 8.
Ack.
On my laptop, there are fewer regressions
in CUDA 9, than CUDA 8.
If the patch causes regressions for either cuda 8 or cuda 9, then th
ctions
>>>> as generic address spaces as part of their PTX 6.0 changes. More
>>>> specifically,
>>>> <http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#changes-in-ptx-isa-version-6-0>:
>>>>
>>>>
>>>>
>>&
On 12/20/2017 11:59 PM, Cesar Philippidis wrote:
On 12/19/2017 04:39 PM, Tom de Vries wrote:
On 12/20/2017 12:25 AM, Cesar Philippidis wrote:
In CUDA 9, Nvidia removed support for treating the labels of functions
as generic address spaces as part of their PTX 6.0 changes. More
specifically
On 12/19/2017 04:39 PM, Tom de Vries wrote:
> On 12/20/2017 12:25 AM, Cesar Philippidis wrote:
>> In CUDA 9, Nvidia removed support for treating the labels of functions
>> as generic address spaces as part of their PTX 6.0 changes. More
>> specifically,
>> <http:/
On 12/20/2017 12:25 AM, Cesar Philippidis wrote:
In CUDA 9, Nvidia removed support for treating the labels of functions
as generic address spaces as part of their PTX 6.0 changes. More
specifically,
<http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#changes-in-ptx-isa-version-
In CUDA 9, Nvidia removed support for treating the labels of functions
as generic address spaces as part of their PTX 6.0 changes. More
specifically,
<http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#changes-in-ptx-isa-version-6-0>:
Support for taking address of labels,
On 05/21/2017 03:35 AM, Tom de Vries wrote:
On 12/02/2015 04:09 PM, Nathan Sidwell wrote:
+/* Output a pattern for a move instruction. */
+
+const char *
+nvptx_output_mov_insn (rtx dst, rtx src)
+{
src_inner uses dst_mode rather than GET_MODE (src). I'm trying to
understand if that is inten
On 12/02/2015 04:09 PM, Nathan Sidwell wrote:
+/* Output a pattern for a move instruction. */
+
+const char *
+nvptx_output_mov_insn (rtx dst, rtx src)
+{
+ machine_mode dst_mode = GET_MODE (dst);
+ machine_mode dst_inner = (GET_CODE (dst) == SUBREG
+ ? GET_MODE (XEXP
Hi!
On Wed, 3 May 2017 11:00:14 +0200, Jakub Jelinek wrote:
> On Sat, Jan 21, 2017 at 03:50:43PM +0100, Thomas Schwinge wrote:
> > > In order to configure gcc to load libcuda.so.1 dynamically,
> > > one has to either configure it --without-cuda-driver, or without
> > > --with-cuda-driver=/--with-
-o FILE Write output to FILE\n\
-vBe verbose\n\
+ --verify Do verify output is acceptable to ptxas\n\
--no-verify Do not verify output is acceptable to ptxas\n\
--helpPrint this help and exit\n\
--version
Hi!
On Wed, 22 Mar 2017 18:46:30 +0300, Alexander Monakov
wrote:
> This patchset implements privatization of addressable variables in OpenMP SIMD
> regions lowered for SIMT targets (i.e. NVPTX) via the approach identified in
> the review of the previous submission. [...]
Given that the subject
Hello,
This patchset implements privatization of addressable variables in OpenMP SIMD
regions lowered for SIMT targets (i.e. NVPTX) via the approach identified in
the review of the previous submission.
Now instead of explicitly privatizing those variables as fields of an
allocated struct up front
Hi!
On Tue, 23 Sep 2014 19:19:31 +0100, Julian Brown
wrote:
> This patch contains the bulk of the OpenACC 2.0 runtime support, [...]
> --- /dev/null
> +++ b/libgomp/plugin-nvptx.c
> +void
> +PTX_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
> + size_t *sizes, unsign
On Wed, Feb 01, 2017 at 08:09:27PM +0300, Alexander Monakov wrote:
> > That said, I think pointers to gimple stmts in struct loop or something
> > similar is problematic, you'd need to adjust those whenever something would
> > remove those stmts, or e.g. duplicate the loop and stmts, handle those
>
On Wed, 1 Feb 2017, Jakub Jelinek wrote:
> > Yes; I imagine the approach taken in patch 2/5 can be extended to achieve
> > this.
> > That is, instead of just storing a flag 'bool in_simtreg' in struct loop,
> > store
> > pointers to corresponding SIMT_ENTER/EXIT gimple statements, use a similar
>
On Wed, Feb 01, 2017 at 06:44:39PM +0300, Alexander Monakov wrote:
> > That said, I understand how would you add these &varN arguments during
> > lowering, but don't understand what would you want to do during inlining,
> > if you have addressable vars in inlined function, you need to avoid
> > esc
On Wed, 1 Feb 2017, Jakub Jelinek wrote:
> IFN_ASAN_POISON is treated that way too. That also means that if a
> variable is previously addressable and the only spot that takes its address
> is that IFN, it can be rewritten into SSA form, but the IFN has to be
> adjusted to something different whic
On Wed, Feb 01, 2017 at 04:28:14PM +0300, Alexander Monakov wrote:
> Hi,
>
> Earlier Richard mentioned the possibility to special-case GOMP_SIMT_ENTER to
> allow passing privatized variables to it by reference without making them
> addressable. I now see that such special-casing is already done f
Hi,
Earlier Richard mentioned the possibility to special-case GOMP_SIMT_ENTER to
allow passing privatized variables to it by reference without making them
addressable. I now see that such special-casing is already done for
IFN_ATOMIC_COMPARE_EXCHANGE in tree-ssa.c: execute_update_addresses_taken
Hi,
Here's a different approach that doesn't introduce indirection for privatized
variables at all, and keeps dependencies obvious in the IR, but, on the flip
side, requires mentioning all subfields of privatized structures in a few
places.
For each privatized variable, add it to the list of outp
On Sat, Jan 21, 2017 at 03:50:43PM +0100, Thomas Schwinge wrote:
> > In order to configure gcc to load libcuda.so.1 dynamically,
> > one has to either configure it --without-cuda-driver, or without
> > --with-cuda-driver=/--with-cuda-driver-lib=/--with-cuda-driver-include=
> > options if cuda.h and
t; These two patches allow building GCC without CUDA around in a way that later
> on can offload to PTX if libcuda.so.1 is installed
Thanks!
I'd like to have some additional changes done; see the attached patch,
and also some further comments below.
> In order to configure gcc to load l
On Thu, 19 Jan 2017, Jakub Jelinek wrote:
> On Thu, Jan 19, 2017 at 04:36:25PM +0300, Alexander Monakov wrote:
> > > One of the problems with that is that it means that you can't easily turn
> > > addressable private variables into non-addressable ones once you force
> > > them
> > > into such str
On Thu, Jan 19, 2017 at 06:09:35PM +0300, Alexander Monakov wrote:
> > -#ifdef __LP64__
> > +#if defined(__LP64__) || defined(_WIN64)
> >
> > (is that the right define for 64-bit MingW, right?).
>
> Yes, _WIN64; libsanitizer has a similar test. Alternatively, I guess,
>
> #if __SIZEOF_POINTER
On 01/18/2017 06:22 AM, Richard Biener wrote:
> On Wed, Jan 18, 2017 at 3:11 PM, Alexander Monakov wrote:
>> On Wed, 18 Jan 2017, Richard Biener wrote:
After OpenMP lowering, inlining might break this by inlining functions with
address-taken locals into SIMD regions. For now, such inlin
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> On Wed, Jan 18, 2017 at 10:52:32PM +0300, Alexander Monakov wrote:
> > Sorry for not noticing this earlier, but ...
> >
> > > +#ifdef __LP64__
> > > +typedef unsigned long long CUdeviceptr;
> > > +#else
> > > +typedef unsigned CUdeviceptr;
> > > +#endif
On Thu, Jan 19, 2017 at 04:36:25PM +0300, Alexander Monakov wrote:
> On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> > > Inlining needs to do just like omp-low; if we take the current framework,
> > > it
> > > would need to collect addressable locals into one struct, replace
> > > references to
> > >
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> > Inlining needs to do just like omp-low; if we take the current framework, it
> > would need to collect addressable locals into one struct, replace
> > references to
> > those locals by field references in the inlined body. Then it needs to
> > appropr
On Thu, Jan 19, 2017 at 11:00 AM, Jakub Jelinek wrote:
> On Thu, Jan 19, 2017 at 10:45:08AM +0100, Richard Biener wrote:
>> > But in the escape analysis we could consider all the specially marked
>> > "omp simt private" addressable vars to escape and thus confine them into
>> > the
>> > SIMT regi
re cannot exist other stores to this variable in the SIMT region
(because otherwise the original OpenMP SIMD code contained a race). So I cannot
see how this can break program semantics. Do you mean the formal race of
writing the same value from active lanes? On PTX that is well-defined, and the
b
On Thu, Jan 19, 2017 at 10:45:08AM +0100, Richard Biener wrote:
> > But in the escape analysis we could consider all the specially marked
> > "omp simt private" addressable vars to escape and thus confine them into the
> > SIMT region that way, right?
>
> We could. But that doesn't prevent vars f
On Thu, Jan 19, 2017 at 10:44 AM, Alexander Monakov wrote:
> On Thu, 19 Jan 2017, Richard Biener wrote:
>> >> What about motion in the other direction, upwards across SIMT_ENTER()?
>> >
>> > I think this is a question for Richard, whether it can be done in the alias
>> > oracle. If yes, it suppos
Jan 2017, Jakub Jelinek wrote:
>> >> > We are talking here about addressable vars, right (so if we turn it into
>> >> > non-addressable, in the SIMT region we just use the normal PTX pseudos),
>> >> > right? We could emit inner ={v} {CLOBBER};
On Thu, 19 Jan 2017, Richard Biener wrote:
> >> What about motion in the other direction, upwards across SIMT_ENTER()?
> >
> > I think this is a question for Richard, whether it can be done in the alias
> > oracle. If yes, it supposedly can be done for both SIMT_ENTER and
> > SIMT_EXIT.
>
> Code
about addressable vars, right (so if we turn it into
> >> > non-addressable, in the SIMT region we just use the normal PTX pseudos),
> >> > right? We could emit inner ={v} {CLOBBER}; before SIMT_EXIT() to make it
> >> > clear it shouldn't be moved afterwards.
in the SIMT region we just use the normal PTX pseudos),
>> > right? We could emit inner ={v} {CLOBBER}; before SIMT_EXIT() to make it
>> > clear it shouldn't be moved afterwards. For the private vars used directly
>> > in SIMD region, for the vars from inlined funct
On Wed, Jan 18, 2017 at 10:52:32PM +0300, Alexander Monakov wrote:
> Sorry for not noticing this earlier, but ...
>
> > +#ifdef __LP64__
> > +typedef unsigned long long CUdeviceptr;
> > +#else
> > +typedef unsigned CUdeviceptr;
> > +#endif
>
> I think this #ifdef doesn't do the right thing on Min
Hello Jakub,
Sorry for not noticing this earlier, but ...
> +#ifdef __LP64__
> +typedef unsigned long long CUdeviceptr;
> +#else
> +typedef unsigned CUdeviceptr;
> +#endif
I think this #ifdef doesn't do the right thing on MinGW.
Would it be fine to simplify it? In my code I have
typedef uint
On Wed, Jan 18, 2017 at 08:02:14PM +0300, Alexander Monakov wrote:
> On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> > > It is, but I think my approach is compatible with inlining too (and has a
> > > more
> > > localized impact on the compiler).
> >
> > But your 2/5 patch disables inlining into the
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> > It is, but I think my approach is compatible with inlining too (and has a
> > more
> > localized impact on the compiler).
>
> But your 2/5 patch disables inlining into the SIMT regions. Or do you mean
> the approach with some new IFN for the pointers
On Wed, Jan 18, 2017 at 07:15:34PM +0300, Alexander Monakov wrote:
> On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> > We are talking here about addressable vars, right (so if we turn it into
> > non-addressable, in the SIMT region we just use the normal PTX pseudos),
> > right?
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> We are talking here about addressable vars, right (so if we turn it into
> non-addressable, in the SIMT region we just use the normal PTX pseudos),
> right? We could emit inner ={v} {CLOBBER}; before SIMT_EXIT() to make it
> clear it should
> Also, we'd need to ensure IPA-SRA propagates the magic flag when decomposing
> structs.
We are talking here about addressable vars, right (so if we turn it into
non-addressable, in the SIMT region we just use the normal PTX pseudos),
right? We could emit inner ={v} {CLOBBER}; before SI
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> On Wed, Jan 18, 2017 at 05:52:49PM +0300, Alexander Monakov wrote:
> > On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> > > Can't it be e.g. recorded inside a flag on the VAR_DECLs or magic
> > > attributes
> > > on them during omplower time and then only fi
On Wed, Jan 18, 2017 at 05:52:49PM +0300, Alexander Monakov wrote:
> On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> > Can't it be e.g. recorded inside a flag on the VAR_DECLs or magic attributes
> > on them during omplower time and then only finalized into the magic .local
> > alloca in the pass_omp_d
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> Can't it be e.g. recorded inside a flag on the VAR_DECLs or magic attributes
> on them during omplower time and then only finalized into the magic .local
> alloca in the pass_omp_device_lower pass?
No (see my adjacent response): it can't be a variable fl
On Wed, Jan 18, 2017 at 03:32:56PM +0100, Jakub Jelinek wrote:
> > It probably is.
> >
> > But I guess I was asking whether you could initially emit
> >
> > void *omp_simt = IFN_GOMP_SIMT_ENTER (0);
> >
> > for (int i = n1; i < n2; i++)
> > foo (&tmp);
> >
> > IFN_GOMP_SIMT_EXIT (omp_
On Wed, 18 Jan 2017, Richard Biener wrote:
> But I guess I was asking whether you could initially emit
>
> void *omp_simt = IFN_GOMP_SIMT_ENTER (0);
>
> for (int i = n1; i < n2; i++)
> foo (&tmp);
>
> IFN_GOMP_SIMT_EXIT (omp_simt);
>
> and only after inlining do liveness / use analysi
On Wed, Jan 18, 2017 at 03:22:15PM +0100, Richard Biener wrote:
> > So I guess a way to keep allocation layout implicit until after inlining is
> > this: instead of exposing the helper struct in the IR immediately, somehow
> > keep
> > it on the side, associated only with the SIMT region, and not
On Wed, Jan 18, 2017 at 3:11 PM, Alexander Monakov wrote:
> On Wed, 18 Jan 2017, Richard Biener wrote:
>> > After OpenMP lowering, inlining might break this by inlining functions with
>> > address-taken locals into SIMD regions. For now, such inlining is
>> > disallowed
>> > (this penalizes only
On Wed, 18 Jan 2017, Richard Biener wrote:
> > After OpenMP lowering, inlining might break this by inlining functions with
> > address-taken locals into SIMD regions. For now, such inlining is
> > disallowed
> > (this penalizes only SIMT code), but eventually that can be handled by
> > collecting
1 - 100 of 395 matches
Mail list logo