Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2021-01-05 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 05, 2021 at 12:13:59PM +, Julian Brown wrote: > Just to check, does my reply below address your concerns -- > particularly with regards to the current usage of CUDA streams > serializing kernel executions from different host threads? Given that > situation, and the observed speed im

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2021-01-05 Thread Julian Brown
Hi Jakub, Just to check, does my reply below address your concerns -- particularly with regards to the current usage of CUDA streams serializing kernel executions from different host threads? Given that situation, and the observed speed improvement with OpenMP offloading to NVPTX with the patch, I

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-12-15 Thread Julian Brown
On Tue, 15 Dec 2020 18:00:36 +0100 Jakub Jelinek wrote: > On Tue, Dec 15, 2020 at 04:49:38PM +, Julian Brown wrote: > > > Do you need to hold the omp_stacks.lock across the entire > > > offloading? Doesn't that serialize all offloading kernels to the > > > same device? I mean, can't the lock

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-12-15 Thread Jakub Jelinek via Gcc-patches
On Tue, Dec 15, 2020 at 04:49:38PM +, Julian Brown wrote: > > Do you need to hold the omp_stacks.lock across the entire offloading? > > Doesn't that serialize all offloading kernels to the same device? > > I mean, can't the lock be taken just shortly at the start to either > > acquire the cache

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-12-15 Thread Julian Brown
On Tue, 15 Dec 2020 14:49:40 +0100 Jakub Jelinek wrote: > On Tue, Dec 15, 2020 at 01:39:13PM +, Julian Brown wrote: > > @@ -1922,7 +1997,9 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void > > *tgt_vars, void **args) nvptx_adjust_launch_bounds (tgt_fn, > > ptx_dev, &teams, &threads); > >s

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-12-15 Thread Jakub Jelinek via Gcc-patches
On Tue, Dec 15, 2020 at 01:39:13PM +, Julian Brown wrote: > @@ -1922,7 +1997,9 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void > *tgt_vars, void **args) >nvptx_adjust_launch_bounds (tgt_fn, ptx_dev, &teams, &threads); > >size_t stack_size = nvptx_stacks_size (); > - void *stacks =

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-12-15 Thread Julian Brown
On Tue, 8 Dec 2020 20:11:38 +0300 Alexander Monakov wrote: > On Tue, 8 Dec 2020, Julian Brown wrote: > > > Ping? > > This has addressed my concerns, thanks. Jakub, Tom -- just to confirm, is this OK for trunk now? I noticed a slight bugfix myself in the no-stacks/out-of-memory case -- i.e.

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-12-08 Thread Alexander Monakov via Gcc-patches
On Tue, 8 Dec 2020, Julian Brown wrote: > Ping? This has addressed my concerns, thanks. Alexander > On Fri, 13 Nov 2020 20:54:54 + > Julian Brown wrote: > > > Hi Alexander, > > > > Thanks for the review! Comments below. > > > > On Tue, 10 Nov 2020 00:32:36 +0300 > > Alexander Monakov

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-12-07 Thread Julian Brown
Ping? Thanks, Julian On Fri, 13 Nov 2020 20:54:54 + Julian Brown wrote: > Hi Alexander, > > Thanks for the review! Comments below. > > On Tue, 10 Nov 2020 00:32:36 +0300 > Alexander Monakov wrote: > > > On Mon, 26 Oct 2020, Jakub Jelinek wrote: > > > > > On Mon, Oct 26, 2020 at 07:1

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-11-13 Thread Julian Brown
Hi Alexander, Thanks for the review! Comments below. On Tue, 10 Nov 2020 00:32:36 +0300 Alexander Monakov wrote: > On Mon, 26 Oct 2020, Jakub Jelinek wrote: > > > On Mon, Oct 26, 2020 at 07:14:48AM -0700, Julian Brown wrote: > > > This patch adds caching for the stack block allocated for > >

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-11-09 Thread Alexander Monakov via Gcc-patches
On Mon, 26 Oct 2020, Jakub Jelinek wrote: > On Mon, Oct 26, 2020 at 07:14:48AM -0700, Julian Brown wrote: > > This patch adds caching for the stack block allocated for offloaded > > OpenMP kernel launches on NVPTX. This is a performance optimisation -- > > we observed an average 11% or so performa

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-10-28 Thread Julian Brown
On Wed, 28 Oct 2020 15:25:56 +0800 Chung-Lin Tang wrote: > On 2020/10/27 9:17 PM, Julian Brown wrote: > >> And, in which context are cuStreamAddCallback registered callbacks > >> run? E.g. if it is inside of asynchronous interrput, using locking > >> in there might not be the best thing to do.

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-10-28 Thread Chung-Lin Tang
On 2020/10/27 9:17 PM, Julian Brown wrote: And, in which context are cuStreamAddCallback registered callbacks run? E.g. if it is inside of asynchronous interrput, using locking in there might not be the best thing to do. The cuStreamAddCallback API is documented here: https://docs.nvidia.com

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-10-27 Thread Julian Brown
(Apologies if threading is broken, for some reason I didn't receive this reply directly!) On Mon Oct 26 14:26:34 GMT 2020, Jakub Jelinek wrote: > On Mon, Oct 26, 2020 at 07:14:48AM -0700, Julian Brown wrote: > > This patch adds caching for the stack block allocated for offloaded > > OpenMP kernel

Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-10-26 Thread Jakub Jelinek via Gcc-patches
On Mon, Oct 26, 2020 at 07:14:48AM -0700, Julian Brown wrote: > This patch adds caching for the stack block allocated for offloaded > OpenMP kernel launches on NVPTX. This is a performance optimisation -- > we observed an average 11% or so performance improvement with this patch > across a set of a

[PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-10-26 Thread Julian Brown
Hi, This patch adds caching for the stack block allocated for offloaded OpenMP kernel launches on NVPTX. This is a performance optimisation -- we observed an average 11% or so performance improvement with this patch across a set of accelerated GPU benchmarks on one machine (results vary according