On Tue, Jan 05, 2021 at 12:13:59PM +, Julian Brown wrote:
> Just to check, does my reply below address your concerns --
> particularly with regards to the current usage of CUDA streams
> serializing kernel executions from different host threads? Given that
> situation, and the observed speed im
Hi Jakub,
Just to check, does my reply below address your concerns --
particularly with regards to the current usage of CUDA streams
serializing kernel executions from different host threads? Given that
situation, and the observed speed improvement with OpenMP offloading to
NVPTX with the patch, I
On Tue, 15 Dec 2020 18:00:36 +0100
Jakub Jelinek wrote:
> On Tue, Dec 15, 2020 at 04:49:38PM +, Julian Brown wrote:
> > > Do you need to hold the omp_stacks.lock across the entire
> > > offloading? Doesn't that serialize all offloading kernels to the
> > > same device? I mean, can't the lock
On Tue, Dec 15, 2020 at 04:49:38PM +, Julian Brown wrote:
> > Do you need to hold the omp_stacks.lock across the entire offloading?
> > Doesn't that serialize all offloading kernels to the same device?
> > I mean, can't the lock be taken just shortly at the start to either
> > acquire the cache
On Tue, 15 Dec 2020 14:49:40 +0100
Jakub Jelinek wrote:
> On Tue, Dec 15, 2020 at 01:39:13PM +, Julian Brown wrote:
> > @@ -1922,7 +1997,9 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void
> > *tgt_vars, void **args) nvptx_adjust_launch_bounds (tgt_fn,
> > ptx_dev, &teams, &threads);
> >s
On Tue, Dec 15, 2020 at 01:39:13PM +, Julian Brown wrote:
> @@ -1922,7 +1997,9 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void
> *tgt_vars, void **args)
>nvptx_adjust_launch_bounds (tgt_fn, ptx_dev, &teams, &threads);
>
>size_t stack_size = nvptx_stacks_size ();
> - void *stacks =
On Tue, 8 Dec 2020 20:11:38 +0300
Alexander Monakov wrote:
> On Tue, 8 Dec 2020, Julian Brown wrote:
>
> > Ping?
>
> This has addressed my concerns, thanks.
Jakub, Tom -- just to confirm, is this OK for trunk now?
I noticed a slight bugfix myself in the no-stacks/out-of-memory case --
i.e.
On Tue, 8 Dec 2020, Julian Brown wrote:
> Ping?
This has addressed my concerns, thanks.
Alexander
> On Fri, 13 Nov 2020 20:54:54 +
> Julian Brown wrote:
>
> > Hi Alexander,
> >
> > Thanks for the review! Comments below.
> >
> > On Tue, 10 Nov 2020 00:32:36 +0300
> > Alexander Monakov
Ping?
Thanks,
Julian
On Fri, 13 Nov 2020 20:54:54 +
Julian Brown wrote:
> Hi Alexander,
>
> Thanks for the review! Comments below.
>
> On Tue, 10 Nov 2020 00:32:36 +0300
> Alexander Monakov wrote:
>
> > On Mon, 26 Oct 2020, Jakub Jelinek wrote:
> >
> > > On Mon, Oct 26, 2020 at 07:1
Hi Alexander,
Thanks for the review! Comments below.
On Tue, 10 Nov 2020 00:32:36 +0300
Alexander Monakov wrote:
> On Mon, 26 Oct 2020, Jakub Jelinek wrote:
>
> > On Mon, Oct 26, 2020 at 07:14:48AM -0700, Julian Brown wrote:
> > > This patch adds caching for the stack block allocated for
> >
On Mon, 26 Oct 2020, Jakub Jelinek wrote:
> On Mon, Oct 26, 2020 at 07:14:48AM -0700, Julian Brown wrote:
> > This patch adds caching for the stack block allocated for offloaded
> > OpenMP kernel launches on NVPTX. This is a performance optimisation --
> > we observed an average 11% or so performa
On Wed, 28 Oct 2020 15:25:56 +0800
Chung-Lin Tang wrote:
> On 2020/10/27 9:17 PM, Julian Brown wrote:
> >> And, in which context are cuStreamAddCallback registered callbacks
> >> run? E.g. if it is inside of asynchronous interrput, using locking
> >> in there might not be the best thing to do.
On 2020/10/27 9:17 PM, Julian Brown wrote:
And, in which context are cuStreamAddCallback registered callbacks
run? E.g. if it is inside of asynchronous interrput, using locking in
there might not be the best thing to do.
The cuStreamAddCallback API is documented here:
https://docs.nvidia.com
(Apologies if threading is broken, for some reason I didn't receive
this reply directly!)
On Mon Oct 26 14:26:34 GMT 2020, Jakub Jelinek wrote:
> On Mon, Oct 26, 2020 at 07:14:48AM -0700, Julian Brown wrote:
> > This patch adds caching for the stack block allocated for offloaded
> > OpenMP kernel
On Mon, Oct 26, 2020 at 07:14:48AM -0700, Julian Brown wrote:
> This patch adds caching for the stack block allocated for offloaded
> OpenMP kernel launches on NVPTX. This is a performance optimisation --
> we observed an average 11% or so performance improvement with this patch
> across a set of a
Hi,
This patch adds caching for the stack block allocated for offloaded
OpenMP kernel launches on NVPTX. This is a performance optimisation --
we observed an average 11% or so performance improvement with this patch
across a set of accelerated GPU benchmarks on one machine (results vary
according
16 matches
Mail list logo