RE: [wwwdocs] gcc-12/changes.html (GCN): >1 workers per gang

2021-08-16 Thread Stubbs, Andrew
> In other words: For gangs > #CUs or >1 gang per CU, the following patch > is needed: >[OG11] https://gcc.gnu.org/g:4dcd1e1f4e6b451aac44f919b8eb3ac49292b308 >[email] https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550102.html > "not suitable for mainline until the multiple-worke

RE: [committed 6/6] amdgcn: vector testsuite tweaks

2022-10-28 Thread Stubbs, Andrew
> -Original Message- > Looking into commit r13-3225-gbd9a05594d227cde79a67dc715bd9d82e9c464e9 > "amdgcn: vector testsuite tweaks" for a moment, I also did wonder about > the following changes, because for 'vect_multiple_sizes' (for example, > x86_64-pc-linux-gnu) that seems to lose more spe

RE: [PATCH] amdgcn: Add support for additional natively supported floating-point operations

2022-09-09 Thread Stubbs, Andrew
> -Original Message- > I agree - for example powerpc has -mrecip= to control which instructions > to use (float/double rsqrt or inverse) and -mrecip-precision to > specify whether further iteration is done or not. > > x86 has similar but does always perform newton raphson iteration, > docu

Re: [PATCH] emit-rtl.c: Allow vector subreg of float vectors

2020-08-10 Thread Stubbs, Andrew
On 10 Aug 2020 17:23, Richard Sandiford wrote: Andrew Stubbs writes: > On 06/08/2020 04:54, Richard Sandiford wrote: >>> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c >>> index f9b0e9714d9..d7067989ad7 100644 >>> --- a/gcc/emit-rtl.c >>> +++ b/gcc/emit-rtl.c >>> @@ -947,6 +947,11 @@ validate_sub

Re: [PATCH] [og10] amdgcn: Add waitcnt after LDS write instructions

2020-06-29 Thread Stubbs, Andrew
On 29 Jun 2020 22:03, "Brown, Julian" wrote: On Mon, 29 Jun 2020 21:32:41 +0100 Andrew Stubbs wrote: > In particular, it seems logical that any barrier should be a memory > barrier, so inserting it in the barrier pattern is not a big deal. > IIRC, only OpenACC is using that anyway (OpenMP has exp

RE: [PATCH] libgomp, openmp: pinned memory

2022-06-09 Thread Stubbs, Andrew
> The question is only what to do with 'requires unified_shared_memory' – > and a non-multi-device allocator. The compiler emits an error at compile time if you attempt to use both -foffload-memory=pinned and USM, because they’re not compatible. You're fine to use both explicit allocators in the

RE: [PATCH] libgomp, openmp: pinned memory

2022-06-09 Thread Stubbs, Andrew
> For example, it's documented that 'cuMemHostAlloc', > api/group__CUDA__MEM.html#group__CUDA__MEM_1g572ca4011bfcb25034888a14d4e035b > 9>, > "Allocates page-locked host memory". The crucial thing, though, what > makes this different from 'malloc' plus '

RE: [arm][patch] fix arm_neon_ok check on !arm_arch7

2014-09-23 Thread Stubbs, Andrew
Maybe the original patch is better? Or maybe it should reconfigure the FPU instead of erroring out? But reconfigure it to what? Andrew From: James Greenhalgh [james.greenha...@arm.com] Sent: 23 September 2014 09:27 To: Stubbs, Andrew Cc: Richard Earnshaw

Re: [PATCH][ARM] -m{cpu,tune,arch}=native

2011-08-30 Thread Stubbs, Andrew
On 29/08/11 04:29, Michael Hope wrote: > On Sat, Aug 27, 2011 at 3:19 AM, Andrew Stubbs wrote: >> Hi all, >> >> This patch adds support for -mcpu=native, -mtune=native, and -march=native >> for ARM Linux hosts. >> >> So far, it only recognises Cortex-A8 and Cortex-A9, so I really need to find >> o

Re: [PATCH][ARM] Add support for ADDW and SUBW instructions

2011-06-16 Thread Stubbs, Andrew
On 02/06/11 11:36, Ramana Radhakrishnan wrote: > On 2 June 2011 10:03, Andrew Stubbs wrote: >> On 02/06/11 09:23, Ramana Radhakrishnan wrote: >>> >>> Please remove the alternatives in the subsi3 pattern since that is just >>> unnecessary. Please make the constraints internal only. >> >> Is this be

Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-24 Thread Stubbs, Andrew
On 24/06/11 09:28, Richard Guenther wrote: >> > To be clear, it only skips past NOP_EXPR. Is it not the case that what >> > you're describing would need a CONVERT_EXPR? > NOP_EXPR is the same as CONVERT_EXPR. Are you sure? I thought this was safe because the internals manual says: NOP_EXPR

Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-24 Thread Stubbs, Andrew
On 24/06/11 16:47, Richard Guenther wrote: >> > I can certainly add checks to make sure that the skipped operations >> > actually don't make any important changes to the value, but do I need to? > Yes. Ok, I'll go away and do that then. BTW, I see useless_type_conversion_p, but that's not quite

Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Stubbs, Andrew
On 28/06/11 17:37, Michael Matz wrote: >> What I want (and I'm not totally clear on what this actually means) is >> > to be able to optimize all the cases where the end result will be the >> > same as the compiler produces now (using multiple multiply, shift, and >> > add operations). > Okay, th

Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Stubbs, Andrew
On 01/07/11 13:33, Paolo Bonzini wrote: > Got it now! Casts from signed to unsigned are not value-preserving, but > they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the > same result bit-by-bit as the s64 result. The fact that s64 has an > implicit ... in front, while an u64 h

Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Stubbs, Andrew
On 01/07/11 15:40, Paolo Bonzini wrote: > On 07/01/2011 03:30 PM, Stubbs, Andrew wrote: >>> > However, perhaps there is a catch. We can do the following thought >>> > experiment. What would happen if you had multiple widening multiplies? >>> > Like 8-bit si

Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Stubbs, Andrew
On 01/07/11 14:30, Stubbs, Andrew wrote: >> Got it now! Casts from signed to unsigned are not value-preserving, but >> > they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the >> > same result bit-by-bit as the s64 result. The fact that s6

Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-01 Thread Stubbs, Andrew
On 01/07/11 16:54, Paolo Bonzini wrote: > On 07/01/2011 04:55 PM, Stubbs, Andrew wrote: >>> > >>> > What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to >>> > (u128)c + (u128)((s8)a * (s8)b). >> Oh I see, sorry. Yes, that's exa

RE: Attempt to register OpenMP pinned memory using a device instead of 'mlock' (was: [PATCH] libgomp, openmp: pinned memory)

2023-02-16 Thread Stubbs, Andrew via Gcc-patches
> -Original Message- > From: Thomas Schwinge > Sent: 16 February 2023 15:33 > To: Andrew Stubbs ; Jakub Jelinek ; > Tobias Burnus ; gcc-patches@gcc.gnu.org > Subject: Attempt to register OpenMP pinned memory using a device instead of > 'mlock' (was: [PATCH] libgomp, openmp: pinned memory)

Re: [Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

2022-11-21 Thread Stubbs, Andrew via Gcc-patches
On 21/11/2022 13:41, Tobias Burnus wrote: On 19.11.22 11:46, Tobias Burnus wrote: +   stacklimit = stackbase + seg_size*64; (this should be '*seg_size' not 'seg_size' and the name should be s/seg_size/seg_size_ptr/.) I have updated the comment and ... (Reading it, I think it should be '..

RE: [og12] libgomp: Document OpenMP 'pinned' memory (was: [PATCH] libgomp, openmp: pinned memory

2023-03-27 Thread Stubbs, Andrew via Gcc-patches
> -Original Message- > From: Thomas Schwinge > Sent: 24 March 2023 15:50 > To: gcc-patches@gcc.gnu.org; Andrew Stubbs ; > Tobias Burnus > Subject: [og12] libgomp: Document OpenMP 'pinned' memory (was: [PATCH] > libgomp, openmp: pinned memory > > Hi! > > On 2022-01-04T15:32:17+, Andr