Hi Julian!
On 2023-09-06T02:34:30-0700, Julian Brown wrote:
> This patch works around behaviour of the 2D and 3D memcpy operations in
> the CUDA driver runtime. Particularly in Fortran, the "base pointer"
> of an array (used for either source or destination of a host/device copy)
> may lie outsi
Hi, Jeff.
I removed mem-to-mem patterns as you suggested that means we don't have scalar
move optimization for small size vector modes.
Is it ok for trunk?
Since it is a bug fix patch, I hope we can land it soon. We may will find
another way to optimize small size vector mode mem-to-mem.
ju
On 9/25/23 16:43, Patrick Palka wrote:
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?
-- >8 --
This tree code dates all the way back to r69130[1] which implemented
typing of non-dependent expressions. Its motivation was never clear (to
me at least) since the do
Hi,
I can't approve but hope you don't mind the review,
> +/* Return true if this CODE describes a conditional (masked)
> +internal_fn. */
> +
> +bool
> +cond_fn_p (code_helper code)
> +{
> + if (!code.is_fn_code ())
> +return false;
> +
> + if (!internal_fn_p ((combined_fn) code))
> +
Hi All,
When we have a vector conditional on a masked target which is doing a selection
on the result of a conditional operation where one of the operands of the
conditional operation is the other operand of the select, then we can fold the
vector conditional into the operation.
Concretely this t
Hi All,
For targets that allow conversion between int and float modes this adds a new
optimization transforming fneg (fabs (x)) into x | (1 << signbit(x)). Such
sequences are common in scientific code working with gradients.
The transformed instruction if the target has an inclusive-OR that take
Hi All,
Following the Neoverse N/V and Cortex-A optimization guides SIMD 0 immediates
should be created with a movi of 0.
At the moment we generate an `fmov .., xzr` which is slower and requires a
GP -> FP transfer.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Ok for master?
Hi All,
SVE has much bigger immediate encoding range for bitmasks than Advanced SIMD has
and so on a system that is SVE capable if we need an Advanced SIMD Inclusive-OR
by immediate and would require a reload then an unpredicated SVE ORR instead.
This has both speed and size improvements.
Bootst
Hi All,
This adds a way to generate special sequences for creation of constants for
which we don't have single instructions sequences which would have normally
lead to a GP -> FP transfer or a literal load.
The patch starts out by adding support for creating 1 << 63 using fneg (mov 0).
Bootstrap
Hi All,
This rewrites the simd MOV patterns to use the new compact syntax.
No change in semantics is expected. This will be needed in follow on patches.
This also merges the splits into the define_insn which will also be needed soon.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issue
On Tue, Sep 26, 2023 at 5:51 PM Tamar Christina wrote:
>
> Hi All,
>
> For targets that allow conversion between int and float modes this adds a new
> optimization transforming fneg (fabs (x)) into x | (1 << signbit(x)). Such
> sequences are common in scientific code working with gradients.
>
> T
On Wed, Sep 27, 2023 at 1:53 AM Tamar Christina wrote:
>
> Hi All,
>
> This rewrites the simd MOV patterns to use the new compact syntax.
> No change in semantics is expected. This will be needed in follow on patches.
>
> This also merges the splits into the define_insn which will also be needed
On Wed, Sep 27, 2023 at 1:51 AM Tamar Christina wrote:
>
> Hi All,
>
> Following the Neoverse N/V and Cortex-A optimization guides SIMD 0 immediates
> should be created with a movi of 0.
>
> At the moment we generate an `fmov .., xzr` which is slower and requires a
> GP -> FP transfer.
>
> Bootstr
> -Original Message-
> From: Andrew Pinski
> Sent: Wednesday, September 27, 2023 2:17 AM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de;
> j...@ventanamicro.com
> Subject: Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 <<
> signbit(x)) [PR1
> -Original Message-
> From: Ramana Radhakrishnan
> Sent: Wednesday, September 27, 2023 2:28 AM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov ;
> Richard Sandiford
> Subject: Re: [PATCH]AArch64 Rewrite simd move immedia
From: xuli
Consider the flowing situation:
BB5: local_dem(RVV Insn 1, AVL(reg zero))
RVV Insn 1: vmv.s.x, AVL (const_int 1)
RVV Insn 2: vredsum.vs, AVL(reg zero)
vmv.s.x has vl operand, the following code will get
avl (cosnt_int) from RVV Insn 1.
rtx avl = has_vl_op (insn->rtl ()) ? get_vl (insn
+ vid sequence. The elt (i) can be either const_int or
+ const_poly_int. */
+ HOST_WIDE_INT diff = rtx_to_poly_int64 (builder.elt (i)).to_constant () - i;
How about:
poly_int64 diff = rtx_to_poly_int64 (builder.elt (i)) - i;
rtx avl
- = has_vl_op (insn->rtl ()) ? get_vl (insn
From: Pan Li
This patch would like to support auto-vectorization for the
trunc API in math.h. It depends on the -ffast-math option.
When we would like to call trunc/truncf like v2 = trunc (v1),
we will convert it into below insns (reference the implementation of
llvm).
* vfcvt.rtz.x.f v3, v1
*
LGTM.
juzhe.zh...@rivai.ai
From: pan2.li
Date: 2023-09-27 11:28
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP trunc auto-vectorization
From: Pan Li
This patch would like to support auto-vectorization for the
trunc API in math.h. I
I'm planning to push the attached 1-liner bug fix to mainline tomorrow, when
testing has completed. I think this qualifies as "obvious". There's no test
case because I discovered it while testing the updated loop transformation
patches, which are still a work in progress; it fixed an ICE in th
Hi!
While looking into vec.h, I've noticed we still have a workaround for
GCC 4.1-4.3 bugs.
As we now use C++11 and thus need to be built by GCC 4.8 or later,
I think this is now never used.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
2023-09-27 Jakub Jelinek
Still no chance to get feedback from TC ? Maybe I can commit the below
then ?
AFAICS on gcc mailing list several gcc releases were done recently, too
late.
On 14/09/2023 06:46, François Dumont wrote:
Author: TC
Date: Wed Sep 6 19:31:55 2023 +0200
libstdc++: Force _Hash_node_value_ba
On 27 September 2023 06:43:24 CEST, Jakub Jelinek wrote:
>Hi!
>
>While looking into vec.h, I've noticed we still have a workaround for
>GCC 4.1-4.3 bugs.
This is https://gcc.gnu.org/PR105656
thanks,
>As we now use C++11 and thus need to be built by GCC 4.8 or later,
>I think this is now never u
Hi!
We have some very limited support for non-POD types in vec.h
(in particular grow_cleared will invoke default ctors on the
cleared elements and vector copying invokes copy ctors.
My pending work on wide_int/widest_int which makes those two
non-trivially default/copy constructible, assignable a
Hi,
on 2023/9/25 09:57, HAO CHEN GUI wrote:
> Hi Kewen,
>
> 在 2023/9/18 15:34, Kewen.Lin 写道:
>> hanks for checking! So for P7, this patch looks neutral, but for P8 and
>> later, it may cause some few differences in code gen. I'm curious that how
>> many total object files and different object f
Hi,
on 2023/9/25 10:05, HAO CHEN GUI wrote:
> Hi,
> This patch implements 32bit inline lrint by "fctiw". It depends on
> the patch1 to do SImode move from FP registers on P7.
>
> Compared to last version, the main change is to add some test cases.
> https://gcc.gnu.org/pipermail/gcc-patches/2
> Am 27.09.2023 um 06:43 schrieb Jakub Jelinek :
>
> Hi!
>
> While looking into vec.h, I've noticed we still have a workaround for
> GCC 4.1-4.3 bugs.
> As we now use C++11 and thus need to be built by GCC 4.8 or later,
> I think this is now never used.
>
> Bootstrapped/regtested on x86_64-l
Hi,
As PR111367 shows, with prefixed insn supported, some of
checkings consider it's able to leverage prefixed insn
for stack protect related load/store, but since we don't
actually change the emitted assembly for 32 bit, it can
cause the assembler error as exposed.
Mike's commit r10-4547-gce6a6c
Hi,
The uninitialized variable a in pr60510.f can cause some
random failures as exposed in PR111427, see the details
there. This patch is to make it initialized accordingly.
As verified, it can fix the reported -m32 failures on
P7 and P8 BE. It's also tested well on powerpc64-linux-gnu
P9 and p
Hi Jeff,
On 9/19/23 07:59, Jeff Law wrote:
On 9/18/23 21:37, Vineet Gupta wrote:
On 9/18/23 19:41, Jeff Law wrote:
On 9/18/23 13:45, Vineet Gupta wrote:
For the cases which do require sign extends, but not being
eliminated due to "missing definition(s)" I'm working on adapting
Ajit's RE
Tamar Christina writes:
> Hi All,
>
> Following the Neoverse N/V and Cortex-A optimization guides SIMD 0 immediates
> should be created with a movi of 0.
>
> At the moment we generate an `fmov .., xzr` which is slower and requires a
> GP -> FP transfer.
>
> Bootstrapped Regtested on aarch64-none-l
Fix comments since original comment is confusing.
gcc/ChangeLog:
* tree-if-conv.cc (is_cond_scalar_reduction): Fix comments.
---
gcc/tree-if-conv.cc | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 799f071965e..a8c
101 - 132 of 132 matches
Mail list logo