Hi,
This patch declares unsigned and polynomial type-qualified builtins for
vget_low_*/vget_high_* Neon intrinsics. Using these builtins removes
the need for many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
Hi,
This patch declares unsigned and polynomial type-qualified builtins for
vcombine_* Neon intrinsics. Using these builtins removes the need for
many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeL
Hi,
This patch declares unsigned and polynomial type-qualified builtins and
uses them to implement the LD1/ST1 Neon intrinsics. This removes the
need for many casts in arm_neon.h.
The new type-qualified builtins are also lowered to gimple - as the
unqualified builtins are already.
Regression tes
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement the vector reduction Neon intrinsics. This removes the need
for many casts in arm_neon.h.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/Chang
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement the pairwise addition Neon intrinsics. This removes the need
for many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/Chan
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement (rounding) halving-narrowing-subtract Neon intrinsics. This
removes the need for many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonatha
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement (rounding) halving-narrowing-add Neon intrinsics. This
removes the need for many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
--
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement halving-subtract Neon intrinsics. This removes the need for
many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement (rounding) halving-add Neon intrinsics. This removes the
need for many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/Cha
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement widening-subtract Neon intrinsics. This removes the need
for many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLo
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement widening-add Neon intrinsics. This removes the need for
many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2
Hi,
Thus patch declares unsigned type-qualified builtins and uses them for
[R]SHRN[2] Neon intrinsics. This removes the need for casts in
arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-11-08 Jonat
Hi,
This patch declares unsigned type-qualified builtins and uses them for
XTN[2] Neon intrinsics. This removes the need for casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-11-08 Jonathan
Hi,
This patch declares poly type-qualified builtins and uses them for
PMUL[L] Neon intrinsics. This removes the need for casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-11-08 Jonathan Wri
Hi,
This patch declares type-qualified builtins and uses them for MLA/MLS
Neon intrinsics that operate on unsigned types. This eliminates lots of
casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
Hi,
Each of the comments on the previous version of the patch have been
addressed.
Ok for master?
Thanks,
Jonathan
From: Richard Sandiford
Sent: 22 October 2021 16:13
To: Jonathan Wright
Cc: gcc-patches@gcc.gnu.org ; Kyrylo Tkachov
Subject: Re: [PATCH 4/6] aarch64: Add machine modes for Ne
Hi,
Until now, GCC has used large integer machine modes (OI, CI and XI)
to model Neon vector-tuple types. This is suboptimal for many
reasons, the most notable are:
1) Large integer modes are opaque and modifying one vector in the
tuple requires a lot of inefficient set/get gymnastics. The
Hi,
Neon vector-tuple types can be passed in registers on function call
and return - there is no need to generate a parallel rtx. This patch
adds cases to detect vector-tuple modes and generates an appropriate
register rtx.
This change greatly improves code generated when passing Neon vector-
tup
Hi,
Preventing decomposition if modes are not tieable is necessary to
stop AArch64 partial Neon structure modes being treated as packed in
registers.
This is a necessary prerequisite for a future AArch64 PCS change to
maintain good code generation.
Bootstrapped and regression tested on:
* x86_64
Hi,
Extracting a bitfield from a vector can be achieved by casting the
vector to a new type whose elements are the same size as the desired
bitfield, before generating a subreg. However, this is only an
optimization if the original vector can be accessed in the new
machine mode without first being
Hi,
A long time ago, using a parallel to take a subreg of a SIMD register
was broken. This temporary fix[1] (from 2003) spilled these registers
to memory and reloaded the appropriate part to obtain the subreg.
The fix initially existed for the benefit of the PowerPC E500 - a
platform for which GC
Hi,
As subject, this patch declares the Neon vector-tuple types inside the
compiler instead of in the arm_neon.h header. This is a necessary first
step before adding corresponding machine modes to the AArch64
backend.
The vector-tuple types are implemented using a #pragma. This means
initializati
Hi,
As subject, this patch deletes some redundant type definitions in
arm_neon.h. These vector type definitions are an artifact from the initial
commit that added the AArch64 port.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gc
The pointer parameter to load a vector of signed values should itself
be a signed type. This patch fixes two instances of this unsigned-
signed implicit conversion in arm_neon.h.
Tested relevant intrinsics with -Wpointer-sign and warnings no longer
present.
Ok for master?
Thanks,
Jonathan
---
Hi,
This patch fixes type qualifiers for the qtbl1 and qtbx1 Neon builtins
and removes the casts from the Neon intrinsic function bodies that
use these builtins.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
23-09
-
gcc/ChangeLog:
2021-08-18 Jonathan Wright
* config/aarch64/arm_neon.h (vld4_lane_f32): Use float RTL
pattern.
(vld4q_lane_f64): Use float type cast.
From: Andreas Schwab
Sent: 18 August 2021 13:09
To: Jonathan Wright via Gcc-patches
Cc: Jonathan Wright ; Richard San
Hi,
This patch removes macros for vld4[q]_lane Neon intrinsics. This is a
preparatory step before adding new modes for structures of Advanced
SIMD vectors.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-08-16
Hi,
This patch removes macros for vld3[q]_lane Neon intrinsics. This is a
preparatory step before adding new modes for structures of Advanced
SIMD vectors.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-08-16
Hi,
This patch removes macros for vld2[q]_lane Neon intrinsics. This is a
preparatory step before adding new modes for structures of Advanced
SIMD vectors.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-08-12
Hi,
Some scan-assembler tests for SVE code generation were erroneously
split over multiple lines - meaning they became invalid. This patch
gets the tests working again by putting each test on a single line.
The extract_[1234].c tests are corrected to expect that extracted
32-bit values are moved
Hi,
I've corrected the quoting and moved everything on to one line.
Ok for master?
Thanks,
Jonathan
---
gcc/testsuite/ChangeLog:
2021-08-04 Jonathan Wright
* gcc.target/aarch64/vector_structure_intrinsics.c: Restrict
tests to little-endian targets.
From: Richard Sandifo
Hi,
As subject, this patch uses __builtin_memcpy to copy vector structures
instead of using a union - or constructing a new opaque structure one
vector at a time - in each of the vst[234][q] and vst1[q]_x[234] bfloat
Neon intrinsics in arm_neon.h.
It also adds new code generation tests to verify
Hi,
As subject, this patch uses __builtin_memcpy to copy vector structures
instead of using a union - or constructing a new opaque structure one
vector at a time - in each of the vst2[q]_lane Neon intrinsics in
arm_neon.h.
It also adds new code generation tests to verify that superfluous move
ins
Hi,
As subject, this patch uses __builtin_memcpy to copy vector structures
instead of using a union - or constructing a new opaque structure one
vector at a time - in each of the vst3[q]_lane Neon intrinsics in
arm_neon.h.
It also adds new code generation tests to verify that superfluous move
ins
Hi,
As subject, this patch uses __builtin_memcpy to copy vector structures
instead of using a union - or constructing a new opaque structure one
vector at a time - in each of the vst4[q]_lane Neon intrinsics in
arm_neon.h.
It also adds new code generation tests to verify that superfluous move
ins
Hi,
V2 of this change implements the same approach as for the multiply
and add-widen patches.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-07-28 Jonathan Wright
* config/aarch64/aarch64.c: Traver
Hi,
V2 of this patch uses the same approach as that just implemented
for the multiply high-half cost patch.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-07-28 Jonathan Wright
* config/aarch64/aa
): Declare.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vmul_high_cost.c: New test.
From: Richard Sandiford
Sent: 04 August 2021 10:05
To: Jonathan Wright via Gcc-patches
Cc: Jonathan Wright
Subject: Re: [PATCH] aarch64: Don't include vec_select high-half in SIMD
multiply cost
Jon
: Christophe Lyon
Sent: 03 August 2021 10:42
To: Jonathan Wright
Cc: gcc-patches@gcc.gnu.org ; Richard Sandiford
Subject: Re: [PATCH 1/8] aarch64: Use memcpy to copy vector tables in
vqtbl[234] intrinsics
On Fri, Jul 23, 2021 at 10:22 AM Jonathan Wright via Gcc-patches
wrote:
Hi,
This patch
Hi,
The Neon subtract-long/subract-widen instructions can select the top
or bottom half of the operand registers. This selection does not
change the cost of the underlying instruction and this should be
reflected by the RTL cost function.
This patch adds RTL tree traversal in the Neon subtract co
Hi,
The Neon add-long/add-widen instructions can select the top or bottom
half of the operand registers. This selection does not change the
cost of the underlying instruction and this should be reflected by
the RTL cost function.
This patch adds RTL tree traversal in the Neon add cost function to
Hi,
The Neon multiply/multiply-accumulate/multiply-subtract instructions
can select the top or bottom half of the operand registers. This
selection does not change the cost of the underlying instruction and
this should be reflected by the RTL cost function.
This patch adds RTL tree traversal in t
Hi,
V2 of the patch addresses the initial review comments, factors out
common code (as we discussed off-list) and adds a set of unit tests
to verify the code generation benefit.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/C
Hi,
This updated patch fixes the two-operators-per-row style issue in the
aarch64-simd.md RTL patterns as well as integrating the simplify-rtx.c
change as suggested.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst1[q]_x2 Neon intrinsics in arm_neon.h. This simplifies the header
file and also improves code generation - superfluous move
instructions were emitted for
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst1[q]_x3 Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for
Same explanation as for patch 3/8:
I haven't added test cases here because these intrinsics don't map to
a single instruction (they're legacy from Armv7) and would trip the
"scan-assembler not mov" that we're using for the other tests.
Thanks,
Jonathan
From: Richa
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
using a union in each of the vst1[q]_x4 Neon intrinsics in arm_neon.h.
Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x4 intrinsics.
Regression tested and bootstr
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst2[q] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for ev
I haven't added test cases here because these intrinsics don't map to
a single instruction (they're legacy from Armv7) and would trip the
"scan-assembler not mov" that we're using for the other tests.
Jonathan
From: Richard Sandiford
Sent: 23 July 2021 10:29
To: K
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst3[q] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for ev
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst4[q] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for ev
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vtbx4 Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for ever
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vtbl[34] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for e
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vqtbx[234] Neon intrinsics in arm_neon.h. This simplifies the header
file and also improves code generation - superfluous move
instructions were emitted for
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vqtbl[234] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for
Hi,
As a general principle, vec_duplicate should be as close to the root
of an expression as possible. Where unary operations have
vec_duplicate as an argument, these operations should be pushed
inside the vec_duplicate.
This patch modifies unary operation simplification to push
sign/zero-extensi
Hi,
The Neon multiply/multiply-accumulate/multiply-subtract instructions
can take various forms - multiplying full vector registers of values
or multiplying one vector by a single element of another. Regardless
of the form used, these instructions have the same cost, and this
should be reflected b
Hi,
As subject, this patch renames the two-source-register TBL/TBX RTL
patterns so that their names better reflect what they do, rather than
confusing them with tbl3 or tbx4 patterns. Also use the correct
"neon_tbl2" type attribute for both patterns.
Rename single-source-register TBL/TBX patterns
Ah, yes - those test results should have only been changed for little endian.
I've submitted a patch to the list restoring the original expected results for
big
endian.
Thanks,
Jonathan
From: Christophe Lyon
Sent: 15 July 2021 10:09
To: Richard Sandiford ; Jonat
Hi,
A recent change "gcc: Add vec_select -> subreg RTL simplification"
updated the expected test results for SVE extraction tests. The new
result should only have been changed for little endian. This patch
restores the old expected result for big endian.
Ok for master?
Thanks,
Jonathan
---
gcc
Hi,
As subject, this patch uses a union instead of constructing a new opaque
vector structure for each of the vqtbl[234] Neon intrinsics in arm_neon.h.
This simplifies the header file and also improves code generation -
superfluous move instructions were emitted for every register
extraction/set i
Hi,
Version 2 of this patch adds more code generation tests to show the
benefit of this RTL simplification as well as adding a new helper function
'rtx_vec_series_p' to reduce code duplication.
Patch tested as version 1 - ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-06-08 Jonathan
Hi,
As subject, this patch adds a new RTL simplification for the case of a
VEC_SELECT selecting the low part of a vector. The simplification
returns a SUBREG.
The primary goal of this patch is to enable better combinations of
Neon RTL patterns - specifically allowing generation of 'write-to-
high
depending on endianness.
(aarch64_hn_insn_le): Define.
(aarch64_hn_insn_be): Define.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/narrow_zero_high_half.c: Add new tests.
From: Gcc-patches on
behalf of Jonathan Wright via Gcc-patches
Sent: 15 June 2021 11:02
To: gcc-patches
endianness.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/narrow_zero_high_half.c: Add new tests.
From: Gcc-patches on
behalf of Jonathan Wright via Gcc-patches
Sent: 15 June 2021 10:59
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] aarch64: Model zero-high-half semantics of [SU]QXTN
): Define.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/narrow_zero_high_half.c: Add new tests.
From: Gcc-patches on
behalf of Jonathan Wright via Gcc-patches
Sent: 15 June 2021 10:52
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] aarch64: Model zero-high-half semantics of SQXTUN
/narrow_zero_high_half.c: Add new tests.
From: Gcc-patches on
behalf of Jonathan Wright via Gcc-patches
Sent: 15 June 2021 10:45
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] aarch64: Model zero-high-half semantics of XTN instruction in
RTL
Hi,
Modeling the zero-high-half semantics of the XTN narrowing
Hi,
This patch adds tests to verify that Neon narrowing-shift instructions
clear the top half of the result vector. It is sufficient to show that a
subsequent combine with a zero-vector is optimized away - leaving
just the narrowing-shift instruction.
Ok for master?
Thanks,
Jonathan
---
gcc/te
Hi,
As subject, this patch models the zero-high-half semantics of the
narrowing arithmetic Neon instructions in the
aarch64_hn RTL pattern. Modeling these
semantics allows for better RTL combinations while also removing
some register allocation issues as the compiler now knows that the
operation i
Hi,
As subject, this patch first splits the aarch64_qmovn
pattern into separate scalar and vector variants. It then further splits
the vector RTL pattern into big/little endian variants that model the
zero-high-half semantics of the underlying instruction. Modeling
these semantics allows for bett
Hi,
As subject, this patch first splits the aarch64_sqmovun pattern
into separate scalar and vector variants. It then further split the vector
pattern into big/little endian variants that model the zero-high-half
semantics of the underlying instruction. Modeling these semantics
allows for better R
Hi,
Modeling the zero-high-half semantics of the XTN narrowing
instruction in RTL indicates to the compiler that this is a totally
destructive operation. This enables more RTL simplifications and also
prevents some register allocation issues.
Regression tested and bootstrapped on aarch64-none-lin
Hi,
As subject, this patch corrects the type attribute in RTL patterns that
generate XTN/XTN2 instructions to be "neon_move_narrow_q".
This makes a material difference because these instructions can be
executed on both SIMD pipes in the Cortex-A57 core model, whereas the
"neon_shift_imm_narrow_q"
Hi,
The existing vec_pack_trunc RTL pattern emits an opaque two-
instruction assembly code sequence that prevents proper instruction
scheduling. This commit changes the pattern to an expander that emits
individual xtn and xtn2 instructions.
This commit also consolidates the duplicate truncation p
Hi,
As subject, this patch adds tests to confirm that a *2 (write to high-half)
Neon instruction is generated from vcombine* of a narrowing intrinsic
sequence.
Ok for master?
Thanks,
Jonathan
---
gcc/testsuite/ChangeLog:
2021-05-14 Jonathan Wright
* gcc.target/aarch64/narrow_high_
Hi,
As subject, this patch splits the aarch64_qshrn_n
pattern into separate scalar and vector variants. It further splits the vector
pattern into big/little endian variants that model the zero-high-half
semantics of the underlying instruction - allowing for more combinations
with the write-to-high
Hi,
As subject, this patch uses UNSPEC_SQXTUN instead of UNSPEC_SQXTUN2
in the aarch64_sqxtun2 patterns. This allows for more more
aggressive combinations and ultimately better code generation - which will
be confirmed by a new set of tests in
gcc.target/aarch64/narrow_high_combine.c (patch 5/5 in
Hi,
As subject, this patch implements saturating right-shift and narrow high
Neon intrinsic RTL patterns using a vec_concat of a register_operand
and a VQSHRN_N unspec - instead of just a VQSHRN_N unspec. This
more relaxed pattern allows for more aggressive combinations and
ultimately better code
Hi,
As subject, this patch implements v[r]addhn2 and v[r]subhn2 Neon intrinsic
RTL patterns using a vec_concat of a register_operand and an ADDSUBHN
unspec - instead of just an ADDSUBHN2 unspec. This more relaxed pattern
allows for more aggressive combinations and ultimately better code
generation
Hi Richard,
I think you may be referencing an older checkout as we refactored this
pattern in a previous change to:
(define_insn "mul_lane3"
[(set (match_operand:VMUL 0 "register_operand" "=w")
(mult:VMUL
(vec_duplicate:VMUL
(vec_select:
(match_operand:VMUL 2 "register_oper
via Gcc-patches
Cc: Jonathan Wright
Subject: Re: [PATCH 14/20] testsuite: aarch64: Add fusion tests for FP vml[as]
intrinsics
Jonathan Wright via Gcc-patches writes:
> Hi,
>
> As subject, this patch adds compilation tests to make sure that the output
> of vmla/vmls floating-point Neo
Updated the patch to be more consistent with the others in the series.
Tested and bootstrapped on aarch64-none-linux-gnu - no issues.
Ok for master?
Thanks,
Jonathan
From: Gcc-patches on behalf of Jonathan
Wright via Gcc-patches
Sent: 28 April 2021 15:42
To
Patch updated as per suggestion (similar to patch 10/20.)
Tested and bootstrapped on aarch64-none-linux-gnu - no issues.
Ok for master?
Thanks,
Jonathan
From: Richard Sandiford
Sent: 28 April 2021 16:37
To: Jonathan Wright via Gcc-patches
Cc: Jonathan Wright
Patch updated as per your suggestion.
Tested and bootstrapped on aarch64-none-linux-gnu - no issues.
Ok for master?
Thanks,
Jonathan
From: Richard Sandiford
Sent: 28 April 2021 16:11
To: Jonathan Wright via Gcc-patches
Cc: Jonathan Wright
Subject: Re: [PATCH
Thanks for the review, I've updated the patch as per option 1.
Tested and bootstrapped on aarch64-none-linux-gnu with no issues.
Ok for master?
Thanks,
Jonathan
From: Richard Sandiford
Sent: 28 April 2021 15:11
To: Jonathan Wright via Gcc-patches
Cc: Jon
Hi,
Saturating truncation can be expressed using the RTL expressions
ss_truncate and us_truncate. This patch changes the implementation
of the vqmovn_* Neon intrinsics to use these RTL expressions rather
than a pair of unspecs. The redundant unspecs are removed along with
their code iterator.
Reg
Hi,
As subject, this patch updates the attributes of all intrinsics defined in
arm_acle.h to be consistent with the attributes of the intrinsics defined
in arm_neon.h. Specifically, this means updating the attributes from:
__extension__ static __inline
__attribute__ ((__always_inline__))
to:
Hi,
As subject, this patch updates the attributes of all intrinsics defined in
arm_fp16.h to be consistent with the attributes of the intrinsics defined
in arm_neon.h. Specifically, this means updating the attributes from:
__extension__ static __inline
__attribute__ ((__always_inline__))
to:
Hi,
As subject, this patch implements the saturating right-shift and narrow
high Neon intrinsic RTL patterns using a vec_concat of a register_operand
and a VQSHRN_N unspec - instead of just a VQSHRN2_N unspec. This
more relaxed pattern allows for more aggressive combinations and
ultimately better
Hi,
As subject, this patch implements the v[r]addhn2 and v[r]subhn2 Neon
intrinsic RTL patterns using a vec_concat of a register_operand and an
ADDSUBHN unspec - instead of just an ADDSUBHN2 unspec. This more
relaxed pattern allows for more aggressive combinations and ultimately
better code genera
Hi,
As subject, this patch rewrites the vcvtx Neon intrinsics to use RTL builtins
rather than inline assembly code, allowing for better scheduling and
optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu and
aarch64_be-none-elf - no issues.
Ok for master?
Thanks,
Jonathan
Hi,
As subject, this patch adds compilation tests to make sure that the output
of vmla/vmls floating-point Neon intrinsics (fmul, fadd/fsub) is not fused
into fmla/fmls instructions.
Ok for master?
Thanks,
Jonathan
---
gcc/testsuite/ChangeLog:
2021-02-16 Jonathan Wright
* gcc.targ
Hi,
As subject, this patch rewrites the floating-point vml[as][q]_laneq Neon
intrinsics to use RTL builtins rather than relying on the GCC vector
extensions. Using RTL builtins allows control over the emission of
fmla/fmls instructions (which we don't want here.)
With this commit, the code genera
Hi,
As subject, this patch rewrites the floating-point vml[as][q]_lane Neon
intrinsics to use RTL builtins rather than relying on the GCC vector
extensions. Using RTL builtins allows control over the emission of
fmla/fmls instructions (which we don't want here.)
With this commit, the code generat
Hi,
As subject, this patch rewrites the floating-point vml[as][q] Neon intrinsics
to use RTL builtins rather than relying on the GCC vector extensions.
Using RTL builtins allows control over the emission of fmla/fmls
instructions (which we don't want here.)
With this commit, the code generated by
Hi,
As subject, this patch rewrites the floating-point vml[as][q]_n Neon
intrinsics to use RTL builtins rather than inline assembly code, allowing
for better scheduling and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
-
Hi,
As subject, this patch rewrites the v[q]tbx Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling
and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
Hi,
As subject, this patch rewrites the v[q]tbl Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling
and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
Hi,
As subject, this patch rewrites the vsri[q]_n_p* Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling
and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeL
1 - 100 of 134 matches
Mail list logo