Hi,
As subject, this patch rewrites integer mla Neon intrinsics to use
RTL builtins rather than inline assembly code, allowing for better
scheduling and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
If ok, please commit to master (I don't have commit rig
GNU style (followed in the header file) is to insert a space between
the function name and the arguments. Same for the other functions.
Ah, yes - will change.
Since other patches like this are on their way, would you mind
going through the process on https://gcc.gnu.org/gitwrite.html
to get commi
ChangeLog:
2021-01-22 Jonathan Wright
* MAINTAINERS (Write After Approval): Add myself.
From 32a93eac7adbb34bb50ed07a9841c870b7ebcb7a Mon Sep 17 00:00:00 2001
From: Jonathan Wright
Date: Fri, 22 Jan 2021 19:09:11 +
Subject: [PATCH] MAINTAINERS: Add myself for write after a
Hi,
As subject, this patch rewrites integer mla_n Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling
and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLo
I have re-written this to use RTL builtins - regression tested and bootstrapped
on aarch64-none-linux-gnu with no issues:
aarch64: Use RTL builtins for integer mls intrinsics
Rewrite integer mls Neon intrinsics to use RTL builtins rather than
inline assembly code, allowing for better scheduling
Hi,
As subject, this patch rewrites integer mls_n Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling
and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
Hi,
As subject, this patch rewrites floating-point mla_n/mls_n intrinsics to use
a + b * c / a - b * c rather than inline assembly code, allowing for better
scheduling and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
--
Hi,
As subject, this patch rewrites [su]mlal Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better
scheduling and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
202
Hi,
As subject, this patch rewrites [su]mlal_n Neon intrinsics to use RTL builtins
rather than inline assembly code, allowing for better scheduling and
optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2
Hi,
As subject, this patch rewrites [su]mlsl_n Neon intrinsics to use RTL builtins
rather than inline assembly code, allowing for better scheduling and
optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2
Hi,
As subject, this patch rewrites [su]mlsl_lane[q] Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling
and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu and
aarch64_be-none-elf - no issues.
Ok for master?
Thanks,
J
Hi,
As subject, this patch rewrites [su]mull_n Neon intrinsics to use RTL builtins
rather than inline assembly code, allowing for better scheduling and
optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
gcc/ChangeLog:
2021-0
Hi,
As subject, this patch adds tests for vmull_high_* Neon intrinsics. Since
these intrinsics are only supported for AArch64, these tests are
restricted to only run on AArch64 targets.
Ok for master?
Thanks,
Jonathan
---
gcc/testsuite/ChangeLog:
2021-01-29 Jonathan Wright
* gcc.t
Woops, didn't attach the diff. Here we go.
Thanks,
Jonathan
From: Jonathan Wright
Sent: 01 February 2021 11:42
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov
Subject: [PATCH] testsuite: aarch64: Add tests for vmull_high intrinsics
Hi,
As subject, this patch add
Hi,
As subject, this patch adds tests for vmlal_high_* and vmlsl_high_*
Neon intrinsics. Since these intrinsics are only supported for AArch64,
these tests are restricted to only run on AArch64 targets.
Ok for master?
Thanks,
Jonathan
---
gcc/testsuite/ChangeLog:
2021-01-31 Jonathan Wright
Hi,
As subject, this patch rewrites [su]mlal_high Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling
and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
Hi,
As subject, this patch rewrites [su]mlal_high_n Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling and
optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu and
aarch64_be-none-elf - no issues.
Ok for master?
Thanks,
Jon
Hi,
As subject, this patch rewrites [su]mlsl_high_n Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling and
optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu and
aarch64_be-none-elf - no issues.
Ok for master?
Thanks,
Jon
Hi,
As subject, this patch rewrites [su]mlal_high_lane[q] Neon intrinsics to use
RTL builtins rather than inline assembly code, allowing for better scheduling
and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu and
aarch64_be-none-elf - no issues.
Ok for master?
Thank
Hi,
As subject, this patch rewrites [su]mlsl_high_lane[q] Neon intrinsics to use
RTL builtins rather than inline assembly code, allowing for better scheduling
and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu and
aarch64_be-none-elf - no issues.
Ok for master?
Thank
Hi,
As subject, this patch rewrites [su]mull_high_n Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling and
optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu and
aarch64_be-none-elf - no issues.
Ok for master?
Thanks,
Jon
Hi,
As subject this patch rewrites [su]mull_high_lane[q] Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling and
optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu and
aarch64_be-none-elf - no issues.
Ok for master?
Thanks
Hi,
As subject, this patch adds tests for vpaddq_* Neon intrinsics. Since these
intrinsics are only supported for AArch64, these tests are restricted to
only run on AArch64 targets.
(There are currently no tests covering these intrinsics.)
Ok for master?
Thanks,
Jonathan
---
gcc/testsuite/Cha
Hi,
As subject, this patch adds tests for v[r]addhn_high and v[r]subhn_high Neon
intrinsics. Since these intrinsics are only supported for AArch64, these tests
are restricted to only run on AArch64 targets.
Ok for master?
Thanks,
Jonathan
---
gcc/testsuite/ChangeLog:
2021-03-02 Jonathan Wrig
Hi,
As subject, this patch adds tests for v[r]shrn_high Neon intrinsics. Since
these intrinsics are only supported for AArch64, these tests are restricted
to only run on AArch64 targets.
Ok for master?
Thanks,
Jonathan
---
gcc/testsuite/ChangeLog:
2021-03-02 Jonathan Wright
* gcc.
Hi,
As subject, this patch adds tests for v[q]mov[u]n_high Neon intrinsics. Since
these intrinsics are only supported for AArch64, these tests are restricted
to only run on AArch64 targets.
Ok for master?
Thanks,
Jonathan
---
gcc/testsuite/ChangeLog:
2021-03-02 Jonathan Wright
* g
Hi,
As subject, this patch adds tests for vcvtx* and vcvt_fXX_fXX floating-point
Neon intrinsics. Since these intrinsics are only supported for AArch64, these
tests are restricted to only run on AArch64 targets.
Ok for master?
Thanks,
Jonathan
---
gcc/testsuite/ChangeLog:
2021-02-18 Jonathan
: Christophe Lyon
Sent: 03 August 2021 10:42
To: Jonathan Wright
Cc: gcc-patches@gcc.gnu.org ; Richard Sandiford
Subject: Re: [PATCH 1/8] aarch64: Use memcpy to copy vector tables in
vqtbl[234] intrinsics
On Fri, Jul 23, 2021 at 10:22 AM Jonathan Wright via Gcc-patches
wrote:
Hi,
This patch
): Declare.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vmul_high_cost.c: New test.
From: Richard Sandiford
Sent: 04 August 2021 10:05
To: Jonathan Wright via Gcc-patches
Cc: Jonathan Wright
Subject: Re: [PATCH] aarch64: Don't include vec_select high-half in SIMD
multiply cost
Jon
Hi,
V2 of this patch uses the same approach as that just implemented
for the multiply high-half cost patch.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-07-28 Jonathan Wright
* config/aarch64/aa
Hi,
V2 of this change implements the same approach as for the multiply
and add-widen patches.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-07-28 Jonathan Wright
* config/aarch64/aarch64.c: Traver
Hi,
As subject, this patch uses __builtin_memcpy to copy vector structures
instead of using a union - or constructing a new opaque structure one
vector at a time - in each of the vst4[q]_lane Neon intrinsics in
arm_neon.h.
It also adds new code generation tests to verify that superfluous move
ins
Hi,
As subject, this patch uses __builtin_memcpy to copy vector structures
instead of using a union - or constructing a new opaque structure one
vector at a time - in each of the vst3[q]_lane Neon intrinsics in
arm_neon.h.
It also adds new code generation tests to verify that superfluous move
ins
Hi,
As subject, this patch uses __builtin_memcpy to copy vector structures
instead of using a union - or constructing a new opaque structure one
vector at a time - in each of the vst2[q]_lane Neon intrinsics in
arm_neon.h.
It also adds new code generation tests to verify that superfluous move
ins
Hi,
As subject, this patch uses __builtin_memcpy to copy vector structures
instead of using a union - or constructing a new opaque structure one
vector at a time - in each of the vst[234][q] and vst1[q]_x[234] bfloat
Neon intrinsics in arm_neon.h.
It also adds new code generation tests to verify
Hi,
I've corrected the quoting and moved everything on to one line.
Ok for master?
Thanks,
Jonathan
---
gcc/testsuite/ChangeLog:
2021-08-04 Jonathan Wright
* gcc.target/aarch64/vector_structure_intrinsics.c: Restrict
tests to little-endian targets.
From: Richard Sandifo
Hi,
Some scan-assembler tests for SVE code generation were erroneously
split over multiple lines - meaning they became invalid. This patch
gets the tests working again by putting each test on a single line.
The extract_[1234].c tests are corrected to expect that extracted
32-bit values are moved
Hi,
This patch removes macros for vld2[q]_lane Neon intrinsics. This is a
preparatory step before adding new modes for structures of Advanced
SIMD vectors.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-08-12
Hi,
This patch removes macros for vld3[q]_lane Neon intrinsics. This is a
preparatory step before adding new modes for structures of Advanced
SIMD vectors.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-08-16
Hi,
This patch removes macros for vld4[q]_lane Neon intrinsics. This is a
preparatory step before adding new modes for structures of Advanced
SIMD vectors.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-08-16
-
gcc/ChangeLog:
2021-08-18 Jonathan Wright
* config/aarch64/arm_neon.h (vld4_lane_f32): Use float RTL
pattern.
(vld4q_lane_f64): Use float type cast.
From: Andreas Schwab
Sent: 18 August 2021 13:09
To: Jonathan Wright via Gcc-patches
Cc: Jonathan Wright ; Richard San
Hi,
This patch fixes type qualifiers for the qtbl1 and qtbx1 Neon builtins
and removes the casts from the Neon intrinsic function bodies that
use these builtins.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
23-09
Hi,
Each of the comments on the previous version of the patch have been
addressed.
Ok for master?
Thanks,
Jonathan
From: Richard Sandiford
Sent: 22 October 2021 16:13
To: Jonathan Wright
Cc: gcc-patches@gcc.gnu.org ; Kyrylo Tkachov
Subject: Re: [PATCH 4/6] aarch64: Add machine modes for Ne
Hi,
This patch declares type-qualified builtins and uses them for MLA/MLS
Neon intrinsics that operate on unsigned types. This eliminates lots of
casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
Hi,
This patch declares poly type-qualified builtins and uses them for
PMUL[L] Neon intrinsics. This removes the need for casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-11-08 Jonathan Wri
Hi,
This patch declares unsigned type-qualified builtins and uses them for
XTN[2] Neon intrinsics. This removes the need for casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-11-08 Jonathan
Hi,
Thus patch declares unsigned type-qualified builtins and uses them for
[R]SHRN[2] Neon intrinsics. This removes the need for casts in
arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-11-08 Jonat
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement widening-add Neon intrinsics. This removes the need for
many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement widening-subtract Neon intrinsics. This removes the need
for many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLo
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement (rounding) halving-add Neon intrinsics. This removes the
need for many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/Cha
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement halving-subtract Neon intrinsics. This removes the need for
many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement (rounding) halving-narrowing-add Neon intrinsics. This
removes the need for many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
--
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement (rounding) halving-narrowing-subtract Neon intrinsics. This
removes the need for many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonatha
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement the pairwise addition Neon intrinsics. This removes the need
for many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/Chan
Hi,
This patch declares unsigned type-qualified builtins and uses them to
implement the vector reduction Neon intrinsics. This removes the need
for many casts in arm_neon.h.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/Chang
Hi,
This patch declares unsigned and polynomial type-qualified builtins and
uses them to implement the LD1/ST1 Neon intrinsics. This removes the
need for many casts in arm_neon.h.
The new type-qualified builtins are also lowered to gimple - as the
unqualified builtins are already.
Regression tes
Hi,
This patch declares unsigned and polynomial type-qualified builtins for
vcombine_* Neon intrinsics. Using these builtins removes the need for
many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeL
Hi,
This patch declares unsigned and polynomial type-qualified builtins for
vget_low_*/vget_high_* Neon intrinsics. Using these builtins removes
the need for many casts in arm_neon.h.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
Hi,
The existing vec_pack_trunc RTL pattern emits an opaque two-
instruction assembly code sequence that prevents proper instruction
scheduling. This commit changes the pattern to an expander that emits
individual xtn and xtn2 instructions.
This commit also consolidates the duplicate truncation p
Hi,
As subject, this patch corrects the type attribute in RTL patterns that
generate XTN/XTN2 instructions to be "neon_move_narrow_q".
This makes a material difference because these instructions can be
executed on both SIMD pipes in the Cortex-A57 core model, whereas the
"neon_shift_imm_narrow_q"
The pointer parameter to load a vector of signed values should itself
be a signed type. This patch fixes two instances of this unsigned-
signed implicit conversion in arm_neon.h.
Tested relevant intrinsics with -Wpointer-sign and warnings no longer
present.
Ok for master?
Thanks,
Jonathan
---
Hi,
As subject, this patch deletes some redundant type definitions in
arm_neon.h. These vector type definitions are an artifact from the initial
commit that added the AArch64 port.
Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gc
Hi,
As subject, this patch declares the Neon vector-tuple types inside the
compiler instead of in the arm_neon.h header. This is a necessary first
step before adding corresponding machine modes to the AArch64
backend.
The vector-tuple types are implemented using a #pragma. This means
initializati
Hi,
A long time ago, using a parallel to take a subreg of a SIMD register
was broken. This temporary fix[1] (from 2003) spilled these registers
to memory and reloaded the appropriate part to obtain the subreg.
The fix initially existed for the benefit of the PowerPC E500 - a
platform for which GC
Hi,
Extracting a bitfield from a vector can be achieved by casting the
vector to a new type whose elements are the same size as the desired
bitfield, before generating a subreg. However, this is only an
optimization if the original vector can be accessed in the new
machine mode without first being
Hi,
Preventing decomposition if modes are not tieable is necessary to
stop AArch64 partial Neon structure modes being treated as packed in
registers.
This is a necessary prerequisite for a future AArch64 PCS change to
maintain good code generation.
Bootstrapped and regression tested on:
* x86_64
Hi,
Neon vector-tuple types can be passed in registers on function call
and return - there is no need to generate a parallel rtx. This patch
adds cases to detect vector-tuple modes and generates an appropriate
register rtx.
This change greatly improves code generated when passing Neon vector-
tup
Hi,
Until now, GCC has used large integer machine modes (OI, CI and XI)
to model Neon vector-tuple types. This is suboptimal for many
reasons, the most notable are:
1) Large integer modes are opaque and modifying one vector in the
tuple requires a lot of inefficient set/get gymnastics. The
Hi,
Version 2 of this patch adds more code generation tests to show the
benefit of this RTL simplification as well as adding a new helper function
'rtx_vec_series_p' to reduce code duplication.
Patch tested as version 1 - ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2021-06-08 Jonathan
Hi,
As subject, this patch uses a union instead of constructing a new opaque
vector structure for each of the vqtbl[234] Neon intrinsics in arm_neon.h.
This simplifies the header file and also improves code generation -
superfluous move instructions were emitted for every register
extraction/set i
Hi,
A recent change "gcc: Add vec_select -> subreg RTL simplification"
updated the expected test results for SVE extraction tests. The new
result should only have been changed for little endian. This patch
restores the old expected result for big endian.
Ok for master?
Thanks,
Jonathan
---
gcc
Ah, yes - those test results should have only been changed for little endian.
I've submitted a patch to the list restoring the original expected results for
big
endian.
Thanks,
Jonathan
From: Christophe Lyon
Sent: 15 July 2021 10:09
To: Richard Sandiford ; Jonat
Hi,
As subject, this patch renames the two-source-register TBL/TBX RTL
patterns so that their names better reflect what they do, rather than
confusing them with tbl3 or tbx4 patterns. Also use the correct
"neon_tbl2" type attribute for both patterns.
Rename single-source-register TBL/TBX patterns
Hi,
The Neon multiply/multiply-accumulate/multiply-subtract instructions
can take various forms - multiplying full vector registers of values
or multiplying one vector by a single element of another. Regardless
of the form used, these instructions have the same cost, and this
should be reflected b
Hi,
As a general principle, vec_duplicate should be as close to the root
of an expression as possible. Where unary operations have
vec_duplicate as an argument, these operations should be pushed
inside the vec_duplicate.
This patch modifies unary operation simplification to push
sign/zero-extensi
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vqtbl[234] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vqtbx[234] Neon intrinsics in arm_neon.h. This simplifies the header
file and also improves code generation - superfluous move
instructions were emitted for
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vtbl[34] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for e
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vtbx4 Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for ever
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst4[q] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for ev
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst3[q] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for ev
I haven't added test cases here because these intrinsics don't map to
a single instruction (they're legacy from Armv7) and would trip the
"scan-assembler not mov" that we're using for the other tests.
Jonathan
From: Richard Sandiford
Sent: 23 July 2021 10:29
To: K
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst2[q] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for ev
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
using a union in each of the vst1[q]_x4 Neon intrinsics in arm_neon.h.
Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x4 intrinsics.
Regression tested and bootstr
Same explanation as for patch 3/8:
I haven't added test cases here because these intrinsics don't map to
a single instruction (they're legacy from Armv7) and would trip the
"scan-assembler not mov" that we're using for the other tests.
Thanks,
Jonathan
From: Richa
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst1[q]_x3 Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for
Hi,
This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst1[q]_x2 Neon intrinsics in arm_neon.h. This simplifies the header
file and also improves code generation - superfluous move
instructions were emitted for
Hi,
This updated patch fixes the two-operators-per-row style issue in the
aarch64-simd.md RTL patterns as well as integrating the simplify-rtx.c
change as suggested.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
Hi,
V2 of the patch addresses the initial review comments, factors out
common code (as we discussed off-list) and adds a set of unit tests
to verify the code generation benefit.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/C
Hi,
The Neon multiply/multiply-accumulate/multiply-subtract instructions
can select the top or bottom half of the operand registers. This
selection does not change the cost of the underlying instruction and
this should be reflected by the RTL cost function.
This patch adds RTL tree traversal in t
Hi,
The Neon add-long/add-widen instructions can select the top or bottom
half of the operand registers. This selection does not change the
cost of the underlying instruction and this should be reflected by
the RTL cost function.
This patch adds RTL tree traversal in the Neon add cost function to
Hi,
The Neon subtract-long/subract-widen instructions can select the top
or bottom half of the operand registers. This selection does not
change the cost of the underlying instruction and this should be
reflected by the RTL cost function.
This patch adds RTL tree traversal in the Neon subtract co
Hi,
As subject, this patch rewrites the vmull[_high]_p8 Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling and
optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu and
aarch64_be-none-elf - no issues.
Ok for master?
Thanks,
Hi,
As subject, this patch rewrites the vq[r]dmulh[q]_n Neon intrinsics to use
RTL builtins rather than inline assembly code, allowing for better scheduling
and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/Chan
Hi,
As subject, this patch rewrites the vpaddq Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling
and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
2
Hi,
As subject, this patch rewrites the [su]paddl[q] Neon intrinsics to use
RTL builtins rather than inline assembly code, allowing for better
scheduling and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeL
Hi,
As subject, this patch rewrites the vpadal_[su]32 Neon intrinsics to use
RTL builtins rather than inline assembly code, allowing for better
scheduling and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/Change
Hi,
As subject, this patch rewrites the vsli[q]_n_p* Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling
and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeL
Hi,
As subject, this patch rewrites the vsri[q]_n_p* Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling
and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeL
Hi,
As subject, this patch rewrites the v[q]tbl Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling
and optimization.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.
Ok for master?
Thanks,
Jonathan
---
gcc/ChangeLog:
1 - 100 of 134 matches
Mail list logo