[PATCH] aarch64: Use RTL builtins for integer mla intrinsics

2021-01-22 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites integer mla Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. If ok, please commit to master (I don't have commit rig

Re: [PATCH] aarch64: Use RTL builtins for integer mla intrinsics

2021-01-22 Thread Jonathan Wright via Gcc-patches
GNU style (followed in the header file) is to insert a space between the function name and the arguments. Same for the other functions. Ah, yes - will change. Since other patches like this are on their way, would you mind going through the process on https://gcc.gnu.org/gitwrite.html to get commi

[COMMITTED] MAINTAINERS: Add myself for write after approval

2021-01-22 Thread Jonathan Wright via Gcc-patches
ChangeLog: 2021-01-22 Jonathan Wright * MAINTAINERS (Write After Approval): Add myself. From 32a93eac7adbb34bb50ed07a9841c870b7ebcb7a Mon Sep 17 00:00:00 2001 From: Jonathan Wright Date: Fri, 22 Jan 2021 19:09:11 + Subject: [PATCH] MAINTAINERS: Add myself for write after a

[PATCH] aarch64: Use RTL builtins for integer mla_n intrinsics

2021-01-26 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites integer mla_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLo

Re: [PATCH] aarch64: Use GCC vector extensions for integer mls intrinsics

2021-01-27 Thread Jonathan Wright via Gcc-patches
I have re-written this to use RTL builtins - regression tested and bootstrapped on aarch64-none-linux-gnu with no issues: aarch64: Use RTL builtins for integer mls intrinsics Rewrite integer mls Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling

aarch64: Use RTL builtins for integer mls_n intrinsics

2021-01-27 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites integer mls_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog:

[PATCH] aarch64: Use GCC vector extensions for FP ml[as]_n intrinsics

2021-01-27 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites floating-point mla_n/mls_n intrinsics to use a + b * c / a - b * c rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --

[PATCH] aarch64: Use RTL builtins for [su]mlal intrinsics

2021-01-27 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites [su]mlal Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 202

aarch64: Use RTL builtins for [su]mlal_n intrinsics

2021-01-28 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites [su]mlal_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2

[PATCH] aarch64: Use RTL builtins for [su]mlsl_n intrinsics

2021-01-28 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites [su]mlsl_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2

[PATCH] aarch64: Use RTL builtins for [su]mlsl_lane[q] intrinsics

2021-01-29 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites [su]mlsl_lane[q] Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu and aarch64_be-none-elf - no issues. Ok for master? Thanks, J

[PATCH] aarch64: Use RTL builtins for [su]mull_n intrinsics

2021-01-29 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites [su]mull_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan gcc/ChangeLog: 2021-0

[PATCH] testsuite: aarch64: Add tests for vmull_high intrinsics

2021-02-01 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch adds tests for vmull_high_* Neon intrinsics. Since these intrinsics are only supported for AArch64, these tests are restricted to only run on AArch64 targets. Ok for master? Thanks, Jonathan --- gcc/testsuite/ChangeLog: 2021-01-29  Jonathan Wright   * gcc.t

Re: [PATCH] testsuite: aarch64: Add tests for vmull_high intrinsics

2021-02-01 Thread Jonathan Wright via Gcc-patches
Woops, didn't attach the diff. Here we go. Thanks, Jonathan From: Jonathan Wright Sent: 01 February 2021 11:42 To: gcc-patches@gcc.gnu.org Cc: Kyrylo Tkachov Subject: [PATCH] testsuite: aarch64: Add tests for vmull_high intrinsics Hi, As subject, this patch add

[PATCH] testsuite: aarch64: Add tests for vmlXl_high intrinsics

2021-02-01 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch adds tests for vmlal_high_* and vmlsl_high_* Neon intrinsics. Since these intrinsics are only supported for AArch64, these tests are restricted to only run on AArch64 targets. Ok for master? Thanks, Jonathan --- gcc/testsuite/ChangeLog: 2021-01-31  Jonathan Wright  

[PATCH] aarch64: Use RTL builtins for [su]mlal_high intrinsics

2021-02-03 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites [su]mlal_high Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog:

[PATCH] aarch64: Use RTL builtins for [su]mlal_high_n intrinsics

2021-02-03 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites [su]mlal_high_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu and aarch64_be-none-elf - no issues. Ok for master? Thanks, Jon

[PATCH] aarch64: Use RTL builtins for [su]mlsl_high_n intrinsics

2021-02-03 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites [su]mlsl_high_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu and aarch64_be-none-elf - no issues. Ok for master? Thanks, Jon

[PATCH] aarch64: Use RTL builtins for [su]mlal_high_lane[q] intrinsics

2021-02-03 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites [su]mlal_high_lane[q] Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu and aarch64_be-none-elf - no issues. Ok for master? Thank

[PATCH] aarch64: Use RTL builtins for [su]mlsl_high_lane[q] intrinsics

2021-02-03 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites [su]mlsl_high_lane[q] Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu and aarch64_be-none-elf - no issues. Ok for master? Thank

[PATCH] aarch64: Use RTL builtins for [su]mull_high_n intrinsics

2021-02-04 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites [su]mull_high_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu and aarch64_be-none-elf - no issues. Ok for master? Thanks, Jon

[PATCH] aarch64: Use RTL builtins for [su]mull_high_lane[q] intrinsics

2021-02-04 Thread Jonathan Wright via Gcc-patches
Hi, As subject this patch rewrites [su]mull_high_lane[q] Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu and aarch64_be-none-elf - no issues. Ok for master? Thanks

[PATCH] testsuite: aarch64: Add tests for vpaddq intrinsics

2021-02-09 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch adds tests for vpaddq_* Neon intrinsics. Since these intrinsics are only supported for AArch64, these tests are restricted to only run on AArch64 targets. (There are currently no tests covering these intrinsics.) Ok for master? Thanks, Jonathan --- gcc/testsuite/Cha

[PATCH] testsuite: aarch64: Add tests for narrowing-arithmetic intrinsics

2021-03-03 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch adds tests for v[r]addhn_high and v[r]subhn_high Neon intrinsics. Since these intrinsics are only supported for AArch64, these tests are restricted to only run on AArch64 targets. Ok for master? Thanks, Jonathan --- gcc/testsuite/ChangeLog: 2021-03-02  Jonathan Wrig

[PATCH] testsuite: aarch64: Add tests for v[r]shrn_high intrinsics

2021-03-03 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch adds tests for v[r]shrn_high Neon intrinsics. Since these intrinsics are only supported for AArch64, these tests are restricted to only run on AArch64 targets. Ok for master? Thanks, Jonathan --- gcc/testsuite/ChangeLog: 2021-03-02  Jonathan Wright   * gcc.

[PATCH] testsuite: aarch64: Add tests for v[q]mov[u]n_high intrinsics

2021-03-03 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch adds tests for v[q]mov[u]n_high Neon intrinsics. Since these intrinsics are only supported for AArch64, these tests are restricted to only run on AArch64 targets. Ok for master? Thanks, Jonathan --- gcc/testsuite/ChangeLog: 2021-03-02  Jonathan Wright   * g

[PATCH] testsuite: aarch64: Add tests for vcvt FP intrinsics

2021-03-03 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch adds tests for vcvtx* and vcvt_fXX_fXX floating-point Neon intrinsics. Since these intrinsics are only supported for AArch64, these tests are restricted to only run on AArch64 targets. Ok for master? Thanks, Jonathan --- gcc/testsuite/ChangeLog: 2021-02-18  Jonathan

[PATCH] testsuite: aarch64: Fix failing vector structure tests on big-endian

2021-08-04 Thread Jonathan Wright via Gcc-patches
: Christophe Lyon Sent: 03 August 2021 10:42 To: Jonathan Wright Cc: gcc-patches@gcc.gnu.org ; Richard Sandiford Subject: Re: [PATCH 1/8] aarch64: Use memcpy to copy vector tables in vqtbl[234] intrinsics   On Fri, Jul 23, 2021 at 10:22 AM Jonathan Wright via Gcc-patches wrote: Hi, This patch

[PATCH V2] aarch64: Don't include vec_select high-half in SIMD multiply cost

2021-08-04 Thread Jonathan Wright via Gcc-patches
): Declare. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vmul_high_cost.c: New test. From: Richard Sandiford Sent: 04 August 2021 10:05 To: Jonathan Wright via Gcc-patches Cc: Jonathan Wright Subject: Re: [PATCH] aarch64: Don't include vec_select high-half in SIMD multiply cost   Jon

[PATCH V2] aarch64: Don't include vec_select high-half in SIMD add cost

2021-08-04 Thread Jonathan Wright via Gcc-patches
Hi, V2 of this patch uses the same approach as that just implemented for the multiply high-half cost patch. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-07-28  Jonathan Wright   * config/aarch64/aa

[PATCH V2] aarch64: Don't include vec_select high-half in SIMD subtract cost

2021-08-05 Thread Jonathan Wright via Gcc-patches
Hi, V2 of this change implements the same approach as for the multiply and add-widen patches. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-07-28  Jonathan Wright   * config/aarch64/aarch64.c: Traver

[PATCH 1/4] aarch64: Use memcpy to copy structures in vst4[q]_lane intrinsics

2021-08-05 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch uses __builtin_memcpy to copy vector structures instead of using a union - or constructing a new opaque structure one vector at a time - in each of the vst4[q]_lane Neon intrinsics in arm_neon.h. It also adds new code generation tests to verify that superfluous move ins

[PATCH 2/4] aarch64: Use memcpy to copy structures in vst3[q]_lane intrinsics

2021-08-05 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch uses __builtin_memcpy to copy vector structures instead of using a union - or constructing a new opaque structure one vector at a time - in each of the vst3[q]_lane Neon intrinsics in arm_neon.h. It also adds new code generation tests to verify that superfluous move ins

[PATCH 3/4] aarch64: Use memcpy to copy structures in vst2[q]_lane intrinsics

2021-08-05 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch uses __builtin_memcpy to copy vector structures instead of using a union - or constructing a new opaque structure one vector at a time - in each of the vst2[q]_lane Neon intrinsics in arm_neon.h. It also adds new code generation tests to verify that superfluous move ins

[PATCH 4/4] aarch64: Use memcpy to copy structures in bfloat vst* intrinsics

2021-08-05 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch uses __builtin_memcpy to copy vector structures instead of using a union - or constructing a new opaque structure one vector at a time - in each of the vst[234][q] and vst1[q]_x[234] bfloat Neon intrinsics in arm_neon.h. It also adds new code generation tests to verify

Re: [PATCH] testsuite: aarch64: Fix failing vector structure tests on big-endian

2021-08-09 Thread Jonathan Wright via Gcc-patches
Hi, I've corrected the quoting and moved everything on to one line. Ok for master? Thanks, Jonathan --- gcc/testsuite/ChangeLog: 2021-08-04  Jonathan Wright   * gcc.target/aarch64/vector_structure_intrinsics.c: Restrict tests to little-endian targets. From: Richard Sandifo

[PATCH] testsuite: aarch64: Fix invalid SVE tests

2021-08-09 Thread Jonathan Wright via Gcc-patches
Hi, Some scan-assembler tests for SVE code generation were erroneously split over multiple lines - meaning they became invalid. This patch gets the tests working again by putting each test on a single line. The extract_[1234].c tests are corrected to expect that extracted 32-bit values are moved

[PATCH 1/3] aarch64: Remove macros for vld2[q]_lane Neon intrinsics

2021-08-16 Thread Jonathan Wright via Gcc-patches
Hi, This patch removes macros for vld2[q]_lane Neon intrinsics. This is a preparatory step before adding new modes for structures of Advanced SIMD vectors. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-08-12

[PATCH 2/3] aarch64: Remove macros for vld3[q]_lane Neon intrinsics

2021-08-16 Thread Jonathan Wright via Gcc-patches
Hi, This patch removes macros for vld3[q]_lane Neon intrinsics. This is a preparatory step before adding new modes for structures of Advanced SIMD vectors. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-08-16

[PATCH 3/3] aarch64: Remove macros for vld4[q]_lane Neon intrinsics

2021-08-16 Thread Jonathan Wright via Gcc-patches
Hi, This patch removes macros for vld4[q]_lane Neon intrinsics. This is a preparatory step before adding new modes for structures of Advanced SIMD vectors. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-08-16

[PATCH] aarch64: Fix float <-> int errors in vld4[q]_lane intrinsics

2021-08-18 Thread Jonathan Wright via Gcc-patches
- gcc/ChangeLog: 2021-08-18  Jonathan Wright   * config/aarch64/arm_neon.h (vld4_lane_f32): Use float RTL pattern. (vld4q_lane_f64): Use float type cast. From: Andreas Schwab Sent: 18 August 2021 13:09 To: Jonathan Wright via Gcc-patches Cc: Jonathan Wright ; Richard San

[PATCH] aarch64: Fix type qualifiers for qtbl1 and qtbx1 Neon builtins

2021-09-24 Thread Jonathan Wright via Gcc-patches
Hi, This patch fixes type qualifiers for the qtbl1 and qtbx1 Neon builtins and removes the casts from the Neon intrinsic function bodies that use these builtins. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 23-09

Re: [PATCH 4/6 V2] aarch64: Add machine modes for Neon vector-tuple types

2021-11-02 Thread Jonathan Wright via Gcc-patches
Hi, Each of the comments on the previous version of the patch have been addressed. Ok for master? Thanks, Jonathan From: Richard Sandiford Sent: 22 October 2021 16:13 To: Jonathan Wright Cc: gcc-patches@gcc.gnu.org ; Kyrylo Tkachov Subject: Re: [PATCH 4/6] aarch64: Add machine modes for Ne

[PATCH] aarch64: Use type-qualified builtins for unsigned MLA/MLS intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, This patch declares type-qualified builtins and uses them for MLA/MLS Neon intrinsics that operate on unsigned types. This eliminates lots of casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog:

[PATCH] aarch64: Use type-qualified builtins for PMUL[L] Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, This patch declares poly type-qualified builtins and uses them for PMUL[L] Neon intrinsics. This removes the need for casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-08  Jonathan Wri

[PATCH] aarch64: Use type-qualified builtins for XTN[2] Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, This patch declares unsigned type-qualified builtins and uses them for XTN[2] Neon intrinsics. This removes the need for casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-08  Jonathan

[PATCH] aarch64: Use type-qualified builtins for [R]SHRN[2] Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, Thus patch declares unsigned type-qualified builtins and uses them for [R]SHRN[2] Neon intrinsics. This removes the need for casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-08  Jonat

[PATCH] aarch64: Use type-qualified builtins for UADD[LW][2] Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, This patch declares unsigned type-qualified builtins and uses them to implement widening-add Neon intrinsics. This removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2

[PATCH] aarch64: Use type-qualified builtins for USUB[LW][2] Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, This patch declares unsigned type-qualified builtins and uses them to implement widening-subtract Neon intrinsics. This removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLo

[PATCH] aarch64: Use type-qualified builtins for U[R]HADD Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, This patch declares unsigned type-qualified builtins and uses them to implement (rounding) halving-add Neon intrinsics. This removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/Cha

[PATCH] aarch64: Use type-qualified builtins for UHSUB Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, This patch declares unsigned type-qualified builtins and uses them to implement halving-subtract Neon intrinsics. This removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog

[PATCH] aarch64: Use type-qualified builtins for [R]ADDHN[2] Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, This patch declares unsigned type-qualified builtins and uses them to implement (rounding) halving-narrowing-add Neon intrinsics. This removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --

[PATCH] aarch64: Use type-qualified builtins for [R]SUBHN[2] Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, This patch declares unsigned type-qualified builtins and uses them to implement (rounding) halving-narrowing-subtract Neon intrinsics. This removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonatha

[PATCH] aarch64: Use type-qualified builtins for ADDP Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, This patch declares unsigned type-qualified builtins and uses them to implement the pairwise addition Neon intrinsics. This removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/Chan

[PATCH] aarch64: Use type-qualified builtins for ADDV Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, This patch declares unsigned type-qualified builtins and uses them to implement the vector reduction Neon intrinsics. This removes the need for many casts in arm_neon.h. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/Chang

[PATCH] aarch64: Use type-qualified builtins for LD1/ST1 Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, This patch declares unsigned and polynomial type-qualified builtins and uses them to implement the LD1/ST1 Neon intrinsics. This removes the need for many casts in arm_neon.h. The new type-qualified builtins are also lowered to gimple - as the unqualified builtins are already. Regression tes

[PATCH] aarch64: Use type-qualified builtins for vcombine_* Neon intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, This patch declares unsigned and polynomial type-qualified builtins for vcombine_* Neon intrinsics. Using these builtins removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeL

[PATCH] aarch64: Use type-qualified builtins for vget_low/high intrinsics

2021-11-11 Thread Jonathan Wright via Gcc-patches
Hi, This patch declares unsigned and polynomial type-qualified builtins for vget_low_*/vget_high_* Neon intrinsics. Using these builtins removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan ---

[PATCH] aarch64: Use an expander for quad-word vec_pack_trunc pattern

2021-05-19 Thread Jonathan Wright via Gcc-patches
Hi, The existing vec_pack_trunc RTL pattern emits an opaque two- instruction assembly code sequence that prevents proper instruction scheduling. This commit changes the pattern to an expander that emits individual xtn and xtn2 instructions. This commit also consolidates the duplicate truncation p

[PATCH] aarch64: Use correct type attributes for RTL generating XTN(2)

2021-05-19 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch corrects the type attribute in RTL patterns that generate XTN/XTN2 instructions to be "neon_move_narrow_q". This makes a material difference because these instructions can be executed on both SIMD pipes in the Cortex-A57 core model, whereas the "neon_shift_imm_narrow_q"

[PATCH] aarch64: Fix pointer parameter type in LD1 Neon intrinsics

2021-10-14 Thread Jonathan Wright via Gcc-patches
The pointer parameter to load a vector of signed values should itself be a signed type. This patch fixes two instances of this unsigned- signed implicit conversion in arm_neon.h. Tested relevant intrinsics with -Wpointer-sign and warnings no longer present. Ok for master? Thanks, Jonathan ---

[PATCH] aarch64: Remove redundant struct type definitions in arm_neon.h

2021-10-21 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch deletes some redundant type definitions in arm_neon.h. These vector type definitions are an artifact from the initial commit that added the AArch64 port. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gc

[PATCH 1/6] aarch64: Move Neon vector-tuple type declaration into the compiler

2021-10-22 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch declares the Neon vector-tuple types inside the compiler instead of in the arm_neon.h header. This is a necessary first step before adding corresponding machine modes to the AArch64 backend. The vector-tuple types are implemented using a #pragma. This means initializati

[PATCH 2/6] gcc/expr.c: Remove historic workaround for broken SIMD subreg

2021-10-22 Thread Jonathan Wright via Gcc-patches
Hi, A long time ago, using a parallel to take a subreg of a SIMD register was broken. This temporary fix[1] (from 2003) spilled these registers to memory and reloaded the appropriate part to obtain the subreg. The fix initially existed for the benefit of the PowerPC E500 - a platform for which GC

[PATCH 3/6] gcc/expmed.c: Ensure vector modes are tieable before extraction

2021-10-22 Thread Jonathan Wright via Gcc-patches
Hi, Extracting a bitfield from a vector can be achieved by casting the vector to a new type whose elements are the same size as the desired bitfield, before generating a subreg. However, this is only an optimization if the original vector can be accessed in the new machine mode without first being

[PATCH 5/6] gcc/lower_subreg.c: Prevent decomposition if modes are not tieable

2021-10-22 Thread Jonathan Wright via Gcc-patches
Hi, Preventing decomposition if modes are not tieable is necessary to stop AArch64 partial Neon structure modes being treated as packed in registers. This is a necessary prerequisite for a future AArch64 PCS change to maintain good code generation. Bootstrapped and regression tested on: * x86_64

[PATCH 6/6] aarch64: Pass and return Neon vector-tuple types without a parallel

2021-10-22 Thread Jonathan Wright via Gcc-patches
Hi, Neon vector-tuple types can be passed in registers on function call and return - there is no need to generate a parallel rtx. This patch adds cases to detect vector-tuple modes and generates an appropriate register rtx. This change greatly improves code generated when passing Neon vector- tup

[PATCH 4/6] aarch64: Add machine modes for Neon vector-tuple types

2021-10-22 Thread Jonathan Wright via Gcc-patches
Hi, Until now, GCC has used large integer machine modes (OI, CI and XI) to model Neon vector-tuple types. This is suboptimal for many reasons, the most notable are:  1) Large integer modes are opaque and modifying one vector in the     tuple requires a lot of inefficient set/get gymnastics. The  

[PATCH V2] gcc: Add vec_select -> subreg RTL simplification

2021-07-07 Thread Jonathan Wright via Gcc-patches
Hi, Version 2 of this patch adds more code generation tests to show the benefit of this RTL simplification as well as adding a new helper function 'rtx_vec_series_p' to reduce code duplication. Patch tested as version 1 - ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-06-08  Jonathan

[PATCH] aarch64: Use unions for vector tables in vqtbl[234] intrinsics

2021-07-09 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch uses a union instead of constructing a new opaque vector structure for each of the vqtbl[234] Neon intrinsics in arm_neon.h. This simplifies the header file and also improves code generation - superfluous move instructions were emitted for every register extraction/set i

testsuite: aarch64: Fix failing SVE tests on big endian

2021-07-15 Thread Jonathan Wright via Gcc-patches
Hi, A recent change "gcc: Add vec_select -> subreg RTL simplification" updated the expected test results for SVE extraction tests. The new result should only have been changed for little endian. This patch restores the old expected result for big endian. Ok for master? Thanks, Jonathan --- gcc

Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification

2021-07-15 Thread Jonathan Wright via Gcc-patches
Ah, yes - those test results should have only been changed for little endian. I've submitted a patch to the list restoring the original expected results for big endian. Thanks, Jonathan From: Christophe Lyon Sent: 15 July 2021 10:09 To: Richard Sandiford ; Jonat

[PATCH] aarch64: Refactor TBL/TBX RTL patterns

2021-07-19 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch renames the two-source-register TBL/TBX RTL patterns so that their names better reflect what they do, rather than confusing them with tbl3 or tbx4 patterns. Also use the correct "neon_tbl2" type attribute for both patterns. Rename single-source-register TBL/TBX patterns

[PATCH] aarch64: Don't include vec_select in SIMD multiply cost

2021-07-20 Thread Jonathan Wright via Gcc-patches
Hi, The Neon multiply/multiply-accumulate/multiply-subtract instructions can take various forms - multiplying full vector registers of values or multiplying one vector by a single element of another. Regardless of the form used, these instructions have the same cost, and this should be reflected b

[PATCH] simplify-rtx: Push sign/zero-extension inside vec_duplicate

2021-07-20 Thread Jonathan Wright via Gcc-patches
Hi, As a general principle, vec_duplicate should be as close to the root of an expression as possible. Where unary operations have vec_duplicate as an argument, these operations should be pushed inside the vec_duplicate. This patch modifies unary operation simplification to push sign/zero-extensi

[PATCH 1/8] aarch64: Use memcpy to copy vector tables in vqtbl[234] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi, This patch uses __builtin_memcpy to copy vector structures instead of building a new opaque structure one vector at a time in each of the vqtbl[234] Neon intrinsics in arm_neon.h. This simplifies the header file and also improves code generation - superfluous move instructions were emitted for

[PATCH 2/8] aarch64: Use memcpy to copy vector tables in vqtbx[234] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi, This patch uses __builtin_memcpy to copy vector structures instead of building a new opaque structure one vector at a time in each of the vqtbx[234] Neon intrinsics in arm_neon.h. This simplifies the header file and also improves code generation - superfluous move instructions were emitted for

[PATCH 3/8] aarch64: Use memcpy to copy vector tables in vtbl[34] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi, This patch uses __builtin_memcpy to copy vector structures instead of building a new opaque structure one vector at a time in each of the vtbl[34] Neon intrinsics in arm_neon.h. This simplifies the header file and also improves code generation - superfluous move instructions were emitted for e

[PATCH 4/8] aarch64: Use memcpy to copy vector tables in vtbx4 intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi, This patch uses __builtin_memcpy to copy vector structures instead of building a new opaque structure one vector at a time in each of the vtbx4 Neon intrinsics in arm_neon.h. This simplifies the header file and also improves code generation - superfluous move instructions were emitted for ever

[PATCH 5/8] aarch64: Use memcpy to copy vector tables in vst4[q] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi, This patch uses __builtin_memcpy to copy vector structures instead of building a new opaque structure one vector at a time in each of the vst4[q] Neon intrinsics in arm_neon.h. This simplifies the header file and also improves code generation - superfluous move instructions were emitted for ev

[PATCH 6/8] aarch64: Use memcpy to copy vector tables in vst3[q] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi, This patch uses __builtin_memcpy to copy vector structures instead of building a new opaque structure one vector at a time in each of the vst3[q] Neon intrinsics in arm_neon.h. This simplifies the header file and also improves code generation - superfluous move instructions were emitted for ev

Re: [PATCH 3/8] aarch64: Use memcpy to copy vector tables in vtbl[34] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
I haven't added test cases here because these intrinsics don't map to a single instruction (they're legacy from Armv7) and would trip the "scan-assembler not mov" that we're using for the other tests. Jonathan From: Richard Sandiford Sent: 23 July 2021 10:29 To: K

[PATCH 7/8] aarch64: Use memcpy to copy vector tables in vst2[q] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi, This patch uses __builtin_memcpy to copy vector structures instead of building a new opaque structure one vector at a time in each of the vst2[q] Neon intrinsics in arm_neon.h. This simplifies the header file and also improves code generation - superfluous move instructions were emitted for ev

[PATCH 8/8] aarch64: Use memcpy to copy vector tables in vst1[q]_x4 intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi, This patch uses __builtin_memcpy to copy vector structures instead of using a union in each of the vst1[q]_x4 Neon intrinsics in arm_neon.h. Add new code generation tests to verify that superfluous move instructions are not generated for the vst1q_x4 intrinsics. Regression tested and bootstr

Re: [PATCH 4/8] aarch64: Use memcpy to copy vector tables in vtbx4 intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Same explanation as for patch 3/8: I haven't added test cases here because these intrinsics don't map to a single instruction (they're legacy from Armv7) and would trip the "scan-assembler not mov" that we're using for the other tests. Thanks, Jonathan From: Richa

[PATCH] aarch64: Use memcpy to copy vector tables in vst1[q]_x3 intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi, This patch uses __builtin_memcpy to copy vector structures instead of building a new opaque structure one vector at a time in each of the vst1[q]_x3 Neon intrinsics in arm_neon.h. This simplifies the header file and also improves code generation - superfluous move instructions were emitted for

[PATCH] aarch64: Use memcpy to copy vector tables in vst1[q]_x2 intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi, This patch uses __builtin_memcpy to copy vector structures instead of building a new opaque structure one vector at a time in each of the vst1[q]_x2 Neon intrinsics in arm_neon.h. This simplifies the header file and also improves code generation - superfluous move instructions were emitted for

Re: [PATCH V2] simplify-rtx: Push sign/zero-extension inside vec_duplicate

2021-07-26 Thread Jonathan Wright via Gcc-patches
Hi, This updated patch fixes the two-operators-per-row style issue in the aarch64-simd.md RTL patterns as well as integrating the simplify-rtx.c change as suggested. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog:

Re: [PATCH V2] aarch64: Don't include vec_select in SIMD multiply cost

2021-07-28 Thread Jonathan Wright via Gcc-patches
Hi, V2 of the patch addresses the initial review comments, factors out common code (as we discussed off-list) and adds a set of unit tests to verify the code generation benefit. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/C

[PATCH] aarch64: Don't include vec_select high-half in SIMD multiply cost

2021-07-28 Thread Jonathan Wright via Gcc-patches
Hi, The Neon multiply/multiply-accumulate/multiply-subtract instructions can select the top or bottom half of the operand registers. This selection does not change the cost of the underlying instruction and this should be reflected by the RTL cost function. This patch adds RTL tree traversal in t

[PATCH] aarch64: Don't include vec_select high-half in SIMD add cost

2021-07-29 Thread Jonathan Wright via Gcc-patches
Hi, The Neon add-long/add-widen instructions can select the top or bottom half of the operand registers. This selection does not change the cost of the underlying instruction and this should be reflected by the RTL cost function. This patch adds RTL tree traversal in the Neon add cost function to

[PATCH] aarch64: Don't include vec_select high-half in SIMD subtract cost

2021-07-29 Thread Jonathan Wright via Gcc-patches
Hi, The Neon subtract-long/subract-widen instructions can select the top or bottom half of the operand registers. This selection does not change the cost of the underlying instruction and this should be reflected by the RTL cost function. This patch adds RTL tree traversal in the Neon subtract co

[PATCH 1/20] aarch64: Use RTL builtin for vmull[_high]_p8 intrinsics

2021-04-28 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites the vmull[_high]_p8 Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu and aarch64_be-none-elf - no issues. Ok for master? Thanks,

[PATCH 2/20] aarch64: Use RTL builtin for vq[r]dmulh[q]_n intrinsics

2021-04-28 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites the vq[r]dmulh[q]_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/Chan

[PATCH 3/20] aarch64: Use RTL builtins for vpaddq intrinsics

2021-04-28 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites the vpaddq Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2

[PATCH 4/20] aarch64: Use RTL builtins for [su]paddl[q] intrinsics

2021-04-28 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites the [su]paddl[q] Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeL

[PATCH 5/20] aarch64: Use RTL builtins for vpadal_[su]32 intrinsics

2021-04-28 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites the vpadal_[su]32 Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/Change

[PATCH 6/20] aarch64: Use RTL builtins for polynomial vsli[q]_n intrinsics

2021-04-28 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites the vsli[q]_n_p* Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeL

[PATCH 7/20] aarch64: Use RTL builtins for polynomial vsri[q]_n intrinsics

2021-04-28 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites the vsri[q]_n_p* Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeL

[PATCH 8/20] aarch64: Use RTL builtins for v[q]tbl intrinsics

2021-04-28 Thread Jonathan Wright via Gcc-patches
Hi, As subject, this patch rewrites the v[q]tbl Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog:

  1   2   >