[PATCH 0/6] aarch64: Add support for __arm_rsr and __arm_wsr ACLE function family

2023-10-03 Thread Victor Do Nascimento
/acle/main/acle.html Victor Do Nascimento (6): aarch64: Sync system register information with Binutils aarch64: Add support for aarch64-sys-regs.def aarch64: Implement system register validation tools aarch64: Add basic target_print_operand support for CONST_STRING aarch64: Implement sys

[PATCH 6/6] aarch64: Add front-end argument type checking for target builtins

2023-10-03 Thread Victor Do Nascimento
In implementing the ACLE read/write system register builtins it was observed that leaving argument type checking to be done at expand-time meant that poorly-formed function calls were being "fixed" by certain optimization passes, meaning bad code wasn't being properly picked up in checking. Exampl

[PATCH 4/6] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-03 Thread Victor Do Nascimento
Motivated by the need to print system register names in output assembly, this patch adds the required logic to `aarch64_print_operand' to accept rtxs of type CONST_STRING and process these accordingly. Consequently, an rtx such as: (set (reg/i:DI 0 x0) (unspec:DI [(const_string ("amcgc

[PATCH 2/6] aarch64: Add support for aarch64-sys-regs.def

2023-10-03 Thread Victor Do Nascimento
This patch defines the structure of a new .def file used for representing the aarch64 system registers, what information it should hold and the basic framework in GCC to process this file. Entries in the aarch64-system-regs.def file should be as follows: SYSREG (NAME, CPENC (sn,op1,cn,cm,op2),

[PATCH 5/6] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-03 Thread Victor Do Nascimento
Implement the aarch64 intrinsics for reading and writing system registers with the following signatures: uint32_t __arm_rsr(const char *special_register); uint64_t __arm_rsr64(const char *special_register); void* __arm_rsrp(const char *special_register); float __arm

[PATCH 1/6] aarch64: Sync system register information with Binutils

2023-10-03 Thread Victor Do Nascimento
This patch adds the `aarch64-sys-regs.def' file to GCC, teaching the compiler about system registers known to the assembler and how these can be used. The macros used to hold system register information reflect those in use by binutils, a design choice made to facilitate the sharing of data betwee

[PATCH 3/6] aarch64: Implement system register validation tools

2023-10-03 Thread Victor Do Nascimento
Given the implementation of a mechanism of encoding system registers into GCC, this patch provides the mechanism of validating their use by the compiler. In particular, this involves: 1. Ensuring a supplied string corresponds to a known system register name. System registers can be access

Re: [PATCH 4/6] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-05 Thread Victor Do Nascimento
On 10/5/23 13:26, Richard Earnshaw wrote: On 03/10/2023 16:18, Victor Do Nascimento wrote: Motivated by the need to print system register names in output assembly, this patch adds the required logic to `aarch64_print_operand' to accept rtxs of type CONST_STRING and process these accord

Re: [PATCH 1/6] aarch64: Sync system register information with Binutils

2023-10-05 Thread Victor Do Nascimento
On 10/5/23 12:42, Richard Earnshaw wrote: On 03/10/2023 16:18, Victor Do Nascimento wrote: This patch adds the `aarch64-sys-regs.def' file to GCC, teaching the compiler about system registers known to the assembler and how these can be used. The macros used to hold system reg

Re: [PATCH 1/6] aarch64: Sync system register information with Binutils

2023-10-09 Thread Victor Do Nascimento
On 10/9/23 01:02, Ramana Radhakrishnan wrote: On 5 Oct 2023, at 14:04, Victor Do Nascimento wrote: External email: Use caution opening links or attachments On 10/5/23 12:42, Richard Earnshaw wrote: On 03/10/2023 16:18, Victor Do Nascimento wrote: This patch adds the `aarch64-sys

Re: [PATCH 6/6] aarch64: Add front-end argument type checking for target builtins

2023-10-09 Thread Victor Do Nascimento
On 10/7/23 12:53, Richard Sandiford wrote: Richard Earnshaw writes: On 03/10/2023 16:18, Victor Do Nascimento wrote: In implementing the ACLE read/write system register builtins it was observed that leaving argument type checking to be done at expand-time meant that poorly-formed function

[PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-18 Thread Victor Do Nascimento
Implement the aarch64 intrinsics for reading and writing system registers with the following signatures: uint32_t __arm_rsr(const char *special_register); uint64_t __arm_rsr64(const char *special_register); void* __arm_rsrp(const char *special_register); float __arm

[PATCH V2 0/7] aarch64: Add support for __arm_rsr and __arm_wsr ACLE function family

2023-10-18 Thread Victor Do Nascimento
old = __arm_rsr("trcseqstr"); __arm_wsr("trcseqstr", new); Testing: - Bootstrap/regtest on aarch64-linux-gnu done. [1] https://arm-software.github.io/acle/main/acle.html Victor Do Nascimento (7): aarch64: Sync system register information with Binutils aarch64: A

[PATCH V2 4/7] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-18 Thread Victor Do Nascimento
Motivated by the need to print system register names in output assembly, this patch adds the required logic to `aarch64_print_operand' to accept rtxs of type CONST_STRING and process these accordingly. Consequently, an rtx such as: (set (reg/i:DI 0 x0) (unspec:DI [(const_string ("s3_3_

[PATCH V2 1/7] aarch64: Sync system register information with Binutils

2023-10-18 Thread Victor Do Nascimento
This patch adds the `aarch64-sys-regs.def' file, originally written for Binutils, to GCC. In so doing, it provides GCC with the necessary information for teaching the compiler about system registers known to the assembler and how these can be used. By aligning the representation of data common to

[PATCH V2 2/7] aarch64: Add support for aarch64-sys-regs.def

2023-10-18 Thread Victor Do Nascimento
This patch defines the structure of a new .def file used for representing the aarch64 system registers, what information it should hold and the basic framework in GCC to process this file. Entries in the aarch64-system-regs.def file should be as follows: SYSREG (NAME, CPENC (sn,op1,cn,cm,op2),

[PATCH V2 7/7] aarch64: Add system register duplication check selftest

2023-10-18 Thread Victor Do Nascimento
Add a build-time test to check whether system register data, as imported from `aarch64-sys-reg.def' has any duplicate entries. Duplicate entries are defined as any two SYSREG entries in the .def file which share the same encoding values (as specified by its `CPENC' field) and where the relationshi

[PATCH V2 6/7] aarch64: Add front-end argument type checking for target builtins

2023-10-18 Thread Victor Do Nascimento
In implementing the ACLE read/write system register builtins it was observed that leaving argument type checking to be done at expand-time meant that poorly-formed function calls were being "fixed" by certain optimization passes, meaning bad code wasn't being properly picked up in checking. Exampl

[PATCH V2 3/7] aarch64: Implement system register validation tools

2023-10-18 Thread Victor Do Nascimento
Given the implementation of a mechanism of encoding system registers into GCC, this patch provides the mechanism of validating their use by the compiler. In particular, this involves: 1. Ensuring a supplied string corresponds to a known system register name. System registers can be access

[PATCH] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-26 Thread Victor Do Nascimento
Motivated by the need to print system register names in output assembly, this patch adds the required logic to `aarch64_print_operand' to accept rtxs of type CONST_STRING and process these accordingly. Consequently, an rtx such as: (set (reg/i:DI 0 x0) (unspec:DI [(const_string ("s3_3_

Re: [PATCH V2 2/7] aarch64: Add support for aarch64-sys-regs.def

2023-10-26 Thread Victor Do Nascimento
On 10/18/23 22:07, Richard Sandiford wrote: Victor Do Nascimento writes: This patch defines the structure of a new .def file used for representing the aarch64 system registers, what information it should hold and the basic framework in GCC to process this file. Entries in the aarch64

Re: [PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-26 Thread Victor Do Nascimento
On 10/18/23 21:39, Richard Sandiford wrote: Victor Do Nascimento writes: Implement the aarch64 intrinsics for reading and writing system registers with the following signatures: uint32_t __arm_rsr(const char *special_register); uint64_t __arm_rsr64(const char

Re: [PATCH V2 7/7] aarch64: Add system register duplication check selftest

2023-10-26 Thread Victor Do Nascimento
On 10/18/23 22:30, Richard Sandiford wrote: Victor Do Nascimento writes: Add a build-time test to check whether system register data, as imported from `aarch64-sys-reg.def' has any duplicate entries. Duplicate entries are defined as any two SYSREG entries in the .def file which shar

Re: [PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-26 Thread Victor Do Nascimento
On 10/26/23 16:23, Richard Sandiford wrote: Victor Do Nascimento writes: On 10/18/23 21:39, Richard Sandiford wrote: Victor Do Nascimento writes: Implement the aarch64 intrinsics for reading and writing system registers with the following signatures: uint32_t __arm_rsr(const

[PATCH] aarch64: Implement ACLE instruction/data prefetch functions.

2023-10-27 Thread Victor Do Nascimento
Implement the ACLE data and instruction prefetch functions[1] with the following signatures: 1. Data prefetch intrinsics: void __pldx (/*constant*/ unsigned int /*access_kind*/, /*constant*/ unsigned int /*cache_level*/, /*constant*/

Re: [PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-27 Thread Victor Do Nascimento
On 10/27/23 14:18, Alex Coplan wrote: On 26/10/2023 16:23, Richard Sandiford wrote: Victor Do Nascimento writes: On 10/18/23 21:39, Richard Sandiford wrote: Victor Do Nascimento writes: Implement the aarch64 intrinsics for reading and writing system registers with the following

[PATCH V2] aarch64: Implement the ACLE instruction/data prefetch functions.

2023-10-30 Thread Victor Do Nascimento
Correct CV-qualification from being erroeously applied to the `addr' pointer, applying it instead to its pointer target, as specified by the ACLE standards. --- Implement the ACLE data and instruction prefetch functions[1] with the following signatures: 1. Data prefetch intrinsics: -

[PATCH V3 2/6] aarch64: Add support for aarch64-sys-regs.def

2023-11-02 Thread Victor Do Nascimento
This patch defines the structure of a new .def file used for representing the aarch64 system registers, what information it should hold and the basic framework in GCC to process this file. Entries in the aarch64-system-regs.def file should be as follows: SYSREG (NAME, CPENC (sn,op1,cn,cm,op2),

[PATCH V3 1/6] aarch64: Sync system register information with Binutils

2023-11-02 Thread Victor Do Nascimento
This patch adds the `aarch64-sys-regs.def' file, originally written for Binutils, to GCC. In so doing, it provides GCC with the necessary information for teaching the compiler about system registers known to the assembler and how these can be used. By aligning the representation of data common to

[PATCH V3 3/6] aarch64: Implement system register validation tools

2023-11-02 Thread Victor Do Nascimento
Given the implementation of a mechanism of encoding system registers into GCC, this patch provides the mechanism of validating their use by the compiler. In particular, this involves: 1. Ensuring a supplied string corresponds to a known system register name. System registers can be access

[PATCH V3 0/6] aarch64: Add support for __arm_rsr and __arm_wsr ACLE function family

2023-11-02 Thread Victor Do Nascimento
__arm_wsr("trcseqstr", new); Testing: - Bootstrap/regtest on aarch64-linux-gnu done. [1] https://arm-software.github.io/acle/main/acle.html Victor Do Nascimento (6): aarch64: Sync system register information with Binutils aarch64: Add support for aarch64-sys-regs.def aarch64: Imple

[PATCH V3 6/6] aarch64: Add system register duplication check selftest

2023-11-02 Thread Victor Do Nascimento
Add a build-time test to check whether system register data, as imported from `aarch64-sys-reg.def' has any duplicate entries. Duplicate entries are defined as any two SYSREG entries in the .def file which share the same encoding values (as specified by its `CPENC' field) and where the relationshi

[PATCH V3 4/6] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-11-02 Thread Victor Do Nascimento
Implement the aarch64 intrinsics for reading and writing system registers with the following signatures: uint32_t __arm_rsr(const char *special_register); uint64_t __arm_rsr64(const char *special_register); void* __arm_rsrp(const char *special_register); float __arm

[PATCH V3 5/6] aarch64: Add front-end argument type checking for target builtins

2023-11-02 Thread Victor Do Nascimento
In implementing the ACLE read/write system register builtins it was observed that leaving argument type checking to be done at expand-time meant that poorly-formed function calls were being "fixed" by certain optimization passes, meaning bad code wasn't being properly picked up in checking. Exampl

[PATCH] aarch64: Add +lse128 architectural extension command-line flag

2024-03-15 Thread Victor Do Nascimento
Given how, at present, the choice of using LSE128 atomic instructions by the toolchain is delegated to run-time selection in the form of Libatomic ifuncs, responsible for querying target support, the `+lse128' target architecture compile-time flag is absent from GCC. This, however, contrasts with

[PATCH] aarch64: Align lrcpc3 FEAT_STRING with /proc/cpuinfo 'Features' entry

2024-03-25 Thread Victor Do Nascimento
Due to the Linux kernel exposing the lrcpc3 architectural feature as "lrcpc3", this patch corrects the relevant FEATURE_STRING entry in the "rcpc3" AARCH64_OPT_FMV_EXTENSION macro, such that the feature can be correctly detected when doing native compilation on rcpc3-enabled targets. Regtested on

Re: [PATCH] aarch64: Add +lse128 architectural extension command-line flag

2024-03-27 Thread Victor Do Nascimento
On 3/26/24 12:26, Richard Sandiford wrote: Victor Do Nascimento writes: Given how, at present, the choice of using LSE128 atomic instructions by the toolchain is delegated to run-time selection in the form of Libatomic ifuncs, responsible for querying target support, the `+lse128' t

[PATCH] AArch64: Update system register database.

2024-02-06 Thread Victor Do Nascimento
With the release of Binutils 2.42, this brings the level of system-register support in GCC in line with the current state-of-the-art in Binutils, ensuring everything available in Binutils is plainly accessible from GCC. Where Binutils uses a more detailed description of which features are responsi

Re: [libatomic PATCH] PR other/113336: Fix libatomic testsuite regressions on ARM.

2024-02-14 Thread Victor Do Nascimento
arm-linux-gnueabihf with --with-arch=armv6 with make bootstrap and make -k check where it fixes all of the FAILs in libatomic. Ok for mainline? 2024-01-28 Roger Sayle Victor Do Nascimento libatomic/ChangeLog PR other/113336 * Makefile.am: Build tas

[PATCH v4 0/4] Libatomic: Add LSE128 atomics support for AArch64

2024-01-24 Thread Victor Do Nascimento
upport is present. Regression tested on aarch64-linux-gnu target with LSE128-support. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html Victor Do Nascimento (4): libatomic: atomic_16.S: Improve ENTRY, END an

[PATCH v4 2/4] libatomic: Add support for __ifunc_arg_t arg in ifunc resolver

2024-01-24 Thread Victor Do Nascimento
With support for new atomic features in Armv9.4-a being indicated by HWCAP2 bits, Libatomic's ifunc resolver must now query its second argument, of type __ifunc_arg_t*. We therefore make this argument known to libatomic, allowing us to query hwcap2 bits in the following manner: bool resolver

[PATCH v4 3/4] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-24 Thread Victor Do Nascimento
The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * LDSETP - Atomic OR (bitset) of

[PATCH v4 4/4] aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

2024-01-24 Thread Victor Do Nascimento
At present, Evaluation of both `has_lse2(hwcap)' and `has_lse128(hwcap)' may require issuing an `mrs' instruction to query a system register. This instruction, when issued from user-space results in a trap by the kernel which then returns the value read in by the system register. Given the undesi

[PATCH v4 1/4] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-24 Thread Victor Do Nascimento
The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch feature and makes the code harder to maintain when new ifuncs are added and their su

[PATCH v2 0/2] libatomic: AArch64 rcpc3 128-bit atomic operation enablement

2024-01-24 Thread Victor Do Nascimento
/gcc-patches/2024-January/643841.html Victor Do Nascimento (2): libatomic: Increase max IFUNC_NCOND(N) from 3 to 4. libatomic: Add rcpc3 128-bit atomic operations for AArch64 libatomic/Makefile.am| 6 +- libatomic/Makefile.in| 22

[PATCH v2 1/2] libatomic: Increase max IFUNC_NCOND(N) from 3 to 4.

2024-01-24 Thread Victor Do Nascimento
libatomic/ChangeLog: * libatomic_i.h: Add GEN_SELECTOR implementation for IFUNC_NCOND(N) == 4. --- libatomic/libatomic_i.h | 18 ++ 1 file changed, 18 insertions(+) diff --git a/libatomic/libatomic_i.h b/libatomic/libatomic_i.h index 861a22da152..0a854fd908c 100644

[PATCH v2 2/2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-01-24 Thread Victor Do Nascimento
The introduction of the optional RCPC3 architectural extension for Armv8.2-A upwards provides additional support for the release consistency model, introducing the Load-Acquire RCpc Pair Ordered, and Store-Release Pair Ordered operations in the form of LDIAPP and STILP. These operations are single

Re: [libatomic PATCH] Fix testsuite regressions on ARM [raspberry pi].

2024-01-25 Thread Victor Do Nascimento
On 1/11/24 15:55, Roger Sayle wrote: Hi Richard, As you've recommended, this issue has now been filed in bugzilla as PR other/113336. As explained in the new PR, libatomic's testsuite used to pass on armv6 (raspberry pi) in previous GCC releases, but the code was incorrect/non-synchronous; t

Re: [PATCH v2 2/2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-01-26 Thread Victor Do Nascimento
On 1/26/24 10:53, Richard Sandiford wrote: > Victor Do Nascimento writes: >> @@ -712,6 +760,27 @@ ENTRY (libat_test_and_set_16) >> END (libat_test_and_set_16) >> >> >> +/* Alias all LSE128_LRCPC3 ifuncs to their specific implementations, >> + that

[PATCH v3 0/3] Libatomic: Add LSE128 atomics support for AArch64

2024-01-02 Thread Victor Do Nascimento
org/pipermail/gcc-patches/2023-June/620529.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html Victor Do Nascimento (3): libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface libatomic: Enable LSE128 128-bit atomics for armv9.4-a aarch64: Add ex

[PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-02 Thread Victor Do Nascimento
The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * LDSETP - Atomic OR (bitset) of

[PATCH v3 1/3] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-02 Thread Victor Do Nascimento
The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch feature and makes the code harder to maintain when new ifuncs are added and their su

[PATCH v3 3/3] aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

2024-01-02 Thread Victor Do Nascimento
At present, Evaluation of both `has_lse2(hwcap)' and `has_lse128(hwcap)' may require issuing an `mrs' instruction to query a system register. This instruction, when issued from user-space results in a trap by the kernel which then returns the value read in by the system register. Given the undesi

Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-08 Thread Victor Do Nascimento
On 1/5/24 11:47, Richard Sandiford wrote: Victor Do Nascimento writes: The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with

Re: [PATCH v3 1/3] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-08 Thread Victor Do Nascimento
On 1/5/24 11:10, Richard Sandiford wrote: Victor Do Nascimento writes: The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which

[PATCH v3] aarch64: Implement the ACLE instruction/data prefetch functions.

2023-12-05 Thread Victor Do Nascimento
Key changes in v3: * Implement the `require_const_argument' function to ensure the nth argument in EXP represents a const-type argument in the valid range given by [minval, maxval), forgoing expansion altogether when an invalid argument is detected early on. * Whereas in the previous iter

[PATCH] aarch64: arm_neon.h - Fix -Wincompatible-pointer-types errors

2023-12-09 Thread Victor Do Nascimento
In the Linux kernel, u64/s64 are [un]signed long long, not [un]signed long. This means that when the `arm_neon.h' header is used by the kernel, any use of the `uint64_t' / `in64_t' types needs to be correctly cast to the correct `__builtin_aarch64_simd_di' / `__builtin_aarch64_simd_df' types when

[PATCH 1/2] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2023-11-07 Thread Victor Do Nascimento
The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch feature and makes the code harder to maintain when new ifuncs are added and their su

[PATCH 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2023-11-07 Thread Victor Do Nascimento
The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * LDSETP - Atomic OR (bitset) of

[PATCH 0/2] Libatomic: Add LSE128 atomics support for AArch64

2023-11-07 Thread Victor Do Nascimento
e whenever architectural support is present. Regression tested on aarch64-linux-gnu target with LSE128-support. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html Victor Do Nascimento (2): libatomic: atomic_16.S: Imp

[PATCH 2/5] aarch64: Add support for GCS system registers with the +gcs modifier

2023-11-07 Thread Victor Do Nascimento
Given the introduction of system registers associated with the Guarded Control Stack extension to Armv9.4-a in Binutils and their reliance on the `+gcs' modifier, we implement the necessary changes in GCC to allow for them to be recognized by the compiler. gcc/ChangeLog: * config/aarch64/

[PATCH 3/5] aarch64: Sync `aarch64-sys-regs.def' with Binutils.

2023-11-07 Thread Victor Do Nascimento
This patch updates `aarch64-sys-regs.def', bringing it into sync with the Binutils source. gcc/ChangeLog: * config/aarch64/aarch64-sys-regs.def (par_el1): New. (rcwmask_el1): Likewise. (rcwsmask_el1): Likewise. (ttbr0_el1): Likewise. (ttbr0_el12): Likewise.

[PATCH 1/5] aarch64: Add march flags for +the and +d128 arch extensions

2023-11-07 Thread Victor Do Nascimento
Given the introduction of optional 128-bit page table descriptor and translation hardening extension support with the Arm9.4-a architecture, this introduces the relevant flags to enable the reading and writing of 128-bit system registers. The `+d128' -march modifier enables the use of the followin

[PATCH 0/5] aarch64: Add Armv9.4-a 128-bit system-register read/write support

2023-11-07 Thread Victor Do Nascimento
also present in the `aarch64-sys-regs.def' system register database. Victor Do Nascimento (5): aarch64: Add march flags for +the and +d128 arch extensions aarch64: Add support for GCS system registers with the +gcs modifier aarch64: Sync `aarch64-sys-regs.def' with Binutils. aar

[PATCH 5/5] aarch64: Add rsr128 and wsr128 ACLE tests

2023-11-07 Thread Victor Do Nascimento
Extend existing unit tests for the ACLE system register manipulation functions to include 128-bit tests. gcc/testsuite/ChangeLog: * gcc/testsuite/gcc.target/aarch64/acle/rwsr.c (get_rsr128): New. (set_wsr128): Likewise. --- gcc/testsuite/gcc.target/aarch64/acle/rwsr.c | 30 ++

[PATCH 4/5] aarch64: Implement 128-bit extension to ACLE sysreg r/w builtins

2023-11-07 Thread Victor Do Nascimento
Implement the ACLE builtins for 128-bit system register manipulation: * __uint128_t __arm_rsr128(const char *special_register); * void __arm_wsr128(const char *special_register, __uint128_t value); gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (AARCH64_RSR128): New `enu

[PATCH 0/5] aarch64: Add ACLE intrinsics codegen support for lrcpc3 instructions

2023-11-09 Thread Victor Do Nascimento
}64 * ldp1q_lane_{u|s|p}64 Bootstrapped and regression tested on aarch64-none-linux-gnu. Victor Do Nascimento (5): aarch64: rcpc3: Add +rcpc3 extension aarch64: rcpc3: Add relevant iterators to handle Neon intrinsics aarch64: rcpc3: Add Neon ACLE intrinsics aarch64: rcpc3: add Neon ACLE

[PATCH 2/5] aarch64: rcpc3: Add relevant iterators to handle Neon intrinsics

2023-11-09 Thread Victor Do Nascimento
The LDAP1 and STL1 Neon ACLE intrinsics, operating on 64-bit data values, operate on single-lane (Vt.1D) or twin-lane (Vt.2D) SIMD register configurations, either in the DI or DF modes. This leads to the need for a mode iterator accounting for the V1DI, V1DF, V2DI and V2DF modes. This patch there

[PATCH 3/5] aarch64: rcpc3: Add Neon ACLE intrinsics

2023-11-09 Thread Victor Do Nascimento
Register the target specific builtins in `aarch64-simd-builtins.def' and implement their associated backend patterns in `aarch64-simd.md'. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (vec_ldap1_lane): New. (vec_stl1_lane): Likewise. * config/aarch64/a

[PATCH 4/5] aarch64: rcpc3: add Neon ACLE wrapper functions to `arm_neon.h'

2023-11-09 Thread Victor Do Nascimento
Create the necessary mappings from the ACLE-defined Neon intrinsics names[1] to the internal builtin function names. [1] https://arm-software.github.io/acle/neon_intrinsics/advsimd.html gcc/ChangeLog: * gcc/config/aarch64/arm_neon.h (vldap1_lane_u64): New. (vldap1q_lane_u64): Lik

[PATCH 1/5] aarch64: rcpc3: Add +rcpc3 extension

2023-11-09 Thread Victor Do Nascimento
Given the optional LRCPC3 target support for Armv8.2-a cores onwards, the +rcpc3 arch feature modifier is added to GCC's command-line options. gcc/ChangeLog: * config/aarch64/aarch64-option-extensions.def (rcpc3): New. * config/aarch64/aarch64.h (AARCH64_ISA_RCPC3): Likewise.

[PATCH 5/5] aarch64: rcpc3: Add intrinsics tests

2023-11-09 Thread Victor Do Nascimento
Add unit test to ensure that added intrinsics compile to the correct `LDAP1 {Vt.D}[lane],[Xn]' and `STL1 {Vt.d}[lane],[Xn]' instructions. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/rcpc3.c: New. --- gcc/testsuite/gcc.target/aarch64/acle/rcpc3.c | 47 +++ 1 file ch

[PATCH v2 1/2] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2023-11-13 Thread Victor Do Nascimento
The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch feature and makes the code harder to maintain when new ifuncs are added and their su

[PATCH v2 0/2] Libatomic: Add LSE128 atomics support for AArch64

2023-11-13 Thread Victor Do Nascimento
-patches/2023-August/626358.html Victor Do Nascimento (2): libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface libatomic: Enable LSE128 128-bit atomics for armv9.4-a libatomic/Makefile.am| 3 + libatomic/Makefile.in| 1 + li

[PATCH v2 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2023-11-13 Thread Victor Do Nascimento
The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * LDSETP - Atomic OR (bitset) of

[PATCH] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2023-11-13 Thread Victor Do Nascimento
Continuing on from previously-proposed Libatomic enablement work [1], the introduction of the optional RCPC3 architectural extension for Armv8.2-A upwards provides additional support for the release consistency model, introducing both the Load-Acquire RCpc Pair Ordered, and Store-Release Pair Order

[PATCH v2 2/5] aarch64: Add support for GCS system registers with the +gcs modifier

2023-11-28 Thread Victor Do Nascimento
Given the introduction of system registers associated with the Guarded Control Stack extension to Armv9.4-a in Binutils and their reliance on the `+gcs' modifier, we implement the necessary changes in GCC to allow for them to be recognized by the compiler. gcc/ChangeLog: * config/aarch64/

[PATCH v2 0/5] aarch64: Add Armv9.4-a 128-bit system-register read/write support

2023-11-28 Thread Victor Do Nascimento
ces the Guarded Control Stack (GCS) `+gcs' architecture modifier flag, allowing the inclusion of the novel GCS system registers which are now supported and also present in the `aarch64-sys-regs.def' system register database. Victor Do Nascimento (5): aarch64: Add march flags for +the

[PATCH v2 5/5] aarch64: Add rsr128 and wsr128 ACLE tests

2023-11-28 Thread Victor Do Nascimento
Extend existing unit tests for the ACLE system register manipulation functions to include 128-bit tests. gcc/testsuite/ChangeLog: * gcc/testsuite/gcc.target/aarch64/acle/rwsr.c (get_rsr128): New. (set_wsr128): Likewise. --- gcc/testsuite/gcc.target/aarch64/acle/rwsr.c | 32 ++

[PATCH v2 4/5] aarch64: Implement 128-bit extension to ACLE sysreg r/w builtins

2023-11-28 Thread Victor Do Nascimento
Implement the ACLE builtins for 128-bit system register manipulation: * __uint128_t __arm_rsr128(const char *special_register); * void __arm_wsr128(const char *special_register, __uint128_t value); gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (AARCH64_RSR128): New `enu

[PATCH v2 1/5] aarch64: Add march flags for +the and +d128 arch extensions

2023-11-28 Thread Victor Do Nascimento
Given the introduction of optional 128-bit page table descriptor and translation hardening extension support with the Arm9.4-a architecture, this introduces the relevant flags to enable the reading and writing of 128-bit system registers. The `+d128' -march modifier enables the use of the followin

[PATCH v2 3/5] aarch64: Sync `aarch64-sys-regs.def' with Binutils.

2023-11-28 Thread Victor Do Nascimento
This patch updates `aarch64-sys-regs.def', bringing it into sync with the Binutils source. gcc/ChangeLog: * config/aarch64/aarch64-sys-regs.def (par_el1): New. (rcwmask_el1): Likewise. (rcwsmask_el1): Likewise. (ttbr0_el1): Likewise. (ttbr0_el12): Likewise.

[PATCH 04/10] arm: Fix arm backend-use of (u|s|us)dot_prod patterns.

2024-07-10 Thread Victor Do Nascimento
gcc/ChangeLog: * config/arm/arm-builtins.cc (enum arm_builtins): Add new ARM_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI, UDOTV8QI, UDOTV16QI, USDOTV8QI, USDOTV16QI. (arm_init_dotprod_builtins): New. (arm_init_builtins): Add call to `arm_init_dotprod_builtins

[PATCH 01/10] optabs: Make all `*dot_prod_optab's modeled as conversions

2024-07-10 Thread Victor Do Nascimento
Given the specification in the GCC internals manual defines the {u|s}dot_prod standard name as taking "two signed elements of the same mode, adding them to a third operand of wider mode", there is currently ambiguity in the relationship between the mode of the first two arguments and that of the th

[PATCH 10/10] autovectorizer: Test autovectorization of different dot-prod modes.

2024-07-10 Thread Victor Do Nascimento
From: Victor Do Nascimento Given the novel treatment of the dot product optab as a conversion we are now able to target, for a given architecture, different relationships between output modes and input modes. This is made clearer by way of example. Previously, on AArch64, the following loop was

[PATCH 07/10] mips: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/mips/loongson-mmi.md (sdot_prodv4hi): Deleted. (sdot_prodv2siv4hi): New. --- gcc/co

[PATCH 09/10] c6x: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/c6x/c6x.md (sdot_prodv2hi): Deleted. (sdot_prodsiv2hi): New. --- gcc/config/c6x/c6x

[PATCH 08/10] altivec: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/rs6000/altivec.md (udot_prod): Deleted. (udot_prodv4si): New. (sdot_prodv8hi

[PATCH 02/10] autovectorizer: Add basic support for convert optabs

2024-07-10 Thread Victor Do Nascimento
Given the shift from modeling dot products as direct optabs to treating them as conversion optabs, we make necessary changes to the autovectorizer code to ensure that given the relevant tree code, together with the input and output data modes, we can retrieve the relevant optab and subsequently the

[PATCH 06/10] arc: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/arc/simdext.md (sdot_prodv2hi): Deleted. (sdot_prodsiv2hi): New. (udot_prodv

[PATCH 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/i386/mmx.md (usdot_prodv8qi): Deleted. (usdot_prodv2siv8qi): New. (sdot_prod

[PATCH 00/10] Make `dot_prod' a convert-type optab

2024-07-10 Thread Victor Do Nascimento
x to ensure I've not inadvertently broken anything for those backends. Victor Do Nascimento (10): optabs: Make all `*dot_prod_optab's modeled as conversions autovectorizer: Add basic support for convert optabs aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns. arm: Fix arm b

[PATCH 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns.

2024-07-10 Thread Victor Do Nascimento
Given recent changes to the dot_prod standard pattern name, this patch fixes the aarch64 back-end by implementing the following changes: 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files. 2. Rewrite initialization and function expansion mechanism for simd builtins. 3. Fix all direct ca

Re: [PATCH 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets

2024-07-12 Thread Victor Do Nascimento
On 7/12/24 03:23, Jiang, Haochen wrote: -Original Message- From: Hongtao Liu Sent: Thursday, July 11, 2024 9:45 AM To: Victor Do Nascimento Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com; richard.earns...@arm.com Subject: Re: [PATCH 05/10] i386: Fix dot_prod backend patterns

[PATCH 3/4] Libatomic: Clean up AArch64 ifunc aliasing

2024-05-16 Thread Victor Do Nascimento
Following improvements to the way ifuncs are selected based on detected architectural features, we are able to do away with many of the aliases that were previously needed for subsets of atomic functions that were not implemented in a given extension. This may be clarified by virtue of an example.

[PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing

2024-05-16 Thread Victor Do Nascimento
able-gnu-indirect-function' configurations on armv9.4-a target with LRCPC3 and LSE128 support and without. Victor Do Nascimento (4): Libatomic: Define per-file identifier macros Libatomic: Make ifunc selector behavior contingent on importing file Libatomic: Clean up AArch64 ifunc a

[PATCH 2/4] Libatomic: Make ifunc selector behavior contingent on importing file

2024-05-16 Thread Victor Do Nascimento
By querying previously-defined file-identifier macros, `host-config.h' is able to get information about its environment and, based on this information, select more appropriate function-specific ifunc selectors. This reduces the number of unnecessary feature tests that need to be carried out in ord

[PATCH 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file

2024-05-16 Thread Victor Do Nascimento
At present, `atomic_16.S' groups different implementations of the same functions together in the file. Therefore, as an example, the LSE128 implementation of `exchange_16' follows on immediately from its core implementation, as does the `fetch_or_16' LSE128 implementation. Such architectural exte

[PATCH 1/4] Libatomic: Define per-file identifier macros

2024-05-16 Thread Victor Do Nascimento
In order to facilitate the fine-tuning of how `libatomic_i.h' and `host-config.h' headers are used by different atomic functions, we define distinct identifier macros for each file which, in implementing atomic operations, imports these headers. The idea is that different parts of these headers co

[PATCH] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-05-16 Thread Victor Do Nascimento
The introduction of the optional RCPC3 architectural extension for Armv8.2-A upwards provides additional support for the release consistency model, introducing the Load-Acquire RCpc Pair Ordered, and Store-Release Pair Ordered operations in the form of LDIAPP and STILP. These operations are single

  1   2   >