/acle/main/acle.html
Victor Do Nascimento (6):
aarch64: Sync system register information with Binutils
aarch64: Add support for aarch64-sys-regs.def
aarch64: Implement system register validation tools
aarch64: Add basic target_print_operand support for CONST_STRING
aarch64: Implement sys
In implementing the ACLE read/write system register builtins it was
observed that leaving argument type checking to be done at expand-time
meant that poorly-formed function calls were being "fixed" by certain
optimization passes, meaning bad code wasn't being properly picked up
in checking.
Exampl
Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accordingly.
Consequently, an rtx such as:
(set (reg/i:DI 0 x0)
(unspec:DI [(const_string ("amcgc
This patch defines the structure of a new .def file used for
representing the aarch64 system registers, what information it should
hold and the basic framework in GCC to process this file.
Entries in the aarch64-system-regs.def file should be as follows:
SYSREG (NAME, CPENC (sn,op1,cn,cm,op2),
Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:
uint32_t __arm_rsr(const char *special_register);
uint64_t __arm_rsr64(const char *special_register);
void* __arm_rsrp(const char *special_register);
float __arm
This patch adds the `aarch64-sys-regs.def' file to GCC, teaching
the compiler about system registers known to the assembler and how
these can be used.
The macros used to hold system register information reflect those in
use by binutils, a design choice made to facilitate the sharing of data
betwee
Given the implementation of a mechanism of encoding system registers
into GCC, this patch provides the mechanism of validating their use by
the compiler. In particular, this involves:
1. Ensuring a supplied string corresponds to a known system
register name. System registers can be access
On 10/5/23 13:26, Richard Earnshaw wrote:
On 03/10/2023 16:18, Victor Do Nascimento wrote:
Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accord
On 10/5/23 12:42, Richard Earnshaw wrote:
On 03/10/2023 16:18, Victor Do Nascimento wrote:
This patch adds the `aarch64-sys-regs.def' file to GCC, teaching
the compiler about system registers known to the assembler and how
these can be used.
The macros used to hold system reg
On 10/9/23 01:02, Ramana Radhakrishnan wrote:
On 5 Oct 2023, at 14:04, Victor Do Nascimento
wrote:
External email: Use caution opening links or attachments
On 10/5/23 12:42, Richard Earnshaw wrote:
On 03/10/2023 16:18, Victor Do Nascimento wrote:
This patch adds the `aarch64-sys
On 10/7/23 12:53, Richard Sandiford wrote:
Richard Earnshaw writes:
On 03/10/2023 16:18, Victor Do Nascimento wrote:
In implementing the ACLE read/write system register builtins it was
observed that leaving argument type checking to be done at expand-time
meant that poorly-formed function
Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:
uint32_t __arm_rsr(const char *special_register);
uint64_t __arm_rsr64(const char *special_register);
void* __arm_rsrp(const char *special_register);
float __arm
old = __arm_rsr("trcseqstr");
__arm_wsr("trcseqstr", new);
Testing:
- Bootstrap/regtest on aarch64-linux-gnu done.
[1] https://arm-software.github.io/acle/main/acle.html
Victor Do Nascimento (7):
aarch64: Sync system register information with Binutils
aarch64: A
Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accordingly.
Consequently, an rtx such as:
(set (reg/i:DI 0 x0)
(unspec:DI [(const_string ("s3_3_
This patch adds the `aarch64-sys-regs.def' file, originally written
for Binutils, to GCC. In so doing, it provides GCC with the necessary
information for teaching the compiler about system registers known to
the assembler and how these can be used.
By aligning the representation of data common to
This patch defines the structure of a new .def file used for
representing the aarch64 system registers, what information it should
hold and the basic framework in GCC to process this file.
Entries in the aarch64-system-regs.def file should be as follows:
SYSREG (NAME, CPENC (sn,op1,cn,cm,op2),
Add a build-time test to check whether system register data, as
imported from `aarch64-sys-reg.def' has any duplicate entries.
Duplicate entries are defined as any two SYSREG entries in the .def
file which share the same encoding values (as specified by its `CPENC'
field) and where the relationshi
In implementing the ACLE read/write system register builtins it was
observed that leaving argument type checking to be done at expand-time
meant that poorly-formed function calls were being "fixed" by certain
optimization passes, meaning bad code wasn't being properly picked up
in checking.
Exampl
Given the implementation of a mechanism of encoding system registers
into GCC, this patch provides the mechanism of validating their use by
the compiler. In particular, this involves:
1. Ensuring a supplied string corresponds to a known system
register name. System registers can be access
Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accordingly.
Consequently, an rtx such as:
(set (reg/i:DI 0 x0)
(unspec:DI [(const_string ("s3_3_
On 10/18/23 22:07, Richard Sandiford wrote:
Victor Do Nascimento writes:
This patch defines the structure of a new .def file used for
representing the aarch64 system registers, what information it should
hold and the basic framework in GCC to process this file.
Entries in the aarch64
On 10/18/23 21:39, Richard Sandiford wrote:
Victor Do Nascimento writes:
Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:
uint32_t __arm_rsr(const char *special_register);
uint64_t __arm_rsr64(const char
On 10/18/23 22:30, Richard Sandiford wrote:
Victor Do Nascimento writes:
Add a build-time test to check whether system register data, as
imported from `aarch64-sys-reg.def' has any duplicate entries.
Duplicate entries are defined as any two SYSREG entries in the .def
file which shar
On 10/26/23 16:23, Richard Sandiford wrote:
Victor Do Nascimento writes:
On 10/18/23 21:39, Richard Sandiford wrote:
Victor Do Nascimento writes:
Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:
uint32_t __arm_rsr(const
Implement the ACLE data and instruction prefetch functions[1] with the
following signatures:
1. Data prefetch intrinsics:
void __pldx (/*constant*/ unsigned int /*access_kind*/,
/*constant*/ unsigned int /*cache_level*/,
/*constant*/
On 10/27/23 14:18, Alex Coplan wrote:
On 26/10/2023 16:23, Richard Sandiford wrote:
Victor Do Nascimento writes:
On 10/18/23 21:39, Richard Sandiford wrote:
Victor Do Nascimento writes:
Implement the aarch64 intrinsics for reading and writing system
registers with the following
Correct CV-qualification from being erroeously applied to the `addr'
pointer, applying it instead to its pointer target, as specified by
the ACLE standards.
---
Implement the ACLE data and instruction prefetch functions[1] with the
following signatures:
1. Data prefetch intrinsics:
-
This patch defines the structure of a new .def file used for
representing the aarch64 system registers, what information it should
hold and the basic framework in GCC to process this file.
Entries in the aarch64-system-regs.def file should be as follows:
SYSREG (NAME, CPENC (sn,op1,cn,cm,op2),
This patch adds the `aarch64-sys-regs.def' file, originally written
for Binutils, to GCC. In so doing, it provides GCC with the necessary
information for teaching the compiler about system registers known to
the assembler and how these can be used.
By aligning the representation of data common to
Given the implementation of a mechanism of encoding system registers
into GCC, this patch provides the mechanism of validating their use by
the compiler. In particular, this involves:
1. Ensuring a supplied string corresponds to a known system
register name. System registers can be access
__arm_wsr("trcseqstr", new);
Testing:
- Bootstrap/regtest on aarch64-linux-gnu done.
[1] https://arm-software.github.io/acle/main/acle.html
Victor Do Nascimento (6):
aarch64: Sync system register information with Binutils
aarch64: Add support for aarch64-sys-regs.def
aarch64: Imple
Add a build-time test to check whether system register data, as
imported from `aarch64-sys-reg.def' has any duplicate entries.
Duplicate entries are defined as any two SYSREG entries in the .def
file which share the same encoding values (as specified by its `CPENC'
field) and where the relationshi
Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:
uint32_t __arm_rsr(const char *special_register);
uint64_t __arm_rsr64(const char *special_register);
void* __arm_rsrp(const char *special_register);
float __arm
In implementing the ACLE read/write system register builtins it was
observed that leaving argument type checking to be done at expand-time
meant that poorly-formed function calls were being "fixed" by certain
optimization passes, meaning bad code wasn't being properly picked up
in checking.
Exampl
Given how, at present, the choice of using LSE128 atomic instructions
by the toolchain is delegated to run-time selection in the form of
Libatomic ifuncs, responsible for querying target support, the
`+lse128' target architecture compile-time flag is absent from GCC.
This, however, contrasts with
Due to the Linux kernel exposing the lrcpc3 architectural feature as
"lrcpc3", this patch corrects the relevant FEATURE_STRING entry in the
"rcpc3" AARCH64_OPT_FMV_EXTENSION macro, such that the feature can be
correctly detected when doing native compilation on rcpc3-enabled
targets.
Regtested on
On 3/26/24 12:26, Richard Sandiford wrote:
Victor Do Nascimento writes:
Given how, at present, the choice of using LSE128 atomic instructions
by the toolchain is delegated to run-time selection in the form of
Libatomic ifuncs, responsible for querying target support, the
`+lse128' t
With the release of Binutils 2.42, this brings the level of
system-register support in GCC in line with the current
state-of-the-art in Binutils, ensuring everything available in
Binutils is plainly accessible from GCC.
Where Binutils uses a more detailed description of which features are
responsi
arm-linux-gnueabihf with --with-arch=armv6
with make bootstrap and make -k check where it fixes all of the FAILs in
libatomic. Ok for mainline?
2024-01-28 Roger Sayle
Victor Do Nascimento
libatomic/ChangeLog
PR other/113336
* Makefile.am: Build tas
upport is present.
Regression tested on aarch64-linux-gnu target with LSE128-support.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html
Victor Do Nascimento (4):
libatomic: atomic_16.S: Improve ENTRY, END an
With support for new atomic features in Armv9.4-a being indicated by
HWCAP2 bits, Libatomic's ifunc resolver must now query its second
argument, of type __ifunc_arg_t*.
We therefore make this argument known to libatomic, allowing us to
query hwcap2 bits in the following manner:
bool
resolver
The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:
* LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
value held in a pair of registers, with original data loaded into
the same 2 registers.
* LDSETP - Atomic OR (bitset) of
At present, Evaluation of both `has_lse2(hwcap)' and
`has_lse128(hwcap)' may require issuing an `mrs' instruction to query
a system register. This instruction, when issued from user-space
results in a trap by the kernel which then returns the value read in
by the system register. Given the undesi
The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i' suffixes to functions
cumbersome to work with. It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their su
/gcc-patches/2024-January/643841.html
Victor Do Nascimento (2):
libatomic: Increase max IFUNC_NCOND(N) from 3 to 4.
libatomic: Add rcpc3 128-bit atomic operations for AArch64
libatomic/Makefile.am| 6 +-
libatomic/Makefile.in| 22
libatomic/ChangeLog:
* libatomic_i.h: Add GEN_SELECTOR implementation for
IFUNC_NCOND(N) == 4.
---
libatomic/libatomic_i.h | 18 ++
1 file changed, 18 insertions(+)
diff --git a/libatomic/libatomic_i.h b/libatomic/libatomic_i.h
index 861a22da152..0a854fd908c 100644
The introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing the Load-Acquire RCpc Pair Ordered, and
Store-Release Pair Ordered operations in the form of LDIAPP and STILP.
These operations are single
On 1/11/24 15:55, Roger Sayle wrote:
Hi Richard,
As you've recommended, this issue has now been filed in bugzilla
as PR other/113336. As explained in the new PR, libatomic's testsuite
used to pass on armv6 (raspberry pi) in previous GCC releases, but
the code was incorrect/non-synchronous; t
On 1/26/24 10:53, Richard Sandiford wrote:
> Victor Do Nascimento writes:
>> @@ -712,6 +760,27 @@ ENTRY (libat_test_and_set_16)
>> END (libat_test_and_set_16)
>>
>>
>> +/* Alias all LSE128_LRCPC3 ifuncs to their specific implementations,
>> + that
org/pipermail/gcc-patches/2023-June/620529.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html
Victor Do Nascimento (3):
libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
libatomic: Enable LSE128 128-bit atomics for armv9.4-a
aarch64: Add ex
The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:
* LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
value held in a pair of registers, with original data loaded into
the same 2 registers.
* LDSETP - Atomic OR (bitset) of
The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i' suffixes to functions
cumbersome to work with. It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their su
At present, Evaluation of both `has_lse2(hwcap)' and
`has_lse128(hwcap)' may require issuing an `mrs' instruction to query
a system register. This instruction, when issued from user-space
results in a trap by the kernel which then returns the value read in
by the system register. Given the undesi
On 1/5/24 11:47, Richard Sandiford wrote:
Victor Do Nascimento writes:
The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:
* LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
value held in a pair of registers, with
On 1/5/24 11:10, Richard Sandiford wrote:
Victor Do Nascimento writes:
The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i' suffixes to functions
cumbersome to work with. It is awkward to remember which ifunc maps
onto which
Key changes in v3:
* Implement the `require_const_argument' function to ensure the nth
argument in EXP represents a const-type argument in the valid range
given by [minval, maxval), forgoing expansion altogether when an
invalid argument is detected early on.
* Whereas in the previous iter
In the Linux kernel, u64/s64 are [un]signed long long, not [un]signed
long. This means that when the `arm_neon.h' header is used by the
kernel, any use of the `uint64_t' / `in64_t' types needs to be
correctly cast to the correct `__builtin_aarch64_simd_di' /
`__builtin_aarch64_simd_df' types when
The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i' suffixes to functions
cumbersome to work with. It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their su
The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:
* LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
value held in a pair of registers, with original data loaded into
the same 2 registers.
* LDSETP - Atomic OR (bitset) of
e whenever architectural support is present.
Regression tested on aarch64-linux-gnu target with LSE128-support.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html
Victor Do Nascimento (2):
libatomic: atomic_16.S: Imp
Given the introduction of system registers associated with the Guarded
Control Stack extension to Armv9.4-a in Binutils and their reliance on
the `+gcs' modifier, we implement the necessary changes in GCC to
allow for them to be recognized by the compiler.
gcc/ChangeLog:
* config/aarch64/
This patch updates `aarch64-sys-regs.def', bringing it into sync with
the Binutils source.
gcc/ChangeLog:
* config/aarch64/aarch64-sys-regs.def (par_el1): New.
(rcwmask_el1): Likewise.
(rcwsmask_el1): Likewise.
(ttbr0_el1): Likewise.
(ttbr0_el12): Likewise.
Given the introduction of optional 128-bit page table descriptor and
translation hardening extension support with the Arm9.4-a
architecture, this introduces the relevant flags to enable the reading
and writing of 128-bit system registers.
The `+d128' -march modifier enables the use of the followin
also present in the `aarch64-sys-regs.def' system register
database.
Victor Do Nascimento (5):
aarch64: Add march flags for +the and +d128 arch extensions
aarch64: Add support for GCS system registers with the +gcs modifier
aarch64: Sync `aarch64-sys-regs.def' with Binutils.
aar
Extend existing unit tests for the ACLE system register manipulation
functions to include 128-bit tests.
gcc/testsuite/ChangeLog:
* gcc/testsuite/gcc.target/aarch64/acle/rwsr.c (get_rsr128): New.
(set_wsr128): Likewise.
---
gcc/testsuite/gcc.target/aarch64/acle/rwsr.c | 30 ++
Implement the ACLE builtins for 128-bit system register manipulation:
* __uint128_t __arm_rsr128(const char *special_register);
* void __arm_wsr128(const char *special_register, __uint128_t value);
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (AARCH64_RSR128): New
`enu
}64
* ldp1q_lane_{u|s|p}64
Bootstrapped and regression tested on aarch64-none-linux-gnu.
Victor Do Nascimento (5):
aarch64: rcpc3: Add +rcpc3 extension
aarch64: rcpc3: Add relevant iterators to handle Neon intrinsics
aarch64: rcpc3: Add Neon ACLE intrinsics
aarch64: rcpc3: add Neon ACLE
The LDAP1 and STL1 Neon ACLE intrinsics, operating on 64-bit data
values, operate on single-lane (Vt.1D) or twin-lane (Vt.2D) SIMD
register configurations, either in the DI or DF modes. This leads to
the need for a mode iterator accounting for the V1DI, V1DF, V2DI and
V2DF modes.
This patch there
Register the target specific builtins in `aarch64-simd-builtins.def'
and implement their associated backend patterns in `aarch64-simd.md'.
gcc/ChangeLog:
* config/aarch64/aarch64-simd-builtins.def
(vec_ldap1_lane): New.
(vec_stl1_lane): Likewise.
* config/aarch64/a
Create the necessary mappings from the ACLE-defined Neon intrinsics
names[1] to the internal builtin function names.
[1] https://arm-software.github.io/acle/neon_intrinsics/advsimd.html
gcc/ChangeLog:
* gcc/config/aarch64/arm_neon.h (vldap1_lane_u64): New.
(vldap1q_lane_u64): Lik
Given the optional LRCPC3 target support for Armv8.2-a cores onwards,
the +rcpc3 arch feature modifier is added to GCC's command-line options.
gcc/ChangeLog:
* config/aarch64/aarch64-option-extensions.def (rcpc3): New.
* config/aarch64/aarch64.h (AARCH64_ISA_RCPC3): Likewise.
Add unit test to ensure that added intrinsics compile to the correct
`LDAP1 {Vt.D}[lane],[Xn]' and `STL1 {Vt.d}[lane],[Xn]' instructions.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/acle/rcpc3.c: New.
---
gcc/testsuite/gcc.target/aarch64/acle/rcpc3.c | 47 +++
1 file ch
The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i' suffixes to functions
cumbersome to work with. It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their su
-patches/2023-August/626358.html
Victor Do Nascimento (2):
libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
libatomic: Enable LSE128 128-bit atomics for armv9.4-a
libatomic/Makefile.am| 3 +
libatomic/Makefile.in| 1 +
li
The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:
* LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
value held in a pair of registers, with original data loaded into
the same 2 registers.
* LDSETP - Atomic OR (bitset) of
Continuing on from previously-proposed Libatomic enablement work [1],
the introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing both the Load-Acquire RCpc Pair
Ordered, and Store-Release Pair Order
Given the introduction of system registers associated with the Guarded
Control Stack extension to Armv9.4-a in Binutils and their reliance on
the `+gcs' modifier, we implement the necessary changes in GCC to
allow for them to be recognized by the compiler.
gcc/ChangeLog:
* config/aarch64/
ces the Guarded
Control Stack (GCS) `+gcs' architecture modifier flag, allowing the
inclusion of the novel GCS system registers which are now supported
and also present in the `aarch64-sys-regs.def' system register
database.
Victor Do Nascimento (5):
aarch64: Add march flags for +the
Extend existing unit tests for the ACLE system register manipulation
functions to include 128-bit tests.
gcc/testsuite/ChangeLog:
* gcc/testsuite/gcc.target/aarch64/acle/rwsr.c (get_rsr128): New.
(set_wsr128): Likewise.
---
gcc/testsuite/gcc.target/aarch64/acle/rwsr.c | 32 ++
Implement the ACLE builtins for 128-bit system register manipulation:
* __uint128_t __arm_rsr128(const char *special_register);
* void __arm_wsr128(const char *special_register, __uint128_t value);
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (AARCH64_RSR128): New
`enu
Given the introduction of optional 128-bit page table descriptor and
translation hardening extension support with the Arm9.4-a
architecture, this introduces the relevant flags to enable the reading
and writing of 128-bit system registers.
The `+d128' -march modifier enables the use of the followin
This patch updates `aarch64-sys-regs.def', bringing it into sync with
the Binutils source.
gcc/ChangeLog:
* config/aarch64/aarch64-sys-regs.def (par_el1): New.
(rcwmask_el1): Likewise.
(rcwsmask_el1): Likewise.
(ttbr0_el1): Likewise.
(ttbr0_el12): Likewise.
gcc/ChangeLog:
* config/arm/arm-builtins.cc (enum arm_builtins): Add new
ARM_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI, UDOTV8QI,
UDOTV16QI, USDOTV8QI, USDOTV16QI.
(arm_init_dotprod_builtins): New.
(arm_init_builtins): Add call to `arm_init_dotprod_builtins
Given the specification in the GCC internals manual defines the
{u|s}dot_prod standard name as taking "two signed elements of the
same mode, adding them to a third operand of wider mode", there is
currently ambiguity in the relationship between the mode of the first
two arguments and that of the th
From: Victor Do Nascimento
Given the novel treatment of the dot product optab as a conversion we
are now able to target, for a given architecture, different
relationships between output modes and input modes.
This is made clearer by way of example. Previously, on AArch64, the
following loop was
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.
gcc/ChangeLog:
* config/mips/loongson-mmi.md (sdot_prodv4hi): Deleted.
(sdot_prodv2siv4hi): New.
---
gcc/co
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.
gcc/ChangeLog:
* config/c6x/c6x.md (sdot_prodv2hi): Deleted.
(sdot_prodsiv2hi): New.
---
gcc/config/c6x/c6x
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.
gcc/ChangeLog:
* config/rs6000/altivec.md (udot_prod): Deleted.
(udot_prodv4si): New.
(sdot_prodv8hi
Given the shift from modeling dot products as direct optabs to
treating them as conversion optabs, we make necessary changes to the
autovectorizer code to ensure that given the relevant tree code,
together with the input and output data modes, we can retrieve the
relevant optab and subsequently the
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.
gcc/ChangeLog:
* config/arc/simdext.md (sdot_prodv2hi): Deleted.
(sdot_prodsiv2hi): New.
(udot_prodv
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.
gcc/ChangeLog:
* config/i386/mmx.md (usdot_prodv8qi): Deleted.
(usdot_prodv2siv8qi): New.
(sdot_prod
x to ensure I've not inadvertently broken anything for
those backends.
Victor Do Nascimento (10):
optabs: Make all `*dot_prod_optab's modeled as conversions
autovectorizer: Add basic support for convert optabs
aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns.
arm: Fix arm b
Given recent changes to the dot_prod standard pattern name, this patch
fixes the aarch64 back-end by implementing the following changes:
1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
2. Rewrite initialization and function expansion mechanism for simd
builtins.
3. Fix all direct ca
On 7/12/24 03:23, Jiang, Haochen wrote:
-Original Message-
From: Hongtao Liu
Sent: Thursday, July 11, 2024 9:45 AM
To: Victor Do Nascimento
Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com;
richard.earns...@arm.com
Subject: Re: [PATCH 05/10] i386: Fix dot_prod backend patterns
Following improvements to the way ifuncs are selected based on
detected architectural features, we are able to do away with many of
the aliases that were previously needed for subsets of atomic
functions that were not implemented in a given extension.
This may be clarified by virtue of an example.
able-gnu-indirect-function' configurations on armv9.4-a target
with LRCPC3 and LSE128 support and without.
Victor Do Nascimento (4):
Libatomic: Define per-file identifier macros
Libatomic: Make ifunc selector behavior contingent on importing file
Libatomic: Clean up AArch64 ifunc a
By querying previously-defined file-identifier macros, `host-config.h'
is able to get information about its environment and, based on this
information, select more appropriate function-specific ifunc
selectors. This reduces the number of unnecessary feature tests that
need to be carried out in ord
At present, `atomic_16.S' groups different implementations of the
same functions together in the file. Therefore, as an example,
the LSE128 implementation of `exchange_16' follows on immediately
from its core implementation, as does the `fetch_or_16' LSE128
implementation.
Such architectural exte
In order to facilitate the fine-tuning of how `libatomic_i.h' and
`host-config.h' headers are used by different atomic functions, we
define distinct identifier macros for each file which, in implementing
atomic operations, imports these headers.
The idea is that different parts of these headers co
The introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing the Load-Acquire RCpc Pair Ordered, and
Store-Release Pair Ordered operations in the form of LDIAPP and STILP.
These operations are single
1 - 100 of 168 matches
Mail list logo