Bootstrapped and regtested on x86_64-pc-linux-gnu{-m64}.
Ok for trunk?
This patch enables the use of the VCOMSBF16 instruction from AVX10.2 for
efficient BF16 comparisons.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_branch): Handle BFmode
when TARGET_AVX10_2_256 is e
Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?
gcc/ChangeLog:
* config/i386/i386.md: Rewrite insn truncsfbf2.
gcc/testsuite/ChangeLog:
* gcc.target/i386/truncsfbf-1.c: New test.
* gcc.target/i386/truncsfbf-2.c: New test.
---
gcc/config/i386/i386.md
This patch enables vectorization of the popcount operation for V2QI, V4QI,
V8QI, V2HI, V4HI, and V2SI modes.
gcc/ChangeLog:
* config/i386/mmx.md:
(VQI_16_32_64): New mode iterator for 8-byte, 4-byte, and 2-byte QImode.
(popcount2): New pattern for popcount of V2QI/V4QI/V8Q
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
* config/i386/i386.cc (ix86_get_mask_mode):
Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2.
* config/i386/mmx.md (vec_cmpqi):
Implement vec_cmpv2bfqi and vec_cmpv
Simple testcase fix, ok for trunk?
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Separated 32-bit
scan
and removed register checks in spill situations.
---
.../i386/avx10_2-partial-bf-vector-fma-1.c | 12
1 file changed, 8 i
gcc/ChangeLog:
* config/i386/i386.cc (ix86_get_mask_mode):
Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2.
* config/i386/mmx.md (vec_cmpqi):
Implement vec_cmpv2bfqi and vec_cmpv4bfqi.
gcc/testsuite/ChangeLog:
* gcc.target/i386/part-vect-vec
Simple testcase fix, ok for trunk?
This patch removes specific register checks to account for possible
register spills and disables tests in 32-bit mode. This adjustment
is necessary because V4BF operations in 32-bit mode require duplicating
instructions, which lead to unintended test failures. It
Hi
Bootstrapped and tested on x86-64-pc-linux-gnu.
Ok for trunk?
This patch introduces support for vectorized FMA operations for bf16 types in
V2BF and V4BF modes on the i386 architecture. New mode iterators and
define_expand entries for fma, fnma, fms, and fnms operations are added in
mmx.md, e
Hi
This patch adds support for bf16 operations in V2BF and V4BF modes on i386,
handling signbit, xorsign, copysign, abs, neg, and various logical operations.
Bootstrapped and tested on x86-64-pc-linux-gnu.
Ok for trunk?
gcc/ChangeLog:
* config/i386/i386.cc (ix86_build_const_vector): Ad
Hi
This change adds BFmode support to the ix86_preferred_simd_mode function
enhancing SIMD vectorization for BF16 operations. The update ensures
optimized usage of SIMD capabilities improving performance and aligning
vector sizes with processor capabilities.
Bootstrapped and tested on x86-64-pc-l
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
This patch supports sminmax for partial vectorized V2BF/V4BF.
gcc/ChangeLog:
* config/i386/mmx.md (3): New define_expand for
V2BF/V4BFsmaxmin
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_2-partial-bf-v
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
This patch introduces new mode iterators and expands for the i386 architecture
to support partial vectorization of bf16 operations using AVX10.2 instructions.
These operations include addition, subtraction, multiplication, d
This patch extends support for BF16 vector operations in GCC, including bitwise
AND, ANDNOT, ABS, NEG, COPYSIGN, and XORSIGN for V8BF, V16BF, and V32BF modes.
Bootstrapped and tested on x86_64-linux-gnu. ok for trunk?
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_fp_absneg_ope
This patch updates the GCC x86 backend to efficiently handle
odd, incrementally increasing permutations of BF16 vectors
using the cvtne2ps2bf16 instruction.
It modifies ix86_vectorize_vec_perm_const to support these operations
and adds a specific predicate to ensure proper sequence handling.
Boots
gcc/ChangeLog:
* config/i386/i386-expand.cc
(ix86_vectorize_vec_perm_const): Convert BF to HI using subreg.
* config/i386/predicates.md
(vcvtne2ps2bf_parallel): New define_insn_and_split.
* config/i386/sse.md
(vpermt2_sepcial_bf16_shuffle_): New pred
Replaced arithmetic shifts with logical shifts in
expand_vec_perm_psrlw_psllw_por to avoid sign bit extension issues. Also
corrected gen_vlshrv8hi3 to gen_lshrv8hi3 and gen_vashlv8hi3 to gen_ashlv8hi3.
Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?
Co-authored-by: H.J. Lu
gcc/Chan
Hi All
We've introduced a new subroutine in ix86_expand_vec_perm_const_1
to optimize vector shifting for the V16QI type on x86.
This patch uses a three-instruction sequence psrlw, psllw, and por
to handle specific vector shuffle operations more efficiently.
The change aims to improve assembly code
Hi All
We've introduced a new subroutine in ix86_expand_vec_perm_const_1
to optimize vector shifting for the V16QI type on x86.
This patch uses a three-instruction sequence psrlw, psllw, and por
to handle specific vector shuffle operations more efficiently.
The change aims to improve assembly code
Hi All
This patch updates the GCC x86 backend to efficiently handle
odd, incrementally increasing permutations of BF16 vectors
using the cvtne2ps2bf16 instruction.
It modifies ix86_vectorize_vec_perm_const to support these operations
and adds a specific predicate to ensure proper sequence handling
Hi All
We've introduced a new subroutine in ix86_expand_vec_perm_const_1
to optimize vector shifting for the V16QI type on x86.
This patch uses a three-instruction sequence psrlw, psllw, and por
to handle specific vector shuffle operations more efficiently.
The change aims to improve assembly c
PR target/107563
gcc/ChangeLog:
* config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
subroutine.
(ix86_expand_vec_perm_const_1): New Entry.
gcc/testsuite/ChangeLog:
* g++.target/i386/pr107563.C: New test.
---
gcc/config/i386/i386-expand.cc
From: Liwei Xu
This patch optimize byte swaps in vectors using SSE2 instructions.
It targets 8-byte and 16-byte vectors, efficiently handling patterns like
__builtin_shufflevector(v, v, 1, 0, 3, 2, ...).
PR target/107563
gcc/ChangeLog:
* config/i386/i386-expand.cc (expand_vec
From: LevyHsu
Added implementation for builtin overflow detection, new patterns are listed
below.
---
Addition:
signed addition (SImode in RV32 || DImode in RV64):
add t0, t1, t2
sltit3, t2, 0
slt t
From: LevyHsu
Added implementation for builtin overflow detection, new patterns are listed
below.
---
Addition:
signed addition (SImode with RV32 || DImode with RV64):
add t0, t1, t2
sltit3, t2, 0
slt
24 matches
Mail list logo