Bootstrapped and regtested on x86_64-pc-linux-gnu{-m64}.
Ok for trunk?
This patch enables the use of the VCOMSBF16 instruction from AVX10.2 for
efficient BF16 comparisons.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_branch): Handle BFmode
when TARGET_AVX10_2_256 is e
Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?
gcc/ChangeLog:
* config/i386/i386.md: Rewrite insn truncsfbf2.
gcc/testsuite/ChangeLog:
* gcc.target/i386/truncsfbf-1.c: New test.
* gcc.target/i386/truncsfbf-2.c: New test.
---
gcc/config/i386/i386.md
This patch enables vectorization of the popcount operation for V2QI, V4QI,
V8QI, V2HI, V4HI, and V2SI modes.
gcc/ChangeLog:
* config/i386/mmx.md:
(VQI_16_32_64): New mode iterator for 8-byte, 4-byte, and 2-byte QImode.
(popcount2): New pattern for popcount of V2QI/V4QI/V8Q
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
* config/i386/i386.cc (ix86_get_mask_mode):
Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2.
* config/i386/mmx.md (vec_cmpqi):
Implement vec_cmpv2bfqi and vec_cmpv
Simple testcase fix, ok for trunk?
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Separated 32-bit
scan
and removed register checks in spill situations.
---
.../i386/avx10_2-partial-bf-vector-fma-1.c | 12
1 file changed, 8 i
gcc/ChangeLog:
* config/i386/i386.cc (ix86_get_mask_mode):
Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2.
* config/i386/mmx.md (vec_cmpqi):
Implement vec_cmpv2bfqi and vec_cmpv4bfqi.
gcc/testsuite/ChangeLog:
* gcc.target/i386/part-vect-vec
Simple testcase fix, ok for trunk?
This patch removes specific register checks to account for possible
register spills and disables tests in 32-bit mode. This adjustment
is necessary because V4BF operations in 32-bit mode require duplicating
instructions, which lead to unintended test failures. It
Hi
Bootstrapped and tested on x86-64-pc-linux-gnu.
Ok for trunk?
This patch introduces support for vectorized FMA operations for bf16 types in
V2BF and V4BF modes on the i386 architecture. New mode iterators and
define_expand entries for fma, fnma, fms, and fnms operations are added in
mmx.md, e
Hi
This patch adds support for bf16 operations in V2BF and V4BF modes on i386,
handling signbit, xorsign, copysign, abs, neg, and various logical operations.
Bootstrapped and tested on x86-64-pc-linux-gnu.
Ok for trunk?
gcc/ChangeLog:
* config/i386/i386.cc (ix86_build_const_vector): Ad
Hi
This change adds BFmode support to the ix86_preferred_simd_mode function
enhancing SIMD vectorization for BF16 operations. The update ensures
optimized usage of SIMD capabilities improving performance and aligning
vector sizes with processor capabilities.
Bootstrapped and tested on x86-64-pc-l
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
This patch supports sminmax for partial vectorized V2BF/V4BF.
gcc/ChangeLog:
* config/i386/mmx.md (3): New define_expand for
V2BF/V4BFsmaxmin
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_2-partial-bf-v
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
This patch introduces new mode iterators and expands for the i386 architecture
to support partial vectorization of bf16 operations using AVX10.2 instructions.
These operations include addition, subtraction, multiplication, d
This patch extends support for BF16 vector operations in GCC, including bitwise
AND, ANDNOT, ABS, NEG, COPYSIGN, and XORSIGN for V8BF, V16BF, and V32BF modes.
Bootstrapped and tested on x86_64-linux-gnu. ok for trunk?
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_fp_absneg_ope
This patch updates the GCC x86 backend to efficiently handle
odd, incrementally increasing permutations of BF16 vectors
using the cvtne2ps2bf16 instruction.
It modifies ix86_vectorize_vec_perm_const to support these operations
and adds a specific predicate to ensure proper sequence handling.
Boots
gcc/ChangeLog:
* config/i386/i386-expand.cc
(ix86_vectorize_vec_perm_const): Convert BF to HI using subreg.
* config/i386/predicates.md
(vcvtne2ps2bf_parallel): New define_insn_and_split.
* config/i386/sse.md
(vpermt2_sepcial_bf16_shuffle_): New pred
Replaced arithmetic shifts with logical shifts in
expand_vec_perm_psrlw_psllw_por to avoid sign bit extension issues. Also
corrected gen_vlshrv8hi3 to gen_lshrv8hi3 and gen_vashlv8hi3 to gen_ashlv8hi3.
Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?
Co-authored-by: H.J. Lu
gcc/Chan
embly code generation for configurations
supporting SSE2.
Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?
Best
Levy
gcc/ChangeLog:
PR target/107563
* config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
subroutine.
(ix86_expand_vec_perm_co
embly code generation for configurations
supporting SSE2.
Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?
Best
Levy
gcc/ChangeLog:
PR target/107563
* config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
subroutine.
(ix86_expand_vec_perm_co
handling.
Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?
BRs,
Levy
gcc/ChangeLog:
* config/i386/i386-expand.cc
(ix86_vectorize_vec_perm_const): Convert BF to HI using subreg.
* config/i386/predicates.md
(vcvtne2ps2bf_parallel): New define_insn_and_split
embly code generation for configurations
supporting SSE2.
This update addresses the issue detailed in Bugzilla report 107563.
Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?
BRs,
Levy
gcc/ChangeLog:
* config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por)
PR target/107563
gcc/ChangeLog:
* config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
subroutine.
(ix86_expand_vec_perm_const_1): New Entry.
gcc/testsuite/ChangeLog:
* g++.target/i386/pr107563.C: New test.
---
gcc/config/i386/i386-expand.cc
From: Liwei Xu
This patch optimize byte swaps in vectors using SSE2 instructions.
It targets 8-byte and 16-byte vectors, efficiently handling patterns like
__builtin_shufflevector(v, v, 1, 0, 3, 2, ...).
PR target/107563
gcc/ChangeLog:
* config/i386/i386-expand.cc (expand_vec
Hi RuoYao
It’s probably because loongarch64 doesn’t support
can_vec_perm_const_p(result_mode, op_mode, sel2, false)
I’m not sure whether if loongarch will support it or should I just limit the
test target for pr54346.c?
Best Regards
Levy
> On 12 Oct 2022, at 9:51 pm, Xi Ruoyao wr
From: LevyHsu
Added implementation for builtin overflow detection, new patterns are listed
below.
---
Addition:
signed addition (SImode in RV32 || DImode in RV64):
add t0, t1, t2
sltit3, t2, 0
slt t
From: LevyHsu
Added implementation for builtin overflow detection, new patterns are listed
below.
---
Addition:
signed addition (SImode with RV32 || DImode with RV64):
add t0, t1, t2
sltit3, t2, 0
slt
Added implementation for builtin overflow detection, new patterns are listed
below.
signed addition:
add t0, t1, t2
sltit3, t2, 0
slt t4, t0, t1
bne t3, t4, overflow
unsigned addition:
add t0, t1, t2
bltut0, t1, overflow
sig
26 matches
Mail list logo