[PATCH] i386: Utilize VCOMSBF16 for BF16 Comparisons with AVX10.2

2024-10-16 Thread Levy Hsu
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m64}. Ok for trunk? This patch enables the use of the VCOMSBF16 instruction from AVX10.2 for efficient BF16 comparisons. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_branch): Handle BFmode when TARGET_AVX10_2_256 is e

[PATCH] x86: Implement Fast-Math Float Truncation to BF16 via PSRLD Instruction

2024-10-08 Thread Levy Hsu
Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? gcc/ChangeLog: * config/i386/i386.md: Rewrite insn truncsfbf2. gcc/testsuite/ChangeLog: * gcc.target/i386/truncsfbf-1.c: New test. * gcc.target/i386/truncsfbf-2.c: New test. --- gcc/config/i386/i386.md

[PATCH] x86: Extend AVX512 Vectorization for Popcount in Various Modes

2024-09-23 Thread Levy Hsu
This patch enables vectorization of the popcount operation for V2QI, V4QI, V8QI, V2HI, V4HI, and V2SI modes. gcc/ChangeLog: * config/i386/mmx.md: (VQI_16_32_64): New mode iterator for 8-byte, 4-byte, and 2-byte QImode. (popcount2): New pattern for popcount of V2QI/V4QI/V8Q

[PATCH v2] Enable V2BF/V4BF vec_cmp with AVX10.2 vcmppbf16

2024-09-11 Thread Levy Hsu
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: * config/i386/i386.cc (ix86_get_mask_mode): Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2. * config/i386/mmx.md (vec_cmpqi): Implement vec_cmpv2bfqi and vec_cmpv

[PATCH] x86: Refine V4BF/V2BF FMA Testcase

2024-09-10 Thread Levy Hsu
Simple testcase fix, ok for trunk? gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Separated 32-bit scan and removed register checks in spill situations. --- .../i386/avx10_2-partial-bf-vector-fma-1.c | 12 1 file changed, 8 i

[PATCH] i386: Enable V2BF/V4BF vec_cmp with AVX10.2 vcmppbf16

2024-09-09 Thread Levy Hsu
gcc/ChangeLog: * config/i386/i386.cc (ix86_get_mask_mode): Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2. * config/i386/mmx.md (vec_cmpqi): Implement vec_cmpv2bfqi and vec_cmpv4bfqi. gcc/testsuite/ChangeLog: * gcc.target/i386/part-vect-vec

[PATCH] x86: Refine V4BF/V2BF FMA testcase

2024-09-05 Thread Levy Hsu
Simple testcase fix, ok for trunk? This patch removes specific register checks to account for possible register spills and disables tests in 32-bit mode. This adjustment is necessary because V4BF operations in 32-bit mode require duplicating instructions, which lead to unintended test failures. It

[PATCH] i386: Support partial vectorized FMA for V2BF/V4BF

2024-09-03 Thread Levy Hsu
Hi Bootstrapped and tested on x86-64-pc-linux-gnu. Ok for trunk? This patch introduces support for vectorized FMA operations for bf16 types in V2BF and V4BF modes on the i386 architecture. New mode iterators and define_expand entries for fma, fnma, fms, and fnms operations are added in mmx.md, e

[PATCH] i386: Support partial signbit/xorsign/copysign/abs/neg/and/xor/ior/andn for V2BF/V4BF

2024-09-03 Thread Levy Hsu
Hi This patch adds support for bf16 operations in V2BF and V4BF modes on i386, handling signbit, xorsign, copysign, abs, neg, and various logical operations. Bootstrapped and tested on x86-64-pc-linux-gnu. Ok for trunk? gcc/ChangeLog: * config/i386/i386.cc (ix86_build_const_vector): Ad

[PATCH] i386: Integrate BFmode for Enhanced Vectorization in ix86_preferred_simd_mode

2024-09-03 Thread Levy Hsu
Hi This change adds BFmode support to the ix86_preferred_simd_mode function enhancing SIMD vectorization for BF16 operations. The update ensures optimized usage of SIMD capabilities improving performance and aligning vector sizes with processor capabilities. Bootstrapped and tested on x86-64-pc-l

[PATCH] i386: Support partial vectorized V2BF/V4BF smaxmin

2024-09-02 Thread Levy Hsu
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? This patch supports sminmax for partial vectorized V2BF/V4BF. gcc/ChangeLog: * config/i386/mmx.md (3): New define_expand for V2BF/V4BFsmaxmin gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-partial-bf-v

[PATCH] i386: Support partial vectorized V2BF/V4BF plus/minus/mult/div/sqrt

2024-09-02 Thread Levy Hsu
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? This patch introduces new mode iterators and expands for the i386 architecture to support partial vectorization of bf16 operations using AVX10.2 instructions. These operations include addition, subtraction, multiplication, d

Support bitwise and/andnot/abs/neg/copysign/xorsign op for V8BF/V16BF/V32BF

2024-07-03 Thread Levy Hsu
This patch extends support for BF16 vector operations in GCC, including bitwise AND, ANDNOT, ABS, NEG, COPYSIGN, and XORSIGN for V8BF, V16BF, and V32BF modes. Bootstrapped and tested on x86_64-linux-gnu. ok for trunk? gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_fp_absneg_ope

[PATCH] x86: Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

2024-06-13 Thread Levy Hsu
This patch updates the GCC x86 backend to efficiently handle odd, incrementally increasing permutations of BF16 vectors using the cvtne2ps2bf16 instruction. It modifies ix86_vectorize_vec_perm_const to support these operations and adds a specific predicate to ensure proper sequence handling. Boots

[PATCH] x86: Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

2024-06-13 Thread Levy Hsu
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_vectorize_vec_perm_const): Convert BF to HI using subreg. * config/i386/predicates.md (vcvtne2ps2bf_parallel): New define_insn_and_split. * config/i386/sse.md (vpermt2_sepcial_bf16_shuffle_): New pred

[PATCH] x86: Fix Logical Shift Issue in expand_vec_perm_psrlw_psllw_por [PR115146]

2024-05-20 Thread Levy Hsu
Replaced arithmetic shifts with logical shifts in expand_vec_perm_psrlw_psllw_por to avoid sign bit extension issues. Also corrected gen_vlshrv8hi3 to gen_lshrv8hi3 and gen_vashlv8hi3 to gen_ashlv8hi3. Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? Co-authored-by: H.J. Lu gcc/Chan

[PATCH] x86: Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-14 Thread Levy Hsu
Hi All We've introduced a new subroutine in ix86_expand_vec_perm_const_1 to optimize vector shifting for the V16QI type on x86. This patch uses a three-instruction sequence psrlw, psllw, and por to handle specific vector shuffle operations more efficiently. The change aims to improve assembly code

[PATCH 1/1] [PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-09 Thread Levy Hsu
Hi All We've introduced a new subroutine in ix86_expand_vec_perm_const_1 to optimize vector shifting for the V16QI type on x86. This patch uses a three-instruction sequence psrlw, psllw, and por to handle specific vector shuffle operations more efficiently. The change aims to improve assembly code

[PATCH 1/1] Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

2024-05-07 Thread Levy Hsu
Hi All This patch updates the GCC x86 backend to efficiently handle odd, incrementally increasing permutations of BF16 vectors using the cvtne2ps2bf16 instruction. It modifies ix86_vectorize_vec_perm_const to support these operations and adds a specific predicate to ensure proper sequence handling

[PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-07 Thread Levy Hsu
Hi All We've introduced a new subroutine in ix86_expand_vec_perm_const_1 to optimize vector shifting for the V16QI type on x86. This patch uses a three-instruction sequence psrlw, psllw, and por to handle specific vector shuffle operations more efficiently. The change aims to improve assembly c

[PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-07 Thread Levy Hsu
PR target/107563 gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New subroutine. (ix86_expand_vec_perm_const_1): New Entry. gcc/testsuite/ChangeLog: * g++.target/i386/pr107563.C: New test. --- gcc/config/i386/i386-expand.cc

[x86_64 PATCH] PR target/107563: Add 3-instruction subroutine vector shift in ix86_expand_vec_perm_const_1

2024-01-04 Thread Levy Hsu
From: Liwei Xu This patch optimize byte swaps in vectors using SSE2 instructions. It targets 8-byte and 16-byte vectors, efficiently handling patterns like __builtin_shufflevector(v, v, 1, 0, 3, 2, ...). PR target/107563 gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec

[PATCH] [RISCV] Add Pattern for builtin overflow

2021-04-28 Thread Levy Hsu
From: LevyHsu Added implementation for builtin overflow detection, new patterns are listed below. --- Addition: signed addition (SImode in RV32 || DImode in RV64): add t0, t1, t2 sltit3, t2, 0 slt t

[PATCH] [RISCV] Add Pattern for builtin overflow

2021-04-26 Thread Levy Hsu
From: LevyHsu Added implementation for builtin overflow detection, new patterns are listed below. --- Addition: signed addition (SImode with RV32 || DImode with RV64): add t0, t1, t2 sltit3, t2, 0 slt