Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

2022-10-12 Thread Levy
Hi RuoYao It’s probably because loongarch64 doesn’t support can_vec_perm_const_p(result_mode, op_mode, sel2, false) I’m not sure whether if loongarch will support it or should I just limit the test target for pr54346.c? Best Regards Levy > On 12 Oct 2022, at 9:51 pm, Xi Ruoyao wr

[PATCH] RISC-V: Add implementation for builtin overflow

2021-01-21 Thread Levy
Added implementation for builtin overflow detection, new patterns are listed below. signed addition: add t0, t1, t2 sltit3, t2, 0 slt t4, t0, t1 bne t3, t4, overflow unsigned addition: add t0, t1, t2 bltut0, t1, overflow sig

[PATCH] [RISCV] Add Pattern for builtin overflow

2021-04-26 Thread Levy Hsu
From: LevyHsu Added implementation for builtin overflow detection, new patterns are listed below. --- Addition: signed addition (SImode with RV32 || DImode with RV64): add t0, t1, t2 sltit3, t2, 0 slt

[PATCH] [RISCV] Add Pattern for builtin overflow

2021-04-28 Thread Levy Hsu
From: LevyHsu Added implementation for builtin overflow detection, new patterns are listed below. --- Addition: signed addition (SImode in RV32 || DImode in RV64): add t0, t1, t2 sltit3, t2, 0 slt t

[x86_64 PATCH] PR target/107563: Add 3-instruction subroutine vector shift in ix86_expand_vec_perm_const_1

2024-01-04 Thread Levy Hsu
From: Liwei Xu This patch optimize byte swaps in vectors using SSE2 instructions. It targets 8-byte and 16-byte vectors, efficiently handling patterns like __builtin_shufflevector(v, v, 1, 0, 3, 2, ...). PR target/107563 gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec

Support bitwise and/andnot/abs/neg/copysign/xorsign op for V8BF/V16BF/V32BF

2024-07-03 Thread Levy Hsu
This patch extends support for BF16 vector operations in GCC, including bitwise AND, ANDNOT, ABS, NEG, COPYSIGN, and XORSIGN for V8BF, V16BF, and V32BF modes. Bootstrapped and tested on x86_64-linux-gnu. ok for trunk? gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_fp_absneg_ope

[PATCH] x86: Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-14 Thread Levy Hsu
embly code generation for configurations supporting SSE2. Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? Best Levy gcc/ChangeLog: PR target/107563 * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New subroutine. (ix86_expand_vec_perm_co

[PATCH] x86: Fix Logical Shift Issue in expand_vec_perm_psrlw_psllw_por [PR115146]

2024-05-20 Thread Levy Hsu
Replaced arithmetic shifts with logical shifts in expand_vec_perm_psrlw_psllw_por to avoid sign bit extension issues. Also corrected gen_vlshrv8hi3 to gen_lshrv8hi3 and gen_vashlv8hi3 to gen_ashlv8hi3. Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? Co-authored-by: H.J. Lu gcc/Chan

[PATCH] x86: Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

2024-06-13 Thread Levy Hsu
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_vectorize_vec_perm_const): Convert BF to HI using subreg. * config/i386/predicates.md (vcvtne2ps2bf_parallel): New define_insn_and_split. * config/i386/sse.md (vpermt2_sepcial_bf16_shuffle_): New pred

[PATCH] x86: Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

2024-06-13 Thread Levy Hsu
This patch updates the GCC x86 backend to efficiently handle odd, incrementally increasing permutations of BF16 vectors using the cvtne2ps2bf16 instruction. It modifies ix86_vectorize_vec_perm_const to support these operations and adds a specific predicate to ensure proper sequence handling. Boots

[PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-07 Thread Levy Hsu
PR target/107563 gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New subroutine. (ix86_expand_vec_perm_const_1): New Entry. gcc/testsuite/ChangeLog: * g++.target/i386/pr107563.C: New test. --- gcc/config/i386/i386-expand.cc

[PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-07 Thread Levy Hsu
embly code generation for configurations supporting SSE2. This update addresses the issue detailed in Bugzilla report 107563. Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? BRs, Levy gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por)

[PATCH 1/1] Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

2024-05-07 Thread Levy Hsu
handling. Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? BRs, Levy gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_vectorize_vec_perm_const): Convert BF to HI using subreg. * config/i386/predicates.md (vcvtne2ps2bf_parallel): New define_insn_and_split

[PATCH 1/1] [PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-09 Thread Levy Hsu
embly code generation for configurations supporting SSE2. Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? Best Levy gcc/ChangeLog: PR target/107563 * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New subroutine. (ix86_expand_vec_perm_co

[PATCH] i386: Support partial vectorized V2BF/V4BF plus/minus/mult/div/sqrt

2024-09-02 Thread Levy Hsu
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? This patch introduces new mode iterators and expands for the i386 architecture to support partial vectorization of bf16 operations using AVX10.2 instructions. These operations include addition, subtraction, multiplication, d

[PATCH] i386: Support partial vectorized V2BF/V4BF smaxmin

2024-09-02 Thread Levy Hsu
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? This patch supports sminmax for partial vectorized V2BF/V4BF. gcc/ChangeLog: * config/i386/mmx.md (3): New define_expand for V2BF/V4BFsmaxmin gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-partial-bf-v

[PATCH] i386: Integrate BFmode for Enhanced Vectorization in ix86_preferred_simd_mode

2024-09-03 Thread Levy Hsu
Hi This change adds BFmode support to the ix86_preferred_simd_mode function enhancing SIMD vectorization for BF16 operations. The update ensures optimized usage of SIMD capabilities improving performance and aligning vector sizes with processor capabilities. Bootstrapped and tested on x86-64-pc-l

[PATCH] i386: Support partial signbit/xorsign/copysign/abs/neg/and/xor/ior/andn for V2BF/V4BF

2024-09-03 Thread Levy Hsu
Hi This patch adds support for bf16 operations in V2BF and V4BF modes on i386, handling signbit, xorsign, copysign, abs, neg, and various logical operations. Bootstrapped and tested on x86-64-pc-linux-gnu. Ok for trunk? gcc/ChangeLog: * config/i386/i386.cc (ix86_build_const_vector): Ad

[PATCH] i386: Support partial vectorized FMA for V2BF/V4BF

2024-09-03 Thread Levy Hsu
Hi Bootstrapped and tested on x86-64-pc-linux-gnu. Ok for trunk? This patch introduces support for vectorized FMA operations for bf16 types in V2BF and V4BF modes on the i386 architecture. New mode iterators and define_expand entries for fma, fnma, fms, and fnms operations are added in mmx.md, e

[PATCH] x86: Refine V4BF/V2BF FMA testcase

2024-09-05 Thread Levy Hsu
Simple testcase fix, ok for trunk? This patch removes specific register checks to account for possible register spills and disables tests in 32-bit mode. This adjustment is necessary because V4BF operations in 32-bit mode require duplicating instructions, which lead to unintended test failures. It

[PATCH] i386: Enable V2BF/V4BF vec_cmp with AVX10.2 vcmppbf16

2024-09-09 Thread Levy Hsu
gcc/ChangeLog: * config/i386/i386.cc (ix86_get_mask_mode): Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2. * config/i386/mmx.md (vec_cmpqi): Implement vec_cmpv2bfqi and vec_cmpv4bfqi. gcc/testsuite/ChangeLog: * gcc.target/i386/part-vect-vec

[PATCH] x86: Refine V4BF/V2BF FMA Testcase

2024-09-10 Thread Levy Hsu
Simple testcase fix, ok for trunk? gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Separated 32-bit scan and removed register checks in spill situations. --- .../i386/avx10_2-partial-bf-vector-fma-1.c | 12 1 file changed, 8 i

[PATCH v2] Enable V2BF/V4BF vec_cmp with AVX10.2 vcmppbf16

2024-09-11 Thread Levy Hsu
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: * config/i386/i386.cc (ix86_get_mask_mode): Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2. * config/i386/mmx.md (vec_cmpqi): Implement vec_cmpv2bfqi and vec_cmpv

[PATCH] x86: Implement Fast-Math Float Truncation to BF16 via PSRLD Instruction

2024-10-08 Thread Levy Hsu
Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? gcc/ChangeLog: * config/i386/i386.md: Rewrite insn truncsfbf2. gcc/testsuite/ChangeLog: * gcc.target/i386/truncsfbf-1.c: New test. * gcc.target/i386/truncsfbf-2.c: New test. --- gcc/config/i386/i386.md

[PATCH] x86: Extend AVX512 Vectorization for Popcount in Various Modes

2024-09-23 Thread Levy Hsu
This patch enables vectorization of the popcount operation for V2QI, V4QI, V8QI, V2HI, V4HI, and V2SI modes. gcc/ChangeLog: * config/i386/mmx.md: (VQI_16_32_64): New mode iterator for 8-byte, 4-byte, and 2-byte QImode. (popcount2): New pattern for popcount of V2QI/V4QI/V8Q

[PATCH] i386: Utilize VCOMSBF16 for BF16 Comparisons with AVX10.2

2024-10-16 Thread Levy Hsu
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m64}. Ok for trunk? This patch enables the use of the VCOMSBF16 instruction from AVX10.2 for efficient BF16 comparisons. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_branch): Handle BFmode when TARGET_AVX10_2_256 is e