from:"Levy"

[PATCH] i386: Utilize VCOMSBF16 for BF16 Comparisons with AVX10.2

2024-10-16 Thread Levy Hsu

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m64}. Ok for trunk? This patch enables the use of the VCOMSBF16 instruction from AVX10.2 for efficient BF16 comparisons. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_branch): Handle BFmode when TARGET_AVX10_2_256 is e

[PATCH] x86: Implement Fast-Math Float Truncation to BF16 via PSRLD Instruction

2024-10-08 Thread Levy Hsu

Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? gcc/ChangeLog: * config/i386/i386.md: Rewrite insn truncsfbf2. gcc/testsuite/ChangeLog: * gcc.target/i386/truncsfbf-1.c: New test. * gcc.target/i386/truncsfbf-2.c: New test. --- gcc/config/i386/i386.md

[PATCH] x86: Extend AVX512 Vectorization for Popcount in Various Modes

2024-09-23 Thread Levy Hsu

This patch enables vectorization of the popcount operation for V2QI, V4QI, V8QI, V2HI, V4HI, and V2SI modes. gcc/ChangeLog: * config/i386/mmx.md: (VQI_16_32_64): New mode iterator for 8-byte, 4-byte, and 2-byte QImode. (popcount2): New pattern for popcount of V2QI/V4QI/V8Q

[PATCH v2] Enable V2BF/V4BF vec_cmp with AVX10.2 vcmppbf16

2024-09-11 Thread Levy Hsu

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: * config/i386/i386.cc (ix86_get_mask_mode): Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2. * config/i386/mmx.md (vec_cmpqi): Implement vec_cmpv2bfqi and vec_cmpv

[PATCH] x86: Refine V4BF/V2BF FMA Testcase

2024-09-10 Thread Levy Hsu

Simple testcase fix, ok for trunk? gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Separated 32-bit scan and removed register checks in spill situations. --- .../i386/avx10_2-partial-bf-vector-fma-1.c | 12 1 file changed, 8 i

[PATCH] i386: Enable V2BF/V4BF vec_cmp with AVX10.2 vcmppbf16

2024-09-09 Thread Levy Hsu

gcc/ChangeLog: * config/i386/i386.cc (ix86_get_mask_mode): Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2. * config/i386/mmx.md (vec_cmpqi): Implement vec_cmpv2bfqi and vec_cmpv4bfqi. gcc/testsuite/ChangeLog: * gcc.target/i386/part-vect-vec

[PATCH] x86: Refine V4BF/V2BF FMA testcase

2024-09-05 Thread Levy Hsu

Simple testcase fix, ok for trunk? This patch removes specific register checks to account for possible register spills and disables tests in 32-bit mode. This adjustment is necessary because V4BF operations in 32-bit mode require duplicating instructions, which lead to unintended test failures. It

[PATCH] i386: Support partial vectorized FMA for V2BF/V4BF

2024-09-03 Thread Levy Hsu

Hi Bootstrapped and tested on x86-64-pc-linux-gnu. Ok for trunk? This patch introduces support for vectorized FMA operations for bf16 types in V2BF and V4BF modes on the i386 architecture. New mode iterators and define_expand entries for fma, fnma, fms, and fnms operations are added in mmx.md, e

[PATCH] i386: Support partial signbit/xorsign/copysign/abs/neg/and/xor/ior/andn for V2BF/V4BF

2024-09-03 Thread Levy Hsu

Hi This patch adds support for bf16 operations in V2BF and V4BF modes on i386, handling signbit, xorsign, copysign, abs, neg, and various logical operations. Bootstrapped and tested on x86-64-pc-linux-gnu. Ok for trunk? gcc/ChangeLog: * config/i386/i386.cc (ix86_build_const_vector): Ad

[PATCH] i386: Integrate BFmode for Enhanced Vectorization in ix86_preferred_simd_mode

2024-09-03 Thread Levy Hsu

Hi This change adds BFmode support to the ix86_preferred_simd_mode function enhancing SIMD vectorization for BF16 operations. The update ensures optimized usage of SIMD capabilities improving performance and aligning vector sizes with processor capabilities. Bootstrapped and tested on x86-64-pc-l

[PATCH] i386: Support partial vectorized V2BF/V4BF smaxmin

2024-09-02 Thread Levy Hsu

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? This patch supports sminmax for partial vectorized V2BF/V4BF. gcc/ChangeLog: * config/i386/mmx.md (3): New define_expand for V2BF/V4BFsmaxmin gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-partial-bf-v

[PATCH] i386: Support partial vectorized V2BF/V4BF plus/minus/mult/div/sqrt

2024-09-02 Thread Levy Hsu

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? This patch introduces new mode iterators and expands for the i386 architecture to support partial vectorization of bf16 operations using AVX10.2 instructions. These operations include addition, subtraction, multiplication, d

Support bitwise and/andnot/abs/neg/copysign/xorsign op for V8BF/V16BF/V32BF

2024-07-03 Thread Levy Hsu

This patch extends support for BF16 vector operations in GCC, including bitwise AND, ANDNOT, ABS, NEG, COPYSIGN, and XORSIGN for V8BF, V16BF, and V32BF modes. Bootstrapped and tested on x86_64-linux-gnu. ok for trunk? gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_fp_absneg_ope

[PATCH] x86: Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

2024-06-13 Thread Levy Hsu

This patch updates the GCC x86 backend to efficiently handle odd, incrementally increasing permutations of BF16 vectors using the cvtne2ps2bf16 instruction. It modifies ix86_vectorize_vec_perm_const to support these operations and adds a specific predicate to ensure proper sequence handling. Boots

[PATCH] x86: Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

2024-06-13 Thread Levy Hsu

gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_vectorize_vec_perm_const): Convert BF to HI using subreg. * config/i386/predicates.md (vcvtne2ps2bf_parallel): New define_insn_and_split. * config/i386/sse.md (vpermt2_sepcial_bf16_shuffle_): New pred

[PATCH] x86: Fix Logical Shift Issue in expand_vec_perm_psrlw_psllw_por [PR115146]

2024-05-20 Thread Levy Hsu

Replaced arithmetic shifts with logical shifts in expand_vec_perm_psrlw_psllw_por to avoid sign bit extension issues. Also corrected gen_vlshrv8hi3 to gen_lshrv8hi3 and gen_vashlv8hi3 to gen_ashlv8hi3. Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? Co-authored-by: H.J. Lu gcc/Chan

[PATCH] x86: Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-14 Thread Levy Hsu

embly code generation for configurations supporting SSE2. Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? Best Levy gcc/ChangeLog: PR target/107563 * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New subroutine. (ix86_expand_vec_perm_co

[PATCH 1/1] [PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-09 Thread Levy Hsu

embly code generation for configurations supporting SSE2. Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? Best Levy gcc/ChangeLog: PR target/107563 * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New subroutine. (ix86_expand_vec_perm_co

[PATCH 1/1] Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

2024-05-07 Thread Levy Hsu

handling. Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? BRs, Levy gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_vectorize_vec_perm_const): Convert BF to HI using subreg. * config/i386/predicates.md (vcvtne2ps2bf_parallel): New define_insn_and_split

[PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-07 Thread Levy Hsu

embly code generation for configurations supporting SSE2. This update addresses the issue detailed in Bugzilla report 107563. Bootstrapped and tested on x86_64-linux-gnu, OK for trunk? BRs, Levy gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por)

[PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-07 Thread Levy Hsu

PR target/107563 gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New subroutine. (ix86_expand_vec_perm_const_1): New Entry. gcc/testsuite/ChangeLog: * g++.target/i386/pr107563.C: New test. --- gcc/config/i386/i386-expand.cc

[x86_64 PATCH] PR target/107563: Add 3-instruction subroutine vector shift in ix86_expand_vec_perm_const_1

2024-01-04 Thread Levy Hsu

From: Liwei Xu This patch optimize byte swaps in vectors using SSE2 instructions. It targets 8-byte and 16-byte vectors, efficiently handling patterns like __builtin_shufflevector(v, v, 1, 0, 3, 2, ...). PR target/107563 gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec

Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

2022-10-12 Thread Levy

Hi RuoYao It’s probably because loongarch64 doesn’t support can_vec_perm_const_p(result_mode, op_mode, sel2, false) I’m not sure whether if loongarch will support it or should I just limit the test target for pr54346.c? Best Regards Levy > On 12 Oct 2022, at 9:51 pm, Xi Ruoyao wr

[PATCH] [RISCV] Add Pattern for builtin overflow

2021-04-28 Thread Levy Hsu

From: LevyHsu Added implementation for builtin overflow detection, new patterns are listed below. --- Addition: signed addition (SImode in RV32 || DImode in RV64): add t0, t1, t2 sltit3, t2, 0 slt t

[PATCH] [RISCV] Add Pattern for builtin overflow

2021-04-26 Thread Levy Hsu

From: LevyHsu Added implementation for builtin overflow detection, new patterns are listed below. --- Addition: signed addition (SImode with RV32 || DImode with RV64): add t0, t1, t2 sltit3, t2, 0 slt

[PATCH] RISC-V: Add implementation for builtin overflow

2021-01-21 Thread Levy

Added implementation for builtin overflow detection, new patterns are listed below. signed addition: add t0, t1, t2 sltit3, t2, 0 slt t4, t0, t1 bne t3, t4, overflow unsigned addition: add t0, t1, t2 bltut0, t1, overflow sig

[PATCH] i386: Utilize VCOMSBF16 for BF16 Comparisons with AVX10.2

[PATCH] x86: Implement Fast-Math Float Truncation to BF16 via PSRLD Instruction

[PATCH] x86: Extend AVX512 Vectorization for Popcount in Various Modes

[PATCH v2] Enable V2BF/V4BF vec_cmp with AVX10.2 vcmppbf16

[PATCH] x86: Refine V4BF/V2BF FMA Testcase

[PATCH] i386: Enable V2BF/V4BF vec_cmp with AVX10.2 vcmppbf16

[PATCH] x86: Refine V4BF/V2BF FMA testcase

[PATCH] i386: Support partial vectorized FMA for V2BF/V4BF

[PATCH] i386: Support partial signbit/xorsign/copysign/abs/neg/and/xor/ior/andn for V2BF/V4BF

[PATCH] i386: Integrate BFmode for Enhanced Vectorization in ix86_preferred_simd_mode

[PATCH] i386: Support partial vectorized V2BF/V4BF smaxmin

[PATCH] i386: Support partial vectorized V2BF/V4BF plus/minus/mult/div/sqrt

Support bitwise and/andnot/abs/neg/copysign/xorsign op for V8BF/V16BF/V32BF

[PATCH] x86: Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

[PATCH] x86: Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

[PATCH] x86: Fix Logical Shift Issue in expand_vec_perm_psrlw_psllw_por [PR115146]

[PATCH] x86: Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

[PATCH 1/1] [PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

[PATCH 1/1] Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

[PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

[PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

[x86_64 PATCH] PR target/107563: Add 3-instruction subroutine vector shift in ix86_expand_vec_perm_const_1

Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

[PATCH] [RISCV] Add Pattern for builtin overflow

[PATCH] [RISCV] Add Pattern for builtin overflow

[PATCH] RISC-V: Add implementation for builtin overflow

26 matches

Site Navigation

Mail list logo

Footer information