https://gcc.gnu.org/g:f72a2d221539cede358f2487b94bc370c6fc44b5
commit r16-91-gf72a2d221539cede358f2487b94bc370c6fc44b5
Author: liuhongt
Date: Sun Mar 30 20:15:41 2025 -0700
Accept allones or 0 operand for vcond_mask op1.
Since ix86_expand_sse_movcc will simplify them into a simple
https://gcc.gnu.org/g:e1098c7b08d9e6018f60dae7a14c5ad621618223
commit r16-46-ge1098c7b08d9e6018f60dae7a14c5ad621618223
Author: hongtao.liu
Date: Thu Apr 17 09:07:55 2025 +0200
Generate 2 FMA instructions in ix86_expand_swdivsf.
When FMA is available, N-R step can be rewritten with
https://gcc.gnu.org/g:fa58ff249a0e63a721ccb6d770c86523d84a212a
commit r15-9473-gfa58ff249a0e63a721ccb6d770c86523d84a212a
Author: liuhongt
Date: Sun Apr 13 19:40:51 2025 -0700
Revert documents from r11-344-g0fec3f62b9bfc0
gcc/ChangeLog:
PR target/108134
https://gcc.gnu.org/g:62a6cafd7f55c6e88a9780b91039257572038535
commit r15-8461-g62a6cafd7f55c6e88a9780b91039257572038535
Author: liuhongt
Date: Mon Mar 17 22:47:11 2025 -0700
Use ix86_fp_comparison_operator in cbranchbf4 to avoid ICE.
*jcc only supports ix86_fp_comparison_operator
https://gcc.gnu.org/g:be671ec1f30ecd55aaff09048afb2a619018cb8a
commit r15-8283-gbe671ec1f30ecd55aaff09048afb2a619018cb8a
Author: liuhongt
Date: Sun Mar 16 22:28:44 2025 -0700
Mark gcc.target/i386/apx-ndd-tls-1b.c as xfail.
It looks like the testcase is fragile, it's supposed to ch
https://gcc.gnu.org/g:3872daa5767622d1f8b086050996c85604db7514
commit r15-6940-g3872daa5767622d1f8b086050996c85604db7514
Author: liuhongt
Date: Wed Jan 15 19:09:24 2025 -0800
Fix typo to avoid ICE.
gcc/ChangeLog:
PR target/118489
* config/i386/sse.md (
https://gcc.gnu.org/g:0e05b793fba2a9bea9f0fbb1f068679f5dadf514
commit r15-6844-g0e05b793fba2a9bea9f0fbb1f068679f5dadf514
Author: liuhongt
Date: Wed Jan 8 23:11:17 2025 -0800
Refactor ix86_expand_vecop_qihi2.
Since there's regression to use vpermq, and it's manually disabled by
https://gcc.gnu.org/g:ee2f19b0937b5efc0b23c4319cbd4a38b27eac6e
commit r15-6097-gee2f19b0937b5efc0b23c4319cbd4a38b27eac6e
Author: liuhongt
Date: Mon Dec 2 01:54:59 2024 -0800
Fix inaccuracy in cunroll/cunrolli when considering what's innermost loop.
r15-919-gef27b91b62c3aa removed
https://gcc.gnu.org/g:89a27cf6b1354cc80d834d71f7a3aa137d605e94
commit r12-10832-g89a27cf6b1354cc80d834d71f7a3aa137d605e94
Author: liuhongt
Date: Thu Nov 21 23:57:38 2024 -0800
Fix uninitialized operands[2] in vec_unpacks_hi_v4sf.
It could cause weired spill in RA when register pre
https://gcc.gnu.org/g:0eb8c19cb45fc004b7039fa22ff9021604d80dbc
commit r13-9216-g0eb8c19cb45fc004b7039fa22ff9021604d80dbc
Author: liuhongt
Date: Thu Nov 21 23:57:38 2024 -0800
Fix uninitialized operands[2] in vec_unpacks_hi_v4sf.
It could cause weired spill in RA when register pres
https://gcc.gnu.org/g:4a63cc6de77481878ec31e1e6ac30e22c50b063a
commit r14-10979-g4a63cc6de77481878ec31e1e6ac30e22c50b063a
Author: liuhongt
Date: Thu Nov 21 23:57:38 2024 -0800
Fix uninitialized operands[2] in vec_unpacks_hi_v4sf.
It could cause weired spill in RA when register pre
https://gcc.gnu.org/g:ba4cf2e296d8d5950c3d356fa6b6efcad00d0189
commit r15-5639-gba4cf2e296d8d5950c3d356fa6b6efcad00d0189
Author: liuhongt
Date: Thu Nov 21 23:57:38 2024 -0800
Fix uninitialized operands[2] in vec_unpacks_hi_v4sf.
It could cause weired spill in RA when register pres
https://gcc.gnu.org/g:6350e956d1a74963a62bedabef3d4a1a3f2d4852
commit r15-5489-g6350e956d1a74963a62bedabef3d4a1a3f2d4852
Author: MayShao-oc
Date: Thu Nov 7 10:57:02 2024 +0800
Add microarchtecture tunable for pass_align_tight_loops [PR117438]
Hi Hongtao:
Add m_CASCADELAK, a
https://gcc.gnu.org/g:de867e8da30bf5e0cb51c3946ec43c3c4778d4a0
commit r15-5071-gde867e8da30bf5e0cb51c3946ec43c3c4778d4a0
Author: liuhongt
Date: Wed Nov 6 18:15:42 2024 -0800
Guard truncate from vector float to vector __bf16 with !flag_rounding_math
&& HONOR_NANS (BFmode).
hw inst
https://gcc.gnu.org/g:648bd1fcc6acfc56e08f4ad8146a80910cfacfd7
commit r15-4955-g648bd1fcc6acfc56e08f4ad8146a80910cfacfd7
Author: liuhongt
Date: Wed Oct 23 23:51:20 2024 -0700
Support vector float_extend from __bf16 to float.
It's supported by vector permutation with zero vector.
https://gcc.gnu.org/g:a17acf4f25f0ce9b8dce24f25867500a3b093b57
commit r15-4954-ga17acf4f25f0ce9b8dce24f25867500a3b093b57
Author: liuhongt
Date: Wed Oct 23 00:51:00 2024 -0700
Support vector float_truncate for SF to BF.
Generate native instruction whenever possible, otherwise use v
https://gcc.gnu.org/g:28ea5a4ec3e9e49439fdb912ef4edeebfdae881d
commit r13-9157-g28ea5a4ec3e9e49439fdb912ef4edeebfdae881d
Author: liuhongt
Date: Tue Oct 29 02:09:39 2024 -0700
Fix ICE due to subreg:us_truncate.
Force_operand issues an ICE when input
is (subreg:DI (us_truncate:V
https://gcc.gnu.org/g:bc0eeccf27a084461a2d5661e23468350acb43da
commit r15-4775-gbc0eeccf27a084461a2d5661e23468350acb43da
Author: liuhongt
Date: Tue Oct 29 02:09:39 2024 -0700
Fix ICE due to subreg:us_truncate.
Force_operand issues an ICE when input
is (subreg:DI (us_truncate:V
https://gcc.gnu.org/g:71a0cf699b6a2dc03abec53aeafab8b70db2bb07
commit r14-10852-g71a0cf699b6a2dc03abec53aeafab8b70db2bb07
Author: liuhongt
Date: Tue Oct 29 02:09:39 2024 -0700
Fix ICE due to subreg:us_truncate.
Force_operand issues an ICE when input
is (subreg:DI (us_truncate:
https://gcc.gnu.org/g:d0a932fb53ccdf5155db90632901c55446b8
commit r12-10793-gd0a932fb53ccdf5155db90632901c55446b8
Author: liuhongt
Date: Tue Oct 29 02:09:39 2024 -0700
Fix ICE due to subreg:us_truncate.
Force_operand issues an ICE when input
is (subreg:DI (us_truncate:
https://gcc.gnu.org/g:ab84a8a4b78990942e006e9f060dc2705f2c6d8f
commit r12-10784-gab84a8a4b78990942e006e9f060dc2705f2c6d8f
Author: liuhongt
Date: Tue Oct 22 01:54:40 2024 -0700
Fix ICE due to isa mismatch for the builtins.
gcc/ChangeLog:
PR target/117240
https://gcc.gnu.org/g:403e361d5aa620e77c9832578b2409a0fdd79d96
commit r15-4566-g403e361d5aa620e77c9832578b2409a0fdd79d96
Author: liuhongt
Date: Tue Oct 22 01:54:40 2024 -0700
Fix ICE due to isa mismatch for the builtins.
gcc/ChangeLog:
PR target/117240
https://gcc.gnu.org/g:2452387468423882c0732e0fad3a83e887574ccc
commit r13-9145-g2452387468423882c0732e0fad3a83e887574ccc
Author: liuhongt
Date: Tue Oct 22 01:54:40 2024 -0700
Fix ICE due to isa mismatch for the builtins.
gcc/ChangeLog:
PR target/117240
https://gcc.gnu.org/g:b718f6ec1674c0db30f26c65b7a9215e9388dd6c
commit r14-10831-gb718f6ec1674c0db30f26c65b7a9215e9388dd6c
Author: liuhongt
Date: Tue Oct 22 01:54:40 2024 -0700
Fix ICE due to isa mismatch for the builtins.
gcc/ChangeLog:
PR target/117240
https://gcc.gnu.org/g:ee7e77e9c121f5a6f27c92b6b24b2abf9cd66a4d
commit r15-4560-gee7e77e9c121f5a6f27c92b6b24b2abf9cd66a4d
Author: liuhongt
Date: Mon Oct 21 02:22:08 2024 -0700
i386: Optimize EQ/NE comparison between avx512 kmask and -1.
r15-974-gbf7745f887c765e06f2e75508f263debb60a
https://gcc.gnu.org/g:45bde60836d04cce4637b74ecadbb0aff90b832f
commit r12-10781-g45bde60836d04cce4637b74ecadbb0aff90b832f
Author: liuhongt
Date: Tue Oct 22 11:24:23 2024 +0800
[GCC13/GCC12] Fix testcase.
The optimization relies on other patterns which are only available at
GCC
https://gcc.gnu.org/g:8b43518a01cbbbafe042b85a48fa09a32948380a
commit r13-9142-g8b43518a01cbbbafe042b85a48fa09a32948380a
Author: liuhongt
Date: Tue Oct 22 11:24:23 2024 +0800
[GCC13/GCC12] Fix testcase.
The optimization relies on other patterns which are only available at
GCC1
https://gcc.gnu.org/g:91800a70a2af1349eefc5f3380be2b254b1db395
commit r12-10778-g91800a70a2af1349eefc5f3380be2b254b1db395
Author: liuhongt
Date: Wed Oct 16 13:43:48 2024 +0800
Refine splitters related to "combine vpcmpuw + zero_extend to vpcmpuw"
r12-6103-g1a7ce8570997eb combines
https://gcc.gnu.org/g:fca35b417c236e3448bc3666820fd1ba423fe6e9
commit r13-9139-gfca35b417c236e3448bc3666820fd1ba423fe6e9
Author: liuhongt
Date: Wed Oct 16 13:43:48 2024 +0800
Refine splitters related to "combine vpcmpuw + zero_extend to vpcmpuw"
r12-6103-g1a7ce8570997eb combines v
https://gcc.gnu.org/g:79e7e02b7cc578d03eab2b50c029f44409ef8e26
commit r14-10807-g79e7e02b7cc578d03eab2b50c029f44409ef8e26
Author: liuhongt
Date: Wed Oct 16 13:43:48 2024 +0800
Refine splitters related to "combine vpcmpuw + zero_extend to vpcmpuw"
r12-6103-g1a7ce8570997eb combines
https://gcc.gnu.org/g:5259d3927c1c8e3a15b4b844adef59b48c241233
commit r15-4510-g5259d3927c1c8e3a15b4b844adef59b48c241233
Author: liuhongt
Date: Wed Oct 16 13:43:48 2024 +0800
Refine splitters related to "combine vpcmpuw + zero_extend to vpcmpuw"
r12-6103-g1a7ce8570997eb combines v
https://gcc.gnu.org/g:21e2cd65add9070292313f8e12e8731d0aa2c869
commit r15-4400-g21e2cd65add9070292313f8e12e8731d0aa2c869
Author: liuhongt
Date: Tue Oct 8 16:18:31 2024 +0800
Don't lower vpcmpu to pcmpgt since the latter is for signed comparison.
r15-1737-gb06a108f0fbffe lower AVX5
https://gcc.gnu.org/g:edf4db8355dead3413bad64f6a89bae82dabd0ad
commit r15-4399-gedf4db8355dead3413bad64f6a89bae82dabd0ad
Author: liuhongt
Date: Mon Oct 14 13:09:59 2024 +0800
Canonicalize (vec_merge (fma: op2 op1 op3) (match_dup 1)) mask) to
(vec_merge (fma: op1 op2 op3) (match_dup 1)) ma
https://gcc.gnu.org/g:330782a1b6cfe881ad884617ffab441aeb1c2b5c
commit r15-4398-g330782a1b6cfe881ad884617ffab441aeb1c2b5c
Author: liuhongt
Date: Mon Oct 14 17:16:13 2024 +0800
Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to (vec_merge (fma op1
op2 op3) op1 mask).
For x86 ma
https://gcc.gnu.org/g:eecd5f8ce1729a214bf0a1edfdd3ee1cf79be881
commit r13-9118-geecd5f8ce1729a214bf0a1edfdd3ee1cf79be881
Author: liuhongt
Date: Wed Sep 25 13:11:11 2024 +0800
Add a new tune avx256_avoid_vec_perm for SRF.
According to Intel SOM[1], For Crestmont, most 256-bit Inte
https://gcc.gnu.org/g:e9eadc29c1c57cd7be9ec8de231d8fb9e8ac0c7c
commit r13-9117-ge9eadc29c1c57cd7be9ec8de231d8fb9e8ac0c7c
Author: liuhongt
Date: Tue Sep 24 15:53:14 2024 +0800
Add new microarchitecture tune for SRF/GRR/CWF.
For Crestmont, 4-operand vex blendv instructions come from
https://gcc.gnu.org/g:a8b4ea1bcc10b5253992f4b932aec6862aef32fa
commit r15-4371-ga8b4ea1bcc10b5253992f4b932aec6862aef32fa
Author: liuhongt
Date: Tue Oct 15 11:17:20 2024 +0800
Adjust testcase to avoid scan FIX in REG_EQUIV.
Also add hard_float target to avoid failed on arm-eabi.
https://gcc.gnu.org/g:9b7d5ecbecfbd193899648e411f1a9b2a77471e2
commit r14-10783-g9b7d5ecbecfbd193899648e411f1a9b2a77471e2
Author: liuhongt
Date: Wed Sep 25 13:11:11 2024 +0800
Add a new tune avx256_avoid_vec_perm for SRF.
According to Intel SOM[1], For Crestmont, most 256-bit Int
https://gcc.gnu.org/g:fe0692f689a18c432d6f59f404d4cd020cbebef2
commit r14-10782-gfe0692f689a18c432d6f59f404d4cd020cbebef2
Author: liuhongt
Date: Tue Sep 24 15:53:14 2024 +0800
Add new microarchitecture tune for SRF/GRR/CWF.
For Crestmont, 4-operand vex blendv instructions come fro
https://gcc.gnu.org/g:9eaecce3d8c1d9349adbf8c2cdaf8d87672ed29c
commit r15-4234-g9eaecce3d8c1d9349adbf8c2cdaf8d87672ed29c
Author: liuhongt
Date: Wed Sep 25 13:11:11 2024 +0800
Add a new tune avx256_avoid_vec_perm for SRF.
According to Intel SOM[1], For Crestmont, most 256-bit Inte
https://gcc.gnu.org/g:9c8cea8feb6cd54ef73113a0b74f1df7b60d09dc
commit r15-4233-g9c8cea8feb6cd54ef73113a0b74f1df7b60d09dc
Author: liuhongt
Date: Tue Sep 24 15:53:14 2024 +0800
Add new microarchitecture tune for SRF/GRR/CWF.
For Crestmont, 4-operand vex blendv instructions come from
https://gcc.gnu.org/g:70c3db511ba14ff5fa68cb41d0714a9fb957ea5d
commit r15-4225-g70c3db511ba14ff5fa68cb41d0714a9fb957ea5d
Author: liuhongt
Date: Mon Mar 25 21:28:14 2024 -0700
Enable vectorization for unknown tripcount in very cheap cost model but
disable epilog vectorization.
gcc
https://gcc.gnu.org/g:d5d1189c12199db79f6feb5cfcc7e6475c3a4d91
commit r15-4226-gd5d1189c12199db79f6feb5cfcc7e6475c3a4d91
Author: liuhongt
Date: Thu Sep 19 13:38:34 2024 +0800
Adjust testcase after relax O2 vectorization.
gcc/testsuite/ChangeLog:
* gcc.dg/fstack-pr
https://gcc.gnu.org/g:78eef8919e2f2973ed7750ba66f5726e70614d07
commit r15-3885-g78eef8919e2f2973ed7750ba66f5726e70614d07
Author: liuhongt
Date: Mon Sep 23 11:06:04 2024 +0800
Define VECTOR_STORE_FLAG_VALUE
gcc/ChangeLog:
* config/i386/i386.h (VECTOR_STORE_FLAG_VAL
https://gcc.gnu.org/g:f80e4ba94e41410219bdcdb1a0f204ea3f148666
commit r15-3579-gf80e4ba94e41410219bdcdb1a0f204ea3f148666
Author: liuhongt
Date: Tue Sep 10 15:04:58 2024 +0800
Enable tune fuse_move_and_alu for GNR.
According to Intel Software Optimization Manual[1], the Redwood cov
https://gcc.gnu.org/g:c726a6643125a59e2ba6f992924a2d0098104578
commit r15-3558-gc726a6643125a59e2ba6f992924a2d0098104578
Author: liuhongt
Date: Fri Sep 6 15:03:16 2024 +0800
Don't force_reg operands[3] when it's not const0_rtx.
It fix the regression by
a51f2fc0d80869ab079
https://gcc.gnu.org/g:a51f2fc0d80869ab079a93cc3858f24a1fd28237
commit r15-3498-ga51f2fc0d80869ab079a93cc3858f24a1fd28237
Author: liuhongt
Date: Wed Sep 4 15:39:17 2024 +0800
Handle const0_operand for *avx2_pcmp3_1.
*_eq3_1 supports
nonimm_or_0_operand for op1 and op2, pass_com
https://gcc.gnu.org/g:6585b06303d8fd9da907f443fc0da9faed303712
commit r12-10694-g6585b06303d8fd9da907f443fc0da9faed303712
Author: liuhongt
Date: Thu Aug 29 11:39:20 2024 +0800
Check avx upper register for parallel.
For function arguments/return, when it's BLK mode, it's put in a
https://gcc.gnu.org/g:5e049ada87842947adaca5c607516396889f64d6
commit r13-8999-g5e049ada87842947adaca5c607516396889f64d6
Author: liuhongt
Date: Thu Aug 29 11:39:20 2024 +0800
Check avx upper register for parallel.
For function arguments/return, when it's BLK mode, it's put in a
https://gcc.gnu.org/g:ba9a3f105ea552a22d08f2d54dfdbef16af7c99e
commit r14-10625-gba9a3f105ea552a22d08f2d54dfdbef16af7c99e
Author: liuhongt
Date: Thu Aug 29 11:39:20 2024 +0800
Check avx upper register for parallel.
For function arguments/return, when it's BLK mode, it's put in a
https://gcc.gnu.org/g:ab214ef734bfc3dcffcf79ff9e1dd651c2b40566
commit r15-3314-gab214ef734bfc3dcffcf79ff9e1dd651c2b40566
Author: liuhongt
Date: Thu Aug 29 11:39:20 2024 +0800
Check avx upper register for parallel.
For function arguments/return, when it's BLK mode, it's put in a
https://gcc.gnu.org/g:141d8aa375ea32c05f0d437828e6a76f1a3ea4af
commit r12-10683-g141d8aa375ea32c05f0d437828e6a76f1a3ea4af
Author: liuhongt
Date: Thu Aug 22 14:31:40 2024 +0800
Fix testcase failure.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pieces-memcpy-10.c: Use
https://gcc.gnu.org/g:ea9c508927ec032c6d67a24df59ffa429e4d3d95
commit r13-8988-gea9c508927ec032c6d67a24df59ffa429e4d3d95
Author: liuhongt
Date: Thu Aug 22 14:31:40 2024 +0800
Fix testcase failure.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pieces-memcpy-10.c: Use
https://gcc.gnu.org/g:b4bc34db3f2948e37ad55a09870635e88c54c7d3
commit r12-10682-gb4bc34db3f2948e37ad55a09870635e88c54c7d3
Author: liuhongt
Date: Thu Aug 15 12:54:07 2024 +0800
Align ix86_{move_max,store_max} with vectorizer.
When none of mprefer-vector-width, avx256_optimal/avx128
https://gcc.gnu.org/g:aea374238cec1a1e53fb79575d2f998e16926999
commit r13-8987-gaea374238cec1a1e53fb79575d2f998e16926999
Author: liuhongt
Date: Thu Aug 15 12:54:07 2024 +0800
Align ix86_{move_max,store_max} with vectorizer.
When none of mprefer-vector-width, avx256_optimal/avx128_
https://gcc.gnu.org/g:27dc1533b6dfc49f3912c524db51d6c372a5ac3d
commit r14-10608-g27dc1533b6dfc49f3912c524db51d6c372a5ac3d
Author: liuhongt
Date: Thu Aug 15 12:54:07 2024 +0800
Align ix86_{move_max,store_max} with vectorizer.
When none of mprefer-vector-width, avx256_optimal/avx128
https://gcc.gnu.org/g:6ea25c041964bf63014fcf7bb68fb1f5a0a4e123
commit r15-3078-g6ea25c041964bf63014fcf7bb68fb1f5a0a4e123
Author: liuhongt
Date: Thu Aug 15 12:54:07 2024 +0800
Align ix86_{move_max,store_max} with vectorizer.
When none of mprefer-vector-width, avx256_optimal/avx128_
https://gcc.gnu.org/g:bb42c551905024ea23095a0eb7b58fdbcfbcaef6
commit r15-3058-gbb42c551905024ea23095a0eb7b58fdbcfbcaef6
Author: liuhongt
Date: Tue Aug 20 14:41:00 2024 +0800
Align predicates for operands[1] between mov and *mov_internal.
> It's not obvious to me why movv16qi req
https://gcc.gnu.org/g:4e7735a8d87559bbddfe3a985786996e22241f8d
commit r14-10588-g4e7735a8d87559bbddfe3a985786996e22241f8d
Author: liuhongt
Date: Mon Aug 12 14:35:31 2024 +0800
Move ix86_align_loops into a separate pass and insert the pass after
pass_endbr_and_patchable_area.
gcc/
https://gcc.gnu.org/g:f7e672da8fc3d416a6d07eb01f3be4400ef94fac
commit r15-2930-gf7e672da8fc3d416a6d07eb01f3be4400ef94fac
Author: liuhongt
Date: Mon Aug 12 18:24:34 2024 +0800
Movement between GENERAL_REGS and SSE_REGS for TImode doesn't need
secondary reload.
It results in 2 fail
https://gcc.gnu.org/g:c3c83d22d212a35cb1bfb8727477819463f0dcd8
commit r15-2906-gc3c83d22d212a35cb1bfb8727477819463f0dcd8
Author: liuhongt
Date: Mon Aug 12 14:35:31 2024 +0800
Move ix86_align_loops into a separate pass and insert the pass after
pass_endbr_and_patchable_area.
gcc/C
https://gcc.gnu.org/g:617562e4e422c7bd282960b14abfffd994445009
commit r13-8971-g617562e4e422c7bd282960b14abfffd994445009
Author: liuhongt
Date: Wed Jul 24 11:29:23 2024 +0800
Refine constraint "Bk" to define_special_memory_constraint.
For below pattern, RA may still allocate r162
https://gcc.gnu.org/g:c94738e2462ff46f3013f6270f6a955b749d82b2
commit r12-10668-gc94738e2462ff46f3013f6270f6a955b749d82b2
Author: liuhongt
Date: Wed Jul 24 11:29:23 2024 +0800
Refine constraint "Bk" to define_special_memory_constraint.
For below pattern, RA may still allocate r162
https://gcc.gnu.org/g:a295076bee293aa3112c615f9af7a27231816a36
commit r14-10551-ga295076bee293aa3112c615f9af7a27231816a36
Author: liuhongt
Date: Wed Jul 24 11:29:23 2024 +0800
Refine constraint "Bk" to define_special_memory_constraint.
For below pattern, RA may still allocate r162
https://gcc.gnu.org/g:64ca25aec4939aea79bd812b089fbb666ca6f2fd
commit r15-2539-g64ca25aec4939aea79bd812b089fbb666ca6f2fd
Author: liuhongt
Date: Fri Jul 26 09:56:03 2024 +0800
Fix mismatch between constraint and predicate for ashl3_doubleword.
(insn 98 94 387 2 (parallel [
https://gcc.gnu.org/g:bc1fda00d5f20e2f3e77a50b2822562b6e0040b2
commit r15-2395-gbc1fda00d5f20e2f3e77a50b2822562b6e0040b2
Author: liuhongt
Date: Wed Jul 24 11:29:23 2024 +0800
Refine constraint "Bk" to define_special_memory_constraint.
For below pattern, RA may still allocate r162
https://gcc.gnu.org/g:a3f03891065cb9691f6e9cebce4d4542deb92a35
commit r15-2217-ga3f03891065cb9691f6e9cebce4d4542deb92a35
Author: liuhongt
Date: Mon Jul 22 11:36:59 2024 +0800
Relax ix86_hardreg_mov_ok after split1.
ix86_hardreg_mov_ok is added by r11-5066-gbe39636d9f68c4
https://gcc.gnu.org/g:228972b2b7bf50f4776f8ccae0d7c2950827d0f1
commit r15-2127-g228972b2b7bf50f4776f8ccae0d7c2950827d0f1
Author: liuhongt
Date: Tue Jul 16 15:29:01 2024 +0800
Optimize maskstore when mask is 0 or -1 in UNSPEC_MASKMOV
gcc/ChangeLog:
PR target/115843
https://gcc.gnu.org/g:1fff665a51e221a578a92631fc8ea62dd79fa3b6
commit r14-10425-g1fff665a51e221a578a92631fc8ea62dd79fa3b6
Author: H.J. Lu
Date: Tue Apr 26 11:08:55 2022 -0700
x86: Update branch hint for Redwood Cove.
According to IntelĀ® 64 and IA-32 Architectures Optimization Refe
https://gcc.gnu.org/g:f27bf48e0204524ead795fe618cd8b1224f72fd4
commit r15-2038-gf27bf48e0204524ead795fe618cd8b1224f72fd4
Author: liuhongt
Date: Fri Jul 12 09:39:23 2024 +0800
Fix SSA_NAME leak due to def_stmt is removed before use_stmt.
- _5 = __atomic_fetch_or_8 (&set_work_pendi
https://gcc.gnu.org/g:13bfc385b0baebd22aeabb0d90915f2e9b18febe
commit r14-10422-g13bfc385b0baebd22aeabb0d90915f2e9b18febe
Author: liuhongt
Date: Fri Jul 12 09:39:23 2024 +0800
Fix SSA_NAME leak due to def_stmt is removed before use_stmt.
- _5 = __atomic_fetch_or_8 (&set_work_pend
https://gcc.gnu.org/g:9a1cdaa5e8441394d613f5f3401e7aab21efe8f0
commit r13-8913-g9a1cdaa5e8441394d613f5f3401e7aab21efe8f0
Author: liuhongt
Date: Fri Jul 12 09:39:23 2024 +0800
Fix SSA_NAME leak due to def_stmt is removed before use_stmt.
- _5 = __atomic_fetch_or_8 (&set_work_pendi
https://gcc.gnu.org/g:e1427b39d28f382d21e7a0ea1714b3250e0a6e5d
commit r12-10617-ge1427b39d28f382d21e7a0ea1714b3250e0a6e5d
Author: liuhongt
Date: Fri Jul 12 09:39:23 2024 +0800
Fix SSA_NAME leak due to def_stmt is removed before use_stmt.
- _5 = __atomic_fetch_or_8 (&set_work_pend
https://gcc.gnu.org/g:23ab7f632f4f5bae67fb53cf7b18fea7ba7242c4
commit r15-1905-g23ab7f632f4f5bae67fb53cf7b18fea7ba7242c4
Author: liuhongt
Date: Mon Jul 8 10:35:35 2024 +0800
Rename __{float,double}_u to __x86_{float,double}_u to avoid pulluting the
namespace.
I have a build failu
https://gcc.gnu.org/g:a910c30c7c27cd0f6d2d2694544a09fb11d611b9
commit r15-1888-ga910c30c7c27cd0f6d2d2694544a09fb11d611b9
Author: H.J. Lu
Date: Tue Apr 26 11:08:55 2022 -0700
x86: Update branch hint for Redwood Cove.
According to IntelĀ® 64 and IA-32 Architectures Optimization Refer
https://gcc.gnu.org/g:699087a16591adfdf21228876b6c48dbcd353faa
commit r15-1836-g699087a16591adfdf21228876b6c48dbcd353faa
Author: liuhongt
Date: Thu Jul 4 13:57:32 2024 +0800
Use __builtin_cpu_support instead of __get_cpuid_count.
gcc/testsuite/ChangeLog:
PR target
https://gcc.gnu.org/g:239ad907b1fc08874042f8bea5f61eaf3ba2877d
commit r15-1806-g239ad907b1fc08874042f8bea5f61eaf3ba2877d
Author: liuhongt
Date: Wed Jul 3 14:47:33 2024 +0800
Move runtime check into a separate function and guard it with target
("no-avx")
The patch can avoid SIGILL
https://gcc.gnu.org/g:55f80c690c5fa59836646565a9dee2a3f68374a0
commit r15-1742-g55f80c690c5fa59836646565a9dee2a3f68374a0
Author: liuhongt
Date: Mon Jun 24 09:19:01 2024 +0800
Remove vcond{,u,eq} expanders since they will be obsolete.
gcc/ChangeLog:
PR target/11551
https://gcc.gnu.org/g:2ccdd0f22312a14ac64bf944fdc4f8e7532eb0eb
commit r15-1741-g2ccdd0f22312a14ac64bf944fdc4f8e7532eb0eb
Author: liuhongt
Date: Thu Jun 20 12:41:13 2024 +0800
Optimize a < 0 ? -1 : 0 to (signed)a >> 31.
Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
and x
https://gcc.gnu.org/g:09737d9605521df9232d9990006c44955064f44e
commit r15-1738-g09737d9605521df9232d9990006c44955064f44e
Author: liuhongt
Date: Tue Jun 18 15:52:02 2024 +0800
Match IEEE min/max with UNSPEC_IEEE_{MIN,MAX}.
These versions of the min/max patterns implement exactly th
https://gcc.gnu.org/g:e94e6ee495d95f29355bbc017214228a5e367638
commit r15-1740-ge94e6ee495d95f29355bbc017214228a5e367638
Author: liuhongt
Date: Wed Jun 19 16:05:58 2024 +0800
Adjust testcase for the regressed testcases after obsolete of vcond{,u,eq}.
> Richard suggests that we imp
https://gcc.gnu.org/g:3cb204046c0db899750aee9480af4f1953a40ac3
commit r15-1739-g3cb204046c0db899750aee9480af4f1953a40ac3
Author: liuhongt
Date: Wed Jun 19 13:12:00 2024 +0800
Add more splitter for mskmov with avx512 comparison.
gcc/ChangeLog:
PR target/115517
https://gcc.gnu.org/g:b06a108f0fbffe12493b527224f6e4131a72beac
commit r15-1737-gb06a108f0fbffe12493b527224f6e4131a72beac
Author: liuhongt
Date: Tue Jun 18 14:03:42 2024 +0800
Lower AVX512 kmask comparison back to AVX2 comparison when op_{true,false}
is vector -1/0.
gcc/ChangeLog
https://gcc.gnu.org/g:2e2dfa0095c3326a0a5fc2ff175918b42eeb044f
commit r15-1736-g2e2dfa0095c3326a0a5fc2ff175918b42eeb044f
Author: liuhongt
Date: Mon Jun 17 17:16:46 2024 +0800
Add more splitters to match (unspec [op1 op2 (gt op3 constm1_operand)]
UNSPEC_BLENDV)
These define_insn_a
https://gcc.gnu.org/g:e62ea4fb8ffcab06ddd02f26db91b29b7270743f
commit r15-1735-ge62ea4fb8ffcab06ddd02f26db91b29b7270743f
Author: liuhongt
Date: Wed Jun 26 13:52:24 2024 +0800
Enable flate-combine.
Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also
define tar
https://gcc.gnu.org/g:8e1fa107a63b2e160b6bf69de4fe163dd3cebd80
commit r15-1734-g8e1fa107a63b2e160b6bf69de4fe163dd3cebd80
Author: liuhongt
Date: Wed Jun 26 13:07:31 2024 +0800
Extend lshifrtsi3_1_zext to ?k alternative.
late_combine will combine lshift + zero into *lshifrtsi3_1_zex
https://gcc.gnu.org/g:5e1a9f4ccff390ae79a9b9d0d39b325f2b4ea925
commit r15-1733-g5e1a9f4ccff390ae79a9b9d0d39b325f2b4ea925
Author: liuhongt
Date: Wed Jun 26 11:17:46 2024 +0800
Define mask as extern instead of uninitialized local variables.
The testcases are supposed to scan for vpo
https://gcc.gnu.org/g:b8153b5417bed02f47354a14ad36100785dfdc47
commit r15-1673-gb8153b5417bed02f47354a14ad36100785dfdc47
Author: liuhongt
Date: Mon Jun 24 17:53:22 2024 +0800
Fix wrong cost of MEM when addr is a lea.
416.gamess regressed 4-6% on x86_64 since my r15-882-g1d6199e5f8
https://gcc.gnu.org/g:aac00d09859cc5934bd0f7493d537b8430337773
commit r15-1638-gaac00d09859cc5934bd0f7493d537b8430337773
Author: liuhongt
Date: Thu Jun 20 12:41:13 2024 +0800
Optimize a < 0 ? -1 : 0 to (signed)a >> 31.
Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
and x
https://gcc.gnu.org/g:4c957d7ba84d8bbce6e778048f38e92ef71806c8
commit r15-1563-g4c957d7ba84d8bbce6e778048f38e92ef71806c8
Author: Collin Funk
Date: Mon Jun 10 06:36:47 2024 +
AVX-512: Pacify -Wshift-overflow=2. [PR115409]
A shift of 31 on a signed int is undefined behavior. Si
https://gcc.gnu.org/g:d3fae2bea034edb001cd45d1d86c5ceef146899b
commit r15-1308-gd3fae2bea034edb001cd45d1d86c5ceef146899b
Author: liuhongt
Date: Tue Jun 11 21:22:42 2024 +0800
Adjust ix86_rtx_costs for pternlog_operand_p.
r15-1100-gec985bc97a0157 improves handling of ternlog instru
https://gcc.gnu.org/g:8b69efd9819f86b973d7a550e987ce455fce6d62
commit r15-1307-g8b69efd9819f86b973d7a550e987ce455fce6d62
Author: liuhongt
Date: Mon Jun 3 10:38:19 2024 +0800
Remove one_if_conv for latest Intel processors.
The tune is added by PR79390 for SciMark2 on Broadwell.
https://gcc.gnu.org/g:f8bf80a4e1682b2238baad8c44939682f96b1fe0
commit r15-1234-gf8bf80a4e1682b2238baad8c44939682f96b1fe0
Author: liuhongt
Date: Thu Jun 13 09:53:58 2024 +0800
Fix ICE due to REGNO of a SUBREG.
Use reg_or_subregno instead.
gcc/ChangeLog:
PR
https://gcc.gnu.org/g:1d496d2cd1d5d8751a1637abca89339d6f9ddd3b
commit r15-1191-g1d496d2cd1d5d8751a1637abca89339d6f9ddd3b
Author: liuhongt
Date: Tue Jun 11 10:23:27 2024 +0800
Fix ICE in rtl check due to CONST_WIDE_INT in CONST_VECTOR_DUPLICATE_P
The patch add extra check to make s
https://gcc.gnu.org/g:5d52558a531130675329d72ca5c4713abf5bf885
commit r12-10497-g5d52558a531130675329d72ca5c4713abf5bf885
Author: Jan Hubicka
Date: Fri Dec 29 23:51:03 2023 +0100
Disable FMADD in chains for Zen4 and generic
this patch disables use of FMA in matrix multiplication l
https://gcc.gnu.org/g:e4f85ea6271a10e13c6874709a05e04ab0508fbf
commit r13-8825-ge4f85ea6271a10e13c6874709a05e04ab0508fbf
Author: Jan Hubicka
Date: Fri Dec 29 23:51:03 2023 +0100
Disable FMADD in chains for Zen4 and generic
this patch disables use of FMA in matrix multiplication lo
https://gcc.gnu.org/g:b24f2954dbc13d85e9fb62e05a88e9df21e4d4f4
commit r15-1088-gb24f2954dbc13d85e9fb62e05a88e9df21e4d4f4
Author: liuhongt
Date: Fri Jun 7 09:29:24 2024 +0800
Add additional option --param max-completely-peeled-insns=200 for
power64*-*-*
gcc/testsuite/ChangeLog:
https://gcc.gnu.org/g:fcfce55c85f842ed843cbc4aabe744c6a004dead
commit r15-1050-gfcfce55c85f842ed843cbc4aabe744c6a004dead
Author: liuhongt
Date: Thu Jun 6 11:27:53 2024 +0800
Refine testcase for power10.
For power10, there're extra 3 REG_EQUIV notes with (fix:SI. to avoid
the f
https://gcc.gnu.org/g:961dd0d635217c703a38c48903981e0d60962546
commit r15-1048-g961dd0d635217c703a38c48903981e0d60962546
Author: liuhongt
Date: Fri Apr 19 10:39:53 2024 +0800
Adjust rtx_cost for MEM to enable more simplication
For CONST_VECTOR_DUPLICATE_P in constant_pool, it is j
https://gcc.gnu.org/g:7876cde25cbd2f026a0ae488e5263e72f8e9bfa0
commit r15-1047-g7876cde25cbd2f026a0ae488e5263e72f8e9bfa0
Author: liuhongt
Date: Fri Apr 19 10:29:34 2024 +0800
Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.
When mask is (1 << (prec - imm)
1 - 100 of 127 matches
Mail list logo