[PATCH] [RTL/fwprop] Allow propagations from inner loop to outer loop.

2022-01-06 Thread liuhongt via Gcc-patches
>Huh, loop_father should never be NULL. Maybe when fwprop is run after RTL loop >opts you instead want to add a check for current_loops or alternelatively >initialize loops in fwprop. Oh, I didn't know that, i once saw there's ICE and thought it's related to NULL loop. But I can't reproduce the

[PATCH] [i386] Fix ICE of unrecognizable insn. [PR target/104001]

2022-01-13 Thread liuhongt via Gcc-patches
For define_insn_and_split "*xor2andn": 1. Refine predicate of operands[0] from nonimmediate_operand to register_operand. 2. Remove TARGET_AVX512BW from condition to avoid kmov when TARGET_BMI is not available. 3. Force_reg operands[2]. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok

[PATCH] [i386]Adjust testcase for --target_board='unix{-m64\ -march=cascadelake}'

2022-01-17 Thread liuhongt via Gcc-patches
Change scan-assembler from "\tucomisd" to "\t\[v\]?ucomisd". refer to https://gcc.gnu.org/pipermail/gcc-regression/2022-January/076241.html gcc/testsuite/ChangeLog: * g++.target/i386/pr103973-1.C: Change scan-assembler from "\tucomisd" to "\t\[v\]?ucomisd". * g++.target/i

[PATCH] Enhance vec_pack_trunc for integral mode mask.

2022-01-17 Thread liuhongt via Gcc-patches
For testcase in PR, the patch supports QI:4 -> HI:16 pack with multi steps(first pack QI:4 -> QI:8 through vec_pack_sbool_trunc_qi, then pack QI:8 -> HI:16 through vec_pack_trunc_hi). Similar for QI:2 -> HI:16 which is test4 in mask-pack-prefer-128.c. Bootstrapped both with and w/o '--with-arch=na

[PATCH] Remove TARGET_GEN_MEMSET_SCRATCH_RTX since it's not used anymore.

2023-03-21 Thread liuhongt via Gcc-patches
The target hook is only used by i386, and the current definition is same as default gen_reg_rtx. So there's no need for this target hook. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk(or GCC14)? gcc/ChangeLog: * builtins.cc (builtin_memset_read_str): Replace

[PATCH] Generate vpblendd instead of vpblendw for V4SI under AVX2.

2023-03-29 Thread liuhongt via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} Ok for GCC14 stage-1(or maybe trunk)? gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec_perm_blend): Generate vpblendd instead of vpblendw for V4SI under avx2. gcc/testsuite/ChangeLog: * gcc.target/i386/pr888

[PATCH] Support vector conversion for AVX512 vcvtudq2pd/vcvttps2udq/vcvttpd2udq.

2023-03-29 Thread liuhongt via Gcc-patches
There's some typo for the standard pattern name for unsigned_{float,fix}, it should be floatunsmn2/fixuns_truncmn2, not ufloatmn2/ufix_truncmn2 in current trunk, the patch fix the typo. Also vcvttps2udq is available under AVX512VL, so it can be generated directly instead of being emulated via vcvt

[PATCH V2] Rename ufix_trunc/ufloat* patterns to fixuns_trunc/floatuns* to align with standard pattern name.

2023-03-30 Thread liuhongt via Gcc-patches
> > Just rename the instruction and fix all its call sites. The name of > > the insn pattern is internal to the compiler and can be renamed at > > will. > > Ideally, we should standardize all the names to a standard name, so > e.g. ufix_ -> fixuns_ and ufloat -> floatuns. Updated. There's some t

[PATCH] Adjust memory_move_cost for MASK_REGS when MODE_SIZE > 8.

2023-03-30 Thread liuhongt via Gcc-patches
RA sometimes will use lowest the cost of the mode with all different regclasses w/o check if it's hard_regno_mode_ok. It's impossible to put modes whose size > 8 into MASK_REGS, ajdust the cost to avoid potential performance issue. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for t

[PATCH] Document signbitm2.

2023-03-30 Thread liuhongt via Gcc-patches
Look through all backends which defined signbitm2. 1. When m is a scalar mode, the dest is SImode. 2. When m is a vector mode, the dest mode is the vector integer mode has the same size and elements number as m. Ok for trunk? gcc/ChangeLog: * doc/md.texi: Document signbitm2. --- gcc/doc

[PATCH] Check hard_regno_mode_ok before setting lowest memory move cost for the mode with different reg classes.

2023-04-03 Thread liuhongt via Gcc-patches
There's a potential performance issue when backend returns some unreasonable value for the mode which can be never be allocate with reg class. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk(or GCC14 stage1)? gcc/ChangeLog: PR rtl-optimization/109351 * ira.

<    2   3   4   5   6   7