[PATCH] Fix incorrect computation in fill_always_executed_in_1

2021-08-16 Thread Xiong Hu Luo via Gcc-patches
It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for nested loops. inn_loop is updated to inner loop, so it need be restored when exiting from innermost loop. With this patch, the store instruction in outer loop could also be moved out of outer loop by store motion. Any comments?

[RFC] Don't move cold code out of loop by checking bb count

2021-08-01 Thread Xiong Hu Luo via Gcc-patches
There was a patch trying to avoid move cold block out of loop: https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html Richard suggested to "never hoist anything from a bb with lower execution frequency to a bb with higher one in LIM invariantness_dom_walker before_dom_children". This patch

[PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-04-16 Thread Xiong Hu Luo via Gcc-patches
fmod/fmodf and remainder/remainderf could be expanded instead of library call when fast-math build, which is much faster. fmodf: fdivs f0,f1,f2 frizf0,f0 fnmsubs f1,f2,f0,f1 remainderf: fdivs f0,f1,f2 frinf0,f0 fnmsubs f1,f2,f0,f1 gcc/ChangeLog: 2021-04

[RFC] Run pass_sink_code once more after ivopts/fre

2020-12-21 Thread Xiong Hu Luo via Gcc-patches
Here comes another case that requires run a pass once more, as this is not the common suggested direction to solve problems, not quite sure whether it is still a reasonble fix here. Source code is something like: ref = ip + *hslot; while (ip < in_end - 2) { unsigned int len = 2; len++; fo

[PATCH] Add debug_bb_details and debug_bb_n_details

2020-10-23 Thread Xiong Hu Luo via Gcc-patches
Sometimes debug_bb_slim&debug_bb_n_slim is not enough, how about adding this debug_bb_details&debug_bb_n_details? Or any other similar call existed? gcc/ChangeLog: 2020-10-23 Xionghu Luo * print-rtl.c (debug_bb_details): New function. * (debug_bb_n_details): New function. ---

[PATCH v2 1/2] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR

2020-09-17 Thread Xiong Hu Luo via Gcc-patches
This patch enables transformation from ARRAY_REF(VIEW_CONVERT_EXPR) to VEC_SET internal function in gimple-isel pass if target supports vec_set with variable index by checking can_vec_set_var_idx_p. gcc/ChangeLog: 2020-09-18 Xionghu Luo * gimple-isel.cc (gimple_expand_vec_set_expr): N

[PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-17 Thread Xiong Hu Luo via Gcc-patches
vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value to be insert, arg2 is the place to insert arg1 to arg0. Current expander generates stxv+stwx+lxv if arg2 is variable instead of constant, which causes serious store hit load performance issue on Power. This patch tries 1) Bu

[PATCH] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-08-31 Thread Xiong Hu Luo via Gcc-patches
vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value to be insert, arg2 is the place to insert arg1 to arg0. This patch adds __builtin_vec_insert_v4si[v4sf,v2di,v2df,v8hi,v16qi] for vec_insert to not expand too early in gimple stage if arg2 is variable, to avoid generate store h

[PATCH] dse: Remove partial load after full store for high part access[PR71309]

2020-07-21 Thread Xiong Hu Luo via Gcc-patches
This patch could optimize (works for char/short/int/void*): 6: r119:TI=[r118:DI+0x10] 7: [r118:DI]=r119:TI 8: r121:DI=[r118:DI+0x8] => 6: r119:TI=[r118:DI+0x10] 16: r122:DI=r119:TI#8 Final ASM will be as below without partial load after full store(stxv+ld): ld 10,16(3) mr 9,3 ld 3,24(3)

[PATCH 2/2] rs6000: Define define_insn_and_split to split unspec sldi+or to rldimi

2020-07-09 Thread Xiong Hu Luo via Gcc-patches
Combine pass could recognize the pattern defined and split it in split1, this patch could optimize: 21: r130:DI=r133:DI<<0x20 11: {r129:DI=zero_extend(unspec[[r145:DI]] 87);clobber scratch;} 22: r134:DI=r130:DI|r129:DI to 21: {r149:DI=zero_extend(unspec[[r145:DI]] 87);clobber scratch;} 22: r134:

[PATCH 1/2] rs6000: Init V4SF vector without converting SP to DP

2020-07-09 Thread Xiong Hu Luo via Gcc-patches
Move V4SF to V4SI, init vector like V4SI and move to V4SF back. Better instruction sequence could be generated on Power9: lfs + xxpermdi + xvcvdpsp + vmrgew => lwz + (sldi + or) + mtvsrdd With the patch followed, it could be continue optimized to: lwz + rldimi + mtvsrdd The point is to use lwz

[PATCH] ipa-inline: Adjust condition for caller_growth_limits

2020-01-05 Thread Xiong Hu Luo
Inline should return failure either (newsize > param_large_function_insns) OR (newsize > limit). Sometimes newsize is larger than param_large_function_insns, but smaller than limit, inline doesn't return failure even if the new function is a large function. Therefore, param_large_function_insns an

[PATCH] [RFC] ipa: duplicate ipa_size_summary for cloned nodes

2019-12-17 Thread Xiong Hu Luo
The size_info of ipa_size_summary are created by r277424. It should be duplicated for cloned nodes, otherwise self_size and estimated_self_stack_size would be 0, causing param large-function-insns and large-function-growth working inaccurate when ipa-inline. gcc/ChangeLog: 2019-12-18 Lu

[RFC] ipa-cp: Fix PGO regression caused by r278808

2019-12-09 Thread Xiong Hu Luo
The performance of exchange2 built with PGO will decrease ~28% by r278808 due to profile count set incorrectly. The cloned nodes are updated to a very small count caused later pass cunroll fail to unroll the recursive function in exchange2, This patch enables adding orig_sum to the new nodes for s

[PATCH 2/2] Fix comments typo

2019-11-15 Thread Xiong Hu Luo
I'm going to install it as obvious. gcc/ChangeLog: 2019-11-15 Luo Xiong Hu * ipa-comdats.c: Fix comments typo. * ipa-profile.c: Fix comments typo. * tree-profile.c (gimple_gen_ic_profiler): Use the new variable __gcov_indirect_call.counters and __gcov_i

[PATCH 1/2] Update iterator of next

2019-11-15 Thread Xiong Hu Luo
next is initialized only in the loop before, it is never updated in it's own loop. gcc/ChangeLog 2019-11-15 Xiong Hu Luo * ipa-inline.c (inline_small_functions): Update iterator of next. --- gcc/ipa-inline.c | 15 +-- 1 file changed, 9 insertions(+), 6 dele

[PATCH] PR92398: Fix testcase failure of pr72804.c

2019-11-14 Thread Xiong Hu Luo
P9LE generated instruction is not worse than P8LE. mtvsrdd;xxlnot;stxv vs. not;not;std;std. Update the test case to fix failures. gcc/testsuite/ChangeLog: 2019-11-15 Luo Xiong Hu testsuite/pr92398 * gcc.target/powerpc/pr72804.h: New. * gcc.target/powerpc/pr7280

[PATCH] Add explicit description for -finline

2019-10-31 Thread Xiong Hu Luo
-finline is not a explicit option, search word "-finline" in page https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options will miss the explicit option "-fno-inline". gcc/ChangeLog: 2019-11-01 Xiong Hu Luo doc/invoke.texi (inline):

[PATCH] PR92090: Fix testcase failures by r276469

2019-10-30 Thread Xiong Hu Luo
-finline-functions is enabled by default for O2 since r276469, update the test cases that inline small functions caused instruction number difference. gcc/testsuite/ChangeLog: 2019-10-30 Xiong Hu Luo PR92090 * gcc/testsuite/gcc.target/powerpc/pr79439-1.c: Update

[RFC] Come up with ipa passes introduction in gccint documentation

2019-09-29 Thread Xiong Hu Luo
There is no ipa passes introduction in gccint now, is it nessessary to add this part as both GIMPLE passes and RTL passes breif intruduction already exit in Chapter 9 "Passes and Files of the Compiler" but no section for ipa passes? If it's OK, this is just a framework, lots of words need be filled

[PATCH] Enable math functions linking with static library for LTO

2019-08-25 Thread Xiong Hu Luo
math function, then the function in static library will be linked first if its sequence is ahead of the dynamic library. gcc/ChangeLog 2019-08-14 Xiong Hu Luo PR lto/91287 * builtins.c (builtin_with_linkage_p): New function. * builtins.h (builtin_with_linkage_p): New

[PATCH] Add MD Function type check for builtin_md vectorize

2019-08-20 Thread Xiong Hu Luo
. gcc/ChangeLog 2019-08-21 Xiong Hu Luo * tree-vect-stmts.c (vectorizable_call): Check callee built-in type. * gcc/tree.h (DECL_MD_FUNCTION_P): New function. --- gcc/tree-vect-stmts.c | 2 +- gcc/tree.h| 12 2 files changed, 13 insertions(+), 1

[RFC] Enable math functions linking with static library for LTO

2019-08-09 Thread Xiong Hu Luo
, then the function in static library will be linked first if its sequence is ahead of the dynamic library. gcc/ChangeLog 2019-08-09 Xiong Hu Luo PR lto/91287 * symtab.c (write_symbol): Check built_in function type. * lto-streamer-out.c (symtab_node

[PATCH v2] Generalize get_most_common_single_value to return k_th value & count

2019-07-15 Thread Xiong Hu Luo
Currently get_most_common_single_value could only return the max hist , add qsort to enable this function return kth value. Rename it to get_kth_value_count. gcc/ChangeLog: 2019-07-15 Xiong Hu Luo * ipa-profile.c (get_most_common_single_value): Use get_kth_value_count

[PATCH] Generalize get_most_common_single_value to return k_th value & count

2019-07-14 Thread Xiong Hu Luo
Currently get_most_common_single_value could only return the max hist , add two paramter to enable this function return kth value if needed. gcc/ChangeLog: 2019-07-15 Xiong Hu Luo * value-prof.c (get_most_common_single_value): Add input params k_th and k, return the

[PATCH v2] Missed function specialization + partial devirtualization

2019-07-12 Thread Xiong Hu Luo
-profile. As get_most_common_single_value could only return single value, but this multiple indirect call needs store each hist value, will consider specialize it later. gcc/ChangeLog 2019-06-17 Xiong Hu Luo PR ipa/69678 * cgraph.c (symbol_table::create_edge): Init s

[PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-17 Thread Xiong Hu Luo
ASS2_OPTIMIZE: -fprofile-use --param indir-call-topn-profile=1 -flto -fprofile-correction 6.3. No performance change on PHP benchmark. 7. Bootstrap and regression test passed on Power8-LE. gcc/ChangeLog 2019-06-17 Xiong Hu Luo PR ipa/69678 * cgraph.c

Re: [PATCH] backport r257541, r259936, r260294, r260623, r261098, r261333, r268585.

2019-04-17 Thread Xiong Hu Luo
c. backport them to update file names and fix regressions >> for GCC7 on power9. > > (See e.g. https://gcc.gnu.org/ml/gcc-testresults/2019-04/msg01868.html for > the failures this patch fixes; the patch is for GCC 7). > >> gcc/ChangeLog: >> >> 2019-04-03 Xiong

[PATCH] backport r268834 from mainline to gcc-8-branch

2019-03-04 Thread Xiong Hu Luo
Backport r268834 of "Add support for the vec_sbox_be, vec_cipher_be etc." from mainline to gcc-8-branch. Regression-tested on Linux POWER8 LE. OK for gcc-8-branch? PS: Is backport to gcc-7-branch also needed? gcc/ChangeLog: 2019-03-05 Xiong Hu Luo Backport of r268834 from m

*Ping* Re: [PATCH] PR c/43673 - Incorrect warning in dfp printf.

2019-03-03 Thread Xiong Hu Luo
Ping: https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01949.html Thanks Xionghu On 2019/2/26 AM9:13, luo...@linux.ibm.com wrote: > From: Xiong Hu Luo > > dfp printf/scanf of Ha/HA, Da/DA and DDa/DDA is not set properly, cause > incorrect warning happens: > "use of 'D

Re: [PATCH] luoxhu - backport from trunk r255555, r257253 and r258137

2019-02-19 Thread Xiong Hu Luo
Hi Segher, On 2019/2/20 AM6:24, Segher Boessenkool wrote: Hi! On Tue, Feb 19, 2019 at 01:23:53AM -0600, luo...@linux.ibm.com wrote: This is a backport of r25, r257253 and r258137 of trunk to gcc-7-branch. The patches were on trunk before GCC 8 forked already. Totally 5 files need mannual r

Re: [PATCH] rs6000: Add support for the vec_sbox_be, vec_cipher_be etc. builtins.

2019-02-11 Thread Xiong Hu Luo
01-23 Xiong Hu Luo * gcc/testsuite/gcc.target/powerpc/crypto-builtin-1.c (crpyto1_be, crpyto2_be, crpyto3_be, crpyto4_be, crpyto5_be): New testcases. Typoes ("crypto"). And that last line is indented incorrectly. With those things fixed, okay for trunk, with t

[PATCH] Add myself to MAINTAINERS

2019-01-15 Thread Xiong Hu Luo
2019-01-16  Xiong Hu Luo   * MAINTAINERS (Write After Approval): Add myself. Committed in r267962. --- Index: ChangeLog === --- ChangeLog   (revision 267961) +++ ChangeLog   (working copy) @@ -1,3 +1,7 @@ + 2019-01-16  Xiong Hu