[PATCH] Fix incorrect computation in fill_always_executed_in_1

2021-08-16 Thread Xiong Hu Luo via Gcc-patches
It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for nested loops. inn_loop is updated to inner loop, so it need be restored when exiting from innermost loop. With this patch, the store instruction in outer loop could also be moved out of outer loop by store motion. Any comments?

[RFC] Don't move cold code out of loop by checking bb count

2021-08-01 Thread Xiong Hu Luo via Gcc-patches
There was a patch trying to avoid move cold block out of loop: https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html Richard suggested to "never hoist anything from a bb with lower execution frequency to a bb with higher one in LIM invariantness_dom_walker before_dom_children". This patch

[PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-04-16 Thread Xiong Hu Luo via Gcc-patches
fmod/fmodf and remainder/remainderf could be expanded instead of library call when fast-math build, which is much faster. fmodf: fdivs f0,f1,f2 frizf0,f0 fnmsubs f1,f2,f0,f1 remainderf: fdivs f0,f1,f2 frinf0,f0 fnmsubs f1,f2,f0,f1 gcc/ChangeLog: 2021-04

[RFC] Run pass_sink_code once more after ivopts/fre

2020-12-21 Thread Xiong Hu Luo via Gcc-patches
Here comes another case that requires run a pass once more, as this is not the common suggested direction to solve problems, not quite sure whether it is still a reasonble fix here. Source code is something like: ref = ip + *hslot; while (ip < in_end - 2) { unsigned int len = 2; len++; fo

[PATCH] Add debug_bb_details and debug_bb_n_details

2020-10-23 Thread Xiong Hu Luo via Gcc-patches
Sometimes debug_bb_slim&debug_bb_n_slim is not enough, how about adding this debug_bb_details&debug_bb_n_details? Or any other similar call existed? gcc/ChangeLog: 2020-10-23 Xionghu Luo * print-rtl.c (debug_bb_details): New function. * (debug_bb_n_details): New function. ---

[PATCH v2 1/2] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR

2020-09-17 Thread Xiong Hu Luo via Gcc-patches
This patch enables transformation from ARRAY_REF(VIEW_CONVERT_EXPR) to VEC_SET internal function in gimple-isel pass if target supports vec_set with variable index by checking can_vec_set_var_idx_p. gcc/ChangeLog: 2020-09-18 Xionghu Luo * gimple-isel.cc (gimple_expand_vec_set_expr): N

[PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-17 Thread Xiong Hu Luo via Gcc-patches
vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value to be insert, arg2 is the place to insert arg1 to arg0. Current expander generates stxv+stwx+lxv if arg2 is variable instead of constant, which causes serious store hit load performance issue on Power. This patch tries 1) Bu

[PATCH] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-08-31 Thread Xiong Hu Luo via Gcc-patches
vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value to be insert, arg2 is the place to insert arg1 to arg0. This patch adds __builtin_vec_insert_v4si[v4sf,v2di,v2df,v8hi,v16qi] for vec_insert to not expand too early in gimple stage if arg2 is variable, to avoid generate store h

[PATCH] dse: Remove partial load after full store for high part access[PR71309]

2020-07-21 Thread Xiong Hu Luo via Gcc-patches
This patch could optimize (works for char/short/int/void*): 6: r119:TI=[r118:DI+0x10] 7: [r118:DI]=r119:TI 8: r121:DI=[r118:DI+0x8] => 6: r119:TI=[r118:DI+0x10] 16: r122:DI=r119:TI#8 Final ASM will be as below without partial load after full store(stxv+ld): ld 10,16(3) mr 9,3 ld 3,24(3)

[PATCH 2/2] rs6000: Define define_insn_and_split to split unspec sldi+or to rldimi

2020-07-09 Thread Xiong Hu Luo via Gcc-patches
Combine pass could recognize the pattern defined and split it in split1, this patch could optimize: 21: r130:DI=r133:DI<<0x20 11: {r129:DI=zero_extend(unspec[[r145:DI]] 87);clobber scratch;} 22: r134:DI=r130:DI|r129:DI to 21: {r149:DI=zero_extend(unspec[[r145:DI]] 87);clobber scratch;} 22: r134:

[PATCH 1/2] rs6000: Init V4SF vector without converting SP to DP

2020-07-09 Thread Xiong Hu Luo via Gcc-patches
Move V4SF to V4SI, init vector like V4SI and move to V4SF back. Better instruction sequence could be generated on Power9: lfs + xxpermdi + xvcvdpsp + vmrgew => lwz + (sldi + or) + mtvsrdd With the patch followed, it could be continue optimized to: lwz + rldimi + mtvsrdd The point is to use lwz