It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
nested loops. inn_loop is updated to inner loop, so it need be restored
when exiting from innermost loop. With this patch, the store instruction
in outer loop could also be moved out of outer loop by store motion.
Any comments?
There was a patch trying to avoid move cold block out of loop:
https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
Richard suggested to "never hoist anything from a bb with lower execution
frequency to a bb with higher one in LIM invariantness_dom_walker
before_dom_children".
This patch
fmod/fmodf and remainder/remainderf could be expanded instead of library
call when fast-math build, which is much faster.
fmodf:
fdivs f0,f1,f2
frizf0,f0
fnmsubs f1,f2,f0,f1
remainderf:
fdivs f0,f1,f2
frinf0,f0
fnmsubs f1,f2,f0,f1
gcc/ChangeLog:
2021-04
Here comes another case that requires run a pass once more, as this is
not the common suggested direction to solve problems, not quite sure
whether it is still a reasonble fix here. Source code is something like:
ref = ip + *hslot;
while (ip < in_end - 2) {
unsigned int len = 2;
len++;
fo
Sometimes debug_bb_slim&debug_bb_n_slim is not enough, how about adding
this debug_bb_details&debug_bb_n_details? Or any other similar call
existed?
gcc/ChangeLog:
2020-10-23 Xionghu Luo
* print-rtl.c (debug_bb_details): New function.
* (debug_bb_n_details): New function.
---
This patch enables transformation from ARRAY_REF(VIEW_CONVERT_EXPR) to
VEC_SET internal function in gimple-isel pass if target supports
vec_set with variable index by checking can_vec_set_var_idx_p.
gcc/ChangeLog:
2020-09-18 Xionghu Luo
* gimple-isel.cc (gimple_expand_vec_set_expr): N
vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
to be insert, arg2 is the place to insert arg1 to arg0. Current expander
generates stxv+stwx+lxv if arg2 is variable instead of constant, which
causes serious store hit load performance issue on Power. This patch tries
1) Bu
vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
to be insert, arg2 is the place to insert arg1 to arg0. This patch adds
__builtin_vec_insert_v4si[v4sf,v2di,v2df,v8hi,v16qi] for vec_insert to
not expand too early in gimple stage if arg2 is variable, to avoid generate
store h
This patch could optimize (works for char/short/int/void*):
6: r119:TI=[r118:DI+0x10]
7: [r118:DI]=r119:TI
8: r121:DI=[r118:DI+0x8]
=>
6: r119:TI=[r118:DI+0x10]
16: r122:DI=r119:TI#8
Final ASM will be as below without partial load after full store(stxv+ld):
ld 10,16(3)
mr 9,3
ld 3,24(3)
Combine pass could recognize the pattern defined and split it in split1,
this patch could optimize:
21: r130:DI=r133:DI<<0x20
11: {r129:DI=zero_extend(unspec[[r145:DI]] 87);clobber scratch;}
22: r134:DI=r130:DI|r129:DI
to
21: {r149:DI=zero_extend(unspec[[r145:DI]] 87);clobber scratch;}
22: r134:
Move V4SF to V4SI, init vector like V4SI and move to V4SF back.
Better instruction sequence could be generated on Power9:
lfs + xxpermdi + xvcvdpsp + vmrgew
=>
lwz + (sldi + or) + mtvsrdd
With the patch followed, it could be continue optimized to:
lwz + rldimi + mtvsrdd
The point is to use lwz
11 matches
Mail list logo