It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
nested loops. inn_loop is updated to inner loop, so it need be restored
when exiting from innermost loop. With this patch, the store instruction
in outer loop could also be moved out of outer loop by store motion.
Any comments?
There was a patch trying to avoid move cold block out of loop:
https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
Richard suggested to "never hoist anything from a bb with lower execution
frequency to a bb with higher one in LIM invariantness_dom_walker
before_dom_children".
This patch
fmod/fmodf and remainder/remainderf could be expanded instead of library
call when fast-math build, which is much faster.
fmodf:
fdivs f0,f1,f2
frizf0,f0
fnmsubs f1,f2,f0,f1
remainderf:
fdivs f0,f1,f2
frinf0,f0
fnmsubs f1,f2,f0,f1
gcc/ChangeLog:
2021-04
Here comes another case that requires run a pass once more, as this is
not the common suggested direction to solve problems, not quite sure
whether it is still a reasonble fix here. Source code is something like:
ref = ip + *hslot;
while (ip < in_end - 2) {
unsigned int len = 2;
len++;
fo
Sometimes debug_bb_slim&debug_bb_n_slim is not enough, how about adding
this debug_bb_details&debug_bb_n_details? Or any other similar call
existed?
gcc/ChangeLog:
2020-10-23 Xionghu Luo
* print-rtl.c (debug_bb_details): New function.
* (debug_bb_n_details): New function.
---
This patch enables transformation from ARRAY_REF(VIEW_CONVERT_EXPR) to
VEC_SET internal function in gimple-isel pass if target supports
vec_set with variable index by checking can_vec_set_var_idx_p.
gcc/ChangeLog:
2020-09-18 Xionghu Luo
* gimple-isel.cc (gimple_expand_vec_set_expr): N
vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
to be insert, arg2 is the place to insert arg1 to arg0. Current expander
generates stxv+stwx+lxv if arg2 is variable instead of constant, which
causes serious store hit load performance issue on Power. This patch tries
1) Bu
vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
to be insert, arg2 is the place to insert arg1 to arg0. This patch adds
__builtin_vec_insert_v4si[v4sf,v2di,v2df,v8hi,v16qi] for vec_insert to
not expand too early in gimple stage if arg2 is variable, to avoid generate
store h
This patch could optimize (works for char/short/int/void*):
6: r119:TI=[r118:DI+0x10]
7: [r118:DI]=r119:TI
8: r121:DI=[r118:DI+0x8]
=>
6: r119:TI=[r118:DI+0x10]
16: r122:DI=r119:TI#8
Final ASM will be as below without partial load after full store(stxv+ld):
ld 10,16(3)
mr 9,3
ld 3,24(3)
Combine pass could recognize the pattern defined and split it in split1,
this patch could optimize:
21: r130:DI=r133:DI<<0x20
11: {r129:DI=zero_extend(unspec[[r145:DI]] 87);clobber scratch;}
22: r134:DI=r130:DI|r129:DI
to
21: {r149:DI=zero_extend(unspec[[r145:DI]] 87);clobber scratch;}
22: r134:
Move V4SF to V4SI, init vector like V4SI and move to V4SF back.
Better instruction sequence could be generated on Power9:
lfs + xxpermdi + xvcvdpsp + vmrgew
=>
lwz + (sldi + or) + mtvsrdd
With the patch followed, it could be continue optimized to:
lwz + rldimi + mtvsrdd
The point is to use lwz
Inline should return failure either (newsize > param_large_function_insns)
OR (newsize > limit). Sometimes newsize is larger than
param_large_function_insns, but smaller than limit, inline doesn't return
failure even if the new function is a large function.
Therefore, param_large_function_insns an
The size_info of ipa_size_summary are created by r277424. It should be
duplicated for cloned nodes, otherwise self_size and estimated_self_stack_size
would be 0, causing param large-function-insns and large-function-growth working
inaccurate when ipa-inline.
gcc/ChangeLog:
2019-12-18 Lu
The performance of exchange2 built with PGO will decrease ~28% by r278808
due to profile count set incorrectly. The cloned nodes are updated to a
very small count caused later pass cunroll fail to unroll the recursive
function in exchange2, This patch enables adding orig_sum to the new nodes
for s
I'm going to install it as obvious.
gcc/ChangeLog:
2019-11-15 Luo Xiong Hu
* ipa-comdats.c: Fix comments typo.
* ipa-profile.c: Fix comments typo.
* tree-profile.c (gimple_gen_ic_profiler): Use the new variable
__gcov_indirect_call.counters and __gcov_i
next is initialized only in the loop before, it is never updated
in it's own loop.
gcc/ChangeLog
2019-11-15 Xiong Hu Luo
* ipa-inline.c (inline_small_functions): Update iterator of next.
---
gcc/ipa-inline.c | 15 +--
1 file changed, 9 insertions(+), 6 dele
P9LE generated instruction is not worse than P8LE.
mtvsrdd;xxlnot;stxv vs. not;not;std;std.
Update the test case to fix failures.
gcc/testsuite/ChangeLog:
2019-11-15 Luo Xiong Hu
testsuite/pr92398
* gcc.target/powerpc/pr72804.h: New.
* gcc.target/powerpc/pr7280
-finline is not a explicit option, search word "-finline" in page
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
will miss the explicit option "-fno-inline".
gcc/ChangeLog:
2019-11-01 Xiong Hu Luo
doc/invoke.texi (inline):
-finline-functions is enabled by default for O2 since r276469, update the
test cases that inline small functions caused instruction number difference.
gcc/testsuite/ChangeLog:
2019-10-30 Xiong Hu Luo
PR92090
* gcc/testsuite/gcc.target/powerpc/pr79439-1.c: Update
There is no ipa passes introduction in gccint now, is it nessessary to
add this part as both GIMPLE passes and RTL passes breif intruduction
already exit in Chapter 9 "Passes and Files of the Compiler" but no
section for ipa passes?
If it's OK, this is just a framework, lots of words need be filled
math function, then the function in static library will be linked
first if its sequence is ahead of the dynamic library.
gcc/ChangeLog
2019-08-14 Xiong Hu Luo
PR lto/91287
* builtins.c (builtin_with_linkage_p): New function.
* builtins.h (builtin_with_linkage_p): New
.
gcc/ChangeLog
2019-08-21 Xiong Hu Luo
* tree-vect-stmts.c (vectorizable_call): Check callee built-in type.
* gcc/tree.h (DECL_MD_FUNCTION_P): New function.
---
gcc/tree-vect-stmts.c | 2 +-
gcc/tree.h| 12
2 files changed, 13 insertions(+), 1
, then the function in static library will be linked
first if its sequence is ahead of the dynamic library.
gcc/ChangeLog
2019-08-09 Xiong Hu Luo
PR lto/91287
* symtab.c (write_symbol): Check built_in function type.
* lto-streamer-out.c (symtab_node
Currently get_most_common_single_value could only return the max hist
, add qsort to enable this function return kth value.
Rename it to get_kth_value_count.
gcc/ChangeLog:
2019-07-15 Xiong Hu Luo
* ipa-profile.c (get_most_common_single_value): Use
get_kth_value_count
Currently get_most_common_single_value could only return the max hist
, add two paramter to enable this function return kth
value if needed.
gcc/ChangeLog:
2019-07-15 Xiong Hu Luo
* value-prof.c (get_most_common_single_value): Add input params
k_th and k, return the
-profile. As
get_most_common_single_value could only return single value, but this
multiple indirect call needs store each hist value, will
consider specialize it later.
gcc/ChangeLog
2019-06-17 Xiong Hu Luo
PR ipa/69678
* cgraph.c (symbol_table::create_edge): Init s
ASS2_OPTIMIZE: -fprofile-use --param indir-call-topn-profile=1 -flto
-fprofile-correction
6.3. No performance change on PHP benchmark.
7. Bootstrap and regression test passed on Power8-LE.
gcc/ChangeLog
2019-06-17 Xiong Hu Luo
PR ipa/69678
* cgraph.c
c. backport them to update file names and fix regressions
>> for GCC7 on power9.
>
> (See e.g. https://gcc.gnu.org/ml/gcc-testresults/2019-04/msg01868.html for
> the failures this patch fixes; the patch is for GCC 7).
>
>> gcc/ChangeLog:
>>
>> 2019-04-03 Xiong
Backport r268834 of "Add support for the vec_sbox_be, vec_cipher_be etc."
from mainline to gcc-8-branch.
Regression-tested on Linux POWER8 LE. OK for gcc-8-branch?
PS: Is backport to gcc-7-branch also needed?
gcc/ChangeLog:
2019-03-05 Xiong Hu Luo
Backport of r268834 from m
Ping:
https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01949.html
Thanks
Xionghu
On 2019/2/26 AM9:13, luo...@linux.ibm.com wrote:
> From: Xiong Hu Luo
>
> dfp printf/scanf of Ha/HA, Da/DA and DDa/DDA is not set properly, cause
> incorrect warning happens:
> "use of 'D
Hi Segher,
On 2019/2/20 AM6:24, Segher Boessenkool wrote:
Hi!
On Tue, Feb 19, 2019 at 01:23:53AM -0600, luo...@linux.ibm.com wrote:
This is a backport of r25, r257253 and r258137 of trunk to gcc-7-branch.
The patches were on trunk before GCC 8 forked already. Totally 5 files need
mannual r
01-23 Xiong Hu Luo
* gcc/testsuite/gcc.target/powerpc/crypto-builtin-1.c
(crpyto1_be, crpyto2_be, crpyto3_be, crpyto4_be, crpyto5_be):
New testcases.
Typoes ("crypto"). And that last line is indented incorrectly.
With those things fixed, okay for trunk, with t
2019-01-16 Xiong Hu Luo
* MAINTAINERS (Write After Approval): Add myself.
Committed in r267962.
---
Index: ChangeLog
===
--- ChangeLog (revision 267961)
+++ ChangeLog (working copy)
@@ -1,3 +1,7 @@
+ 2019-01-16 Xiong Hu
33 matches
Mail list logo