Re: [PATCH 1/2] Refactor final_value_replacement_loop [PR90594]

2024-12-12 Thread Feng Xue OS
inal_value (class loop *, tree, bool *, bool); +extern void apply_scev_final_value_replacement (gphi *, tree, bool, bool); extern bool final_value_replacement_loop (class loop *); -extern unsigned int scev_const_prop (void); extern bool expression_expensive_p (tree, bool *); extern bool

Re: [PATCH 2/2] Integrate scev-cprop into DCE [PR90594]

2024-12-10 Thread Feng Xue OS
Thanks, and please see my comments as below: >> Currently, if could, scev-cprop unconditionally replaces loop closed ssa with >> an expression built from loop initial value and loop niter, which might cause >> redundant code-gen when all interior computations related to IV inside loop >> are also

Re: [PATCH 2/2] Integrate scev-cprop into DCE [PR90594]

2024-12-06 Thread Feng Xue OS
Forgotten attaching the patch file. From: Feng Xue OS Sent: Friday, December 6, 2024 9:57 PM To: gcc-patches@gcc.gnu.org; Richard Biener Subject: [PATCH 2/2] Integrate scev-cprop into DCE [PR90594] Currently, if could, scev-cprop unconditionally replaces

[PATCH 2/2] Integrate scev-cprop into DCE [PR90594]

2024-12-06 Thread Feng Xue OS
Currently, if could, scev-cprop unconditionally replaces loop closed ssa with an expression built from loop initial value and loop niter, which might cause redundant code-gen when all interior computations related to IV inside loop are also neccessary. As example, for the below case: p = init_

[PATCH 1/2] Refactor final_value_replacement_loop [PR90594]

2024-12-06 Thread Feng Xue OS
This patch refactors the procedure in tree-scalar-evolution.cc in order to partially export its functionality to other module, so decomposes it to several relatively independent utility functions. Thanks, Feng --- gcc/ PR tree-optimization/90594 * tree-scalar-evolution.cc (simple

Re: [PATCH] vect: Fix inconsistency in fully-masked lane-reducing op generation [PR116985]

2024-10-12 Thread Feng Xue OS
Added. Thanks, Feng From: Richard Biener Sent: Saturday, October 12, 2024 8:12 PM To: Feng Xue OS Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] vect: Fix inconsistency in fully-masked lane-reducing op generation [PR116985] On Sat, Oct 12, 2024 at 9

[PATCH] vect: Fix inconsistency in fully-masked lane-reducing op generation [PR116985]

2024-10-12 Thread Feng Xue OS
To align vectorized def/use when lane-reducing op is present in loop reduction, we may need to insert extra trivial pass-through copies, which would cause mismatch between lane-reducing vector copy and loop mask index. This could be fixed by computing the right index around a new counter on effecti

Re: [RFC] Generalize formation of lane-reducing ops in loop reduction

2024-08-21 Thread Feng Xue OS
>> >> >> 1. Background >> >> >> >> For loop reduction of accumulating result of a widening operation, the >> >> preferred pattern is lane-reducing operation, if supported by target. >> >> Because >> >> this kind of operation need not preserve intermediate results of widening >> >> operation, and o

[PATCH] vect: Add missed opcodes in vect_get_smallest_scalar_type [PR115228]

2024-08-05 Thread Feng Xue OS
Some opcodes are missed when determining the smallest scalar type for a vectorizable statement. Currently, this bug does not cause any problem, because vect_get_smallest_scalar_type is only used to compute max nunits vectype, and even statement with missed opcode is incorrectly bypassed, the max nu

[PATCH] vect: Allow unsigned-to-signed promotion in vect_look_through_possible_promotion [PR115707]

2024-08-05 Thread Feng Xue OS
The function vect_look_through_possible_promotion() fails to figure out root definition if casts involves more than two promotions with sign change as: long a = (long)b; // promotion cast -> int b = (int)c; // promotion cast, sign change -> unsigned short c = ...; For this case, the

Re: [RFC] Generalize formation of lane-reducing ops in loop reduction

2024-08-03 Thread Feng Xue OS
>> 1. Background >> >> For loop reduction of accumulating result of a widening operation, the >> preferred pattern is lane-reducing operation, if supported by target. Because >> this kind of operation need not preserve intermediate results of widening >> operation, and only produces reduced amount

[RFC][PATCH 5/5] vect: Add accumulating-result pattern for lane-reducing operation

2024-07-21 Thread Feng Xue OS
This patch adds a pattern to fold a summation into the last operand of lane- reducing operation when appropriate, which is a supplement to those operation- specific patterns for dot-prod/sad/widen-sum. sum = lane-reducing-op(..., 0) + value; => sum = lane-reducing-op(..., value); Thanks, Feng

[RFC][PATCH 2/5] vect: Introduce loop reduction affine closure to vect pattern recog

2024-07-21 Thread Feng Xue OS
For sum-based loop reduction, its affine closure is composed by statements whose results and derived computation only end up in the reduction, and are not used in any non-linear transform operation. The concept underlies the generalized lane-reducing pattern recognition in the coming patches. As ma

[RFC][PATCH 4/5] vect: Extend lane-reducing patterns to non-loop-reduction statement

2024-07-21 Thread Feng Xue OS
Previously, only simple lane-reducing case is supported, in which one loop reduction statement forms one pattern match: char *d0, *d1, *s0, *s1, *w; for (i) { sum += d0[i] * d1[i]; // sum = DOT_PROD(d0, d1, sum); sum += abs(s0[i] - s1[i]); // sum = SAD(s0, s1, sum); sum += w[i

[RFC][PATCH 3/5] vect: Enable lane-reducing operation that is not loop reduction statement

2024-07-21 Thread Feng Xue OS
This patch extends original vect analysis and transform to support a new kind of lane-reducing operation that participates in loop reduction indirectly. The operation itself is not reduction statement, but its value would be accumulated into reduction result finally. Thanks, Feng --- gcc/

[RFC][PATCH 1/5] vect: Fix single_imm_use in tree_vect_patterns

2024-07-21 Thread Feng Xue OS
The work for RFC (https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657860.html) involves not a little code change, so I have to separate it into several batches of patchset. This and the following patches constitute the first batch. Since pattern statement coexists with normal statements in a

[RFC] Generalize formation of lane-reducing ops in loop reduction

2024-07-21 Thread Feng Xue OS
Hi, I composed some patches to generalize lane-reducing (dot-product is a typical representative) pattern recognition, and prepared a RFC document so as to help review. The original intention was to make a complete solution for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114440. For sure, th

Re: [PATCH 1/4] vect: Add a unified vect_get_num_copies for slp and non-slp

2024-07-17 Thread Feng Xue OS
ke the checking assert unconditional? > > OK with that change. vect_get_num_vectors will ICE anyway > I guess, so at your choice remove the assert completely. > OK, I removed the assert. Thanks, Feng From: Richard Biener Sent: Monday, July 15,

[PATCH 4/4] vect: Optimize order of lane-reducing statements in loop def-use cycles

2024-07-13 Thread Feng Xue OS
When transforming multiple lane-reducing operations in a loop reduction chain, originally, corresponding vectorized statements are generated into def-use cycles starting from 0. The def-use cycle with smaller index, would contain more statements, which means more instruction dependency. For example

[PATCH 3/4] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-07-13 Thread Feng Xue OS
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current vectorizer could only handle the pattern if the reduction chain does not contain other operation, no matter the other is normal or lane-reducing. This patches removes some constraints in reduction analysis to allow mult

[PATCH 2/4] vect: Refit lane-reducing to be normal operation

2024-07-13 Thread Feng Xue OS
Vector stmts number of an operation is calculated based on output vectype. This is over-estimated for lane-reducing operation, which would cause vector def/use mismatched when we want to support loop reduction mixed with lane- reducing and normal operations. One solution is to refit lane-reducing t

[PATCH 1/4] vect: Add a unified vect_get_num_copies for slp and non-slp

2024-07-13 Thread Feng Xue OS
Extend original vect_get_num_copies (pure loop-based) to calculate number of vector stmts for slp node regarding a generic vect region. Thanks, Feng --- gcc/ * tree-vectorizer.h (vect_get_num_copies): New overload function. (vect_get_slp_num_vectors): New function. * tree-v

Re: [PATCH 2/4] vect: Fix inaccurate vector stmts number for slp reduction with lane-reducing

2024-07-13 Thread Feng Xue OS
gt; > when that's set instead of SLP_TREE_VECTYPE? As said having wrong > > > SLP_TREE_NUMBER_OF_VEC_STMTS is going to backfire. > > > > Then the alternative is to limit special handling related to the vec_num > > only > > inside vect_transform_reduction. Is

Re: [PATCH 2/4] vect: Fix inaccurate vector stmts number for slp reduction with lane-reducing

2024-07-11 Thread Feng Xue OS
YPE? As said having wrong > SLP_TREE_NUMBER_OF_VEC_STMTS is going to backfire. Then the alternative is to limit special handling related to the vec_num only inside vect_transform_reduction. Is that ok? Or any other suggestion? Thanks, Feng From: Rich

[PATCH 4/4] vect: Optimize order of lane-reducing statements in loop def-use cycles

2024-07-11 Thread Feng Xue OS
When transforming multiple lane-reducing operations in a loop reduction chain, originally, corresponding vectorized statements are generated into def-use cycles starting from 0. The def-use cycle with smaller index, would contain more statements, which means more instruction dependency. For example

[PATCH 3/4] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-07-11 Thread Feng Xue OS
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current vectorizer could only handle the pattern if the reduction chain does not contain other operation, no matter the other is normal or lane-reducing. This patches removes some constraints in reduction analysis to allow mult

[PATCH 2/4] vect: Fix inaccurate vector stmts number for slp reduction with lane-reducing

2024-07-11 Thread Feng Xue OS
Vector stmts number of an operation is calculated based on output vectype. This is over-estimated for lane-reducing operation. Sometimes, to workaround the issue, we have to rely on additional logic to deduce an exactly accurate number by other means. Aiming at the inconvenience, in this patch, we

[PATCH 1/4] vect: Shorten name of macro SLP_TREE_NUMBER_OF_VEC_STMTS

2024-07-11 Thread Feng Xue OS
This patch series are recomposed and split from https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655974.html. As I will add a new field tightly coupled with "vec_stmts_size", if following naming conversion as original, the new macro would be very long. So better to choose samely meaningful but

Re: [PATCH] vect: Fix shift-by-induction for single-lane slp

2024-06-27 Thread Feng Xue OS
LP_TREE_LANES (slp_node) == 1)) scalar_shift_arg = false; else if (dt[1] == vect_constant_def || dt[1] == vect_external_def -- 2.17.1 ________ From: Richard Biener Sent: Thursday, June 27, 2024 12:49 AM To: Feng Xue OS Cc: gcc-patches@gcc.gnu.org S

[PATCH] vect: Fix shift-by-induction for single-lane slp

2024-06-26 Thread Feng Xue OS
Allow shift-by-induction for slp node, when it is single lane, which is aligned with the original loop-based handling. Thanks, Feng --- gcc/tree-vect-stmts.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index ca6052662a3..8

[PATCH] vect: Fix shift-by-induction for single-lane slp

2024-06-26 Thread Feng Xue OS
Allow shift-by-induction for slp node, when it is single lane, which is aligned with the original loop-based handling. Thanks, Feng --- gcc/tree-vect-stmts.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index ca6052662a3..8

Re: [PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles

2024-06-26 Thread Feng Xue OS
ctions. */ -- 2.17.1 ____________ From: Feng Xue OS Sent: Thursday, June 20, 2024 2:02 PM To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles This patch was updated with some new chang

Re: [PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-06-26 Thread Feng Xue OS
s.cc b/gcc/tree-vect-stmts.cc index 840e162c7f0..845647b4399 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -13350,6 +13350,8 @@ vect_analyze_stmt (vec_info *vinfo, NULL, NULL, node, cost_vec) || vectorizable_load (vinfo, stmt_info, NU

Re: [PATCH 4/8] vect: Determine input vectype for multiple lane-reducing

2024-06-26 Thread Feng Xue OS
PHI records the input vectype with least lanes. */ - if (lane_reducing) -STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in; enum vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (phi_info); STMT_VINFO_REDUC_TYPE (reduc_info) = reduction_type; -- 2.17.1 ___

Re: [PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-06-25 Thread Feng Xue OS
>> >> >> - if (slp_node) >> >> + if (slp_node && SLP_TREE_LANES (slp_node) > 1) >> > >> > Hmm, that looks wrong. It looks like SLP_TREE_NUMBER_OF_VEC_STMTS is off >> > instead, which is bad. >> > >> >> nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); >> >>else >> >>

Re: [PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-06-23 Thread Feng Xue OS
s - 1 given you use one above > and the other below? Or simply iterate till op.num_ops > and sip i == reduc_index. > >> + for (unsigned i = 0; i < op.num_ops - 1; i++) >> + { >> + gcc_assert (vec_oprnds[i].length () == using_ncopies); >> +

Re: [PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles

2024-06-19 Thread Feng Xue OS
lar values of those N reductions. */ -- 2.17.1 ____________ From: Feng Xue OS Sent: Sunday, June 16, 2024 3:32 PM To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: [PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles When trans

Re: [PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-06-19 Thread Feng Xue OS
662a3..1b73ef01ade 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -13350,6 +13350,8 @@ vect_analyze_stmt (vec_info *vinfo, NULL, NULL, node, cost_vec) || vectorizable_load (vinfo, stmt_info, NULL, NULL, node, cost_vec) || vectorizable_store (vinfo, stmt_inf

Re: [PATCH 4/8] vect: Determine input vectype for multiple lane-reducing

2024-06-19 Thread Feng Xue OS
{ 0, 0, 0, 0 }; loop () { sum_v0 = dot_prod<16 * char>(char_a0, char_a1, sum_v0); sum_v1 = dot_prod<16 * char>(char_b0, char_b1, sum_v1); sum_v0 = dot_prod<8 * short>(short_c0_lo, short_c1_lo, sum_v0); sum_v1 = dot_prod<8 * short>(short_

[PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles

2024-06-16 Thread Feng Xue OS
When transforming multiple lane-reducing operations in a loop reduction chain, originally, corresponding vectorized statements are generated into def-use cycles starting from 0. The def-use cycle with smaller index, would contain more statements, which means more instruction dependency. For example

[PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-06-16 Thread Feng Xue OS
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current vectorizer could only handle the pattern if the reduction chain does not contain other operation, no matter the other is normal or lane-reducing. Actually, to allow multiple arbitrary lane-reducing operations, we need t

[PATCH 6/8] vect: Tighten an assertion for lane-reducing in transform

2024-06-16 Thread Feng Xue OS
According to logic of code nearby the assertion, all lane-reducing operations should not appear, not just DOT_PROD_EXPR. Since "use_mask_by_cond_expr_p" treats SAD_EXPR same as DOT_PROD_EXPR, and WIDEN_SUM_EXPR should not be allowed by the following assertion "gcc_assert (commutative_binary_op_p (.

[PATCH 5/8] vect: Use an array to replace 3 relevant variables

2024-06-16 Thread Feng Xue OS
It's better to place 3 relevant independent variables into array, since we have requirement to access them via an index in the following patch. At the same time, this change may get some duplicated code be more compact. Thanks, Feng --- gcc/ * tree-vect-loop.cc (vect_transform_reduction):

[PATCH 4/8] vect: Determine input vectype for multiple lane-reducing

2024-06-16 Thread Feng Xue OS
The input vectype of reduction PHI statement must be determined before vect cost computation for the reduction. Since lance-reducing operation has different input vectype from normal one, so we need to traverse all reduction statements to find out the input vectype with the least lanes, and set tha

[PATCH 3/8] vect: Use one reduction_type local variable

2024-06-16 Thread Feng Xue OS
Two local variables were defined to refer same STMT_VINFO_REDUC_TYPE, better to keep only one. Thanks, Feng --- gcc/ * tree-vect-loop.cc (vectorizable_reduction): Remove v_reduc_type, and replace it to another local variable reduction_type. --- gcc/tree-vect-loop.cc | 8

[PATCH 2/8] vect: Remove duplicated check on reduction operand

2024-06-16 Thread Feng Xue OS
In vectorizable_reduction, one check on a reduction operand via index could be contained by another one check via pointer, so remove the former. Thanks, Feng --- gcc/ * tree-vect-loop.cc (vectorizable_reduction): Remove the duplicated check. --- gcc/tree-vect-loop.cc | 6 ++

[PATH 1/8] vect: Add a function to check lane-reducing stmt

2024-06-16 Thread Feng Xue OS
The series of patches are meant to support multiple lane-reducing reduction statements. Since the original ones conflicted with the new single-lane slp node patches, I have reworked most of the patches, and split them as small as possible, which may make code review easier. In the 1st one, I ad

Re: [PATCH 6/6] vect: Optimize order of lane-reducing statements in loop def-use cycles [PR114440]

2024-06-13 Thread Feng Xue OS
able gives the initial scalar values of those N reductions. */ -- 2.17.1 ________ From: Feng Xue OS Sent: Thursday, May 30, 2024 10:56 PM To: Richard Biener Cc: Tamar Christina; gcc-patches@gcc.gnu.org Subject: [PATCH 6/6] vect: Optimize order of lane-reducing

Re: [PATCH 3/6] vect: Set STMT_VINFO_REDUC_DEF for non-live stmt in loop reduction

2024-06-13 Thread Feng Xue OS
gcc_assert (reduction_type != EXTRACT_LAST_REDUCTION -- 2.17.1 ____________ From: Feng Xue OS Sent: Thursday, May 30, 2024 10:51 PM To: Richard Biener Cc: Tamar Christina; gcc-patches@gcc.gnu.org Subject: [PATCH 3/6] vect: Set STMT_VINFO_REDUC_DEF for non-live stmt i

Re: [PATCH 5/6] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-06-02 Thread Feng Xue OS
Please see my comments below. Thanks, Feng > On Thu, May 30, 2024 at 4:55 PM Feng Xue OS > wrote: >> >> For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, >> current >> vectorizer could only handle the pattern if the reduction chain does not

Re: [PATCH 2/6] vect: Split out partial vect checking for reduction into a function

2024-05-31 Thread Feng Xue OS
Ok. Updated as the comments. Thanks, Feng From: Richard Biener Sent: Friday, May 31, 2024 3:29 PM To: Feng Xue OS Cc: Tamar Christina; gcc-patches@gcc.gnu.org Subject: Re: [PATCH 2/6] vect: Split out partial vect checking for reduction into a function

[PATCH 6/6] vect: Optimize order of lane-reducing statements in loop def-use cycles [PR114440]

2024-05-30 Thread Feng Xue OS
When transforming multiple lane-reducing operations in a loop reduction chain, originally, corresponding vectorized statements are generated into def-use cycles starting from 0. The def-use cycle with smaller index, would contain more statements, which means more instruction dependency. For example

[PATCH 5/6] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-05-30 Thread Feng Xue OS
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current vectorizer could only handle the pattern if the reduction chain does not contain other operation, no matter the other is normal or lane-reducing. Actually, to allow multiple arbitray lane-reducing operations, we need to

[PATCH 4/6] vect: Bind input vectype to lane-reducing operation

2024-05-30 Thread Feng Xue OS
The input vectype is an attribute of lane-reducing operation, instead of reduction PHI that it is associated to, since there might be more than one lane-reducing operations with different type in a loop reduction chain. So bind each lane-reducing operation with its own input type. Thanks, Feng ---

[PATCH 3/6] vect: Set STMT_VINFO_REDUC_DEF for non-live stmt in loop reduction

2024-05-30 Thread Feng Xue OS
Normally, vectorizable checking on statement in a loop reduction chain does not use the reduction PHI information. But some special statements might need it in vectorizable analysis, especially, for multiple lane-reducing operations support later. Thanks, Feng --- gcc/ * tree-vect-loop.cc

[PATCH 2/6] vect: Split out partial vect checking for reduction into a function

2024-05-30 Thread Feng Xue OS
This is a patch that is split out from https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652626.html. Partial vectorization checking for vectorizable_reduction is a piece of relatively isolated code, which may be reused by other places. Move the code into a new function for sharing. Thanks, Fen

[PATCH 1/6] vect: Add a function to check lane-reducing code [PR114440]

2024-05-30 Thread Feng Xue OS
This is a patch that is split out from https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652626.html. Check if an operation is lane-reducing requires comparison of code against three kinds (DOT_PROD_EXPR/WIDEN_SUM_EXPR/SAD_EXPR). Add an utility function to make source coding for the check handy

Re: [PATCH] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-05-30 Thread Feng Xue OS
>> Hi, >> >> The patch was updated with the newest trunk, and also contained some minor >> changes. >> >> I am working on another new feature which is meant to support pattern >> recognition >> of lane-reducing operations in affine closure originated from loop reduction >> variable, >> like: >>

Re: [PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info

2024-05-29 Thread Feng Xue OS
Ok. Then I will add a TODO comment on "bbs" field to describe it. Thanks, Feng From: Richard Biener Sent: Wednesday, May 29, 2024 3:14 PM To: Feng Xue OS Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] vect: Unify bbs in loop_vec_info and b

Re: [PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info

2024-05-28 Thread Feng Xue OS
_info_shared *); ~_bb_vec_info (); - /* The region we are operating on. bbs[0] is the entry, excluding - its PHI nodes. In the future we might want to track an explicit - entry edge to cover bbs[0] PHI nodes and have a region entry - insert location. */ - vec bbs; - vec roots; }

Re: [PATCH] vect: Use vect representative statement instead of original in patch recog [PR115060]

2024-05-28 Thread Feng Xue OS
Changed as the comments. Thanks, Feng From: Richard Biener Sent: Tuesday, May 28, 2024 5:34 PM To: Feng Xue OS Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] vect: Use vect representative statement instead of original in patch recog [PR115060] On Sat

[PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info

2024-05-25 Thread Feng Xue OS
Both derived classes ( loop_vec_info/bb_vec_info) have their own "bbs" field, which have exactly same purpose of recording all basic blocks inside the corresponding vect region, while the fields are composed by different data type, one is normal array, the other is auto_vec. This difference causes

[PATCH] vect: Use vect representative statement instead of original in patch recog [PR115060]

2024-05-25 Thread Feng Xue OS
Some utility functions (such as vect_look_through_possible_promotion) that are to find out certain kind of direct or indirect definition SSA for a value, may return the original one of the SSA, not its pattern representative SSA, even pattern is involved. For example, a = (T1) patt_b; pa

Re: [PATCH] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-05-24 Thread Feng Xue OS
Hi, The patch was updated with the newest trunk, and also contained some minor changes. I am working on another new feature which is meant to support pattern recognition of lane-reducing operations in affine closure originated from loop reduction variable, like: sum += cst1 * dot_prod_1 + c

[PATCH] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-04-07 Thread Feng Xue OS
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current vectorizer could only handle the pattern if the reduction chain does not contain other operation, no matter the other is normal or lane-reducing. Acctually, to allow multiple arbitray lane-reducing operations, we need t

Re: [PATCH] Do not count unused scalar use when marking STMT_VINFO_LIVE_P [PR113091]

2024-01-11 Thread Feng Xue OS
mark_live_stmts (bb_vinfo, SLP_INSTANCE_TREE (instance), - instance, &instance->cost_vec, svisited, - visited); - } -} +vect_bb_slp_mark_live_stmts (bb_vinfo); return !vinfo->slp_instances.is_empty (

PING: [PATCH] Do not count unused scalar use when marking STMT_VINFO_LIVE_P [PR113091]

2024-01-10 Thread Feng Xue OS
Hi, Richard, Would you please talk a look at this patch? Thanks, Feng From: Feng Xue OS Sent: Friday, December 29, 2023 6:28 PM To: gcc-patches@gcc.gnu.org Subject: [PATCH] Do not count unused scalar use when marking STMT_VINFO_LIVE_P [PR113091

[PATCH] Do not count unused scalar use when marking STMT_VINFO_LIVE_P [PR113091]

2023-12-29 Thread Feng Xue OS
This patch is meant to fix over-estimation about SLP vector-to-scalar cost for STMT_VINFO_LIVE_P statement. When pattern recognition is involved, a statement whose definition is consumed in some pattern, may not be included in the final replacement pattern statements, and would be skipped when buil

[PATCH] arm/aarch64: Add bti for all functions [PR106671]

2023-08-02 Thread Feng Xue OS via Gcc-patches
This patch extends option -mbranch-protection=bti with an optional argument as bti[+all] to force compiler to unconditionally insert bti for all functions. Because a direct function call at the stage of compiling might be rewritten to an indirect call with some kind of linker-generated thunk stub a

PING^2: [PATCH/RFC 2/2] WPD: Enable whole program devirtualization at LTRANS

2021-10-14 Thread Feng Xue OS via Gcc-patches
Thanks, Feng From: Feng Xue OS Sent: Thursday, September 16, 2021 5:26 PM To: Jan Hubicka; mjam...@suse.cz; Richard Biener; gcc-patches@gcc.gnu.org Cc: JiangNing OS Subject: [PATCH/RFC 2/2] WPD: Enable whole program devirtualization at LTRANS This patch

PING^2: [PATCH/RFC 1/2] WPD: Enable whole program devirtualization

2021-10-14 Thread Feng Xue OS via Gcc-patches
ttps://gcc.gnu.org/pipermail/gcc/2021-August/237132.html) Thanks, Feng From: Feng Xue OS Sent: Saturday, September 18, 2021 5:38 PM To: Jason Merrill; Jan Hubicka; mjam...@suse.cz; Richard Biener; gcc-patches@gcc.gnu.org Subject: Re: [PATCH/RFC 1/2] WPD: En

PING: [PATCH/RFC 2/2] WPD: Enable whole program devirtualization at LTRANS

2021-09-29 Thread Feng Xue OS via Gcc-patches
Made some minor changes. Thanks, Feng From: Feng Xue OS Sent: Thursday, September 16, 2021 5:26 PM To: Jan Hubicka; mjam...@suse.cz; Richard Biener; gcc-patches@gcc.gnu.org Cc: JiangNing OS Subject: [PATCH/RFC 2/2] WPD: Enable whole program

PING: [PATCH/RFC 1/2] WPD: Enable whole program devirtualization

2021-09-29 Thread Feng Xue OS via Gcc-patches
Minor update for some bugfixs and comment wording change. Thanks, Feng From: Feng Xue OS Sent: Saturday, September 18, 2021 5:38 PM To: Jason Merrill; Jan Hubicka; mjam...@suse.cz; Richard Biener; gcc-patches@gcc.gnu.org Subject: Re: [PATCH/RFC 1/2] WPD

[PATCH] Fix value uninitialization in vn_reference_insert_pieces [PR102400]

2021-09-22 Thread Feng Xue OS via Gcc-patches
Bootstrapped/regtested on x86_64-linux. Thanks, Feng --- 2021-09-23 Feng Xue gcc/ChangeLog PR tree-optimization/102400 * tree-ssa-sccvn.c (vn_reference_insert_pieces): Initialize result_vdef to zero value. --- gcc/tree-ssa-sccvn.c | 1 + 1 file changed, 1 insertion(+)

[PATCH] Fix null-pointer dereference in delete_dead_or_redundant_call [PR102451]

2021-09-22 Thread Feng Xue OS via Gcc-patches
Bootstrapped/regtested on x86_64-linux and aarch64-linux. Thanks, Feng --- 2021-09-23 Feng Xue gcc/ChangeLog: PR tree-optimization/102451 * tree-ssa-dse.c (delete_dead_or_redundant_call): Record bb of stmt before removal. --- gcc/tree-ssa-dse.c | 5 +++-- 1 file chang

Re: [PATCH/RFC 1/2] WPD: Enable whole program devirtualization

2021-09-18 Thread Feng Xue OS via Gcc-patches
>On 9/16/21 22:29, Feng Xue OS wrote: >>> On 9/16/21 05:25, Feng Xue OS via Gcc-patches wrote: >>>> This and following patches are composed to enable full devirtualization >>>> under whole program assumption (so also called whole-program >>>&g

Re: [PATCH/RFC 1/2] WPD: Enable whole program devirtualization

2021-09-16 Thread Feng Xue OS via Gcc-patches
>On 9/16/21 05:25, Feng Xue OS via Gcc-patches wrote: >> This and following patches are composed to enable full devirtualization >> under whole program assumption (so also called whole-program >> devirtualization, WPD for short), which is an enhancement to current >> s

[PATCH/RFC 2/2] WPD: Enable whole program devirtualization at LTRANS

2021-09-16 Thread Feng Xue OS via Gcc-patches
This patch is to extend applicability of full devirtualization to LTRANS stage. Normally, whole program assumption would not hold when WPA splits whole compilation into more than one LTRANS partitions. To avoid information lost for WPD at LTRANS, we will record all vtable nodes and related member

[PATCH/RFC 1/2] WPD: Enable whole program devirtualization

2021-09-16 Thread Feng Xue OS via Gcc-patches
This and following patches are composed to enable full devirtualization under whole program assumption (so also called whole-program devirtualization, WPD for short), which is an enhancement to current speculative devirtualization. The base of the optimization is how to identify class type that is

Re: [PATCH] Fix loop split incorrect count and probability

2021-08-10 Thread Feng Xue OS via Gcc-patches
propagates count partially. Thanks, Feng From: Richard Biener Sent: Tuesday, August 10, 2021 10:47 PM To: Xionghu Luo Cc: gcc-patches@gcc.gnu.org; seg...@kernel.crashing.org; Feng Xue OS; wschm...@linux.ibm.com; guoji...@linux.ibm.com; li...@gcc.gnu.org; hubi

Re: [PATCH] Fix loop split incorrect count and probability

2021-08-08 Thread Feng Xue OS via Gcc-patches
Yes. Condition to to switch two versioned loops is "true", the first two arguments should be 100% and 0%. It is different from normal loop split, we could not deduce exactly precise probability for condition-based loop split, since cfg inside loop2 would be changed. (invar-branch is replaced to

Question about non-POD class type

2021-05-14 Thread Feng Xue OS via Gcc-patches
For an instance of a non-POD class, can I always assume that any operation on it should be type-safe, any wrong or even trick code to violate this is UB in C++ spec? For example, here are some ways: union { Type1 *p1; Type2 *p2; }; or union { Type1 t1; Type2 t2; }; or void

Re: [PATCH/RFC] Add a new memory gathering optimization for loop (PR98598)

2021-05-06 Thread Feng Xue OS via Gcc-patches
>> gcc/ >> PR tree-optimization/98598 >> * Makefile.in (OBJS): Add tree-ssa-loop-mgo.o. >> * common.opt (-ftree-loop-mgo): New option. > > Just a quick comment - -ftree-loop-mgo is user-facing and it isn't really a > good > name. -floop-mgo would be better but still I'd h

Re: [PATCH/RFC] Add a new memory gathering optimization for loop (PR98598)

2021-04-29 Thread Feng Xue OS via Gcc-patches
>> This patch implements a new loop optimization according to the proposal >> in RFC given at >> https://gcc.gnu.org/pipermail/gcc/2021-January/234682.html. >> So do not repeat the idea in this mail. Hope your comments on it. > > With the caveat that I'm not an optimization expert (but no one else

[PATCH] Fix testcases to avoid plusminus-with-convert pattern (PR 97066)

2020-09-16 Thread Feng Xue OS via Gcc-patches
With the new pattern rule (T)(A) +- (T)(B) -> (T)(A +- B), some testcases are simplified and could not keep expected code pattern as test-check. Minor changes are made to those cases to avoid simplification effect of the rule. Tested on x86_64-linux and aarch64-linux. Feng --- 2020-09-16 Feng Xu

Re: [PATCH 2/2 V4] Add plusminus-with-convert pattern (PR 94234)

2020-09-15 Thread Feng Xue OS via Gcc-patches
>> Add a rule (T)(A) +- (T)(B) -> (T)(A +- B), which works only when (A +- B) >> could be folded to a simple value. By this rule, a >> plusminus-mult-with-convert >> expression could be handed over to the rule (A * C) +- (B * C) -> (A +- B). > >Please use INTEGRAL_TYPE_P () instead of TREE_CODE ==

Re: Ping: [PATCH 2/2 V3] Simplify plusminus-mult-with-convert expr in forwprop (PR 94234)

2020-09-15 Thread Feng Xue OS via Gcc-patches
>> This patch is to handle simplification of plusminus-mult-with-convert >> expression >> as ((T) X) +- ((T) Y), in which at least one of (X, Y) is result of >> multiplication. >> This is done in forwprop pass. We try to transform it to (T) (X +- Y), and >> resort >> to gimple-matcher to fold (X

[PATCH 2/2 V4] Add plusminus-with-convert pattern (PR 94234)

2020-09-15 Thread Feng Xue OS via Gcc-patches
Add a rule (T)(A) +- (T)(B) -> (T)(A +- B), which works only when (A +- B) could be folded to a simple value. By this rule, a plusminus-mult-with-convert expression could be handed over to the rule (A * C) +- (B * C) -> (A +- B). Bootstrapped/regtested on x86_64-linux and aarch64-linux. Feng ---

Re: Ping: [PATCH 1/2] Fold plusminus_mult expr with multi-use operands (PR 94234)

2020-09-14 Thread Feng Xue OS via Gcc-patches
l) here. OK with that change. Ok. > >I've tried again to think about sth prettier to cover these kind of >single-use checks but failed to come up with sth. Maybe we need a smart combiner that can deduce cost globally, and remove these single-use specifiers from rule description. Feng

Ping: [PATCH 1/2] Fold plusminus_mult expr with multi-use operands (PR 94234)

2020-09-13 Thread Feng Xue OS via Gcc-patches
Thanks, Feng From: Feng Xue OS Sent: Thursday, September 3, 2020 2:06 PM To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/2] Fold plusminus_mult expr with multi-use operands (PR 94234) For pattern A * C +- B * C -> (A +- B) * C, simplification is disab

Ping: [PATCH 2/2 V3] Simplify plusminus-mult-with-convert expr in forwprop (PR 94234)

2020-09-13 Thread Feng Xue OS via Gcc-patches
Thanks, Feng From: Feng Xue OS Sent: Thursday, September 3, 2020 5:29 PM To: Richard Biener; gcc-patches@gcc.gnu.org Subject: Re: [PATCH 2/2 V3] Simplify plusminus-mult-with-convert expr in forwprop (PR 94234) Attach patch file. Feng

Re: [PATCH] Fix ICE in ipa-cp due to cost addition overflow (PR 96806)

2020-09-03 Thread Feng Xue OS via Gcc-patches
>> Hi, >> >> On Mon, Aug 31 2020, Feng Xue OS wrote: >> > This patch is to fix a bug that cost that is used to evaluate clone >> > candidate >> > becomes negative due to integer overflow. >> > >> > Feng >> > --- >> &

Re: [PATCH 2/2 V3] Simplify plusminus-mult-with-convert expr in forwprop (PR 94234)

2020-09-03 Thread Feng Xue OS via Gcc-patches
Attach patch file. Feng From: Gcc-patches on behalf of Feng Xue OS via Gcc-patches Sent: Thursday, September 3, 2020 5:27 PM To: Richard Biener; gcc-patches@gcc.gnu.org Subject: [PATCH 2/2 V3] Simplify plusminus-mult-with-convert expr in forwprop (PR

[PATCH 2/2 V3] Simplify plusminus-mult-with-convert expr in forwprop (PR 94234)

2020-09-03 Thread Feng Xue OS via Gcc-patches
This patch is to handle simplification of plusminus-mult-with-convert expression as ((T) X) +- ((T) Y), in which at least one of (X, Y) is result of multiplication. This is done in forwprop pass. We try to transform it to (T) (X +- Y), and resort to gimple-matcher to fold (X +- Y) instead of man

[PATCH 1/2] Fold plusminus_mult expr with multi-use operands (PR 94234)

2020-09-02 Thread Feng Xue OS via Gcc-patches
For pattern A * C +- B * C -> (A +- B) * C, simplification is disabled when A and B are not single-use. This patch is a minor enhancement on the pattern, which allows folding if final result is found to be a simple gimple value (constant/existing SSA). Bootstrapped/regtested on x86_64-linux and aa

Re: [PATCH V2] Add pattern for pointer-diff on addresses with same base/offset (PR 94234)

2020-09-01 Thread Feng Xue OS via Gcc-patches
>> >> gcc/ >> >> PR tree-optimization/94234 >> >> * tree-ssa-forwprop.c (simplify_binary_with_convert): New >> >> function. >> >> * (fwprop_ssa_val): Move it before its new caller. >> >> > No * at this line. There's an entry for (pass_forwprop::execute) missing. >> OK. >>

Re: [PATCH V2] Add pattern for pointer-diff on addresses with same base/offset (PR 94234)

2020-09-01 Thread Feng Xue OS via Gcc-patches
>> gcc/ >> PR tree-optimization/94234 >> * tree-ssa-forwprop.c (simplify_binary_with_convert): New function. >> * (fwprop_ssa_val): Move it before its new caller. > No * at this line. There's an entry for (pass_forwprop::execute) missing. OK. > I don't think the transfor

PING: [PATCH V2] Add pattern for pointer-diff on addresses with same base/offset (PR 94234)

2020-08-31 Thread Feng Xue OS via Gcc-patches
Thanks, Feng From: Feng Xue OS Sent: Wednesday, August 19, 2020 5:17 PM To: Richard Biener Cc: gcc-patches@gcc.gnu.org; Marc Glisse Subject: [PATCH V2] Add pattern for pointer-diff on addresses with same base/offset (PR 94234) As Richard's comment,

Re: [PATCH] Fix ICE in ipa-cp due to cost addition overflow (PR 96806)

2020-08-31 Thread Feng Xue OS via Gcc-patches
>>> the component is "ipa," please change that when you commit the patch. >> Mistake has been made, I'v pushed it. Is there a way to correct it? git push >> --force? > > There is. You need to wait until tomorrow (after the commit message > gets copied to gcc/ChangeLog by a script) and then push a

Re: [PATCH] Fix ICE in ipa-cp due to cost addition overflow (PR 96806)

2020-08-31 Thread Feng Xue OS via Gcc-patches
>> gcc/ >> PR tree-optimization/96806 > the component is "ipa," please change that when you commit the patch. Mistake has been made, I'v pushed it. Is there a way to correct it? git push --force? Thanks, Feng

  1   2   3   >