Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-10-22 Thread Feng Xue OS
Michael, > I've only noticed a couple typos, and one minor remark. Typos corrected. > I just wonder why you duplicated these three loops instead of integrating > the real body into the existing LI_FROM_INNERMOST loop. I would have > expected your "if (!optimize_loop_for_size_p && split_loop_on_

Re: Ping: [PATCH V4] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-10-23 Thread Feng Xue OS
Thanks for your comment, I will update the case accordingly. Feng From: luoxhu Sent: Wednesday, October 23, 2019 4:02 PM To: Feng Xue OS; Martin Jambor; Jan Hubicka; gcc-patches@gcc.gnu.org Subject: Re: Ping: [PATCH V4] Extend IPA-CP to support

Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-10-23 Thread Feng Xue OS
Patch attached. Feng From: Richard Biener Sent: Wednesday, October 23, 2019 5:04 PM To: Feng Xue OS Cc: Michael Matz; Philipp Tomsich; gcc-patches@gcc.gnu.org; Christoph Müllner; erick.oc...@theobroma-systems.com Subject: Re: [PATCH V3] Loop split upon

Re: [PATCH] Support multi-versioning on self-recursive function (ipa/92133)

2019-10-23 Thread Feng Xue OS
, October 24, 2019 1:44 PM To: Feng Xue OS; gcc-patches@gcc.gnu.org; Jan Hubicka; Martin Jambor Subject: Re: [PATCH] Support multi-versioning on self-recursive function (ipa/92133) Hi, On 2019/10/17 16:23, Feng Xue OS wrote: > IPA does not allow constant propagation on parameter that is used to cont

Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-10-24 Thread Feng Xue OS
Richard, Thanks for your comments. >+ /* For PHI node that is not in loop header, its source operands should >+be defined inside the loop, which are seen as loop variant. */ >+ if (def_bb != loop->header || !skip_head) >+ return false; > so if we have > > for (;;) >

[PATCH V4] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-10-31 Thread Feng Xue OS
Hi, Richard This is a new patch to support more generalized semi-invariant condition, which uses control dependence analysis. Thanks, Feng From: Feng Xue OS Sent: Friday, October 25, 2019 11:43 AM To: Richard Biener Cc: Michael Matz; Philipp Tomsich

Ping: [PATCH V5] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-11-04 Thread Feng Xue OS
Hi, Honza & Martin, This is a new patch merged with the newest IPA changes. Would you please take a look at the patch? Together with the other patch on recursive function versioning, we can find more than 30% performance boost on exchange2 in spec2017. So, it will be good if two patches can en

Re: Ping: [PATCH V6] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-11-05 Thread Feng Xue OS
Hi Martin, Thanks for your review. I updated the patch with your comments. Feng > Sorry that it took so long. Next time, please consider making the > review a bit easier by writing a ChangeLog (yes, I usually read them and > you'll have to write one anyway). >> + class ipcp_param_l

Re: [PATCH V4] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-11-05 Thread Feng Xue OS
> Uh. Note it's not exactly helpful to change algorithms between > reviews, that makes it > just harder :/ > > Btw, I notice you use post-dominance info. Note that we generally do > not keep that > up-to-date with CFG manipulations (and for dominators fast queries are > disabled). > Probably the

Re: Ping: [PATCH V6] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-11-13 Thread Feng Xue OS
Thanks. And for this issue, we can add a new tracker as a followup task. Feng From: Jan Hubicka Sent: Tuesday, November 12, 2019 8:34 PM To: Feng Xue OS Cc: Martin Jambor; gcc-patches@gcc.gnu.org Subject: Re: Ping: [PATCH V6] Extend IPA-CP to support

Re: Ping: [PATCH V6] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-11-13 Thread Feng Xue OS
Please check the attachment, and this patch is based on the previous extended agg-jump-function patch. Thanks, Feng From: Jan Hubicka Sent: Tuesday, November 12, 2019 8:41 PM To: Feng Xue OS Subject: Re: Ping: [PATCH V6] Extend IPA-CP to support

[PATCH] Remove assertion in get_info_about_necessary_edges (PR ipa/93166)

2020-01-19 Thread Feng Xue OS
Bootstrapped/regtested on x86_64-linux and aarch64-linux. Feng --- 2020-01-19 Feng Xue PR ipa/93166 * ipa-cp.c (get_info_about_necessary_edges): Remove value check assertion.From 02e4bea314a0ca0a8befb85c64efcfe422d35cb8 Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Sun

[PATCH] Generalized value pass-through for self-recursive function (ipa/pr93203)

2020-01-25 Thread Feng Xue OS
Besides simple pass-through (aggregate) jump function, arithmetic (aggregate) jump function could also bring same (aggregate) value as parameter passed-in for self-feeding recursive call. For example, f1 (int i)/* normal jump function */ { f1 (i & 1); } S

[PATCH V2] Generalized value pass-through for self-recursive function (ipa/pr93203)

2020-01-25 Thread Feng Xue OS
Made some changes. Feng From: Feng Xue OS Sent: Saturday, January 25, 2020 5:54 PM To: mjam...@suse.cz; Jan Hubicka; gcc-patches@gcc.gnu.org Subject: [PATCH] Generalized value pass-through for self-recursive function (ipa/pr93203) Besides simple pass

[PATCH] Fix missed IPA-CP on by-ref argument directly passed through (PR ipa/93429)

2020-01-26 Thread Feng Xue OS
Current IPA does not propagate aggregate constant for by-ref argument if it is simple pass-through of caller parameter. Here is an example, f1 (int *p) { ... = *p; ... } f2 (int *p) { *p = 2; f1 (p); } It is easy to know that in f1(), *p should be 2 after

Ping* [PATCH V2] Generalized value pass-through for self-recursive function (ipa/pr93203)

2020-02-02 Thread Feng Xue OS
Thanks, Feng From: Feng Xue OS Sent: Saturday, January 25, 2020 9:50 PM To: mjam...@suse.cz; Jan Hubicka; gcc-patches@gcc.gnu.org Subject: [PATCH V2] Generalized value pass-through for self-recursive function (ipa/pr93203) Made some changes. Feng

Re: [PATCH V2] Generalized value pass-through for self-recursive function (ipa/pr93203)

2020-02-09 Thread Feng Xue OS
>> - gcc_checking_assert (item->value); > I've been staring at this for quite a while, trying to figure out how > your patch can put NULL here before I realized it was just a clean-up > :-) Sending such changes independently or pointing them out in the > email/ChangeLog makes review eas

Re: [PATCH V2] Generalized value pass-through for self-recursive function (ipa/pr93203)

2020-02-11 Thread Feng Xue OS
Christina Sent: Tuesday, February 11, 2020 6:05 PM To: Feng Xue OS; Martin Jambor; Jan Hubicka; gcc-patches@gcc.gnu.org Cc: nd Subject: RE: [PATCH V2] Generalized value pass-through for self-recursive function (ipa/pr93203) Hi Feng, This patch (commit a0f6a8cb414b687f22c9011a894d5e8e398c4be0) is

[PATCH] Fix bug in recursiveness check for function to be cloned (ipa/pr93707)

2020-02-12 Thread Feng Xue OS
self_recursive_pass_through_p and intersect_aggregates_with_edge calls. (cgraph_edge_brings_all_agg_vals_for_node): Add "node" argument to intersect_aggregates_with_edge call. > > From: gcc-patches-ow...@gcc.gnu.org > o

[PATCH] Handle aggregate pass-through for self-recursive call (PR ipa/92794)

2019-12-17 Thread Feng Xue OS
If argument for a self-recursive call is a simple pass-through, the call edge is also considered as source of any value originated from non-recursive call to the function. Scalar pass-through and full aggregate pass-through due to pointer pass-through have also been handled. But we missed another k

Re: [PATCH] Handle aggregate pass-through for self-recursive call (PR ipa/92794)

2019-12-18 Thread Feng Xue OS
>> +static bool >> +self_recursive_agg_pass_through_p (cgraph_edge *cs, ipa_agg_jf_item *jfunc, >> +int i) >> +{ >> + if (cs->caller == cs->callee->function_symbol () > I don't know if self-recursive calls can be interposed at all, if yes > you need to add the av

Re: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808

2019-12-30 Thread Feng Xue OS
tches@gcc.gnu.org; seg...@kernel.crashing.org; wschm...@linux.ibm.com; guoji...@linux.ibm.com; li...@gcc.gnu.org; Feng Xue OS Subject: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808 v2 Changes: 1. Enable proportion orig_sum to the new nodes for self recursive node: new_sum = (orig_sum + ne

Re: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808

2019-12-31 Thread Feng Xue OS
esday, December 31, 2019 3:43 PM To: Feng Xue OS; Jan Hubicka; Martin Jambor Cc: Martin Liška; gcc-patches@gcc.gnu.org; seg...@kernel.crashing.org; wschm...@linux.ibm.com; guoji...@linux.ibm.com; li...@gcc.gnu.org Subject: Re: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808 On 2019/12/31 14:43,

[PATCH] Fix a bug that propagation in recursive function uses wrong aggregate lattice (PR ipa/93084)

2020-01-03 Thread Feng Xue OS
When checking a self-recursively generated value for aggregate jump function, wrong aggregate lattice was used, which will cause infinite constant propagation. This patch is composed to fix this issue. 2020-01-03 Feng Xue PR ipa/93084 * ipa-cp.c (self_recursively_generated_p):

[PATCH] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-04-07 Thread Feng Xue OS
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current vectorizer could only handle the pattern if the reduction chain does not contain other operation, no matter the other is normal or lane-reducing. Acctually, to allow multiple arbitray lane-reducing operations, we need t

PING: [PATCH] Do not count unused scalar use when marking STMT_VINFO_LIVE_P [PR113091]

2024-01-10 Thread Feng Xue OS
Hi, Richard, Would you please talk a look at this patch? Thanks, Feng From: Feng Xue OS Sent: Friday, December 29, 2023 6:28 PM To: gcc-patches@gcc.gnu.org Subject: [PATCH] Do not count unused scalar use when marking STMT_VINFO_LIVE_P [PR113091

Re: [PATCH] Do not count unused scalar use when marking STMT_VINFO_LIVE_P [PR113091]

2024-01-11 Thread Feng Xue OS
mark_live_stmts (bb_vinfo, SLP_INSTANCE_TREE (instance), - instance, &instance->cost_vec, svisited, - visited); - } -} +vect_bb_slp_mark_live_stmts (bb_vinfo); return !vinfo->slp_instances.is_empty (

[PATCH] Do not count unused scalar use when marking STMT_VINFO_LIVE_P [PR113091]

2023-12-29 Thread Feng Xue OS
This patch is meant to fix over-estimation about SLP vector-to-scalar cost for STMT_VINFO_LIVE_P statement. When pattern recognition is involved, a statement whose definition is consumed in some pattern, may not be included in the final replacement pattern statements, and would be skipped when buil

Re: [PATCH] vect: Fix shift-by-induction for single-lane slp

2024-06-27 Thread Feng Xue OS
LP_TREE_LANES (slp_node) == 1)) scalar_shift_arg = false; else if (dt[1] == vect_constant_def || dt[1] == vect_external_def -- 2.17.1 ________ From: Richard Biener Sent: Thursday, June 27, 2024 12:49 AM To: Feng Xue OS Cc: gcc-patches@gcc.gnu.org S

[PATCH 1/4] vect: Shorten name of macro SLP_TREE_NUMBER_OF_VEC_STMTS

2024-07-11 Thread Feng Xue OS
This patch series are recomposed and split from https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655974.html. As I will add a new field tightly coupled with "vec_stmts_size", if following naming conversion as original, the new macro would be very long. So better to choose samely meaningful but

[PATCH 2/4] vect: Fix inaccurate vector stmts number for slp reduction with lane-reducing

2024-07-11 Thread Feng Xue OS
Vector stmts number of an operation is calculated based on output vectype. This is over-estimated for lane-reducing operation. Sometimes, to workaround the issue, we have to rely on additional logic to deduce an exactly accurate number by other means. Aiming at the inconvenience, in this patch, we

[PATCH 3/4] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-07-11 Thread Feng Xue OS
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current vectorizer could only handle the pattern if the reduction chain does not contain other operation, no matter the other is normal or lane-reducing. This patches removes some constraints in reduction analysis to allow mult

[PATCH 4/4] vect: Optimize order of lane-reducing statements in loop def-use cycles

2024-07-11 Thread Feng Xue OS
When transforming multiple lane-reducing operations in a loop reduction chain, originally, corresponding vectorized statements are generated into def-use cycles starting from 0. The def-use cycle with smaller index, would contain more statements, which means more instruction dependency. For example

Re: [PATCH 2/4] vect: Fix inaccurate vector stmts number for slp reduction with lane-reducing

2024-07-11 Thread Feng Xue OS
YPE? As said having wrong > SLP_TREE_NUMBER_OF_VEC_STMTS is going to backfire. Then the alternative is to limit special handling related to the vec_num only inside vect_transform_reduction. Is that ok? Or any other suggestion? Thanks, Feng From: Rich

Re: [PATCH 2/4] vect: Fix inaccurate vector stmts number for slp reduction with lane-reducing

2024-07-13 Thread Feng Xue OS
gt; > when that's set instead of SLP_TREE_VECTYPE? As said having wrong > > > SLP_TREE_NUMBER_OF_VEC_STMTS is going to backfire. > > > > Then the alternative is to limit special handling related to the vec_num > > only > > inside vect_transform_reduction. Is

[PATCH 1/4] vect: Add a unified vect_get_num_copies for slp and non-slp

2024-07-13 Thread Feng Xue OS
Extend original vect_get_num_copies (pure loop-based) to calculate number of vector stmts for slp node regarding a generic vect region. Thanks, Feng --- gcc/ * tree-vectorizer.h (vect_get_num_copies): New overload function. (vect_get_slp_num_vectors): New function. * tree-v

[PATCH 2/4] vect: Refit lane-reducing to be normal operation

2024-07-13 Thread Feng Xue OS
Vector stmts number of an operation is calculated based on output vectype. This is over-estimated for lane-reducing operation, which would cause vector def/use mismatched when we want to support loop reduction mixed with lane- reducing and normal operations. One solution is to refit lane-reducing t

[PATCH 3/4] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-07-13 Thread Feng Xue OS
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current vectorizer could only handle the pattern if the reduction chain does not contain other operation, no matter the other is normal or lane-reducing. This patches removes some constraints in reduction analysis to allow mult

[PATCH 4/4] vect: Optimize order of lane-reducing statements in loop def-use cycles

2024-07-13 Thread Feng Xue OS
When transforming multiple lane-reducing operations in a loop reduction chain, originally, corresponding vectorized statements are generated into def-use cycles starting from 0. The def-use cycle with smaller index, would contain more statements, which means more instruction dependency. For example

Re: [PATCH 1/4] vect: Add a unified vect_get_num_copies for slp and non-slp

2024-07-17 Thread Feng Xue OS
ke the checking assert unconditional? > > OK with that change. vect_get_num_vectors will ICE anyway > I guess, so at your choice remove the assert completely. > OK, I removed the assert. Thanks, Feng From: Richard Biener Sent: Monday, July 15,

Re: [PATCH] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-05-24 Thread Feng Xue OS
Hi, The patch was updated with the newest trunk, and also contained some minor changes. I am working on another new feature which is meant to support pattern recognition of lane-reducing operations in affine closure originated from loop reduction variable, like: sum += cst1 * dot_prod_1 + c

[PATCH] vect: Use vect representative statement instead of original in patch recog [PR115060]

2024-05-25 Thread Feng Xue OS
Some utility functions (such as vect_look_through_possible_promotion) that are to find out certain kind of direct or indirect definition SSA for a value, may return the original one of the SSA, not its pattern representative SSA, even pattern is involved. For example, a = (T1) patt_b; pa

[PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info

2024-05-25 Thread Feng Xue OS
Both derived classes ( loop_vec_info/bb_vec_info) have their own "bbs" field, which have exactly same purpose of recording all basic blocks inside the corresponding vect region, while the fields are composed by different data type, one is normal array, the other is auto_vec. This difference causes

Re: [PATCH] vect: Use vect representative statement instead of original in patch recog [PR115060]

2024-05-28 Thread Feng Xue OS
Changed as the comments. Thanks, Feng From: Richard Biener Sent: Tuesday, May 28, 2024 5:34 PM To: Feng Xue OS Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] vect: Use vect representative statement instead of original in patch recog [PR115060] On Sat

Re: [PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info

2024-05-28 Thread Feng Xue OS
_info_shared *); ~_bb_vec_info (); - /* The region we are operating on. bbs[0] is the entry, excluding - its PHI nodes. In the future we might want to track an explicit - entry edge to cover bbs[0] PHI nodes and have a region entry - insert location. */ - vec bbs; - vec roots; }

Re: [PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info

2024-05-29 Thread Feng Xue OS
Ok. Then I will add a TODO comment on "bbs" field to describe it. Thanks, Feng From: Richard Biener Sent: Wednesday, May 29, 2024 3:14 PM To: Feng Xue OS Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] vect: Unify bbs in loop_vec_info and b

Re: [PATCH] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-05-30 Thread Feng Xue OS
>> Hi, >> >> The patch was updated with the newest trunk, and also contained some minor >> changes. >> >> I am working on another new feature which is meant to support pattern >> recognition >> of lane-reducing operations in affine closure originated from loop reduction >> variable, >> like: >>

[PATCH 1/6] vect: Add a function to check lane-reducing code [PR114440]

2024-05-30 Thread Feng Xue OS
This is a patch that is split out from https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652626.html. Check if an operation is lane-reducing requires comparison of code against three kinds (DOT_PROD_EXPR/WIDEN_SUM_EXPR/SAD_EXPR). Add an utility function to make source coding for the check handy

[PATCH 2/6] vect: Split out partial vect checking for reduction into a function

2024-05-30 Thread Feng Xue OS
This is a patch that is split out from https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652626.html. Partial vectorization checking for vectorizable_reduction is a piece of relatively isolated code, which may be reused by other places. Move the code into a new function for sharing. Thanks, Fen

[PATCH 3/6] vect: Set STMT_VINFO_REDUC_DEF for non-live stmt in loop reduction

2024-05-30 Thread Feng Xue OS
Normally, vectorizable checking on statement in a loop reduction chain does not use the reduction PHI information. But some special statements might need it in vectorizable analysis, especially, for multiple lane-reducing operations support later. Thanks, Feng --- gcc/ * tree-vect-loop.cc

[PATCH 4/6] vect: Bind input vectype to lane-reducing operation

2024-05-30 Thread Feng Xue OS
The input vectype is an attribute of lane-reducing operation, instead of reduction PHI that it is associated to, since there might be more than one lane-reducing operations with different type in a loop reduction chain. So bind each lane-reducing operation with its own input type. Thanks, Feng ---

[PATCH 5/6] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-05-30 Thread Feng Xue OS
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current vectorizer could only handle the pattern if the reduction chain does not contain other operation, no matter the other is normal or lane-reducing. Actually, to allow multiple arbitray lane-reducing operations, we need to

[PATCH 6/6] vect: Optimize order of lane-reducing statements in loop def-use cycles [PR114440]

2024-05-30 Thread Feng Xue OS
When transforming multiple lane-reducing operations in a loop reduction chain, originally, corresponding vectorized statements are generated into def-use cycles starting from 0. The def-use cycle with smaller index, would contain more statements, which means more instruction dependency. For example

Re: [PATCH 2/6] vect: Split out partial vect checking for reduction into a function

2024-05-31 Thread Feng Xue OS
Ok. Updated as the comments. Thanks, Feng From: Richard Biener Sent: Friday, May 31, 2024 3:29 PM To: Feng Xue OS Cc: Tamar Christina; gcc-patches@gcc.gnu.org Subject: Re: [PATCH 2/6] vect: Split out partial vect checking for reduction into a function

Re: [PATCH 5/6] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-06-02 Thread Feng Xue OS
Please see my comments below. Thanks, Feng > On Thu, May 30, 2024 at 4:55 PM Feng Xue OS > wrote: >> >> For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, >> current >> vectorizer could only handle the pattern if the reduction chain does not

Re: [RFC] Generalize formation of lane-reducing ops in loop reduction

2024-08-21 Thread Feng Xue OS
>> >> >> 1. Background >> >> >> >> For loop reduction of accumulating result of a widening operation, the >> >> preferred pattern is lane-reducing operation, if supported by target. >> >> Because >> >> this kind of operation need not preserve intermediate results of widening >> >> operation, and o

Re: [PATCH 3/6] vect: Set STMT_VINFO_REDUC_DEF for non-live stmt in loop reduction

2024-06-13 Thread Feng Xue OS
gcc_assert (reduction_type != EXTRACT_LAST_REDUCTION -- 2.17.1 ____________ From: Feng Xue OS Sent: Thursday, May 30, 2024 10:51 PM To: Richard Biener Cc: Tamar Christina; gcc-patches@gcc.gnu.org Subject: [PATCH 3/6] vect: Set STMT_VINFO_REDUC_DEF for non-live stmt i

Re: [PATCH 6/6] vect: Optimize order of lane-reducing statements in loop def-use cycles [PR114440]

2024-06-13 Thread Feng Xue OS
able gives the initial scalar values of those N reductions. */ -- 2.17.1 ________ From: Feng Xue OS Sent: Thursday, May 30, 2024 10:56 PM To: Richard Biener Cc: Tamar Christina; gcc-patches@gcc.gnu.org Subject: [PATCH 6/6] vect: Optimize order of lane-reducing

[PATH 1/8] vect: Add a function to check lane-reducing stmt

2024-06-16 Thread Feng Xue OS
The series of patches are meant to support multiple lane-reducing reduction statements. Since the original ones conflicted with the new single-lane slp node patches, I have reworked most of the patches, and split them as small as possible, which may make code review easier. In the 1st one, I ad

[PATCH 2/8] vect: Remove duplicated check on reduction operand

2024-06-16 Thread Feng Xue OS
In vectorizable_reduction, one check on a reduction operand via index could be contained by another one check via pointer, so remove the former. Thanks, Feng --- gcc/ * tree-vect-loop.cc (vectorizable_reduction): Remove the duplicated check. --- gcc/tree-vect-loop.cc | 6 ++

[PATCH 3/8] vect: Use one reduction_type local variable

2024-06-16 Thread Feng Xue OS
Two local variables were defined to refer same STMT_VINFO_REDUC_TYPE, better to keep only one. Thanks, Feng --- gcc/ * tree-vect-loop.cc (vectorizable_reduction): Remove v_reduc_type, and replace it to another local variable reduction_type. --- gcc/tree-vect-loop.cc | 8

[PATCH 4/8] vect: Determine input vectype for multiple lane-reducing

2024-06-16 Thread Feng Xue OS
The input vectype of reduction PHI statement must be determined before vect cost computation for the reduction. Since lance-reducing operation has different input vectype from normal one, so we need to traverse all reduction statements to find out the input vectype with the least lanes, and set tha

[PATCH 5/8] vect: Use an array to replace 3 relevant variables

2024-06-16 Thread Feng Xue OS
It's better to place 3 relevant independent variables into array, since we have requirement to access them via an index in the following patch. At the same time, this change may get some duplicated code be more compact. Thanks, Feng --- gcc/ * tree-vect-loop.cc (vect_transform_reduction):

[PATCH 6/8] vect: Tighten an assertion for lane-reducing in transform

2024-06-16 Thread Feng Xue OS
According to logic of code nearby the assertion, all lane-reducing operations should not appear, not just DOT_PROD_EXPR. Since "use_mask_by_cond_expr_p" treats SAD_EXPR same as DOT_PROD_EXPR, and WIDEN_SUM_EXPR should not be allowed by the following assertion "gcc_assert (commutative_binary_op_p (.

[PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-06-16 Thread Feng Xue OS
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current vectorizer could only handle the pattern if the reduction chain does not contain other operation, no matter the other is normal or lane-reducing. Actually, to allow multiple arbitrary lane-reducing operations, we need t

[PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles

2024-06-16 Thread Feng Xue OS
When transforming multiple lane-reducing operations in a loop reduction chain, originally, corresponding vectorized statements are generated into def-use cycles starting from 0. The def-use cycle with smaller index, would contain more statements, which means more instruction dependency. For example

Re: [PATCH 4/8] vect: Determine input vectype for multiple lane-reducing

2024-06-19 Thread Feng Xue OS
{ 0, 0, 0, 0 }; loop () { sum_v0 = dot_prod<16 * char>(char_a0, char_a1, sum_v0); sum_v1 = dot_prod<16 * char>(char_b0, char_b1, sum_v1); sum_v0 = dot_prod<8 * short>(short_c0_lo, short_c1_lo, sum_v0); sum_v1 = dot_prod<8 * short>(short_

Re: [PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-06-19 Thread Feng Xue OS
662a3..1b73ef01ade 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -13350,6 +13350,8 @@ vect_analyze_stmt (vec_info *vinfo, NULL, NULL, node, cost_vec) || vectorizable_load (vinfo, stmt_info, NULL, NULL, node, cost_vec) || vectorizable_store (vinfo, stmt_inf

Re: [PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles

2024-06-19 Thread Feng Xue OS
lar values of those N reductions. */ -- 2.17.1 ____________ From: Feng Xue OS Sent: Sunday, June 16, 2024 3:32 PM To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: [PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles When trans

Re: [PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-06-23 Thread Feng Xue OS
s - 1 given you use one above > and the other below? Or simply iterate till op.num_ops > and sip i == reduc_index. > >> + for (unsigned i = 0; i < op.num_ops - 1; i++) >> + { >> + gcc_assert (vec_oprnds[i].length () == using_ncopies); >> +

Re: [PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-06-25 Thread Feng Xue OS
>> >> >> - if (slp_node) >> >> + if (slp_node && SLP_TREE_LANES (slp_node) > 1) >> > >> > Hmm, that looks wrong. It looks like SLP_TREE_NUMBER_OF_VEC_STMTS is off >> > instead, which is bad. >> > >> >> nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); >> >>else >> >>

Re: [PATCH 4/8] vect: Determine input vectype for multiple lane-reducing

2024-06-26 Thread Feng Xue OS
PHI records the input vectype with least lanes. */ - if (lane_reducing) -STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in; enum vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (phi_info); STMT_VINFO_REDUC_TYPE (reduc_info) = reduction_type; -- 2.17.1 ___

Re: [PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-06-26 Thread Feng Xue OS
s.cc b/gcc/tree-vect-stmts.cc index 840e162c7f0..845647b4399 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -13350,6 +13350,8 @@ vect_analyze_stmt (vec_info *vinfo, NULL, NULL, node, cost_vec) || vectorizable_load (vinfo, stmt_info, NU

Re: [PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles

2024-06-26 Thread Feng Xue OS
ctions. */ -- 2.17.1 ____________ From: Feng Xue OS Sent: Thursday, June 20, 2024 2:02 PM To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles This patch was updated with some new chang

[PATCH] vect: Fix shift-by-induction for single-lane slp

2024-06-26 Thread Feng Xue OS
Allow shift-by-induction for slp node, when it is single lane, which is aligned with the original loop-based handling. Thanks, Feng --- gcc/tree-vect-stmts.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index ca6052662a3..8

[PATCH] vect: Fix shift-by-induction for single-lane slp

2024-06-26 Thread Feng Xue OS
Allow shift-by-induction for slp node, when it is single lane, which is aligned with the original loop-based handling. Thanks, Feng --- gcc/tree-vect-stmts.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index ca6052662a3..8

[RFC] Generalize formation of lane-reducing ops in loop reduction

2024-07-21 Thread Feng Xue OS
Hi, I composed some patches to generalize lane-reducing (dot-product is a typical representative) pattern recognition, and prepared a RFC document so as to help review. The original intention was to make a complete solution for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114440. For sure, th

[RFC][PATCH 1/5] vect: Fix single_imm_use in tree_vect_patterns

2024-07-21 Thread Feng Xue OS
The work for RFC (https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657860.html) involves not a little code change, so I have to separate it into several batches of patchset. This and the following patches constitute the first batch. Since pattern statement coexists with normal statements in a

[RFC][PATCH 3/5] vect: Enable lane-reducing operation that is not loop reduction statement

2024-07-21 Thread Feng Xue OS
This patch extends original vect analysis and transform to support a new kind of lane-reducing operation that participates in loop reduction indirectly. The operation itself is not reduction statement, but its value would be accumulated into reduction result finally. Thanks, Feng --- gcc/

[RFC][PATCH 2/5] vect: Introduce loop reduction affine closure to vect pattern recog

2024-07-21 Thread Feng Xue OS
For sum-based loop reduction, its affine closure is composed by statements whose results and derived computation only end up in the reduction, and are not used in any non-linear transform operation. The concept underlies the generalized lane-reducing pattern recognition in the coming patches. As ma

[RFC][PATCH 4/5] vect: Extend lane-reducing patterns to non-loop-reduction statement

2024-07-21 Thread Feng Xue OS
Previously, only simple lane-reducing case is supported, in which one loop reduction statement forms one pattern match: char *d0, *d1, *s0, *s1, *w; for (i) { sum += d0[i] * d1[i]; // sum = DOT_PROD(d0, d1, sum); sum += abs(s0[i] - s1[i]); // sum = SAD(s0, s1, sum); sum += w[i

[RFC][PATCH 5/5] vect: Add accumulating-result pattern for lane-reducing operation

2024-07-21 Thread Feng Xue OS
This patch adds a pattern to fold a summation into the last operand of lane- reducing operation when appropriate, which is a supplement to those operation- specific patterns for dot-prod/sad/widen-sum. sum = lane-reducing-op(..., 0) + value; => sum = lane-reducing-op(..., value); Thanks, Feng

Re: [RFC] Generalize formation of lane-reducing ops in loop reduction

2024-08-03 Thread Feng Xue OS
>> 1. Background >> >> For loop reduction of accumulating result of a widening operation, the >> preferred pattern is lane-reducing operation, if supported by target. Because >> this kind of operation need not preserve intermediate results of widening >> operation, and only produces reduced amount

[PATCH] vect: Allow unsigned-to-signed promotion in vect_look_through_possible_promotion [PR115707]

2024-08-05 Thread Feng Xue OS
The function vect_look_through_possible_promotion() fails to figure out root definition if casts involves more than two promotions with sign change as: long a = (long)b; // promotion cast -> int b = (int)c; // promotion cast, sign change -> unsigned short c = ...; For this case, the

[PATCH] vect: Add missed opcodes in vect_get_smallest_scalar_type [PR115228]

2024-08-05 Thread Feng Xue OS
Some opcodes are missed when determining the smallest scalar type for a vectorizable statement. Currently, this bug does not cause any problem, because vect_get_smallest_scalar_type is only used to compute max nunits vectype, and even statement with missed opcode is incorrectly bypassed, the max nu

[PATCH] Do not propagate self-dependent value (PR ipa/93763)

2020-02-18 Thread Feng Xue OS
Currently, for self-recursive call, we never use value originated from non-passthrough jump function as source to avoid propagation explosion, but self-dependent value is missed. This patch is made to fix the bug. Bootstrapped/regtested on x86_64-linux and aarch64-linux. Feng --- 2020-02-18 Fe

Ping: [PATCH] Fix bug in recursiveness check for function to be cloned (ipa/pr93707)

2020-02-18 Thread Feng Xue OS
Thanks, Feng From: Tamar Christina Sent: Monday, February 17, 2020 4:44 PM To: Feng Xue OS; Martin Jambor; Jan Hubicka; gcc-patches@gcc.gnu.org Cc: nd Subject: RE: [PATCH] Fix bug in recursiveness check for function to be cloned (ipa/pr93707) Hi Feng

Re: [PATCH] Fix bug in recursiveness check for function to be cloned (ipa/pr93707)

2020-02-19 Thread Feng Xue OS
This is a simpel and nice fix, but could suppress some CP opportunities for self-recursive call. Using the test case as example, the first should be a for-all-context clone, and the call "recur_fn (i, 1, depth + 1)" is replaced with a newly created recursive node. Thus, in the next round of CP it

Re: [PATCH] Fix bug in recursiveness check for function to be cloned (ipa/pr93707)

2020-02-21 Thread Feng Xue OS
It is a good solution. Thanks, Feng From: Martin Jambor Sent: Saturday, February 22, 2020 2:15 AM To: Feng Xue OS; Tamar Christina; Jan Hubicka; gcc-patches@gcc.gnu.org Cc: nd Subject: Re: [PATCH] Fix bug in recursiveness check for function to be cloned

Ping: [PATCH V4] Generalized predicate/condition for parameter reference in IPA (PR ipa/91088)

2019-09-30 Thread Feng Xue OS
Hi, Honza & Martin, Would you please take some time to review this updated patch? Thanks. Feng From: Feng Xue OS Sent: Wednesday, September 18, 2019 8:41 PM To: Jan Hubicka Cc: Martin Jambor; gcc-patches@gcc.gnu.org Subject: [PATCH V4] General

Ping: [PATCH V4] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-09-30 Thread Feng Xue OS
Hi Honza & Martin, And also hope your comments on this patch. Thanks. Feng From: Feng Xue OS Sent: Thursday, September 19, 2019 10:30 PM To: Martin Jambor; Jan Hubicka; gcc-patches@gcc.gnu.org Subject: [PATCH V4] Extend IPA-CP to sup

Ping: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-10-08 Thread Feng Xue OS
Hi, Michael, Would you please take a look at this modified version? Thanks, Feng From: Feng Xue OS Sent: Thursday, September 12, 2019 6:21 PM To: Michael Matz Cc: Richard Biener; gcc-patches@gcc.gnu.org Subject: Re: Ping agian: [PATCH V2] Loop split

Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-10-15 Thread Feng Xue OS
Hi Philipp, This is an updated patch based on comments form Michael, and if he think this is ok, we will merge it into trunk. Thanks, Feng From: Philipp Tomsich Sent: Tuesday, October 15, 2019 11:49 PM To: Feng Xue OS Cc: Michael Matz; Richard

Re: [PATCH V4] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-10-16 Thread Feng Xue OS
Hi Philipp, This patch is still under code review, might still need some time. Thanks, Feng From: Philipp Tomsich Sent: Wednesday, October 16, 2019 12:05 AM To: Feng Xue OS Cc: Martin Jambor; Jan Hubicka; gcc-patches@gcc.gnu.org; Christoph Müllner

[PATCH] Support multi-versioning on self-recursive function (ipa/92133)

2019-10-17 Thread Feng Xue OS
IPA does not allow constant propagation on parameter that is used to control function recursion. recur_fn (i) { if ( !terminate_recursion (i)) { ... recur_fn (i + 1); ... } ... } This patch is composed to enable multi-versioning for self-recursive function, and ve

Re: [PATCH] Support multi-versioning on self-recursive function (ipa/92133)

2019-10-17 Thread Feng Xue OS
> I noticed similar issue when analyzing the SPEC, self-recursive function is > not versioned and posted my observations in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92074. > Generally, this could be implemented well by your patch, while I am > wondering whether it is OK to convert the recur

Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-10-22 Thread Feng Xue OS
Hi, Michael, Since gcc 10 release is coming, that will be good if we can add this patch before that. Thanks Feng. From: Michael Matz Sent: Wednesday, October 16, 2019 12:01 AM To: Philipp Tomsich Cc: Feng Xue OS; Richard Biener; gcc-patches

Re: [PATCH] Support multi-versioning on self-recursive function (ipa/92133)

2019-11-14 Thread Feng Xue OS
Thanks for your review. > In general the patch looks good to me, but I would like Martin Jambor to > comment on the ipa-prop/cp interfaces. However... > +@item ipa-cp-max-recursion-depth > +Maximum depth of recursive cloning for self-recursive function. > + > ... I believe we will need more care

Re: [PATCH] Support multi-versioning on self-recursive function (ipa/92133)

2019-11-14 Thread Feng Xue OS
>> Cost model used by self-recursive cloning is mainly based on existing stuffs >> in ipa-cp cloning, size growth and time benefit are considered. But since >> recursive cloning is a more aggressive cloning, we will actually have another >> problem, which is opposite to your concern. By default, c

Re: Ping: [PATCH V6] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-11-15 Thread Feng Xue OS
aning on dst_ctx? From: gcc-patches-ow...@gcc.gnu.org on behalf of Jan Hubicka Sent: Friday, November 15, 2019 4:09 PM To: Feng Xue OS Cc: Martin Jambor; gcc-patches@gcc.gnu.org Subject: Re: Ping: [PATCH V6] Extend IPA-CP to support arithmetically-computed

  1   2   3   >