inal_value (class loop *, tree, bool *, bool);
+extern void apply_scev_final_value_replacement (gphi *, tree, bool, bool);
extern bool final_value_replacement_loop (class loop *);
-extern unsigned int scev_const_prop (void);
extern bool expression_expensive_p (tree, bool *);
extern bool
Thanks, and please see my comments as below:
>> Currently, if could, scev-cprop unconditionally replaces loop closed ssa with
>> an expression built from loop initial value and loop niter, which might cause
>> redundant code-gen when all interior computations related to IV inside loop
>> are also
Forgotten attaching the patch file.
From: Feng Xue OS
Sent: Friday, December 6, 2024 9:57 PM
To: gcc-patches@gcc.gnu.org; Richard Biener
Subject: [PATCH 2/2] Integrate scev-cprop into DCE [PR90594]
Currently, if could, scev-cprop unconditionally replaces
Currently, if could, scev-cprop unconditionally replaces loop closed ssa with
an expression built from loop initial value and loop niter, which might cause
redundant code-gen when all interior computations related to IV inside loop
are also neccessary. As example, for the below case:
p = init_
This patch refactors the procedure in tree-scalar-evolution.cc in order to
partially export its functionality to other module, so decomposes it to several
relatively independent utility functions.
Thanks,
Feng
---
gcc/
PR tree-optimization/90594
* tree-scalar-evolution.cc (simple
Added.
Thanks,
Feng
From: Richard Biener
Sent: Saturday, October 12, 2024 8:12 PM
To: Feng Xue OS
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] vect: Fix inconsistency in fully-masked lane-reducing op
generation [PR116985]
On Sat, Oct 12, 2024 at 9
To align vectorized def/use when lane-reducing op is present in loop reduction,
we may need to insert extra trivial pass-through copies, which would cause
mismatch between lane-reducing vector copy and loop mask index. This could be
fixed by computing the right index around a new counter on effecti
>>
>> >> 1. Background
>> >>
>> >> For loop reduction of accumulating result of a widening operation, the
>> >> preferred pattern is lane-reducing operation, if supported by target.
>> >> Because
>> >> this kind of operation need not preserve intermediate results of widening
>> >> operation, and o
Some opcodes are missed when determining the smallest scalar type for a
vectorizable statement. Currently, this bug does not cause any problem,
because vect_get_smallest_scalar_type is only used to compute max nunits
vectype, and even statement with missed opcode is incorrectly bypassed,
the max nu
The function vect_look_through_possible_promotion() fails to figure out root
definition if casts involves more than two promotions with sign change as:
long a = (long)b; // promotion cast
-> int b = (int)c; // promotion cast, sign change
-> unsigned short c = ...;
For this case, the
>> 1. Background
>>
>> For loop reduction of accumulating result of a widening operation, the
>> preferred pattern is lane-reducing operation, if supported by target. Because
>> this kind of operation need not preserve intermediate results of widening
>> operation, and only produces reduced amount
This patch adds a pattern to fold a summation into the last operand of lane-
reducing operation when appropriate, which is a supplement to those operation-
specific patterns for dot-prod/sad/widen-sum.
sum = lane-reducing-op(..., 0) + value;
=>
sum = lane-reducing-op(..., value);
Thanks,
Feng
For sum-based loop reduction, its affine closure is composed by statements
whose results and derived computation only end up in the reduction, and are
not used in any non-linear transform operation. The concept underlies the
generalized lane-reducing pattern recognition in the coming patches. As
ma
Previously, only simple lane-reducing case is supported, in which one loop
reduction statement forms one pattern match:
char *d0, *d1, *s0, *s1, *w;
for (i) {
sum += d0[i] * d1[i]; // sum = DOT_PROD(d0, d1, sum);
sum += abs(s0[i] - s1[i]); // sum = SAD(s0, s1, sum);
sum += w[i
This patch extends original vect analysis and transform to support a new kind
of lane-reducing operation that participates in loop reduction indirectly. The
operation itself is not reduction statement, but its value would be accumulated
into reduction result finally.
Thanks,
Feng
---
gcc/
The work for RFC
(https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657860.html)
involves not a little code change, so I have to separate it into several batches
of patchset. This and the following patches constitute the first batch.
Since pattern statement coexists with normal statements in a
Hi,
I composed some patches to generalize lane-reducing (dot-product is a typical
representative) pattern recognition, and prepared a RFC document so as to help
review. The original intention was to make a complete solution for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114440. For sure, th
ke the checking assert unconditional?
>
> OK with that change. vect_get_num_vectors will ICE anyway
> I guess, so at your choice remove the assert completely.
>
OK, I removed the assert.
Thanks,
Feng
From: Richard Biener
Sent: Monday, July 15,
When transforming multiple lane-reducing operations in a loop reduction chain,
originally, corresponding vectorized statements are generated into def-use
cycles starting from 0. The def-use cycle with smaller index, would contain
more statements, which means more instruction dependency. For example
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current
vectorizer could only handle the pattern if the reduction chain does not
contain other operation, no matter the other is normal or lane-reducing.
This patches removes some constraints in reduction analysis to allow mult
Vector stmts number of an operation is calculated based on output vectype.
This is over-estimated for lane-reducing operation, which would cause vector
def/use mismatched when we want to support loop reduction mixed with lane-
reducing and normal operations. One solution is to refit lane-reducing
t
Extend original vect_get_num_copies (pure loop-based) to calculate number of
vector stmts for slp node regarding a generic vect region.
Thanks,
Feng
---
gcc/
* tree-vectorizer.h (vect_get_num_copies): New overload function.
(vect_get_slp_num_vectors): New function.
* tree-v
gt; > when that's set instead of SLP_TREE_VECTYPE? As said having wrong
> > > SLP_TREE_NUMBER_OF_VEC_STMTS is going to backfire.
> >
> > Then the alternative is to limit special handling related to the vec_num
> > only
> > inside vect_transform_reduction. Is
YPE? As said having wrong
> SLP_TREE_NUMBER_OF_VEC_STMTS is going to backfire.
Then the alternative is to limit special handling related to the vec_num only
inside vect_transform_reduction. Is that ok? Or any other suggestion?
Thanks,
Feng
From: Rich
When transforming multiple lane-reducing operations in a loop reduction chain,
originally, corresponding vectorized statements are generated into def-use
cycles starting from 0. The def-use cycle with smaller index, would contain
more statements, which means more instruction dependency. For example
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current
vectorizer could only handle the pattern if the reduction chain does not
contain other operation, no matter the other is normal or lane-reducing.
This patches removes some constraints in reduction analysis to allow mult
Vector stmts number of an operation is calculated based on output vectype.
This is over-estimated for lane-reducing operation. Sometimes, to workaround
the issue, we have to rely on additional logic to deduce an exactly accurate
number by other means. Aiming at the inconvenience, in this patch, we
This patch series are recomposed and split from
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655974.html.
As I will add a new field tightly coupled with "vec_stmts_size", if following
naming conversion as original, the new macro would be very long. So better
to choose samely meaningful but
LP_TREE_LANES (slp_node) == 1))
scalar_shift_arg = false;
else if (dt[1] == vect_constant_def
|| dt[1] == vect_external_def
--
2.17.1
________
From: Richard Biener
Sent: Thursday, June 27, 2024 12:49 AM
To: Feng Xue OS
Cc: gcc-patches@gcc.gnu.org
S
Allow shift-by-induction for slp node, when it is single lane, which is
aligned with the original loop-based handling.
Thanks,
Feng
---
gcc/tree-vect-stmts.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ca6052662a3..8
Allow shift-by-induction for slp node, when it is single lane, which is
aligned with the original loop-based handling.
Thanks,
Feng
---
gcc/tree-vect-stmts.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ca6052662a3..8
ctions. */
--
2.17.1
____________
From: Feng Xue OS
Sent: Thursday, June 20, 2024 2:02 PM
To: Richard Biener
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 8/8] vect: Optimize order of lane-reducing statements in
loop def-use cycles
This patch was updated with some new chang
s.cc b/gcc/tree-vect-stmts.cc
index 840e162c7f0..845647b4399 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -13350,6 +13350,8 @@ vect_analyze_stmt (vec_info *vinfo,
NULL, NULL, node, cost_vec)
|| vectorizable_load (vinfo, stmt_info, NU
PHI records the input vectype with least lanes. */
- if (lane_reducing)
-STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in;
enum vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (phi_info);
STMT_VINFO_REDUC_TYPE (reduc_info) = reduction_type;
--
2.17.1
___
>>
>> >> - if (slp_node)
>> >> + if (slp_node && SLP_TREE_LANES (slp_node) > 1)
>> >
>> > Hmm, that looks wrong. It looks like SLP_TREE_NUMBER_OF_VEC_STMTS is off
>> > instead, which is bad.
>> >
>> >> nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
>> >>else
>> >>
s - 1 given you use one above
> and the other below? Or simply iterate till op.num_ops
> and sip i == reduc_index.
>
>> + for (unsigned i = 0; i < op.num_ops - 1; i++)
>> + {
>> + gcc_assert (vec_oprnds[i].length () == using_ncopies);
>> +
lar values of those N reductions. */
--
2.17.1
____________
From: Feng Xue OS
Sent: Sunday, June 16, 2024 3:32 PM
To: Richard Biener
Cc: gcc-patches@gcc.gnu.org
Subject: [PATCH 8/8] vect: Optimize order of lane-reducing statements in loop
def-use cycles
When trans
662a3..1b73ef01ade 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -13350,6 +13350,8 @@ vect_analyze_stmt (vec_info *vinfo,
NULL, NULL, node, cost_vec)
|| vectorizable_load (vinfo, stmt_info, NULL, NULL, node, cost_vec)
|| vectorizable_store (vinfo, stmt_inf
{ 0, 0, 0, 0 };
loop () {
sum_v0 = dot_prod<16 * char>(char_a0, char_a1, sum_v0);
sum_v1 = dot_prod<16 * char>(char_b0, char_b1, sum_v1);
sum_v0 = dot_prod<8 * short>(short_c0_lo, short_c1_lo, sum_v0);
sum_v1 = dot_prod<8 * short>(short_
When transforming multiple lane-reducing operations in a loop reduction chain,
originally, corresponding vectorized statements are generated into def-use
cycles starting from 0. The def-use cycle with smaller index, would contain
more statements, which means more instruction dependency. For example
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current
vectorizer could only handle the pattern if the reduction chain does not
contain other operation, no matter the other is normal or lane-reducing.
Actually, to allow multiple arbitrary lane-reducing operations, we need t
According to logic of code nearby the assertion, all lane-reducing operations
should not appear, not just DOT_PROD_EXPR. Since "use_mask_by_cond_expr_p"
treats SAD_EXPR same as DOT_PROD_EXPR, and WIDEN_SUM_EXPR should not be allowed
by the following assertion "gcc_assert (commutative_binary_op_p (.
It's better to place 3 relevant independent variables into array, since we
have requirement to access them via an index in the following patch. At the
same time, this change may get some duplicated code be more compact.
Thanks,
Feng
---
gcc/
* tree-vect-loop.cc (vect_transform_reduction):
The input vectype of reduction PHI statement must be determined before
vect cost computation for the reduction. Since lance-reducing operation has
different input vectype from normal one, so we need to traverse all reduction
statements to find out the input vectype with the least lanes, and set tha
Two local variables were defined to refer same STMT_VINFO_REDUC_TYPE, better
to keep only one.
Thanks,
Feng
---
gcc/
* tree-vect-loop.cc (vectorizable_reduction): Remove v_reduc_type, and
replace it to another local variable reduction_type.
---
gcc/tree-vect-loop.cc | 8
In vectorizable_reduction, one check on a reduction operand via index could be
contained by another one check via pointer, so remove the former.
Thanks,
Feng
---
gcc/
* tree-vect-loop.cc (vectorizable_reduction): Remove the duplicated
check.
---
gcc/tree-vect-loop.cc | 6 ++
The series of patches are meant to support multiple lane-reducing reduction
statements. Since the original ones conflicted with the new single-lane slp
node patches, I have reworked most of the patches, and split them as small as
possible, which may make code review easier.
In the 1st one, I ad
able gives the initial
scalar values of those N reductions. */
--
2.17.1
________
From: Feng Xue OS
Sent: Thursday, May 30, 2024 10:56 PM
To: Richard Biener
Cc: Tamar Christina; gcc-patches@gcc.gnu.org
Subject: [PATCH 6/6] vect: Optimize order of lane-reducing
gcc_assert (reduction_type != EXTRACT_LAST_REDUCTION
--
2.17.1
____________
From: Feng Xue OS
Sent: Thursday, May 30, 2024 10:51 PM
To: Richard Biener
Cc: Tamar Christina; gcc-patches@gcc.gnu.org
Subject: [PATCH 3/6] vect: Set STMT_VINFO_REDUC_DEF for non-live stmt i
Please see my comments below.
Thanks,
Feng
> On Thu, May 30, 2024 at 4:55 PM Feng Xue OS
> wrote:
>>
>> For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction,
>> current
>> vectorizer could only handle the pattern if the reduction chain does not
Ok. Updated as the comments.
Thanks,
Feng
From: Richard Biener
Sent: Friday, May 31, 2024 3:29 PM
To: Feng Xue OS
Cc: Tamar Christina; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 2/6] vect: Split out partial vect checking for reduction
into a function
When transforming multiple lane-reducing operations in a loop reduction chain,
originally, corresponding vectorized statements are generated into def-use
cycles starting from 0. The def-use cycle with smaller index, would contain
more statements, which means more instruction dependency. For example
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current
vectorizer could only handle the pattern if the reduction chain does not
contain other operation, no matter the other is normal or lane-reducing.
Actually, to allow multiple arbitray lane-reducing operations, we need to
The input vectype is an attribute of lane-reducing operation, instead of
reduction PHI that it is associated to, since there might be more than one
lane-reducing operations with different type in a loop reduction chain. So
bind each lane-reducing operation with its own input type.
Thanks,
Feng
---
Normally, vectorizable checking on statement in a loop reduction chain does
not use the reduction PHI information. But some special statements might
need it in vectorizable analysis, especially, for multiple lane-reducing
operations support later.
Thanks,
Feng
---
gcc/
* tree-vect-loop.cc
This is a patch that is split out from
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652626.html.
Partial vectorization checking for vectorizable_reduction is a piece of
relatively isolated code, which may be reused by other places. Move the
code into a new function for sharing.
Thanks,
Fen
This is a patch that is split out from
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652626.html.
Check if an operation is lane-reducing requires comparison of code against
three kinds (DOT_PROD_EXPR/WIDEN_SUM_EXPR/SAD_EXPR). Add an utility
function to make source coding for the check handy
>> Hi,
>>
>> The patch was updated with the newest trunk, and also contained some minor
>> changes.
>>
>> I am working on another new feature which is meant to support pattern
>> recognition
>> of lane-reducing operations in affine closure originated from loop reduction
>> variable,
>> like:
>>
Ok. Then I will add a TODO comment on "bbs" field to describe it.
Thanks,
Feng
From: Richard Biener
Sent: Wednesday, May 29, 2024 3:14 PM
To: Feng Xue OS
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] vect: Unify bbs in loop_vec_info and b
_info_shared *);
~_bb_vec_info ();
- /* The region we are operating on. bbs[0] is the entry, excluding
- its PHI nodes. In the future we might want to track an explicit
- entry edge to cover bbs[0] PHI nodes and have a region entry
- insert location. */
- vec bbs;
-
vec roots;
}
Changed as the comments.
Thanks,
Feng
From: Richard Biener
Sent: Tuesday, May 28, 2024 5:34 PM
To: Feng Xue OS
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] vect: Use vect representative statement instead of
original in patch recog [PR115060]
On Sat
Both derived classes ( loop_vec_info/bb_vec_info) have their own "bbs"
field, which have exactly same purpose of recording all basic blocks
inside the corresponding vect region, while the fields are composed by
different data type, one is normal array, the other is auto_vec. This
difference causes
Some utility functions (such as vect_look_through_possible_promotion) that are
to find out certain kind of direct or indirect definition SSA for a value, may
return the original one of the SSA, not its pattern representative SSA, even
pattern is involved. For example,
a = (T1) patt_b;
pa
Hi,
The patch was updated with the newest trunk, and also contained some minor
changes.
I am working on another new feature which is meant to support pattern
recognition
of lane-reducing operations in affine closure originated from loop reduction
variable,
like:
sum += cst1 * dot_prod_1 + c
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current
vectorizer could only handle the pattern if the reduction chain does not
contain other operation, no matter the other is normal or lane-reducing.
Acctually, to allow multiple arbitray lane-reducing operations, we need t
mark_live_stmts (bb_vinfo, SLP_INSTANCE_TREE (instance),
- instance, &instance->cost_vec, svisited,
- visited);
- }
-}
+vect_bb_slp_mark_live_stmts (bb_vinfo);
return !vinfo->slp_instances.is_empty (
Hi, Richard,
Would you please talk a look at this patch?
Thanks,
Feng
From: Feng Xue OS
Sent: Friday, December 29, 2023 6:28 PM
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] Do not count unused scalar use when marking STMT_VINFO_LIVE_P
[PR113091
This patch is meant to fix over-estimation about SLP vector-to-scalar cost for
STMT_VINFO_LIVE_P statement. When pattern recognition is involved, a
statement whose definition is consumed in some pattern, may not be
included in the final replacement pattern statements, and would be skipped
when buil
This patch extends option -mbranch-protection=bti with an optional argument
as bti[+all] to force compiler to unconditionally insert bti for all
functions. Because a direct function call at the stage of compiling might be
rewritten to an indirect call with some kind of linker-generated thunk stub
a
Thanks,
Feng
From: Feng Xue OS
Sent: Thursday, September 16, 2021 5:26 PM
To: Jan Hubicka; mjam...@suse.cz; Richard Biener; gcc-patches@gcc.gnu.org
Cc: JiangNing OS
Subject: [PATCH/RFC 2/2] WPD: Enable whole program devirtualization at LTRANS
This patch
ttps://gcc.gnu.org/pipermail/gcc/2021-August/237132.html)
Thanks,
Feng
From: Feng Xue OS
Sent: Saturday, September 18, 2021 5:38 PM
To: Jason Merrill; Jan Hubicka; mjam...@suse.cz; Richard Biener;
gcc-patches@gcc.gnu.org
Subject: Re: [PATCH/RFC 1/2] WPD: En
Made some minor changes.
Thanks,
Feng
From: Feng Xue OS
Sent: Thursday, September 16, 2021 5:26 PM
To: Jan Hubicka; mjam...@suse.cz; Richard Biener; gcc-patches@gcc.gnu.org
Cc: JiangNing OS
Subject: [PATCH/RFC 2/2] WPD: Enable whole program
Minor update for some bugfixs and comment wording change.
Thanks,
Feng
From: Feng Xue OS
Sent: Saturday, September 18, 2021 5:38 PM
To: Jason Merrill; Jan Hubicka; mjam...@suse.cz; Richard Biener;
gcc-patches@gcc.gnu.org
Subject: Re: [PATCH/RFC 1/2] WPD
Bootstrapped/regtested on x86_64-linux.
Thanks,
Feng
---
2021-09-23 Feng Xue
gcc/ChangeLog
PR tree-optimization/102400
* tree-ssa-sccvn.c (vn_reference_insert_pieces): Initialize
result_vdef to zero value.
---
gcc/tree-ssa-sccvn.c | 1 +
1 file changed, 1 insertion(+)
Bootstrapped/regtested on x86_64-linux and aarch64-linux.
Thanks,
Feng
---
2021-09-23 Feng Xue
gcc/ChangeLog:
PR tree-optimization/102451
* tree-ssa-dse.c (delete_dead_or_redundant_call): Record bb of stmt
before removal.
---
gcc/tree-ssa-dse.c | 5 +++--
1 file chang
>On 9/16/21 22:29, Feng Xue OS wrote:
>>> On 9/16/21 05:25, Feng Xue OS via Gcc-patches wrote:
>>>> This and following patches are composed to enable full devirtualization
>>>> under whole program assumption (so also called whole-program
>>>&g
>On 9/16/21 05:25, Feng Xue OS via Gcc-patches wrote:
>> This and following patches are composed to enable full devirtualization
>> under whole program assumption (so also called whole-program
>> devirtualization, WPD for short), which is an enhancement to current
>> s
This patch is to extend applicability of full devirtualization to LTRANS stage.
Normally, whole program assumption would not hold when WPA splits
whole compilation into more than one LTRANS partitions. To avoid information
lost for WPD at LTRANS, we will record all vtable nodes and related member
This and following patches are composed to enable full devirtualization
under whole program assumption (so also called whole-program
devirtualization, WPD for short), which is an enhancement to current
speculative devirtualization. The base of the optimization is how to
identify class type that is
propagates count partially.
Thanks,
Feng
From: Richard Biener
Sent: Tuesday, August 10, 2021 10:47 PM
To: Xionghu Luo
Cc: gcc-patches@gcc.gnu.org; seg...@kernel.crashing.org; Feng Xue OS;
wschm...@linux.ibm.com; guoji...@linux.ibm.com; li...@gcc.gnu.org;
hubi
Yes. Condition to to switch two versioned loops is "true", the first two
arguments should be 100% and 0%.
It is different from normal loop split, we could not deduce exactly precise
probability for
condition-based loop split, since cfg inside loop2 would be changed.
(invar-branch is replaced
to
For an instance of a non-POD class, can I always assume that any
operation on it should be type-safe, any wrong or even trick code
to violate this is UB in C++ spec? For example, here are some ways:
union {
Type1 *p1;
Type2 *p2;
};
or
union {
Type1 t1;
Type2 t2;
};
or
void
>> gcc/
>> PR tree-optimization/98598
>> * Makefile.in (OBJS): Add tree-ssa-loop-mgo.o.
>> * common.opt (-ftree-loop-mgo): New option.
>
> Just a quick comment - -ftree-loop-mgo is user-facing and it isn't really a
> good
> name. -floop-mgo would be better but still I'd h
>> This patch implements a new loop optimization according to the proposal
>> in RFC given at
>> https://gcc.gnu.org/pipermail/gcc/2021-January/234682.html.
>> So do not repeat the idea in this mail. Hope your comments on it.
>
> With the caveat that I'm not an optimization expert (but no one else
With the new pattern rule (T)(A) +- (T)(B) -> (T)(A +- B),
some testcases are simplified and could not keep expected
code pattern as test-check. Minor changes are made to those
cases to avoid simplification effect of the rule.
Tested on x86_64-linux and aarch64-linux.
Feng
---
2020-09-16 Feng Xu
>> Add a rule (T)(A) +- (T)(B) -> (T)(A +- B), which works only when (A +- B)
>> could be folded to a simple value. By this rule, a
>> plusminus-mult-with-convert
>> expression could be handed over to the rule (A * C) +- (B * C) -> (A +- B).
>
>Please use INTEGRAL_TYPE_P () instead of TREE_CODE ==
>> This patch is to handle simplification of plusminus-mult-with-convert
>> expression
>> as ((T) X) +- ((T) Y), in which at least one of (X, Y) is result of
>> multiplication.
>> This is done in forwprop pass. We try to transform it to (T) (X +- Y), and
>> resort
>> to gimple-matcher to fold (X
Add a rule (T)(A) +- (T)(B) -> (T)(A +- B), which works only when (A +- B)
could be folded to a simple value. By this rule, a plusminus-mult-with-convert
expression could be handed over to the rule (A * C) +- (B * C) -> (A +- B).
Bootstrapped/regtested on x86_64-linux and aarch64-linux.
Feng
---
l) here. OK with that change.
Ok.
>
>I've tried again to think about sth prettier to cover these kind of
>single-use checks but failed to come up with sth.
Maybe we need a smart combiner that can deduce cost globally, and
remove these single-use specifiers from rule description.
Feng
Thanks,
Feng
From: Feng Xue OS
Sent: Thursday, September 3, 2020 2:06 PM
To: gcc-patches@gcc.gnu.org
Subject: [PATCH 1/2] Fold plusminus_mult expr with multi-use operands (PR 94234)
For pattern A * C +- B * C -> (A +- B) * C, simplification is disab
Thanks,
Feng
From: Feng Xue OS
Sent: Thursday, September 3, 2020 5:29 PM
To: Richard Biener; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 2/2 V3] Simplify plusminus-mult-with-convert expr in
forwprop (PR 94234)
Attach patch file.
Feng
>> Hi,
>>
>> On Mon, Aug 31 2020, Feng Xue OS wrote:
>> > This patch is to fix a bug that cost that is used to evaluate clone
>> > candidate
>> > becomes negative due to integer overflow.
>> >
>> > Feng
>> > ---
>> &
Attach patch file.
Feng
From: Gcc-patches on behalf of Feng Xue OS
via Gcc-patches
Sent: Thursday, September 3, 2020 5:27 PM
To: Richard Biener; gcc-patches@gcc.gnu.org
Subject: [PATCH 2/2 V3] Simplify plusminus-mult-with-convert expr in forwprop
(PR
This patch is to handle simplification of plusminus-mult-with-convert expression
as ((T) X) +- ((T) Y), in which at least one of (X, Y) is result of
multiplication.
This is done in forwprop pass. We try to transform it to (T) (X +- Y), and
resort
to gimple-matcher to fold (X +- Y) instead of man
For pattern A * C +- B * C -> (A +- B) * C, simplification is disabled
when A and B are not single-use. This patch is a minor enhancement
on the pattern, which allows folding if final result is found to be a
simple gimple value (constant/existing SSA).
Bootstrapped/regtested on x86_64-linux and aa
>> >> gcc/
>> >> PR tree-optimization/94234
>> >> * tree-ssa-forwprop.c (simplify_binary_with_convert): New
>> >> function.
>> >> * (fwprop_ssa_val): Move it before its new caller.
>>
>> > No * at this line. There's an entry for (pass_forwprop::execute) missing.
>> OK.
>>
>> gcc/
>> PR tree-optimization/94234
>> * tree-ssa-forwprop.c (simplify_binary_with_convert): New function.
>> * (fwprop_ssa_val): Move it before its new caller.
> No * at this line. There's an entry for (pass_forwprop::execute) missing.
OK.
> I don't think the transfor
Thanks,
Feng
From: Feng Xue OS
Sent: Wednesday, August 19, 2020 5:17 PM
To: Richard Biener
Cc: gcc-patches@gcc.gnu.org; Marc Glisse
Subject: [PATCH V2] Add pattern for pointer-diff on addresses with same
base/offset (PR 94234)
As Richard's comment,
>>> the component is "ipa," please change that when you commit the patch.
>> Mistake has been made, I'v pushed it. Is there a way to correct it? git push
>> --force?
>
> There is. You need to wait until tomorrow (after the commit message
> gets copied to gcc/ChangeLog by a script) and then push a
>> gcc/
>> PR tree-optimization/96806
> the component is "ipa," please change that when you commit the patch.
Mistake has been made, I'v pushed it. Is there a way to correct it? git push
--force?
Thanks,
Feng
1 - 100 of 200 matches
Mail list logo