[PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-09-14 Thread Di Zhao OS via Gcc-patches
This is a new version of the patch on "nested FMA". Sorry for updating this after so long, I've been studying and writing micro cases to sort out the cause of the regression. First, following previous discussion: (https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629080.html) 1. From testi

RE: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-09-04 Thread Di Zhao OS via Gcc-patches
> > > > > > > From: Richard Biener > > > > > > > Sent: Tuesday, August 29, 2023 3:41 PM > > > > > > > To: Jeff Law ; Martin Jambor > > > > > > > > Cc: Di Zhao OS ; gcc- > > > patc...@gcc.gnu.org > > > > > > > Subject: Re: [PATCH]

RE: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-30 Thread Di Zhao OS via Gcc-patches
; > > > Cc: Di Zhao OS ; gcc- > patc...@gcc.gnu.org > > > > > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in > reassoc > > > to > > > > > reduce cross backedge FMA > > > > > > > > > > On Tue, A

RE: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-29 Thread Di Zhao OS via Gcc-patches
t; Cc: Di Zhao OS ; gcc-patches@gcc.gnu.org > > > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc > to > > > reduce cross backedge FMA > > > > > > On Tue, Aug 29, 2023 at 1:23 AM Jeff Law via Gcc-patches > > > wrote: > &g

RE: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-29 Thread Di Zhao OS via Gcc-patches
FMA > > On Tue, Aug 29, 2023 at 1:23 AM Jeff Law via Gcc-patches > wrote: > > > > > > > > On 8/28/23 02:17, Di Zhao OS via Gcc-patches wrote: > > > This patch tries to fix the 2% regression in 510.parest_r on > > > ampere1 in the tracker. (Prev

[PATCH] alias-analyis: try to find ADDR_EXPR for SSA_NAME ptr

2023-08-28 Thread Di Zhao OS via Gcc-patches
This patch tries to improve alias-analysis between an SSA_NAME and a declaration a little. For a case like: int array1[10], array2[10]; ptr1 = array1 + x; ptr2 = ptr1 + y; , *ptr2 should not alias with array2. If we can't disambiguate from points-to information, this patc

[PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-28 Thread Di Zhao OS via Gcc-patches
This patch tries to fix the 2% regression in 510.parest_r on ampere1 in the tracker. (Previous discussion is here: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html) 1. Add testcases for the problem. For an op list in the form of "acc = a * b + c * d + acc", currently reassociation d

RE: [PATCH v3] tree-optimization/110279- Check for nested FMA in reassoc

2023-08-18 Thread Di Zhao OS via Gcc-patches
Hi, A few updates to the patch: 1. rank_ops_for_fma: return FMA_STATE_NESTED only for complete FMA chain, since the regression is obvious only in this case. 2. Added new testcase. Thanks, Di Zhao PR tree-optimization/110279 gcc/ChangeLog: * tree-ssa-math-opts.cc (con

[PATCH v3] tree-optimization/110279- Check for nested FMA in reassoc

2023-08-09 Thread Di Zhao OS via Gcc-patches
Hi, The previous version of this patch tries to solve two problems at the same time. For better clarity, I'll separate them and only deal with the "nested" FMA in this version. I plan to propose another patch in avoiding bad shaped FMA (deferring FMA). Other changes: 1. Added new testcases for

RE: [PATCH] Change fma_reassoc_width tuning for ampere1

2023-07-29 Thread Di Zhao OS via Gcc-patches
Cherry-picked this to gcc-13. Thanks, Di Zhao > -Original Message- > From: Richard Sandiford > Sent: Monday, June 26, 2023 10:28 PM > To: Philipp Tomsich > Cc: Di Zhao OS via Gcc-patches ; Di Zhao OS > > Subject: Re: [PATCH] Change fma_reassoc_width tuning for

[PATCH v2] tree-optimization/110279- Check for nested FMA chains in reassoc

2023-07-10 Thread Di Zhao OS via Gcc-patches
Attached is an updated version of the patch. Based on Philipp's review, some changes: 1. Defined new enum fma_state to describe the state of FMA candidates for a list of operands. (Since the tests seems simple after the change, I didn't add predicates on it.) 2. Changed return type of conve

[PING][PATCH] tree-optimization/110279- Check for nested FMA chains in reassoc

2023-07-07 Thread Di Zhao OS via Gcc-patches
Update the patch so it can apply. Tested on spec2017 fprate cases again. With option "-funroll-loops -Ofast -flto", the improvements of 1-copy run are: Ampere1: 508.namd_r 4.26% 510.parest_r2.55% Overall 0.54% Intel Xeon: 503.bwaves_r1.3% 508

[PATCH] Change fma_reassoc_width tuning for ampere1

2023-06-19 Thread Di Zhao OS via Gcc-patches
This patch enables reassociation of floating-point additions on ampere1. This brings about 1% overall benefit on spec2017 fprate cases. (There are minor regressions in 510.parest_r and 508.namd_r, analyzed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279 .) Bootstrapped and tested on aarc

[PATCH] tree-optimization/110279- Check for nested FMA chains in reassoc

2023-06-16 Thread Di Zhao OS via Gcc-patches
This patch is to fix the regressions found in SPEC2017 fprate cases on aarch64. 1. Reused code in pass widening_mul to check for nested FMA chains (those connected by MULT_EXPRs), since re-writing to parallel generates worse codes. 2. Avoid re-arrange to produce less FMA chains that can be slo

RE: [PATCH] Handle FMA friendly in reassoc pass

2023-06-06 Thread Di Zhao OS via Gcc-patches
Hello Lili Cui, Since I'm also trying to improve this lately, I've tested your patch on several aarch64 machines we have, including neoverse-n1 and ampere1 architectures. However, I haven't reproduced the 6.00% improvement of 503.bwaves_r single copy run you mentioned. Could you share more inform

RE: [RFC][PATCH] Improve generating FMA by adding a widening_mul pass

2023-05-30 Thread Di Zhao OS via Gcc-patches
Sorry I've missed the recent updates on trunk regarding handling FMA. I'll measure again if something in this still helps. Thanks, Di Zhao > -Original Message- > From: Di Zhao OS > Sent: Friday, May 26, 2023 3:15 PM > To: gcc-patches@gcc.gnu.org > Subject: [RFC][PATCH] Improve generating

[RFC][PATCH] Improve generating FMA by adding a widening_mul pass

2023-05-26 Thread Di Zhao OS via Gcc-patches
As GCC's reassociation pass does not have knowledge of FMA, when transforming expression lists to parallel, it reduces the opportunities to generate FMAs. Currently there's a workaround on AArch64 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114), that is, to disable the parallelization with flo

PING: [PATCH v6] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-11-15 Thread Di Zhao OS via Gcc-patches
Hi, I saw that Stage 1 of GCC 13 development is just ended. So is this considered? Or should I bring this up when general development is reopened? Thanks, Di Zhao > -Original Message- > From: Di Zhao OS > Sent: Tuesday, October 25, 2022 8:18 AM > To: gcc-patches@gcc.gnu.org > Cc: Richa

[PATCH v6] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-10-24 Thread Di Zhao OS via Gcc-patches
Sorry for the late update. I've been on a vacation and then I spent some time updating and verifying the patch. Attached is a new version of the patch. There are some changes: 1. Store equivalences in a vn_pval chain in vn_ssa_aux, rather than in the expression hash table. (Following Richard's

PING^2: [PATCH v5] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-09-07 Thread Di Zhao OS via Gcc-patches
Gentle ping again. Thanks, Di Zhao > -Original Message- > From: Di Zhao OS > Sent: Tuesday, July 12, 2022 2:08 AM > To: 'gcc-patches@gcc.gnu.org' > Cc: 'Richard Biener' > Subject: PING: [PATCH v5] tree-optimization/101186 - extend FRE with > "equivalence map" for condition prediction >

PING: [PATCH v5] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-07-11 Thread Di Zhao OS via Gcc-patches
Updated the patch in the attachment, so it can apply. Thanks, Di Zhao > -Original Message- > From: Di Zhao OS > Sent: Sunday, May 29, 2022 11:59 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Biener > Subject: [PATCH v5] tree-optimization/101186 - extend FRE with "equivalence > map" for

[PATCH v5] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-05-29 Thread Di Zhao OS via Gcc-patches
Hi, attached is a new version of the patch. The changes are: - Skip using temporary equivalences for floating-point values, because folding expressions can generate incorrect values. For example, operations on 0.0 and -0.0 may have different results. - Avoid inserting duplicated back-refs from val

[PATCH v4] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-12-24 Thread Di Zhao OS via Gcc-patches
Here's a brief summary on the patch: v4 (this version): - In process_bb's condition-prediction code: update equivalence-heads if value-numbers have changed, otherwise some chances can be lost. v3 (a few minor updates): - Simplify function record_equiv_from_prev_phi_1 by removing an argument. -

[PATCH v3] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-12-13 Thread Di Zhao OS via Gcc-patches
A few minor updates on the patch: - Simplify function record_equiv_from_prev_phi_1 by removing an argument. - Fixed two small bugs that can lead to losing optimize opportunities. Thanks, Di Zhao --- Extend FRE with temporary equivalences. 2021-12-13 Di Zhao gcc/ChangeLog: PR tree-op

RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-12-02 Thread Di Zhao OS via Gcc-patches
I'm very sorry there seems to be encoding issue in the attachment in my last email. Attached is the new patch. Thanks, Di Zhao > -Original Message- > From: Di Zhao OS > Sent: Tuesday, November 16, 2021 1:24 AM > To: 'Richard Biener' > Cc: gcc-patches@gcc.gnu.org > Subject: RE: [PATCH v

RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-11-15 Thread Di Zhao OS via Gcc-patches
Attached is the updated patch. Fixed some errors in testcases. > -Original Message- > From: Richard Biener > Sent: Wednesday, November 10, 2021 5:44 PM > To: Di Zhao OS > Cc: gcc-patches@gcc.gnu.org; Andrew MacLeod > Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with > "

[PING] [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-11-07 Thread Di Zhao OS via Gcc-patches
Hi, Gentle ping on this. Di Zhao -Original Message- From: Di Zhao OS Sent: Monday, October 25, 2021 3:03 AM To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction Hi, Attached is a

RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-10-24 Thread Di Zhao OS via Gcc-patches
Hi, Attached is a new version of the patch, mainly for improving performance and simplifying the code. First, regarding the comments: > -Original Message- > From: Richard Biener > Sent: Friday, October 1, 2021 9:00 PM > To: Di Zhao OS > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH

PING: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-09-30 Thread Di Zhao OS via Gcc-patches
Thanks, Di -Original Message- From: Gcc-patches On Behalf Of Di Zhao OS via Gcc-patches Sent: Friday, September 17, 2021 2:13 AM To: gcc-patches@gcc.gnu.org Subject: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction Sorry abou

[PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-09-16 Thread Di Zhao OS via Gcc-patches
Sorry about updating on this after so long. It took me much time to work out a new plan and pass the tests. The new idea is to use one variable to represent a set of equal variables at some basic-block. This variable is called a "equivalence head" or "equiv-head" in the code. (There's no-longer a

[PATCH] tree-optimization/102183 - sccvn: fix result compare in vn_nary_op_insert_into

2021-09-05 Thread Di Zhao OS via Gcc-patches
If the first predicate value is different and copied, the comparison will then be between val->result and the copied one, which seems to be a bug. That can cause inserting extra vn_pvals. Bootstrapped and tested on x86_64-unknown-linux-gnu. Regards, Di Zhao gcc/ChangeLog: * tree-ssa-s

RE: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-07-18 Thread Di Zhao OS via Gcc-patches
I tried to improve the patch following your advices and to catch more opportunities. Hope it'll be helpful. On 6/24/21 8:29 AM, Richard Biener wrote: > On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches patc...@gcc.gnu.org> wrote: > > I have some reservations about extending the ad-hoc "