RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-10-08 Thread Di Zhao OS
Attached is a new version of the patch. > -Original Message- > From: Richard Biener > Sent: Friday, October 6, 2023 5:33 PM > To: Di Zhao OS > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in > get_reassociation_widt

[PING][PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-10-22 Thread Di Zhao OS
Hello and Ping, Thanks, Di > -Original Message- > From: Di Zhao OS > Sent: Monday, October 9, 2023 12:40 AM > To: Richard Biener > Cc: gcc-patches@gcc.gnu.org > Subject: RE: [PATCH v4] [tree-optimization/110279] Consider FMA in > get_reassociation_width > >

RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-12-14 Thread Di Zhao OS
> -Original Message- > From: Richard Biener > Sent: Wednesday, December 13, 2023 5:01 PM > To: Di Zhao OS > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in > get_reassociation_width > > On Wed, Dec 13, 2023 at

RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-12-17 Thread Di Zhao OS
Hello Thomas, > -Original Message- > From: Thomas Schwinge > Sent: Friday, December 15, 2023 5:46 PM > To: Di Zhao OS ; gcc-patches@gcc.gnu.org > Cc: Richard Biener > Subject: RE: [PATCH v4] [tree-optimization/110279] Consider FMA in > get_reassociation_width >

RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-12-22 Thread Di Zhao OS
Updated the fix in attachment. Is it OK for trunk? Tested on aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu. Thanks, Di Zhao > -Original Message- > From: Di Zhao OS > Sent: Sunday, December 17, 2023 8:31 PM > To: Thomas Schwinge ; gcc-patches@gcc.gnu.org > Cc:

RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-12-27 Thread Di Zhao OS
Committed at 6cec7b06b3c8187b36fc05cfd4dd38b42313d727 Thanks, Di > -Original Message- > From: Richard Biener > Sent: Friday, December 22, 2023 11:40 PM > To: Di Zhao OS > Cc: Thomas Schwinge ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH v4] [tree-optimization/11027

[PATCH] aarch64: add 'AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA'

2023-12-27 Thread Di Zhao OS
This patch adds a new tuning option 'AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA', to consider fully pipelined FMAs in reassociation. Also, set this option by default for Ampere CPUs. Tested on aarch64-unknown-linux-gnu. Is this OK for trunk? Thanks, Di Zhao gcc/ChangeLog: * config/aarch64/a

RE: [PATCH] aarch64: add 'AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA'

2024-01-02 Thread Di Zhao OS
> -Original Message- > From: Richard Sandiford > Sent: Friday, December 29, 2023 6:24 PM > To: Di Zhao OS > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] aarch64: add 'AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA' > > Di Zhao OS writes: >

RE: [PATCH] aarch64: modify Ampere CPU tunings on reassociation/FMA

2023-12-01 Thread Di Zhao OS
Hello Richard, Thank you for the review. Fixed the problems and committed to master. Thanks, Di > -Original Message- > From: Richard Earnshaw > Sent: Thursday, November 30, 2023 8:21 PM > To: Di Zhao OS ; gcc-patches@gcc.gnu.org > Cc: Philipp Tomsich > Subject: R

RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-12-13 Thread Di Zhao OS
Hello Richard, > -Original Message- > From: Richard Biener > Sent: Monday, December 11, 2023 7:01 PM > To: Di Zhao OS > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in > get_reassociation_width > > On Wed, Nov

RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-11-09 Thread Di Zhao OS
> -Original Message- > From: Richard Biener > Sent: Tuesday, October 31, 2023 9:48 PM > To: Di Zhao OS > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in > get_reassociation_width > > On Sun, Oct 8, 2023 at

RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-11-29 Thread Di Zhao OS
> -Original Message- > From: Richard Biener > Sent: Tuesday, November 21, 2023 9:01 PM > To: Di Zhao OS > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in > get_reassociation_width > > On Thu, Nov 9, 2023 at

[PATCH] aarch64: modify Ampere CPU tunings on reassociation/FMA

2023-11-30 Thread Di Zhao OS
This patch modifies tunings for ampere1/ampere1a/ampere1b, to: 1. Allow reassociation on FP additions. 2. Avoid generating loop-dependant FMA chains. Added a tuning option for this. Bootstrapped and tested. Is this ok for trunk? Thanks, Di Zhao gcc/ChangeLog: * config/aarch64/aarch64-t

RE: [PING][PATCH] [tree-optimization/110279] fix testcase pr110279-1.c

2024-07-15 Thread Di Zhao OS
Hi, Shall I push this if no objection? Thanks, Di Zhao > -Original Message- > From: Di Zhao OS > Sent: Tuesday, June 18, 2024 9:52 AM > To: Jeff Law > Cc: gcc-patches@gcc.gnu.org > Subject: [PING][PATCH] [tree-optimization/110279] fix testcase pr110279-1.c > &

[PATCH] [tree-optimization/110279] fix testcase pr110279-1.c

2024-05-22 Thread Di Zhao OS
The test case is for targets that support FMA. Previously the "target" selector is missed in dg-final command. Tested on x86_64-pc-linux-gnu. Thanks Di Zhao gcc/testsuite/ChangeLog: * gcc.dg/pr110279-1.c: add target selector. --- gcc/testsuite/gcc.dg/pr110279-1.c | 2 +- 1 file change

RE: [PATCH] [tree-optimization/110279] fix testcase pr110279-1.c

2024-05-23 Thread Di Zhao OS
> -Original Message- > From: Jeff Law > Sent: Wednesday, May 22, 2024 11:14 PM > To: Di Zhao OS ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] [tree-optimization/110279] fix testcase pr110279-1.c > > > > On 5/22/24 5:46 AM, Di Zhao OS wrote: > >

[PING][PATCH] [tree-optimization/110279] fix testcase pr110279-1.c

2024-06-17 Thread Di Zhao OS
This is OK for trunk? Thanks, Di Zhao > -Original Message- > From: Di Zhao OS > Sent: Thursday, May 23, 2024 5:55 PM > To: Jeff Law > Cc: gcc-patches@gcc.gnu.org > Subject: RE: [PATCH] [tree-optimization/110279] fix testcase pr110279-1.c > > > -Original

[PATCH] tree-optimization/114760 - check variants of >> and << in loop-niter

2024-05-10 Thread Di Zhao OS
This patch tries to fix pr114760 by checking for the variants explicitly. When recognizing bit counting idiom, include pattern "x * 2" for "x << 1", and "x / 2" for "x >> 1" (given x is unsigned). Bootstrapped and tested on x86_64-linux-gnu. Thanks, Di Zhao --- gcc/ChangeLog: PR tree-op

RE: [PATCH] tree-optimization/114760 - check variants of >> and << in loop-niter

2024-05-11 Thread Di Zhao OS
Fixed the problems and committed to trunk. Thanks, Di Zhao > -Original Message- > From: Richard Biener > Sent: Friday, May 10, 2024 8:56 PM > To: Di Zhao OS > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] tree-optimization/114760 - check variants of >>

RE: [PATCH] [tree-optimization/110279] fix testcase pr110279-1.c

2024-11-05 Thread Di Zhao OS
Committed to trunk. Thanks, Di Zhao > -Original Message- > From: Jeff Law > Sent: Monday, September 30, 2024 6:28 AM > To: Di Zhao OS > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] [tree-optimization/110279] fix testcase pr110279-1.c > > > >

RE: [RFC][PATCH] Improve generating FMA by adding a widening_mul pass

2023-05-30 Thread Di Zhao OS via Gcc-patches
Sorry I've missed the recent updates on trunk regarding handling FMA. I'll measure again if something in this still helps. Thanks, Di Zhao > -Original Message----- > From: Di Zhao OS > Sent: Friday, May 26, 2023 3:15 PM > To: gcc-patches@gcc.gnu.org > Subject: [RFC][

RE: [PATCH] Handle FMA friendly in reassoc pass

2023-06-06 Thread Di Zhao OS via Gcc-patches
Hello Lili Cui, Since I'm also trying to improve this lately, I've tested your patch on several aarch64 machines we have, including neoverse-n1 and ampere1 architectures. However, I haven't reproduced the 6.00% improvement of 503.bwaves_r single copy run you mentioned. Could you share more inform

RE: [PATCH] Change fma_reassoc_width tuning for ampere1

2023-07-29 Thread Di Zhao OS via Gcc-patches
Cherry-picked this to gcc-13. Thanks, Di Zhao > -Original Message- > From: Richard Sandiford > Sent: Monday, June 26, 2023 10:28 PM > To: Philipp Tomsich > Cc: Di Zhao OS via Gcc-patches ; Di Zhao OS > > Subject: Re: [PATCH] Change fma_reassoc_width tuning for

PING: [PATCH v5] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-07-11 Thread Di Zhao OS via Gcc-patches
Updated the patch in the attachment, so it can apply. Thanks, Di Zhao > -Original Message- > From: Di Zhao OS > Sent: Sunday, May 29, 2022 11:59 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Biener > Subject: [PATCH v5] tree-optimization/101186 - extend FRE with "

[PATCH v3] tree-optimization/110279- Check for nested FMA in reassoc

2023-08-09 Thread Di Zhao OS via Gcc-patches
Hi, The previous version of this patch tries to solve two problems at the same time. For better clarity, I'll separate them and only deal with the "nested" FMA in this version. I plan to propose another patch in avoiding bad shaped FMA (deferring FMA). Other changes: 1. Added new testcases for

RE: [PATCH v3] tree-optimization/110279- Check for nested FMA in reassoc

2023-08-18 Thread Di Zhao OS via Gcc-patches
/ChangeLog: * gcc.dg/pr110279-1.c: New test. * gcc.dg/pr110279-2.c: New test. * gcc.dg/pr110279-3.c: New test. > -Original Message- > From: Di Zhao OS > Sent: Thursday, August 10, 2023 12:53 AM > To: gcc-patches@gcc.gnu.org > Cc: Richard Biener > Subjec

[PATCH] tree-optimization/110279- Check for nested FMA chains in reassoc

2023-06-16 Thread Di Zhao OS via Gcc-patches
This patch is to fix the regressions found in SPEC2017 fprate cases on aarch64. 1. Reused code in pass widening_mul to check for nested FMA chains (those connected by MULT_EXPRs), since re-writing to parallel generates worse codes. 2. Avoid re-arrange to produce less FMA chains that can be slo

[PATCH] Change fma_reassoc_width tuning for ampere1

2023-06-19 Thread Di Zhao OS via Gcc-patches
This patch enables reassociation of floating-point additions on ampere1. This brings about 1% overall benefit on spec2017 fprate cases. (There are minor regressions in 510.parest_r and 508.namd_r, analyzed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279 .) Bootstrapped and tested on aarc

[PING][PATCH] tree-optimization/110279- Check for nested FMA chains in reassoc

2023-07-07 Thread Di Zhao OS via Gcc-patches
1.3% 508.namd_r 1.58% overall 0.42% Thanks, Di Zhao > -Original Message----- > From: Di Zhao OS > Sent: Friday, June 16, 2023 4:51 PM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH] tree-optimization/110279- Check for nested FMA chains in > reassoc &

[PATCH v2] tree-optimization/110279- Check for nested FMA chains in reassoc

2023-07-10 Thread Di Zhao OS via Gcc-patches
Attached is an updated version of the patch. Based on Philipp's review, some changes: 1. Defined new enum fma_state to describe the state of FMA candidates for a list of operands. (Since the tests seems simple after the change, I didn't add predicates on it.) 2. Changed return type of conve

[RFC][PATCH] Improve generating FMA by adding a widening_mul pass

2023-05-26 Thread Di Zhao OS via Gcc-patches
As GCC's reassociation pass does not have knowledge of FMA, when transforming expression lists to parallel, it reduces the opportunities to generate FMAs. Currently there's a workaround on AArch64 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114), that is, to disable the parallelization with flo

RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-12-02 Thread Di Zhao OS via Gcc-patches
I'm very sorry there seems to be encoding issue in the attachment in my last email. Attached is the new patch. Thanks, Di Zhao > -Original Message- > From: Di Zhao OS > Sent: Tuesday, November 16, 2021 1:24 AM > To: 'Richard Biener' > Cc: gcc-patches@gcc

[PATCH v6] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-10-24 Thread Di Zhao OS via Gcc-patches
Sorry for the late update. I've been on a vacation and then I spent some time updating and verifying the patch. Attached is a new version of the patch. There are some changes: 1. Store equivalences in a vn_pval chain in vn_ssa_aux, rather than in the expression hash table. (Following Richard's

PING^2: [PATCH v5] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-09-07 Thread Di Zhao OS via Gcc-patches
Gentle ping again. Thanks, Di Zhao > -Original Message- > From: Di Zhao OS > Sent: Tuesday, July 12, 2022 2:08 AM > To: 'gcc-patches@gcc.gnu.org' > Cc: 'Richard Biener' > Subject: PING: [PATCH v5] tree-optimization/101186 - extend FRE with > &

[PATCH] tree-optimization/102183 - sccvn: fix result compare in vn_nary_op_insert_into

2021-09-05 Thread Di Zhao OS via Gcc-patches
If the first predicate value is different and copied, the comparison will then be between val->result and the copied one, which seems to be a bug. That can cause inserting extra vn_pvals. Bootstrapped and tested on x86_64-unknown-linux-gnu. Regards, Di Zhao gcc/ChangeLog: * tree-ssa-s

[PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-09-16 Thread Di Zhao OS via Gcc-patches
Sorry about updating on this after so long. It took me much time to work out a new plan and pass the tests. The new idea is to use one variable to represent a set of equal variables at some basic-block. This variable is called a "equivalence head" or "equiv-head" in the code. (There's no-longer a

PING: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-09-30 Thread Di Zhao OS via Gcc-patches
Thanks, Di -Original Message- From: Gcc-patches On Behalf Of Di Zhao OS via Gcc-patches Sent: Friday, September 17, 2021 2:13 AM To: gcc-patches@gcc.gnu.org Subject: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction Sorry abou

[PING] [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-11-07 Thread Di Zhao OS via Gcc-patches
Hi, Gentle ping on this. Di Zhao -Original Message- From: Di Zhao OS Sent: Monday, October 25, 2021 3:03 AM To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction Hi, Att

RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-11-15 Thread Di Zhao OS via Gcc-patches
Attached is the updated patch. Fixed some errors in testcases. > -Original Message- > From: Richard Biener > Sent: Wednesday, November 10, 2021 5:44 PM > To: Di Zhao OS > Cc: gcc-patches@gcc.gnu.org; Andrew MacLeod > Subject: Re: [PATCH v2] tree-optimization/101186

RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-10-24 Thread Di Zhao OS via Gcc-patches
Hi, Attached is a new version of the patch, mainly for improving performance and simplifying the code. First, regarding the comments: > -Original Message- > From: Richard Biener > Sent: Friday, October 1, 2021 9:00 PM > To: Di Zhao OS > Cc: gcc-patches@gcc.gnu.or

RE: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-07-18 Thread Di Zhao OS via Gcc-patches
I tried to improve the patch following your advices and to catch more opportunities. Hope it'll be helpful. On 6/24/21 8:29 AM, Richard Biener wrote: > On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches patc...@gcc.gnu.org> wrote: > > I have some reservations about extending the ad-hoc "

[PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-28 Thread Di Zhao OS via Gcc-patches
This patch tries to fix the 2% regression in 510.parest_r on ampere1 in the tracker. (Previous discussion is here: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html) 1. Add testcases for the problem. For an op list in the form of "acc = a * b + c * d + acc", currently reassociation d

[PATCH] alias-analyis: try to find ADDR_EXPR for SSA_NAME ptr

2023-08-28 Thread Di Zhao OS via Gcc-patches
This patch tries to improve alias-analysis between an SSA_NAME and a declaration a little. For a case like: int array1[10], array2[10]; ptr1 = array1 + x; ptr2 = ptr1 + y; , *ptr2 should not alias with array2. If we can't disambiguate from points-to information, this patc

RE: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-29 Thread Di Zhao OS via Gcc-patches
Hi, > -Original Message- > From: Richard Biener > Sent: Tuesday, August 29, 2023 3:41 PM > To: Jeff Law ; Martin Jambor > Cc: Di Zhao OS ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to > reduce cross backedge

RE: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-29 Thread Di Zhao OS via Gcc-patches
Hi, > -Original Message- > From: Richard Biener > Sent: Tuesday, August 29, 2023 4:09 PM > To: Di Zhao OS > Cc: Jeff Law ; Martin Jambor ; gcc- > patc...@gcc.gnu.org > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to > reduce cross

RE: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-30 Thread Di Zhao OS via Gcc-patches
Hello Richard, > -Original Message- > From: Richard Biener > Sent: Tuesday, August 29, 2023 7:11 PM > To: Di Zhao OS > Cc: Jeff Law ; Martin Jambor ; gcc- > patc...@gcc.gnu.org > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to > r

RE: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-09-04 Thread Di Zhao OS via Gcc-patches
> -Original Message- > From: Richard Biener > Sent: Thursday, August 31, 2023 8:23 PM > To: Di Zhao OS > Cc: Jeff Law ; Martin Jambor ; gcc- > patc...@gcc.gnu.org > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to > reduce cross backedge

[PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-09-14 Thread Di Zhao OS via Gcc-patches
This is a new version of the patch on "nested FMA". Sorry for updating this after so long, I've been studying and writing micro cases to sort out the cause of the regression. First, following previous discussion: (https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629080.html) 1. From testi

PING: [PATCH v6] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-11-15 Thread Di Zhao OS via Gcc-patches
Hi, I saw that Stage 1 of GCC 13 development is just ended. So is this considered? Or should I bring this up when general development is reopened? Thanks, Di Zhao > -Original Message- > From: Di Zhao OS > Sent: Tuesday, October 25, 2022 8:18 AM > To: gcc-patches@gcc.

[PATCH v5] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-05-29 Thread Di Zhao OS via Gcc-patches
Hi, attached is a new version of the patch. The changes are: - Skip using temporary equivalences for floating-point values, because folding expressions can generate incorrect values. For example, operations on 0.0 and -0.0 may have different results. - Avoid inserting duplicated back-refs from val

[PATCH v3] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-12-13 Thread Di Zhao OS via Gcc-patches
A few minor updates on the patch: - Simplify function record_equiv_from_prev_phi_1 by removing an argument. - Fixed two small bugs that can lead to losing optimize opportunities. Thanks, Di Zhao --- Extend FRE with temporary equivalences. 2021-12-13 Di Zhao gcc/ChangeLog: PR tree-op

[PATCH v4] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-12-24 Thread Di Zhao OS via Gcc-patches
Here's a brief summary on the patch: v4 (this version): - In process_bb's condition-prediction code: update equivalence-heads if value-numbers have changed, otherwise some chances can be lost. v3 (a few minor updates): - Simplify function record_equiv_from_prev_phi_1 by removing an argument. -