This is a new version of the patch on "nested FMA".
Sorry for updating this after so long, I've been studying and
writing micro cases to sort out the cause of the regression.
First, following previous discussion:
(https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629080.html)
1. From testi
> > > > > > > From: Richard Biener
> > > > > > > Sent: Tuesday, August 29, 2023 3:41 PM
> > > > > > > To: Jeff Law ; Martin Jambor
>
> > > > > > > Cc: Di Zhao OS ; gcc-
> > > patc...@gcc.gnu.org
> > > > > > > Subject: Re: [PATCH]
; > > > Cc: Di Zhao OS ; gcc-
> patc...@gcc.gnu.org
> > > > > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in
> reassoc
> > > to
> > > > > reduce cross backedge FMA
> > > > >
> > > > > On Tue, A
t; Cc: Di Zhao OS ; gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc
> to
> > > reduce cross backedge FMA
> > >
> > > On Tue, Aug 29, 2023 at 1:23 AM Jeff Law via Gcc-patches
> > > wrote:
> &g
FMA
>
> On Tue, Aug 29, 2023 at 1:23 AM Jeff Law via Gcc-patches
> wrote:
> >
> >
> >
> > On 8/28/23 02:17, Di Zhao OS via Gcc-patches wrote:
> > > This patch tries to fix the 2% regression in 510.parest_r on
> > > ampere1 in the tracker. (Prev
This patch tries to improve alias-analysis between an SSA_NAME and
a declaration a little. For a case like:
int array1[10], array2[10];
ptr1 = array1 + x;
ptr2 = ptr1 + y;
, *ptr2 should not alias with array2.
If we can't disambiguate from points-to information, this patc
This patch tries to fix the 2% regression in 510.parest_r on
ampere1 in the tracker. (Previous discussion is here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html)
1. Add testcases for the problem. For an op list in the form of
"acc = a * b + c * d + acc", currently reassociation d
Hi,
A few updates to the patch:
1. rank_ops_for_fma: return FMA_STATE_NESTED only for complete
FMA chain, since the regression is obvious only in this case.
2. Added new testcase.
Thanks,
Di Zhao
PR tree-optimization/110279
gcc/ChangeLog:
* tree-ssa-math-opts.cc (con
Hi,
The previous version of this patch tries to solve two problems
at the same time. For better clarity, I'll separate them and
only deal with the "nested" FMA in this version. I plan to
propose another patch in avoiding bad shaped FMA (deferring FMA).
Other changes:
1. Added new testcases for
Cherry-picked this to gcc-13.
Thanks,
Di Zhao
> -Original Message-
> From: Richard Sandiford
> Sent: Monday, June 26, 2023 10:28 PM
> To: Philipp Tomsich
> Cc: Di Zhao OS via Gcc-patches ; Di Zhao OS
>
> Subject: Re: [PATCH] Change fma_reassoc_width tuning for
Attached is an updated version of the patch.
Based on Philipp's review, some changes:
1. Defined new enum fma_state to describe the state of FMA candidates
for a list of operands. (Since the tests seems simple after the
change, I didn't add predicates on it.)
2. Changed return type of conve
Update the patch so it can apply.
Tested on spec2017 fprate cases again. With option "-funroll-loops -Ofast
-flto",
the improvements of 1-copy run are:
Ampere1:
508.namd_r 4.26%
510.parest_r2.55%
Overall 0.54%
Intel Xeon:
503.bwaves_r1.3%
508
This patch enables reassociation of floating-point additions on ampere1.
This brings about 1% overall benefit on spec2017 fprate cases. (There
are minor regressions in 510.parest_r and 508.namd_r, analyzed here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279 .)
Bootstrapped and tested on aarc
This patch is to fix the regressions found in SPEC2017 fprate cases
on aarch64.
1. Reused code in pass widening_mul to check for nested FMA chains
(those connected by MULT_EXPRs), since re-writing to parallel
generates worse codes.
2. Avoid re-arrange to produce less FMA chains that can be slo
Hello Lili Cui,
Since I'm also trying to improve this lately, I've tested your patch on
several aarch64 machines we have, including neoverse-n1 and ampere1
architectures. However, I haven't reproduced the 6.00% improvement of
503.bwaves_r single copy run you mentioned. Could you share more inform
Sorry I've missed the recent updates on trunk regarding handling FMA.
I'll measure again if something in this still helps.
Thanks,
Di Zhao
> -Original Message-
> From: Di Zhao OS
> Sent: Friday, May 26, 2023 3:15 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [RFC][PATCH] Improve generating
As GCC's reassociation pass does not have knowledge of FMA, when
transforming expression lists to parallel, it reduces the
opportunities to generate FMAs. Currently there's a workaround
on AArch64 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114),
that is, to disable the parallelization with flo
Hi,
I saw that Stage 1 of GCC 13 development is just ended. So is this
considered? Or should I bring this up when general development is
reopened?
Thanks,
Di Zhao
> -Original Message-
> From: Di Zhao OS
> Sent: Tuesday, October 25, 2022 8:18 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richa
Sorry for the late update. I've been on a vacation and then I
spent some time updating and verifying the patch.
Attached is a new version of the patch. There are some changes:
1. Store equivalences in a vn_pval chain in vn_ssa_aux, rather than
in the expression hash table. (Following Richard's
Gentle ping again.
Thanks,
Di Zhao
> -Original Message-
> From: Di Zhao OS
> Sent: Tuesday, July 12, 2022 2:08 AM
> To: 'gcc-patches@gcc.gnu.org'
> Cc: 'Richard Biener'
> Subject: PING: [PATCH v5] tree-optimization/101186 - extend FRE with
> "equivalence map" for condition prediction
>
Updated the patch in the attachment, so it can apply.
Thanks,
Di Zhao
> -Original Message-
> From: Di Zhao OS
> Sent: Sunday, May 29, 2022 11:59 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Biener
> Subject: [PATCH v5] tree-optimization/101186 - extend FRE with "equivalence
> map" for
Hi, attached is a new version of the patch. The changes are:
- Skip using temporary equivalences for floating-point values, because
folding expressions can generate incorrect values. For example,
operations on 0.0 and -0.0 may have different results.
- Avoid inserting duplicated back-refs from val
Here's a brief summary on the patch:
v4 (this version):
- In process_bb's condition-prediction code: update equivalence-heads if
value-numbers have changed, otherwise some chances can be lost.
v3 (a few minor updates):
- Simplify function record_equiv_from_prev_phi_1 by removing an argument.
-
A few minor updates on the patch:
- Simplify function record_equiv_from_prev_phi_1 by removing an argument.
- Fixed two small bugs that can lead to losing optimize opportunities.
Thanks,
Di Zhao
---
Extend FRE with temporary equivalences.
2021-12-13 Di Zhao
gcc/ChangeLog:
PR tree-op
I'm very sorry there seems to be encoding issue in the attachment
in my last email. Attached is the new patch.
Thanks,
Di Zhao
> -Original Message-
> From: Di Zhao OS
> Sent: Tuesday, November 16, 2021 1:24 AM
> To: 'Richard Biener'
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH v
Attached is the updated patch. Fixed some errors in testcases.
> -Original Message-
> From: Richard Biener
> Sent: Wednesday, November 10, 2021 5:44 PM
> To: Di Zhao OS
> Cc: gcc-patches@gcc.gnu.org; Andrew MacLeod
> Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with
> "
Hi,
Gentle ping on this.
Di Zhao
-Original Message-
From: Di Zhao OS
Sent: Monday, October 25, 2021 3:03 AM
To: Richard Biener
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence
map" for condition prediction
Hi,
Attached is a
Hi,
Attached is a new version of the patch, mainly for improving performance
and simplifying the code.
First, regarding the comments:
> -Original Message-
> From: Richard Biener
> Sent: Friday, October 1, 2021 9:00 PM
> To: Di Zhao OS
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH
Thanks,
Di
-Original Message-
From: Gcc-patches
On Behalf Of Di
Zhao OS via Gcc-patches
Sent: Friday, September 17, 2021 2:13 AM
To: gcc-patches@gcc.gnu.org
Subject: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence
map" for condition prediction
Sorry abou
Sorry about updating on this after so long. It took me much time to work out a
new plan and pass the tests.
The new idea is to use one variable to represent a set of equal variables at
some basic-block. This variable is called a "equivalence head" or "equiv-head"
in the code. (There's no-longer a
If the first predicate value is different and copied, the comparison will then
be between val->result and the copied one, which seems to be a bug. That can
cause inserting extra vn_pvals.
Bootstrapped and tested on x86_64-unknown-linux-gnu.
Regards,
Di Zhao
gcc/ChangeLog:
* tree-ssa-s
I tried to improve the patch following your advices and to catch more
opportunities. Hope it'll be helpful.
On 6/24/21 8:29 AM, Richard Biener wrote:
> On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches patc...@gcc.gnu.org> wrote:
>
> I have some reservations about extending the ad-hoc "
32 matches
Mail list logo