> On May 25, 2023, at 03:30, Cui, Lili via Gcc-patches 
> <gcc-patches@gcc.gnu.org> wrote:
> 
> From: Lili Cui <lili....@intel.com>
> 
> Make some changes in reassoc pass to make it more friendly to fma pass later.
> Using FMA instead of mult + add reduces register pressure and insruction
> retired.
> 
> There are mainly two changes
> 1. Put no-mult ops and mult ops alternately at the end of the queue, which is
> conducive to generating more fma and reducing the loss of FMA when breaking
> the chain.
> 2. Rewrite the rewrite_expr_tree_parallel function to try to build parallel
> chains according to the given correlation width, keeping the FMA chance as
> much as possible.
> 
> With the patch applied
> 
> On ICX:
> 507.cactuBSSN_r: Improved by 1.7% for multi-copy .
> 503.bwaves_r   : Improved by  0.60% for single copy .
> 507.cactuBSSN_r: Improved by  1.10% for single copy .
> 519.lbm_r      : Improved by  2.21% for single copy .
> no measurable changes for other benchmarks.
> 
> On aarch64
> 507.cactuBSSN_r: Improved by 1.7% for multi-copy.
> 503.bwaves_r   : Improved by 6.00% for single-copy.
> no measurable changes for other benchmarks.

Hi Cui,

I'm seeing a 4% slowdown on 436.cactusADM from SPEC CPU2006 on 
aarch64-linux-gnu (Cortex-A57) when compiling with "-O2 -flto".  All other 
benchmarks seem neutral to this patch, and I didn't observe the slow down with 
plain -O2 no-LTO or with -O3.

Is this something interesting to investigate?  I'll be happy to assist.

Kind regards,

--
Maxim Kuvyrkov
https://www.linaro.org





Reply via email to