This patch is to fix the regressions found in SPEC2017 fprate cases on aarch64.
1. Reused code in pass widening_mul to check for nested FMA chains
(those connected by MULT_EXPRs), since re-writing to parallel
generates worse codes.
2. Avoid re-arrange to produce less FMA chains that can be slow.
Tested on ampere1 and neoverse-n1, this fixed the regressions in
508.namd_r and 510.parest_r 1 copy run. While I'm still collecting data
on x86 machines we have, I'd like to know what do you think of this.
(Previously I tried to improve things with FMA by adding a widening_mul
pass before reassoc2 for it's easier to recognize different patterns
of FMA chains and decide whether to split them. But I suppose handling
them all in reassoc pass is more efficient.)
Thanks,
Di Zhao
---
gcc/ChangeLog:
* tree-ssa-math-opts.cc (convert_mult_to_fma_1): Add new parameter.
Support new mode that merely do the checking.
(struct fma_transformation_info): Moved to header.
(class fma_deferring_state): Moved to header.
(convert_mult_to_fma): Add new parameter.
* tree-ssa-math-opts.h (struct fma_transformation_info):
(class fma_deferring_state): Moved from .cc.
(convert_mult_to_fma): Add function decl.
* tree-ssa-reassoc.cc (rewrite_expr_tree_parallel):
(rank_ops_for_fma): Return -1 if nested FMAs are found.
(reassociate_bb): Avoid rewriting to parallel if nested FMAs are found.
pr110279-Check-for-nested-FMA-chains-in-reassoc.diff
Description: pr110279-Check-for-nested-FMA-chains-in-reassoc.diff
