On 11/20/2025 11:03 AM, Konstantinos Eleftheriou wrote:
This patch converts the fold-mem-offsets pass from DF to RTL-SSA.
Along with this conversion, the way the pass collects information
was completely reworked. Instead of visiting each instruction multiple
times, this is now down only once.
Most significant changes are:
* The pass operates mainly on insn_info objects from RTL-SSA.
* Single iteration over all nondebug INSNs for identification
of fold-mem-roots. Then walk of the fold-mem-roots' DEF-chain
to collect foldable constants.
* The class fold_mem_info holds vectors for the DEF-chain of
the to-be-folded INSNs (fold_agnostic_insns, which don't need
to be adjusted, and fold_insns, which need their constant to
be set to zero).
* Introduction of a single-USE mode, which only collects DEFs,
that have a single USE and therefore are safe to transform
(the fold-mem-root will be the final USE). This mode is fast
and will always run (unless disabled via -fno-fold-mem-offsets).
* Introduction of a multi-USE mode, which allows DEFs to have
multiple USEs, but all USEs must be part of any fold-mem-root's
DEF-chain. The analysis of all USEs is expensive and therefore,
this mode is disabled for highly connected CFGs. Note, that
multi-USE mode will miss some opportunities that the single-USE
mode finds (e.g. multi-USE mode fails for fold-mem-offsets-3.c).
The following testing was done:
* Bootstrapped and regtested on aarch64-linux, x86-64-linux and riscv64-linux.
* SPEC CPU 2017 tested on aarch64.
A compile time analysis with `/bin/time -v ./install/usr/local/bin/gcc -O2
all.i`
(all.i from PR117922) shows:
* -fno-fold-mem-offsets: 464 s (user time) / 26280384 kBytes (max resident set
size)
* -ffold-mem-offsets: 395 s (user time) / 26281388 kBytes (max resident set
size)
Adding -fexpensive-optimizations to enable multi-USE mode does not have
an impact on the duration or the memory footprint.
SPEC CPU 2017 showed no significant performance impact on aarch64-linux.
Alpha was fine. m68k showed a really really weird problem in the
post-processing step, I don't trust its output, so it's queued back up.
Just to be clear, I have no indication anything bad happened on m68k due
to your patch.
jeff