On 11/20/2025 11:03 AM, Konstantinos Eleftheriou wrote:
This patch converts the fold-mem-offsets pass from DF to RTL-SSA.
Along with this conversion, the way the pass collects information
was completely reworked.  Instead of visiting each instruction multiple
times, this is now down only once.

Most significant changes are:
* The pass operates mainly on insn_info objects from RTL-SSA.
* Single iteration over all nondebug INSNs for identification
   of fold-mem-roots.  Then walk of the fold-mem-roots' DEF-chain
   to collect foldable constants.
* The class fold_mem_info holds vectors for the DEF-chain of
   the to-be-folded INSNs (fold_agnostic_insns, which don't need
   to be adjusted, and fold_insns, which need their constant to
   be set to zero).
* Introduction of a single-USE mode, which only collects DEFs,
   that have a single USE and therefore are safe to transform
   (the fold-mem-root will be the final USE).  This mode is fast
   and will always run (unless disabled via -fno-fold-mem-offsets).
* Introduction of a multi-USE mode, which allows DEFs to have
   multiple USEs, but all USEs must be part of any fold-mem-root's
   DEF-chain.  The analysis of all USEs is expensive and therefore,
   this mode is disabled for highly connected CFGs.  Note, that
   multi-USE mode will miss some opportunities that the single-USE
   mode finds (e.g. multi-USE mode fails for fold-mem-offsets-3.c).

The following testing was done:
* Bootstrapped and regtested on aarch64-linux, x86-64-linux and riscv64-linux.
* SPEC CPU 2017 tested on aarch64.

A compile time analysis with `/bin/time -v ./install/usr/local/bin/gcc -O2 
all.i`
(all.i from PR117922) shows:
* -fno-fold-mem-offsets:  464 s (user time) / 26280384 kBytes (max resident set 
size)
* -ffold-mem-offsets:     395 s (user time) / 26281388 kBytes (max resident set 
size)
Adding -fexpensive-optimizations to enable multi-USE mode does not have
an impact on the duration or the memory footprint.

SPEC CPU 2017 showed no significant performance impact on aarch64-linux.
Alpha was fine.  m68k showed a really really weird problem in the post-processing step, I don't trust its output, so it's queued back up.  Just to be clear, I have no indication anything bad happened on m68k due to your patch.


jeff

Reply via email to