https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99286
Bug ID: 99286 Summary: ivopts don't select the best candidates with -Os Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gengqi at linux dot alibaba.com Target Milestone: --- Created attachment 50261 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50261&action=edit -c -march=rv32imafdc -mabi=ilp32d -Os ivopt_os.c -fdump-tree-ivopts-details I have compared the assembly files and object files generated by different versions of the gcc. One is: $ /lhome/gengq/riscv64-linux-mastertest/bin/riscv64-unknown-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/lhome/gengq/riscv64-linux-mastertest/bin/riscv64-unknown-linux-gnu-gcc COLLECT_LTO_WRAPPER=/lhome/gengq/riscv64-linux-mastertest/libexec/gcc/riscv64-unknown-linux-gnu/11.0.0/lto-wrapper Target: riscv64-unknown-linux-gnu Configured with: /lhome/gengq/riscv-gnu-toolchain-master/riscv-gnu-toolchain/riscv-gcc/configure --target=riscv64-unknown-linux-gnu --prefix=/lhome/gengq/riscv64-linux-mastertest --with-sysroot=/lhome/gengq/riscv64-linux-mastertest/sysroot --with-system-zlib --enable-shared --enable-tls --enable-languages=c,c++,fortran --disable-libmudflap --disable-libssp --disable-libquadmath --disable-libsanitizer --disable-nls --disable-bootstrap --src=.././riscv-gcc --disable-multilib --with-abi=lp64d --with-arch=rv64gc 'CFLAGS_FOR_TARGET=-O2 -mcmodel=medlow' 'CXXFLAGS_FOR_TARGET=-O2 -mcmodel=medlow' Thread model: posix Supported LTO compression algorithms: zlib gcc version 11.0.0 20210209 (experimental) (GCC) cmd is: /lhome/gengq/riscv64-linux-mastertest/bin/riscv64-unknown-linux-gnu-gcc -march=rv32imafdc -mabi=ilp32d -Os ivopt_os.c -c The other is: $ /lhome/gengq/riscv64-linux-810test/bin/riscv32-unknown-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/lhome/gengq/riscv64-linux-810test/bin/riscv32-unknown-linux-gnu-gcc COLLECT_LTO_WRAPPER=/lhome/gengq/riscv64-linux-810test/libexec/gcc/riscv32-unknown-linux-gnu/8.1.0/lto-wrapper Target: riscv32-unknown-linux-gnu Configured with: /lhome/gengq/riscv-gnu-toolchain-master/riscv-gnu-toolchain/riscv-gcc/configure --target=riscv32-unknown-linux-gnu --prefix=/lhome/gengq/riscv64-linux-810test --with-sysroot=/lhome/gengq/riscv64-linux-810test/sysroot --with-newlib --without-headers --disable-shared --disable-threads --with-system-zlib --enable-tls --enable-languages=c --disable-libatomic --disable-libmudflap --disable-libssp --disable-libquadmath --disable-libgomp --disable-nls --disable-bootstrap --src=.././riscv-gcc --with-pkgversion= --disable-multilib --with-abi=ilp32d --with-arch=rv32gc 'CFLAGS_FOR_TARGET=-O2 -mcmodel=medlow' 'CXXFLAGS_FOR_TARGET=-O2 -mcmodel=medlow' CC=gcc CXX=g++ Thread model: single gcc version 8.1.0 () cmd is: /lhome/gengq/riscv64-linux-810test/bin/riscv32-unknown-linux-gnu-gcc -march=rv32imafdc -mabi=ilp32d -Os ivopt_os.c -fdump-tree-all-details -c The code generated by gcc11.0 is worse than by gcc8.1.0. I have done some analysis and I think the difference due to 'ivopts'. It seems that gcc11.0 has done a more detailed job in 'ivopts'. For gcc11.0,there are 2 best candidate sets: One is equivalent to what gcc8.0 used. Another one is the final choice of gcc11.0. And its 'cost' is very close to the other one. I noticed that: The second set include more invariants and less induction varibles. The code implementation prefers to use iv. And this preference can sway the final choice as the differences are minimal. So,why prefer iv? Is there any better treatment here? What I can think of from my experience is that the inv variables are more atomic and have more potential to be optimized. But this also means that the inv may generate more intermediate variables if it is not optimised. Like this case, we chose to use more invs and also created more intermediate variables, which ended up overflowing the registers. I'm not sure I've hit the nail on the head with my analysis, and I'd like to try to find a better solution.