Issue |
144858
|
Summary |
[LSR][slow compilation] LSR 10000x slower for some X86-64 architectures than others
|
Labels |
new issue
|
Assignees |
|
Reporter |
jeanPerier
|
When compiling a Fortran test with flang ([source available here](https://github.com/fujitsu/compiler-test-suite/blob/main/Fortran/0108/0108_0184.f90)), it was noticed that it is more than 1000 times slower to compile when targeting most X86-64 (`icelake-serever`, or `znver2` , `znver3`, `emeraldrapids`, `skylake`), than when targeting `znver4` or `znver5`.
The slow compilations are very slow (more than a 100s) and 99.9% of the time is spent in the "Loop Strength Reduction" reduction pass according to `-ftime-report`.
It looks like a lot of time is spent in `CompareSCEVComplexity` under `llvm::ScalarEvolution::getAddExpr` called from `LSRInstance::GenerateReassociationsImpl`.
Attached are:
- [clang_repro.ll.txt](https://github.com/user-attachments/files/20815720/clang_repro.ll.txt) which contains the IR from produced by flang and one can see the compilation time difference of it with `clang -O3 -march=znver4` vs `clang -O3 -march=icelake-serever`.
- [lsr_repro.ll.txt](https://github.com/user-attachments/files/20815717/lsr_repro.ll.txt) that is the IR taken from a slow compilation right before LSR. One can reproduce the slow LSR step with it with `opt -loop-reduce`.
Note that `lsr_repro.ll` does not contain CPU specific attributes, so it seems the architecture does not impact LSR speed directly, but rather the IR that reaches it due to previous optimization and LSR chokes on it while it looks reasonable (a bit more than 1000 IR ops). There is likely a quadratic behavior somewhere here.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs