jurahul wrote:
I did 2 sets of experiments, but data wise I am inconclusive if this causes a
real compile time regression.
1. Build MLIR verbose and capture all mlir-gen command lines to a file:
ninja -C build check-mlir --verbose | tee build_log.txt
grep "NATIVE/bin/mlir-tblgen " build_log.txt | cut -d ' ' -f 2- >
mlir-tablegen-commands.txt
2. Build both baseline and new versions of LLVM/MLIR in 2 different paths
"upstream_clean" and "upstream_llvm"
3. Use attached script to run these captured commands with --time-phases and
measure total time.
4. Establish baseline variance, by running the script comparing baseline to
itself.
Total time 4.2302 4.2573
0.6406
So baseline variance is 0.6%, with each command running 20 times. Note that
for individual targets,
the variance is quite high for some of them, upto 100%.
5. Establish "new" variance, by running script to compare new to itself
Total time 4.2829 4.2531
-0.6958
Again, 0.6% variance.
6. Run baseline against new:
Total time 4.1745 4.2864
2.6806
So this seems to give 2.6% regression. However, the individual data is quite
noisy. For example, for individual samples,
the variance can be quite high, upto 100%.
7. Add a FormatVariadic benchmark to test format() with 1-5 substitutions
(which covers the common usage in LLVM), and run baseline and new:
./build/benchmarks/FormatVariadic --benchmark_repetitions=20
Baseline:
BM_FormatVariadic_mean 1063 ns 1063 ns 20
New:
BM_FormatVariadic_mean 1097 ns 1097 ns 20
This is ~3.2% regression in just formatv.
The benchmark I added was:
```C++
#include "benchmark/benchmark.h"
#include "llvm/Support/FormatVariadic.h"
using namespace llvm;
// Benchmark intrinsic lookup from a variety of targets.
static void BM_FormatVariadic(benchmark::State &state) {
for (auto _ : state) {
// Exercise formatv() with several valid replacement options.
formatv("{0}", 1).str();
formatv("{0}{1}", 1, 1).str();
formatv("{0}{1}{2}", 1, 1, 1).str();
formatv("{0}{1}{2}{3}", 1, 1, 1, 1).str();
formatv("{0}{1}{2}{3}{4}", 1, 1, 1, 1, 1).str();
}
}
BENCHMARK(BM_FormatVariadic);
BENCHMARK_MAIN();
```
The compile time data collected from mlir-tblgen runs is quite noisy for
individual targets, though the aggregated results seem stable, but I wonder if
that means that its not really capturing small compile time delta correctly. As
an example:
```
lir/Dialect/MemRef/IR/MemRefOps.cpp.inc 0.0106 0.0119 12.2642%
mlir/include/mlir/IR/BuiltinOps.cpp.inc 0.0048 0.0042 -12.5000%
```
So within the same run, for one target its +12% and for another its -12%.
The other line of thinking is that this validation is an aid to developers, so
enabling it just in Debug builds may be good enough to catch issues. I am
attaching the script and the capture mlit-tblgen commands used in the script
below
[mlir-tablegen-commands.txt](https://github.com/user-attachments/files/16770614/mlir-tablegen-commands.txt)
[ct_formatv.txt](https://github.com/user-attachments/files/16770618/ct_formatv.txt)
https://github.com/llvm/llvm-project/pull/105745
_______________________________________________
lldb-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits