[clang] [lld] [llvm] [Windows] Add support for emitting PGO/LTO magic strings in the Windows PE debug directory (PR #114260)

Mikołaj Piróg via cfe-commits Wed, 06 Nov 2024 07:03:01 -0800

mikolaj-pirog wrote:

> > I would like to benchmark `lld` after this change, since I have added a 
> > loop that goes through every section of every object file. Could someone 
> > point in the direction of a good benchmark for that? I was thinking I can 
> > benchmark on the linking of `clang` or any other big project as a reference
> 
> I think benchmarking `clang.exe` itself would be a good testbed. Build the 
> toolchain once in Release (first stage) then in another build folder, build 
> it a second time (second stage), but using `clang-cl.exe` and `lld-link.exe` 
> from the first build folder. Use ninja not MSBuild. Once the second stage has 
> completed, delete `clang.exe` from output folder and pass `ninja clang -v -d 
> keeprsp` on the command-line. That will show the LLD command line which you 
> can re-run and profile. You can also use `lld-link ... --time-trace` and add 
> a more specific `llvm::TimeTraceScope` to enclose the code that parses all 
> the sections.
> 
> If you have trouble building all this I can provide more detailed 
> instructions, please let me know.


Thanks for the suggestions. I have roughly benchmarked this and this change 
basically doesn't have an impact on lld performance. The overall runtime is 
essentially the same (around ~2s for RelWithDebInfo build). I have benchmarked 
under VTune and the function in question take too little time for VTune to 
record any valuable data (reporting they take 0.0s in both cases). The function 
that calls createMiscChunks (where my change resides), Writer::run(), doesn't 
appear in the ~70 most expensive function, and it also calls a bunch of other 
stuff on top of createMiscChunks. I have used the `--time-trace` functionality 
to measure more accurately: a script runs the linking x times, saving the trace 
file and greps the timer for the whole `Writer::run()` function. Here are the 
results (main first, my change second):
![image](https://github.com/user-attachments/assets/c82822e1-4d89-49cd-896f-59ebc7e54cfd)
![Screenshot 2024-11-06 
154022](https://github.com/user-attachments/assets/ae498d52-1409-44d9-b810-a1630604f25b)

So, the whole function `Write::run()` function is slower by 10ms (comparing 
best vs best, worst vs worst). I think this change introduces miniscule 
slowdown when compared to the whole linker machinery. I don't think it's 
necessary to measure under `hyperfine` or any other benchmarking tool, given 
the results presented (even if my results are off by 4x, they are still 
miniscule). 


https://github.com/llvm/llvm-project/pull/114260
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lld] [llvm] [Windows] Add support for emitting PGO/LTO magic strings in the Windows PE debug directory (PR #114260)

Reply via email to