Does gcc have different inlining heuristics on different platforms?

Julian Waters via Gcc Mon, 31 Mar 2025 04:19:25 -0700

Hi all,

I've been trying to chase down an issue that's been driving me insane
for a while now. It has to do with the flatten attribute being
combined with LTO. I've heard that flatten and LTO are a match made in
hell (Someone else's words, not mine), but from what I observe,
several methods marked as flatten on Linux compile to an acceptable
size with ok amount of inlining, but on Windows however... The exact
same methods marked as flatten have their callees inlined so
aggressively that they reach sizes of 5MB per method! Something seems
to be different between how inlining works on the 2 platforms, what
are the differences (If any) between Linux and Windows when it comes
to inlining, particularly involving the flatten attribute? Is there a
list of differences that is easily accessible somewhere, or
alternatively is there somewhere in the gcc source where the
heuristics are defined that I can decipher?


Here's one such example of the differences between Linux and Windows
(Both were compiled with the same optimization settings, -O3 and
-flto=auto):

Linux:
00000000010b12d0 0000000000006289 t
G1ParScanThreadState::trim_queue_to_threshold(unsigned int)

Windows:
0000000296f9b0c0 0000000000642d40 T
G1ParScanThreadState::trim_queue_to_threshold(unsigned int) [clone
.constprop.0]
0000000295125480 0000000000630080 T
G1ParScanThreadState::trim_queue_to_threshold(unsigned int)


Thanks in advance for the help, and for humouring my question

best regards,
Julian

Does gcc have different inlining heuristics on different platforms?

Reply via email to