Thanks for the quick reply, I'll ask the people responsible for working on the Linux parts try to compile and link the codebase with -fno-use-linker-plugin to see what happens. It's a bit disheartening to hear that LTO support on Windows is behind Linux though. I'd help get that up to speed if I could, but I don't even know where to start or look :(
best regards, Julian On Mon, Mar 31, 2025 at 8:09 PM Richard Biener <richard.guent...@gmail.com> wrote: > > On Mon, Mar 31, 2025 at 1:20 PM Julian Waters via Gcc <gcc@gcc.gnu.org> wrote: > > > > Hi all, > > > > I've been trying to chase down an issue that's been driving me insane > > for a while now. It has to do with the flatten attribute being > > combined with LTO. I've heard that flatten and LTO are a match made in > > hell (Someone else's words, not mine), but from what I observe, > > several methods marked as flatten on Linux compile to an acceptable > > size with ok amount of inlining, but on Windows however... The exact > > same methods marked as flatten have their callees inlined so > > aggressively that they reach sizes of 5MB per method! Something seems > > to be different between how inlining works on the 2 platforms, what > > are the differences (If any) between Linux and Windows when it comes > > to inlining, particularly involving the flatten attribute? Is there a > > list of differences that is easily accessible somewhere, or > > alternatively is there somewhere in the gcc source where the > > heuristics are defined that I can decipher? > > > > Here's one such example of the differences between Linux and Windows > > (Both were compiled with the same optimization settings, -O3 and > > -flto=auto): > > > > Linux: > > 00000000010b12d0 0000000000006289 t > > G1ParScanThreadState::trim_queue_to_threshold(unsigned int) > > > > Windows: > > 0000000296f9b0c0 0000000000642d40 T > > G1ParScanThreadState::trim_queue_to_threshold(unsigned int) [clone > > .constprop.0] > > 0000000295125480 0000000000630080 T > > G1ParScanThreadState::trim_queue_to_threshold(unsigned int) > > > > > > Thanks in advance for the help, and for humouring my question > > The main difference is that LTO on Linux can use the linker plugin > to derive information about how TUs are combined while on Windows > we're using the "collect2 path" which is quite unmaintained and which > gives imprecise information. This can already result in quite different > inlining. You can "simulated" that on Linux with -fno-use-linker-plugin > (only for experimenting, don't use this unless necssary). > > Richard. > > > > > best regards, > > Julian