ychen added a comment. In D126341#3554286 <https://reviews.llvm.org/D126341#3554286>, @rnk wrote:
> Well, I guess we're out of luck, but that seems like a very poorly considered > requirement from the standard. If we can't use comdats for inline variables, > every time you include a header with a dynamically initialized variable, it > will generate extra initialization code in every TU that cannot be optimized > away. This reminds me of the problems people used to have where every TU > including <iostream> emitted extra initialization code. > > I think we have two options: > > 1. Full conformance: Stop using comdats altogether and suffer costs in code > size and startup time > 2. Partial / compromised conformance: Provide ordering guarantees between > global_ctors entries so that we can ensure that inline variables in headers > have the expected initialization order I'd prefer option 2 due to the code size/startup time cost. About enforcing the order in global_ctors (and maybe also global_dtors), how about adding an integer `order` field to each entry in global_ctors. Then let the IR verifier check the order is non-descending (passes are allowed to reorder entries with the same `order`). Another guarantee needed for option 2 is that the linker consistently picks the first COMDAT. I think this is getting out of our hands but this assumption *could* be made in practice? In very rare cases, if a linker changes this behavior, we'll have to rely on user reports to find out. ================ Comment at: clang/lib/CodeGen/CGDeclCXX.cpp:563 + } else if (IsInstantiation || getContext().GetGVALinkageForVariable(D) == GVA_DiscardableODR || D->hasAttr<SelectAnyAttr>()) { ---------------- rsmith wrote: > rnk wrote: > > rsmith wrote: > > > rnk wrote: > > > > @rsmith , if inline global variable initialization has ordering > > > > requirements, we have a bug, because I believe this GVA_DiscardableODR > > > > codepath handles them, and we come through here to give them separate > > > > initializers in llvm.global_ctors. See the example with two separate > > > > global_ctors entries on godbolt: > > > > https://gcc.godbolt.org/z/5d577snqb > > > > > > > > As long as LLVM doesn't provide ordering guarantees about same priority > > > > initializers in global_ctors, inline globals have the same problems as > > > > template instantiations. IMO whatever solution we use to order inline > > > > globals should be used for template instantiations. > > > > > > > > Intuitively, that means LLVM should promise to run global_ctors in > > > > left-to-right order, and if all TUs instantiate initializers in the > > > > same order, everything should behave intuitively. > > > > > > > > The question then becomes, why doesn't this work already? > > > > Intuitively, that means LLVM should promise to run global_ctors in > > > > left-to-right order > > > > > > I don't think that's sufficient, due to the way we use COMDATs to discard > > > duplicate global initializers. Consider: > > > > > > TU 1 defines inline variable A. > > > TU 2 defines inline variable A and then inline variable B. > > > The standard guarantees that A is initialized before B in this scenario. > > > But if (somehow) the linker picks the definition of A from TU 2, but > > > orders the initializers from TU 1 first, then the resulting global_ctors > > > order will be B then A, which is not allowed. > > > > > > Either we need a guarantee that linkers will use the same ordering > > > between objects when picking COMDATs as when concatenating `.init_array`, > > > or we need to stop using the COMDAT trick for them (and either make LLVM > > > respect the `@llvm.global_ctors` order or coalesce all inline variable > > > initializers into the same function we run non-inline initializers from > > > or something). Getting that guarantee seems like the best path to me, > > > since the other option will presumably mean we check the guard variable > > > once on startup for each TU that defines the variable, and it's something > > > I expect linkers already happen to guarantee in practice. > > > > > > > IMO whatever solution we use to order inline globals should be used for > > > > template instantiations. > > > > > > That sounds like it would regress what globalopt is able to optimize, for > > > no gain in conformance nor perhaps any real gain in initialization order > > > guarantees. The inline variable case is different, both because we can > > > guarantee an initialization order and because the standard requires us to > > > do so. Should we add complexity to the compiler whose only purpose is to > > > mask bugs, if that complexity doesn't actually define away the > > > possibility of bad behavior? > > > > > > If we can actually describe a rule that we provide for initialization > > > order of instantiated variables, and we can easily implement that rule > > > and be confident we won't want to substantially weaken it later, and we > > > can thereby assure our users that we will satisfy that rule, then I think > > > that could be interesting, but anything less than that doesn't seem > > > worthwhile to me. > > > > > > > The question then becomes, why doesn't this work already? > > > > > > It looks like it mostly does. > > > > > > One factor here is instantiation order. When we instantiate a variable, > > > we add the variables that it references to our "to be instantiated" list, > > > which is processed later: > > > ``` > > > template<int N> int Fib = Fib<N - 2> + Fib<N - 1>; > > > template<> int Fib<0> = 0; > > > template<> int Fib<1> = 1; > > > ``` > > > * instantiating `Fib<5>` will append `Fib<3>` and `Fib<4>` to the list > > > * then we visit `Fib<3>` and append `Fib<1>` and `Fib<2>` to the list > > > * then we visit `Fib<4>` and add no new entries > > > > > > We pass declarations to the consumer when we're done with the > > > instantiation step. *Sometimes* this includes instantiating variables > > > referenced by that variable, and *sometimes* it doesn't. The difference > > > is whether we're performing "recursive" instantiation or not. > > > > > > When we're performing immediate instantiation of a variable (either > > > because it was explicitly instantiated, or because we might need its > > > value immediately because it might be usable in constant expressions), > > > our instantiation step is non-recursive. We just add declarations to > > > `Sema`'s "to be instantiated at end of TU" list. This is at least a > > > little important semantically: we allow a matching specialization to be > > > declared after the first use wherever possible. In that case, we'll pass > > > a declaration to the consumer before we've instantiated the things it > > > references. > > > > > > When we're performing the end-of-TU instantiation of all referenced > > > template specializations, we do that recursively, and that means that we > > > will instantiate any referenced variables *before* we pass the > > > referencing variable to the AST consumer. > > > > > > You can see this happening here: https://godbolt.org/z/4sj1Y7W4G (look at > > > the order in which we get the warnings: for `Fib<5>` then `x` then > > > `Fib<3>`, then `Fib<2>`, then `Fib<4>`, and note that we initialize > > > `Fib<5>` first, because it's the first thing added to the consumer. Then > > > `Fib<2>`, `Fib<3>`, and `Fib<4>` get passed to the consumer *in that > > > order*, because that is the order in which the instantiations of their > > > definitions *finish*. But `Fib<5>`'s instantiation finishes first, > > > because that's a non-recursive instantiation. > > > > > > Without the explicit instantiation, we pass the variables to the AST > > > consumer in the order `Fib<2>`, `Fib<3>`, `Fib<4>`, `Fib<5>`: > > > https://godbolt.org/z/sax1dvh7z leading to an "intuitive" result. They > > > end up dependency-ordered because that's the order in which their > > > instantiations happen to finish. (Example with a static data member: > > > https://godbolt.org/z/9fsbPvWTP) > > > > > > Another factor (and the reason why my examples are working and @ychen's > > > very similar examples are not) is `CodeGenModule`'s deferred emission of > > > variables. If an instantiated variable has an initializer with no side > > > effects, `CodeGenModule` won't emit it unless it emits a reference to it > > > (there's no point emitting something that will just be discarded by the > > > linker). And `CodeGenModule`'s process for emitting deferred declarations > > > does all kinds of reordering. The way this works is: > > > > > > `CodeGenModule` is handed the globals, sees they don't need to be > > > emitted, and ignores them. Then: > > > > > > * it emits a reference to `Fib<5>` and decides that `Fib<5>` needs to be > > > emitted and adds it to a queue of deferred declarations to emit > > > * it emits the deferred declaration `Fib<5>`, which adds `Fib<3>` and > > > `Fib<4>` to a new queue for things used by `Fib<5>` > > > * it emits the things used by `Fib<5>`: `Fib<3>` and `Fib<4>` > > > * emitting `Fib<3>` adds `Fib<2>` to a new queue for things used by > > > `Fib<3>`, so `Fib<2>` is emitted next > > > * emitting `Fib<4>` adds nothing new to the queue > > > > > > So when the initializers don't have side-effects, the variables are > > > initialized in the order `Fib<5>`, `Fib<3>`, `Fib<2>`, `Fib<4>`. > > > > > > So the "reversing" has at least two sources (beyond anything that > > > globalopt or the linker might do): > > > * non-recursive instantiations in `Sema` will cause an instantiated > > > variable to be passed to the consumer before the things it references; > > > this can mostly be avoided by not using explicit instantiations > > > * instantiated variables with side-effect-free initializers have their > > > initializers emitted on use rather than in instantiation order; this can > > > be avoided by building with `-femit-all-decls` or by adding a dummy > > > side-effect to the initializer > > > > > > If we want to fix just the `CodeGen` side of this, I think the thing to > > > do would be to follow the model that `CodeGen` uses for ordered > > > initialization (it tracks a `DelayedCXXInitPosition` map giving the order > > > in which the variables should be initialized, if their initializers are > > > actually emitted). We could do the same thing for instantiated variables, > > > allocating each one handed to CodeGen a slot which either gets filled in > > > with that initializer, or doesn't get emitted if the variable is not > > > emitted. But, as noted above, I'm not convinced it's worth it unless this > > > leads to some actual user-facing behavior guarantee. > > > TU 1 defines inline variable A. > > > TU 2 defines inline variable A and then inline variable B. > > > The standard guarantees that A is initialized before B in this scenario. > > > But if (somehow) the linker picks the definition of A from TU 2, but > > > orders the initializers from TU 1 first, then the resulting global_ctors > > > order will be B then A, which is not allowed. > > > > Yes, but we only get the ordering guarantee if A is defined before B in all > > TUs, and both MSVC and Clang seem to emit all inline globals with dynamic > > initializers, even if they are not ODR used: > > https://gcc.godbolt.org/z/MhKvGqTez > > > > So, in your example, TU1 must not have a definition of B, meaning there is > > no guarantee of ordering. > > > > All we need to do to fix the inline variable conformance bug is for LLVM to > > guarantee an initialization order in global_ctors. However, the bug isn't > > observable in practice because inline globals require guard variables, and > > globalopt can't optimize those yet. > > > > I don't want to get too deep into the clang implementation details. I trust > > you that it's complicated. My perspective is just that, if we have a > > working ordering solution for inline variables, we should use it, if it is > > simple and low cost, for instantiated variables. It sounds to me like > > ordering instantiations is not cheap and easy, so we shouldn't do it. > > > > And, this change in particular doesn't address many cases in practice. > > > TU 1 defines inline variable A. > > > TU 2 defines inline variable A and then inline variable B. > > > The standard guarantees that A is initialized before B in this scenario. > > > But if (somehow) the linker picks the definition of A from TU 2, but > > > orders the initializers from TU 1 first, then the resulting global_ctors > > > order will be B then A, which is not allowed. > > > > Yes, but we only get the ordering guarantee if A is defined before B in all > > TUs, > > That's not quite right. We get the guarantee if A is defined before B in all > TUs where B is defined. It's OK for there to be TUs where A is defined but B > is not. (See the relevant [language > rule](https://eel.is/c++draft/basic.start.dynamic#3.1), which requires only > that "for every definition of [B] there exists a definition of [A]" and not > the other way around.) > > The intended user model is that it's OK for B to rely on A if they're defined > in the same header file, or if A is defined in a header file that B includes. > And it's OK if there's a source file that includes A's header but not B's. > The intended implementation model is that inline variables are initialized > (as if) in definition order within each TU they are defined in, with a guard > variable preventing repeated initialization. > TU 1 defines inline variable A. > TU 2 defines inline variable A and then inline variable B. > The standard guarantees that A is initialized before B in this scenario. But > if (somehow) the linker picks the definition of A from TU 2, but orders the > initializers from TU 1 first, then the resulting global_ctors order will be B > then A, which is not allowed. If I understand this example correctly, a consistent linker that always picks the last COMDAT would also produce the wrong result. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D126341/new/ https://reviews.llvm.org/D126341 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits