ychen added a comment.
In D126341#3554286 <https://reviews.llvm.org/D126341#3554286>, @rnk wrote:
> Well, I guess we're out of luck, but that seems like a very poorly considered
> requirement from the standard. If we can't use comdats for inline variables,
> every time you include a header with a dynamically initialized variable, it
> will generate extra initialization code in every TU that cannot be optimized
> away. This reminds me of the problems people used to have where every TU
> including <iostream> emitted extra initialization code.
>
> I think we have two options:
>
> 1. Full conformance: Stop using comdats altogether and suffer costs in code
> size and startup time
> 2. Partial / compromised conformance: Provide ordering guarantees between
> global_ctors entries so that we can ensure that inline variables in headers
> have the expected initialization order
I'd prefer option 2 due to the code size/startup time cost. About enforcing the
order in global_ctors (and maybe also global_dtors), how about adding an
integer `order` field to each entry in global_ctors. Then let the IR verifier
check the order is non-descending (passes are allowed to reorder entries with
the same `order`). Another guarantee needed for option 2 is that the linker
consistently picks the first COMDAT. I think this is getting out of our hands
but this assumption *could* be made in practice? In very rare cases, if a
linker changes this behavior, we'll have to rely on user reports to find out.
================
Comment at: clang/lib/CodeGen/CGDeclCXX.cpp:563
+ } else if (IsInstantiation ||
getContext().GetGVALinkageForVariable(D) == GVA_DiscardableODR ||
D->hasAttr<SelectAnyAttr>()) {
----------------
rsmith wrote:
> rnk wrote:
> > rsmith wrote:
> > > rnk wrote:
> > > > @rsmith , if inline global variable initialization has ordering
> > > > requirements, we have a bug, because I believe this GVA_DiscardableODR
> > > > codepath handles them, and we come through here to give them separate
> > > > initializers in llvm.global_ctors. See the example with two separate
> > > > global_ctors entries on godbolt:
> > > > https://gcc.godbolt.org/z/5d577snqb
> > > >
> > > > As long as LLVM doesn't provide ordering guarantees about same priority
> > > > initializers in global_ctors, inline globals have the same problems as
> > > > template instantiations. IMO whatever solution we use to order inline
> > > > globals should be used for template instantiations.
> > > >
> > > > Intuitively, that means LLVM should promise to run global_ctors in
> > > > left-to-right order, and if all TUs instantiate initializers in the
> > > > same order, everything should behave intuitively.
> > > >
> > > > The question then becomes, why doesn't this work already?
> > > > Intuitively, that means LLVM should promise to run global_ctors in
> > > > left-to-right order
> > >
> > > I don't think that's sufficient, due to the way we use COMDATs to discard
> > > duplicate global initializers. Consider:
> > >
> > > TU 1 defines inline variable A.
> > > TU 2 defines inline variable A and then inline variable B.
> > > The standard guarantees that A is initialized before B in this scenario.
> > > But if (somehow) the linker picks the definition of A from TU 2, but
> > > orders the initializers from TU 1 first, then the resulting global_ctors
> > > order will be B then A, which is not allowed.
> > >
> > > Either we need a guarantee that linkers will use the same ordering
> > > between objects when picking COMDATs as when concatenating `.init_array`,
> > > or we need to stop using the COMDAT trick for them (and either make LLVM
> > > respect the `@llvm.global_ctors` order or coalesce all inline variable
> > > initializers into the same function we run non-inline initializers from
> > > or something). Getting that guarantee seems like the best path to me,
> > > since the other option will presumably mean we check the guard variable
> > > once on startup for each TU that defines the variable, and it's something
> > > I expect linkers already happen to guarantee in practice.
> > >
> > > > IMO whatever solution we use to order inline globals should be used for
> > > > template instantiations.
> > >
> > > That sounds like it would regress what globalopt is able to optimize, for
> > > no gain in conformance nor perhaps any real gain in initialization order
> > > guarantees. The inline variable case is different, both because we can
> > > guarantee an initialization order and because the standard requires us to
> > > do so. Should we add complexity to the compiler whose only purpose is to
> > > mask bugs, if that complexity doesn't actually define away the
> > > possibility of bad behavior?
> > >
> > > If we can actually describe a rule that we provide for initialization
> > > order of instantiated variables, and we can easily implement that rule
> > > and be confident we won't want to substantially weaken it later, and we
> > > can thereby assure our users that we will satisfy that rule, then I think
> > > that could be interesting, but anything less than that doesn't seem
> > > worthwhile to me.
> > >
> > > > The question then becomes, why doesn't this work already?
> > >
> > > It looks like it mostly does.
> > >
> > > One factor here is instantiation order. When we instantiate a variable,
> > > we add the variables that it references to our "to be instantiated" list,
> > > which is processed later:
> > > ```
> > > template<int N> int Fib = Fib<N - 2> + Fib<N - 1>;
> > > template<> int Fib<0> = 0;
> > > template<> int Fib<1> = 1;
> > > ```
> > > * instantiating `Fib<5>` will append `Fib<3>` and `Fib<4>` to the list
> > > * then we visit `Fib<3>` and append `Fib<1>` and `Fib<2>` to the list
> > > * then we visit `Fib<4>` and add no new entries
> > >
> > > We pass declarations to the consumer when we're done with the
> > > instantiation step. *Sometimes* this includes instantiating variables
> > > referenced by that variable, and *sometimes* it doesn't. The difference
> > > is whether we're performing "recursive" instantiation or not.
> > >
> > > When we're performing immediate instantiation of a variable (either
> > > because it was explicitly instantiated, or because we might need its
> > > value immediately because it might be usable in constant expressions),
> > > our instantiation step is non-recursive. We just add declarations to
> > > `Sema`'s "to be instantiated at end of TU" list. This is at least a
> > > little important semantically: we allow a matching specialization to be
> > > declared after the first use wherever possible. In that case, we'll pass
> > > a declaration to the consumer before we've instantiated the things it
> > > references.
> > >
> > > When we're performing the end-of-TU instantiation of all referenced
> > > template specializations, we do that recursively, and that means that we
> > > will instantiate any referenced variables *before* we pass the
> > > referencing variable to the AST consumer.
> > >
> > > You can see this happening here: https://godbolt.org/z/4sj1Y7W4G (look at
> > > the order in which we get the warnings: for `Fib<5>` then `x` then
> > > `Fib<3>`, then `Fib<2>`, then `Fib<4>`, and note that we initialize
> > > `Fib<5>` first, because it's the first thing added to the consumer. Then
> > > `Fib<2>`, `Fib<3>`, and `Fib<4>` get passed to the consumer *in that
> > > order*, because that is the order in which the instantiations of their
> > > definitions *finish*. But `Fib<5>`'s instantiation finishes first,
> > > because that's a non-recursive instantiation.
> > >
> > > Without the explicit instantiation, we pass the variables to the AST
> > > consumer in the order `Fib<2>`, `Fib<3>`, `Fib<4>`, `Fib<5>`:
> > > https://godbolt.org/z/sax1dvh7z leading to an "intuitive" result. They
> > > end up dependency-ordered because that's the order in which their
> > > instantiations happen to finish. (Example with a static data member:
> > > https://godbolt.org/z/9fsbPvWTP)
> > >
> > > Another factor (and the reason why my examples are working and @ychen's
> > > very similar examples are not) is `CodeGenModule`'s deferred emission of
> > > variables. If an instantiated variable has an initializer with no side
> > > effects, `CodeGenModule` won't emit it unless it emits a reference to it
> > > (there's no point emitting something that will just be discarded by the
> > > linker). And `CodeGenModule`'s process for emitting deferred declarations
> > > does all kinds of reordering. The way this works is:
> > >
> > > `CodeGenModule` is handed the globals, sees they don't need to be
> > > emitted, and ignores them. Then:
> > >
> > > * it emits a reference to `Fib<5>` and decides that `Fib<5>` needs to be
> > > emitted and adds it to a queue of deferred declarations to emit
> > > * it emits the deferred declaration `Fib<5>`, which adds `Fib<3>` and
> > > `Fib<4>` to a new queue for things used by `Fib<5>`
> > > * it emits the things used by `Fib<5>`: `Fib<3>` and `Fib<4>`
> > > * emitting `Fib<3>` adds `Fib<2>` to a new queue for things used by
> > > `Fib<3>`, so `Fib<2>` is emitted next
> > > * emitting `Fib<4>` adds nothing new to the queue
> > >
> > > So when the initializers don't have side-effects, the variables are
> > > initialized in the order `Fib<5>`, `Fib<3>`, `Fib<2>`, `Fib<4>`.
> > >
> > > So the "reversing" has at least two sources (beyond anything that
> > > globalopt or the linker might do):
> > > * non-recursive instantiations in `Sema` will cause an instantiated
> > > variable to be passed to the consumer before the things it references;
> > > this can mostly be avoided by not using explicit instantiations
> > > * instantiated variables with side-effect-free initializers have their
> > > initializers emitted on use rather than in instantiation order; this can
> > > be avoided by building with `-femit-all-decls` or by adding a dummy
> > > side-effect to the initializer
> > >
> > > If we want to fix just the `CodeGen` side of this, I think the thing to
> > > do would be to follow the model that `CodeGen` uses for ordered
> > > initialization (it tracks a `DelayedCXXInitPosition` map giving the order
> > > in which the variables should be initialized, if their initializers are
> > > actually emitted). We could do the same thing for instantiated variables,
> > > allocating each one handed to CodeGen a slot which either gets filled in
> > > with that initializer, or doesn't get emitted if the variable is not
> > > emitted. But, as noted above, I'm not convinced it's worth it unless this
> > > leads to some actual user-facing behavior guarantee.
> > > TU 1 defines inline variable A.
> > > TU 2 defines inline variable A and then inline variable B.
> > > The standard guarantees that A is initialized before B in this scenario.
> > > But if (somehow) the linker picks the definition of A from TU 2, but
> > > orders the initializers from TU 1 first, then the resulting global_ctors
> > > order will be B then A, which is not allowed.
> >
> > Yes, but we only get the ordering guarantee if A is defined before B in all
> > TUs, and both MSVC and Clang seem to emit all inline globals with dynamic
> > initializers, even if they are not ODR used:
> > https://gcc.godbolt.org/z/MhKvGqTez
> >
> > So, in your example, TU1 must not have a definition of B, meaning there is
> > no guarantee of ordering.
> >
> > All we need to do to fix the inline variable conformance bug is for LLVM to
> > guarantee an initialization order in global_ctors. However, the bug isn't
> > observable in practice because inline globals require guard variables, and
> > globalopt can't optimize those yet.
> >
> > I don't want to get too deep into the clang implementation details. I trust
> > you that it's complicated. My perspective is just that, if we have a
> > working ordering solution for inline variables, we should use it, if it is
> > simple and low cost, for instantiated variables. It sounds to me like
> > ordering instantiations is not cheap and easy, so we shouldn't do it.
> >
> > And, this change in particular doesn't address many cases in practice.
> > > TU 1 defines inline variable A.
> > > TU 2 defines inline variable A and then inline variable B.
> > > The standard guarantees that A is initialized before B in this scenario.
> > > But if (somehow) the linker picks the definition of A from TU 2, but
> > > orders the initializers from TU 1 first, then the resulting global_ctors
> > > order will be B then A, which is not allowed.
> >
> > Yes, but we only get the ordering guarantee if A is defined before B in all
> > TUs,
>
> That's not quite right. We get the guarantee if A is defined before B in all
> TUs where B is defined. It's OK for there to be TUs where A is defined but B
> is not. (See the relevant [language
> rule](https://eel.is/c++draft/basic.start.dynamic#3.1), which requires only
> that "for every definition of [B] there exists a definition of [A]" and not
> the other way around.)
>
> The intended user model is that it's OK for B to rely on A if they're defined
> in the same header file, or if A is defined in a header file that B includes.
> And it's OK if there's a source file that includes A's header but not B's.
> The intended implementation model is that inline variables are initialized
> (as if) in definition order within each TU they are defined in, with a guard
> variable preventing repeated initialization.
> TU 1 defines inline variable A.
> TU 2 defines inline variable A and then inline variable B.
> The standard guarantees that A is initialized before B in this scenario. But
> if (somehow) the linker picks the definition of A from TU 2, but orders the
> initializers from TU 1 first, then the resulting global_ctors order will be B
> then A, which is not allowed.
If I understand this example correctly, a consistent linker that always picks
the last COMDAT would also produce the wrong result.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D126341/new/
https://reviews.llvm.org/D126341
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits