[PATCH] D74361: [Clang] Undef attribute for global variables

Arthur O'Dwyer via Phabricator via cfe-commits Mon, 02 Mar 2020 15:40:48 -0800

Quuxplusone added inline comments.


================
Comment at: clang/test/CodeGenCXX/attr-loader-uninitialized.cpp:23
+// CHECK: @nominally_value_init = global i32 undef
+int nominally_value_init  [[clang::loader_uninitialized]] = 4;
+
----------------
rjmccall wrote:
> JonChesterfield wrote:
> > rjmccall wrote:
> > > Quuxplusone wrote:
> > > > rjmccall wrote:
> > > > > JonChesterfield wrote:
> > > > > > Quuxplusone wrote:
> > > > > > > This test case is identical to line 36 of 
> > > > > > > clang/test/Sema/attr-loader-uninitialized.cpp, where you say you 
> > > > > > > don't want it to compile at all.
> > > > > > > 
> > > > > > > I think you need a clearer idea of how this interacts with 
> > > > > > > initializers. Is it merely supposed to eliminate the 
> > > > > > > //zero-initialization// that happens before the user-specified 
> > > > > > > construction/initialization, or is it supposed to compete with 
> > > > > > > the user-specified construction/initialization?
> > > > > > > 
> > > > > > > That is, for
> > > > > > > 
> > > > > > >     nontrivial unt [[clang::loader_uninitialized]];
> > > > > > > 
> > > > > > > is it merely supposed to call `unt::unt()` on a chunk of undef 
> > > > > > > memory (instead of the usual chunk of zeroed memory), or is it 
> > > > > > > supposed to skip the constructor entirely? And for
> > > > > > > 
> > > > > > >     int x [[clang::loader_uninitialized]] = foo();
> > > > > > > 
> > > > > > > is it merely supposed to call `foo()` and assign the result to a 
> > > > > > > chunk of undef memory (instead of the usual chunk of zeroed 
> > > > > > > memory), or is it supposed to skip the initialization entirely?
> > > > > > I think you commented while the first working piece of sema landed. 
> > > > > > My thinking is relatively clear but my understanding of clang's 
> > > > > > semantic analysis is a work in progress!
> > > > > > 
> > > > > > Initializers (`= foo()`) are straightforward. Error on the basis 
> > > > > > that the attribute effectively means `= undef`, and one should not 
> > > > > > have two initializers. A test case is now added for that (and now 
> > > > > > passes).
> > > > > > 
> > > > > > The codegen I want for a default constructed global marked with 
> > > > > > this variable is:
> > > > > > - global variable allocated, with undef as the original value
> > > > > > - default constructor call synthesized
> > > > > > - said default constructor set up for invocation from crt, before 
> > > > > > main, writing over the undef value
> > > > > > 
> > > > > > Where the default constructor can be optimized as usual, e.g. if it 
> > > > > > always writes a constant, we can init with that constant instead of 
> > > > > > the undef and elide the constructor.
> > > > > > 
> > > > > > I don't have that actually working yet - the constructor call is 
> > > > > > not being emitted, so we just have the undef global.
> > > > > > 
> > > > > > I think it's important to distinguish between the values of the 
> > > > > > bits when the program is loaded and whether constructor/destructors 
> > > > > > are called, as one could want any combination of the two.
> > > > > I think Arthur is suggesting that it would be useful to allow the 
> > > > > attribute to be used in conjunction with an initializer in C++, since 
> > > > > if the initializer has to be run dynamically, we can still 
> > > > > meaningfully suppress the static zero-initialization.   That is, 
> > > > > we've accepted that it's useful to do this when 
> > > > > *default-initializing* a global, but it's actually useful when doing 
> > > > > *any* kind of dynamic initialization.
> > > > > 
> > > > > Maybe we can leave it as an error in C++ when the initializer is a 
> > > > > constant expression.  Although that might be unnecessarily brittle if 
> > > > > e.g. the constructor is `constexpr` in one library version but not 
> > > > > another.
> > > > No, that's exctly what I mean. You seem to be holding two contradictory 
> > > > ideas simultaneously:
> > > > 
> > > >     [[loader_uninitialized]] X x = X{};  // two initializers, therefore 
> > > > error
> > > > 
> > > >     [[loader_uninitialized]] X x {}; // one initializer plus one 
> > > > constructor, therefore fine
> > > > 
> > > > In C++, these two declarations have identical semantics. It doesn't 
> > > > make sense to say that one of them "calls a constructor" and the other 
> > > > one "has an initializer." They're literally the same thing.
> > > > 
> > > > Similarly in both C99 and C++ with plain old ints:
> > > > 
> > > >     [[loader_uninitialized]] int x = foo();
> > > > 
> > > > This means "call foo and dynamically initialize x with the result," 
> > > > just as surely as
> > > > 
> > > >     [[loader_uninitialized]] X x = X();
> > > > 
> > > > means "call X::X and dynamically initialize x with the result." Having 
> > > > one rule for dynamic initializers of primitive types and a separate 
> > > > rule for dynamic initializers of class types doesn't work.
> > > > 
> > > > Furthermore, "dynamic initialization" can be promoted to compile-time:
> > > > 
> > > >     [[loader_uninitialized]] int x = 42;
> > > >     [[loader_uninitialized]] std::string_view x = "hello world";
> > > > 
> > > > It wouldn't make semantic sense to say that one of these has "two 
> > > > initializers" and the other has "one initializer," because both of the 
> > > > initializations end up happening at compile time and getting put into 
> > > > .data.
> > > > Similarly in both C99 and C++ with plain old ints:
> > > 
> > > C does not allow non-constant initialization of global variables.
> > > 
> > > Let  me try to explain what we're thinking here.  If the variable 
> > > provides both an initializer and this attribute, and the compiler can 
> > > successfully perform constant initialization (as it's required to do in 
> > > C, and as it often does in C++), then the attribute has probably become 
> > > meaningless because we're not going to suppress that constant 
> > > initialization.  The compiler strongly prefers to diagnose meaningless 
> > > attributes in some way, because they often indirectly indicate a bug.  
> > > For example, maybe the initializer isn't supposed to be there but it's 
> > > hidden behind a huge mess of macros; or maybe somebody added a constant 
> > > initializer because they changed the variable schema to make that 
> > > possible, and so the programmer should now remove the attribute and the 
> > > dynamic initialization code that went along with it.  Compared to that, 
> > > the C++ use case of "allow a dynamic initializer but suppress static 
> > > zero-initialization" is pretty marginal; that doesn't mean we absolutely 
> > > shouldn't support it, but it does suggest that we shouldn't prioritize it 
> > > over other aspects of the feature, like catching that possible bug.
> > I'm under the impression that the constructs are:
> > ```
> > X x = X{};  // default construct an X and then call the copy constructor at 
> > x
> > X x {}; // default construct an X at the location of x
> > X x; // default construct an X at the location of x
> > ```
> > 
> > The C++ initialisation rules are very complicated so I won't be shocked to 
> > have got that wrong. 
> > 
> > If your toolchain actually runs global constructors then there isn't much 
> > use to marking C++ objects with this attribute, unless said constructor 
> > does nothing and would thus leave the bytes still undef.
> > 
> > Accepting default constructors (or, perhaps only accepting trivial types?) 
> > would allow people to use this attribute with the approximation of C++ 
> > available on systems that don't actually run the global constructors.
> > 
> > Equally, if this seems too complicated, I'm happy to restrict this to C as 
> > that's still an improvement on the asm and/or linker scripts I have 
> > available today.
> > If your toolchain actually runs global constructors then there isn't much 
> > use to marking C++ objects with this attribute, unless said constructor 
> > does nothing and would thus leave the bytes still undef.
> 
> That's not totally true.  C++ still requires globals to be zero-initialized 
> before it runs dynamic initializers, and the dynamic initializer is very 
> likely to make that zero-initialization totally redundant.  Arthur is right 
> that there's no inherent reason that that only applies to dynamic *default* 
> initialization, though, as opposed to any non-constant initialization.
> 
> I'd prefer to avoid making this C-specific; Clang takes a lot of pride in 
> trying to make feature cross-language as much as possible.  We could just 
> make the error about providing both the attribute and an initializer 
> C-specific, though.
Ah, I was wrong when I said "in both C99 and C++" `int x = foo();` would be 
supported. You're right, that's a C++ism. Even C11 doesn't seem to support 
dynamic initializers, not even for function-local static variables.

I suppose I should also mention that I totally don't understand the domain 
here. :) To me, the way you avoid storing an explicit initializer in the ELF 
file is you allocate the variable into .bss instead of .data; and the way you 
avoid paying for zero-initialization at runtime would be that you allocate the 
variable into a hand-crafted section instead of .bss ([example from Keil 
docs](http://www.keil.com/support/man/docs/armcc/armcc_chr1359124243221.htm)). 
In ordinary ELF-land, there's no way to say "I want this variable in .data but 
not initialized," nor "I want this variable in .bss but not zero-initialized" — 
the file format literally can't represent those concepts.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74361/new/

https://reviews.llvm.org/D74361



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D74361: [Clang] Undef attribute for global variables

Reply via email to