rsmith added a comment.

In D68115#1946668 <https://reviews.llvm.org/D68115#1946668>, 
@hubert.reinterpretcast wrote:

> It sounds like we are looking for `-fzero-union-padding`. That's been where 
> the discussion has left off twice for months.


I believe the state of Clang prior to this patch is actually wrong. We reach 
this code regardless of the kind of initialization used for the union, and some 
forms of initialization require zeroing of padding whereas some do not. If we 
want to model this precisely, we'll need to track on `APValue` whether the 
padding bits of the union constant are zeroed or not. For example, given:

  union U { char c; int n; };
  U u = U(); // value-initialization
  U v = U{}; // aggregate initialization

Clang emits `{ i8 0, [3 x i8] undef }` for both `u` and `v`. That's correct for 
`v`, but incorrect for `u`: it should emit `{ i8 0, [3 x i8] zeroinitializer }` 
or perhaps simply `zeroinitializer`, because value-initialization invokes 
zero-initialization, which zeroes the padding.

Fixing this properly is likely not very hard, but it would require more changes 
than are present in this patch. (This patch is conservatively correct, but 
initializes more than we need to initialize.)

We also need to track in `APValue` whether padding bits are zeroed in order to 
correctly support `bit_cast` from structs with padding. Per discussion in 
committee, the intended behavior for `bit_cast` is:

> A bit in the value representation of the result is indeterminate if does not 
> correspond to a bit in the value representation of from or corresponds to a 
> bit of an object that is not within its lifetime or has an indeterminate 
> value ([basic.indet]). For each bit in the value representation of the result 
> that is indeterminate, the smallest object containing that bit has an 
> indeterminate value; the behavior is undefined unless that object is of 
> unsigned ordinary character type or std::byte type. The result does not 
> otherwise contain any indeterminate values.

So in particular:

  struct A { char c; int n; };
  constexpr long n = bit_cast<long>(A()); // ok, 0
  constexpr long m = bit_cast<long>(A{}); // ill-formed, indeterminate value 
due to uninitialized padding between c and n

So I would propose we take the following path:

1. Extend Clang's constant evaluator and `APValue` to track, for a struct or 
union value, whether all padding bits are zeroed. (Should always be `true` for 
a value with no padding bits.)
2. Land this patch behind a flag to zero all padding bits for unions, ideally 
extended to cover struct padding as well as union padding.
3. After doing (1), extend `__builtin_bit_cast` support to properly handle 
padding bits.
4. After doing (1) and (2), extend constant aggregate emission to always zero 
padding when required by the language standard. (If you want, make the flag be 
three-way: never zero, zero as required by language standard, always zero, 
maybe: `-fzero-padding=never` / `-fzero-padding=std`, `-fzero-padding=always`.)

Note that (1) and (2) are independent, so I don't think we need to block this 
patch (2) on the implementation of (1), but we should be aware that we're not 
done here until we do steps (3) and (4).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68115/new/

https://reviews.llvm.org/D68115



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to