On 11/4/21 3:42 AM, Jakub Jelinek via Gcc-patches wrote:
Hi!

When users don't use constexpr everywhere in initialization of namespace
scope non-comdat vars and the initializers aren't constant when FE is
looking at them, the FE performs dynamic initialization of those variables.
But after inlining and some constant propagation, we often end up with
just storing constants into those variables in the _GLOBAL__sub_I_*
constructor.
C++ gives us permission to change some of that dynamic initialization
back into static initialization - https://eel.is/c++draft/basic.start.static#3
For classes that need (dynamic) construction, I believe access to some var
from other dynamic construction before that var is constructed is UB, but
as the example in the above mentioned spot of C++:
inline double fd() { return 1.0; }
extern double d1;
double d2 = d1;     // unspecified:
                     // either statically initialized to 0.0 or
                     // dynamically initialized to 0.0 if d1 is
                     // dynamically initialized, or 1.0 otherwise
double d1 = fd();   // either initialized statically or dynamically to 1.0
some vars can be used before they are dynamically initialized and the
implementation can still optimize those into static initialization.

The following patch attempts to optimize some such cases back into
DECL_INITIAL initializers and where possible (originally const vars without
mutable members) put those vars back to .rodata etc.

Because we put all dynamic initialization from a single TU into one single
function (well, originally one function per priority but typically inline
those back into one function), we can either have a simpler approach
(from the PR it seems that is what LLVM uses) where either we manage to
optimize all dynamic initializers into constant in the TU, or nothing,
or by adding some markup - in the form of a pair of internal functions in
this patch - around each dynamic initialization that can be optimized,
we can optimize each dynamic initialization separately.

The patch adds a new pass that is invoked (through gate check) only on
DECL_ARTIFICIAL DECL_STATIC_CONSTRUCTOR functions, and looks there for
sequences like:
   .DYNAMIC_INIT_START (&b, 0);
   b = 1;
   .DYNAMIC_INIT_END (&b);
or
   .DYNAMIC_INIT_START (&e, 1);
   # DEBUG this => &e.f
   MEM[(struct S *)&e + 4B] ={v} {CLOBBER};
   MEM[(struct S *)&e + 4B].a = 1;
   MEM[(struct S *)&e + 4B].b = 2;
   MEM[(struct S *)&e + 4B].c = 3;
   # DEBUG BEGIN_STMT
   MEM[(struct S *)&e + 4B].d = 6;
   # DEBUG this => NULL
   .DYNAMIC_INIT_END (&e);
(where between the pair of markers everything is either debug stmts or
stores of constants into the variables or their parts).
The pass needs to be done late enough so that after IPA all the needed
constant propagation and perhaps loop unrolling is done, on the other
side should be early enough so that if we can't optimize it, we can
remove those .DYNAMIC_INIT* internal calls that could prevent some
further optimizations (they have fnspec such that they pretend to read
the corresponding variable).

In my work-in-progress patch to diagnose stores into constant
objects (and subobjects) I deal with the same problem.  I had
considered a pair of markers like those above (David Malcolm
suggested a smilar approach as well), but decided to go
a different route, not trusting they could be kept together,
or that they wouldn't be viewed as overly intrusive.  With
it, I have been able to distinguish dynamic initialization
from overwriting stores even at the end of compilation, but
I'd be fine changing that and running the detection earlier.

So if the markers are added for the purpose of optimizing
the dynamic initialization at file scope, could they be added
for those of locals as well?  That way I wouldn't need to add
a separate solution.


Currently the optimization is only able to optimize cases where the whole
variable is stored in a single store (typically scalar variables), or
uses the native_{encode,interpret}* infrastructure to create or update
the CONSTRUCTOR.  This means that except for the first category, we can't
right now handle unions or anything that needs relocations (vars containing
pointers to other vars or references).
I think it would be nice to incrementally add before the native_* fallback
some attempt to just create or update a CONSTRUCTOR if possible.  If we only
see var.a.b.c.d[10].e = const; style of stores, this shouldn't be that hard
as the whole access path is recorded there and we'd just need to decide what
to do with unions if two or more union members are accessed.  And do a deep
copy of the CONSTRUCTOR and try to efficiently update the copy afterwards
(the CONSTRUCTORs should be sorted on increasing offsets of the
members/elements, so doing an ordered vec insertion might not be the best
idea).  But MEM_REFs complicate this, parts or all of the access path
is lost.  For non-unions in most cases we could try to guess which field
it is (do we have some existing function to do that?

The sprintf pass (only, for now) uses field_at_offset() for this.

I vaguely remember
we've been doing that in some cases in the past in some folding but stopped
doing so) but with unions it will be harder or impossible.

As the middle-end can't easily differentiate between const variables without
and with mutable members,

In the work-in-progress patch I mentioned above I've added two
hooks: one to return if a type has a mutable member:

  extern bool lhd_has_mutable (const_tree);

and another if a decl is mutable:

  extern bool lhd_is_mutable (const_tree);

They let the middle end tell (with some effort) if a subobject
is mutable (they're called from field_at_offset()).  The effort
is in recreating the full path to the mmeber for every MEM_REF.
Since mutable is rare, this is a lot of work for usually no gain.
Rather than doing this for every MEM_REF I've been thinking of
caching this info for every struct type in some persistent
structure.  But for my purposes the mutable bit isn't enough.
To tell if a member is const the search needs to be done top
to bottom to detect even non-const membbers of const subobjects
of complete objects (const or not).  So the cache I'm thinking
of would need to be some sort of a sparse mapping from every
byte offset into a struct type to its const member.  I would
love it if the same solution could be used both for
the optimization and for warnings.  (Ideally, the cache would
be populated by the front ends as they parse structs, but it
could also be populated lazily by the middle end.)

I don't know if the optimization you're interested in (or one
like it) might also be applicable to const locals.  Detecting
accidental stores into those is much more likely to find subtle
bugs than just globals, simply because there are more cases,
and because many globals are in read-only memory and so writing
into those will crash.

Martin

Reply via email to