On Wed, 25 Jul 2018, Martin Sebor wrote:
> > BUT - for the string_constant and c_strlen functions we are,
> > in all cases we return something interesting, able to look
> > at an initializer which then determines that type. Hopefully.
> > I think the strlen() folding code when it sets SSA ranges
> > now looks at types ...?
> >
> > Consider
> >
> > struct X { int i; char c[4]; int j;};
> > struct Y { char c[16]; };
> >
> > void foo (struct X *p, struct Y *q)
> > {
> > memcpy (p, q, sizeof (struct Y));
> > if (strlen ((char *)(struct Y *)p + 4) < 7)
> > abort ();
> > }
> >
> > here the GIMPLE IL looks like
> >
> > const char * _1;
> >
> > <bb 2> [local count: 1073741825]:
> > _5 = MEM[(char * {ref-all})q_4(D)];
> > MEM[(char * {ref-all})p_6(D)] = _5;
> > _1 = p_6(D) + 4;
> > _2 = __builtin_strlen (_1);
> >
> > and I guess Martin would argue that since p is of type struct X
> > + 4 gets you to c[4] and thus strlen of that cannot be larger
> > than 3. But of course the middle-end doesn't work like that
> > and luckily we do not try to draw such conclusions or we
> > are somehow lucky that for the testcase as written above we do not
> > (I'm not sure whether Martins changes in this area would derive
> > such conclusions in principle).
>
> Only if the strlen argument were p->c.
>
> > NOTE - we do not know the dynamic type here since we do not know
> > the dynamic type of the memory pointed-to by q! We can only
> > derive that at q+4 there must be some object that we can
> > validly call strlen on (where Martin again thinks strlen
> > imposes constrains that memchr does not - sth I do not agree
> > with from a QOI perspective)
>
> The dynamic type is a murky area.
It's well-specified in the middle-end. A store changes the
dynamic type of the stored-to object. If that type is
compatible with the surrounding objects dynamic type that one
is not affected, if not then the surrounding objects dynamic
type becomes unspecified. There is TYPE_TYPELESS_STORAGE
to somewhat control "compatibility" of subobjects.
> As you said, above we don't
> know whether *p is an allocated object or not. Strictly speaking,
> we would need to treat it as such. It would basically mean
> throwing out all type information and treating objects simply
> as blobs of bytes. But that's not what GCC or other compilers do
> either.
It is what GCC does unless it sees a store to the memory. Basically
pointers carry no type information, only (visible!) stores
(and loads to some extent) provide information about dynamic types
of objects (allocated or declared - GCC doesn't make a difference there).
For instance, in the modified foo below, GCC eliminates
> the test because it assumes that *p and *q don't overlap. It
> does that because they are members of structs of unrelated types
> access to which cannot alias. I.e., not just the type of
> the access matters (here int and char) but so does the type of
> the enclosing object. If it were otherwise and only the type
> of the access mattered then eliminating the test below wouldn't
> be valid (objects can have their stored value accessed by either
> an lvalue of a compatible type or char).
>
> void foo (struct X *p, struct Y *q)
> {
> int j = p->j;
> q->c[__builtin_offsetof (struct X, j)] = 0;
> if (j != p->j)
> __builtin_abort ();
> }
Here GCC sees both a load and a store where it derives the
information from. And yes, it looks at the full access
structure which contains a dereference of p and of q.
Because of that and the fact that the store to q->c[]
(which for GCC implies a store to *q!) that changes the dynamic
type.
> Clarifying (and adjusting if necessary) this area is among
> the goals of the C object model proposal and the ongoing study
> group. We have been talking about some of these cases there
> and trying to come up with ways to let code do what it needs
> to do without compromising existing language rules, which was
> the consensus position within WG14 when the study group was
> formed: i.e., to clarify or reaffirm existing rules and, in
> cases of ambiguity or where the standard is unintentionally
> overly permissive), favor tighter rules over looser ones.
There is also the C++ object model and the Ada object model and ...
GCC already has an object model in its middle-end and that is
not going to change. And obviously it was modeled after the
requirements from the languages the middle-end supports. The
latest change was made necessary by C++ (placement new and
storage re-use specifically).
Richard.