On 02/16/2018 04:22 AM, Richard Biener wrote:
On Thu, Feb 15, 2018 at 6:28 PM, Martin Sebor <mse...@gmail.com> wrote:
There are APIs to determine the base object and an offset
into it from all sorts of expressions, including ARRAY_REF,
COMPONENT_REF, and MEM_REF, but none of those I know about
makes it also possible to discover the member being referred
to.

Is there an API that I'm missing or a combination of calls
to some that would let me determine the (approximate) member
and/or element of an aggregate from a MEM_REF expression,
plus the offset from its beginning?

Say, given

  struct A
  {
     void *p;
     char b[3][9];
  } a[2];

and an expression like

  a[1].b[2] + 3

represented as the expr

  MEM_REF (char[9], a, 69)

 &MEM_REF (&a, 69)

you probably mean.

Yes.  I was using the notation from the Wiki
  https://gcc.gnu.org/wiki/MemRef

where offsetof (struct A, a[1].b[2]) == 66

I'd like to be able to determine that expr refers to the field
b of struct A, and more specifically, b[2], plus 3.  It's not
important what the index into the array a is, or any other
arrays on the way to b.

There is code in initializer folding that searches for a field in
a CONSTRUCTOR by base and offset.  There's no existing
helper that gives you exactly what you want -- I guess you'd
ideally want to have a path to the refered object.  But it may
be possible to follow what fold_ctor_reference does and build
such a helper.

Thanks.  I'll see what I can come up with if/when I get to it
in stage 1.


I realize the reference can be ambiguous in some cases (arrays
of structs with multiple array members) and so the result wouldn't
be guaranteed to be 100% reliable.  It would only be used in
diagnostics.  (I think with some effort the type of the MEM_REF
could be used to disambiguate the majority (though not all) of
these references in practice.)

Given you have the address of the MEM_REF in your example above
the type of the MEM_REF doesn't mean anything.

You're right, it doesn't always correspond to the type of
the member.  It does in some cases but those may be uncommon.
Too bad.

I think ambiguity only happens with unions given MEM_REF offsets
are constant.

Note that even the type of 'a' might not be correct as it may have had
a different dynamic type.

So not sure what context you are trying to use this in diagnostics.

Say I have a struct like this:

  struct A {
    char a[4], b[5];
  };

then in

  extern struct A *a;

  memset (&a[0].a[0] + 14, 0, 3);   // invalid

  memset (&a[1].b[0] + 1, 0, 3);    // valid

both references are the same:

   &MEM_REF[char*, (void *)a + 14];

and there's no way to unambiguously tell which member each refers
to, or even to distinguish the valid one from the other.  MEM_REF
makes the kind of analysis I'm interested in very difficult (or
impossible) to do reliably.

Being able to determine the member is useful in -Wrestrict where
rather than printing the offsets from the base object I'd like
to be able to print the offsets relative to the referenced
member.  Beyond -Wrestrict, identifying the member is key in
detecting writes that span multiple members (e.g., strcpy).
Those could (for example) overwrite a member that's a pointer
to a function and cause code injection.  As it is, GCC has no
way to do that because __builtin_object_size considers the
size of the entire enclosing object, not that of the member.
For the same reason: MEM_REF makes it impossible.

Martin

Reply via email to