On 02/16/2018 04:22 AM, Richard Biener wrote:
On Thu, Feb 15, 2018 at 6:28 PM, Martin Sebor <mse...@gmail.com> wrote:
There are APIs to determine the base object and an offset
into it from all sorts of expressions, including ARRAY_REF,
COMPONENT_REF, and MEM_REF, but none of those I know about
makes it also possible to discover the member being referred
to.
Is there an API that I'm missing or a combination of calls
to some that would let me determine the (approximate) member
and/or element of an aggregate from a MEM_REF expression,
plus the offset from its beginning?
Say, given
struct A
{
void *p;
char b[3][9];
} a[2];
and an expression like
a[1].b[2] + 3
represented as the expr
MEM_REF (char[9], a, 69)
&MEM_REF (&a, 69)
you probably mean.
Yes. I was using the notation from the Wiki
https://gcc.gnu.org/wiki/MemRef
where offsetof (struct A, a[1].b[2]) == 66
I'd like to be able to determine that expr refers to the field
b of struct A, and more specifically, b[2], plus 3. It's not
important what the index into the array a is, or any other
arrays on the way to b.
There is code in initializer folding that searches for a field in
a CONSTRUCTOR by base and offset. There's no existing
helper that gives you exactly what you want -- I guess you'd
ideally want to have a path to the refered object. But it may
be possible to follow what fold_ctor_reference does and build
such a helper.
Thanks. I'll see what I can come up with if/when I get to it
in stage 1.
I realize the reference can be ambiguous in some cases (arrays
of structs with multiple array members) and so the result wouldn't
be guaranteed to be 100% reliable. It would only be used in
diagnostics. (I think with some effort the type of the MEM_REF
could be used to disambiguate the majority (though not all) of
these references in practice.)
Given you have the address of the MEM_REF in your example above
the type of the MEM_REF doesn't mean anything.
You're right, it doesn't always correspond to the type of
the member. It does in some cases but those may be uncommon.
Too bad.
I think ambiguity only happens with unions given MEM_REF offsets
are constant.
Note that even the type of 'a' might not be correct as it may have had
a different dynamic type.
So not sure what context you are trying to use this in diagnostics.
Say I have a struct like this:
struct A {
char a[4], b[5];
};
then in
extern struct A *a;
memset (&a[0].a[0] + 14, 0, 3); // invalid
memset (&a[1].b[0] + 1, 0, 3); // valid
both references are the same:
&MEM_REF[char*, (void *)a + 14];
and there's no way to unambiguously tell which member each refers
to, or even to distinguish the valid one from the other. MEM_REF
makes the kind of analysis I'm interested in very difficult (or
impossible) to do reliably.
Being able to determine the member is useful in -Wrestrict where
rather than printing the offsets from the base object I'd like
to be able to print the offsets relative to the referenced
member. Beyond -Wrestrict, identifying the member is key in
detecting writes that span multiple members (e.g., strcpy).
Those could (for example) overwrite a member that's a pointer
to a function and cause code injection. As it is, GCC has no
way to do that because __builtin_object_size considers the
size of the entire enclosing object, not that of the member.
For the same reason: MEM_REF makes it impossible.
Martin