Hi,
previously I sent an e-mail inquiring about the state of points-to
information of structure variables allocated in the heap. It was brought
to my attention that heap variables do not have a size to model and
therefore IPA-PTA is not able to provide field sensitivity.
I now understand better how field sensitivity is modeled in IPA-PTA and
the way size is needed in order to compute the correct solution.
However, I am now trying to compute the points-to analysis for pointer
expressions for stack allocated struct variables. I am trying to answer
the question:
What does `temp->f1` points to? For the following simple example without
heap allocated memory.
```c
struct A { char* f0; char *f1; struct A *f2;};
int __GIMPLE(startwith("ipa-pta"))
main (int argc, char * * argv)
{
struct A p1;
char * pc;
char c;
char *cast;
struct A*temp;
char *temp2;
int i;
int _27;
i_15 = 1;
pc = &c;
p1.f1 = pc;
p1.f2 = &p1;
_27 = 0;
cast = pc;
temp = p1.f2;
temp2 = temp->f1;
return _27;
}
```
There are two question I have regarding this example. The first one is
that IPA-PTA will determine that temp2 points to { c p1 } while I think
it should only point to { c } and I'm trying to understand why. The
second thing is that, I am still unsure how to get points-to information
for pointer expressions like temp->f1.
Details:
IPA-PTA correctly points out that the structure p1 and structure pointer
temp can point to both { c and p1 }
```
c = { }
p1 = { c p1 } same as temp_33
temp_33 = { c p1 }
```
I believe this is because p1 is a the whole struct variable, and temp_33
is also modeling the whole struct variable. (in other words *temp_33+64
points-to c, *temp_33+128 points-to p1. Note that nothing is in field f0)
However, in the case of temp2, we have the following points-to information:
```
temp2_34 = { c p1 }
```
which I believe is an over approximation. Looking at the constraints
generated, we see that temp2_34 was assigned the following constraint
temp2_34 = *temp_33 + 64
And that means that the method do_sd_constraint should have been used to
compute the correct points to information. Looking at the the method,
and adding some print statements, it is clear to me that the problem
with this imprecision is that temp_33 may point to { c } in its second
field. However, isn't GCC supposed to take into account field
information in this case? I believe that in order to make this more
precise we need a change in the get_varinfo API to something that takes
into account offsets and gets the solution for pointer expressions.
Instead of this line
else if (v->may_have_pointers
&& add_graph_edge (graph, lhs, t))
flag |= bitmap_ior_into (sol, get_varinfo (t)->solution);
something like:
else if (v->may_have_pointers
&& add_graph_edge (graph, lhs, t))
flag |= bitmap_ior_into (sol, get_varinfo (t,
roffset)->solution);
This seems to me that it is already a known issue and it might be
described accurately by this comment.
TODO: Adding offsets to pointer-to-structures can be handled (IE not
punted
on and turned into anything), but isn't. You can just see what offset
inside the pointed-to struct it's going to access.
So, I just want to confirm, does this comment refer concretely to what
I'm trying to do? And does this mean that in order to accomplish an API
similar to what I described, would I need to create new constraint
variables? (One new constraint variable for each field in all pointer to
struct variables)
Thanks!