Hi, answering my own question:
It looks that field_sensitivity was disabled for this particular
example. Because I was compiling with -O as opposed to -O2. (I disable
-O2 because I needed to parse gimple exactly as I wrote it, otherwise
some things might be optimized). After adding
--param=max-fields-for-field-sensitive=$val we have the correct
points-to solution.
c = { }
p1.0+64 = { }
p1.64+64 = { c }
p1.128+64 = { p1.0+64 }
main.clobber = { }
main.use = { p1.64+64 }
i_15 = { NONLOCAL } same as main.arg0
pc_28 = { c }
_27 = { NULL }
cast_32 = { c } same as pc_28
temp_33 = { p1.0+64 } same as p1.128+64
temp2_34 = { c }
Thanks!
On 28/09/2020 14:30, Erick Ochoa wrote:
On 28/09/2020 14:25, Erick Ochoa wrote:
Hi,
previously I sent an e-mail inquiring about the state of points-to
information of structure variables allocated in the heap. It was
brought to my attention that heap variables do not have a size to
model and therefore IPA-PTA is not able to provide field sensitivity.
I now understand better how field sensitivity is modeled in IPA-PTA
and the way size is needed in order to compute the correct solution.
However, I am now trying to compute the points-to analysis for pointer
expressions for stack allocated struct variables. I am trying to
answer the question:
What does `temp->f1` points to? For the following simple example
without heap allocated memory.
```c
struct A { char* f0; char *f1; struct A *f2;};
int __GIMPLE(startwith("ipa-pta"))
main (int argc, char * * argv)
{
struct A p1;
char * pc;
char c;
char *cast;
struct A*temp;
char *temp2;
int i;
int _27;
i_15 = 1;
pc = &c;
p1.f1 = pc;
p1.f2 = &p1;
_27 = 0;
cast = pc;
temp = p1.f2;
temp2 = temp->f1;
return _27;
}
```
There are two question I have regarding this example. The first one is
that IPA-PTA will determine that temp2 points to { c p1 } while I
think it should only point to { c } and I'm trying to understand why.
The second thing is that, I am still unsure how to get points-to
information for pointer expressions like temp->f1.
Details:
IPA-PTA correctly points out that the structure p1 and structure
pointer temp can point to both { c and p1 }
```
c = { }
p1 = { c p1 } same as temp_33
temp_33 = { c p1 }
```
I believe this is because p1 is a the whole struct variable, and
temp_33 is also modeling the whole struct variable. (in other words
*temp_33+64 points-to c, *temp_33+128 points-to p1. Note that nothing
is in field f0)
However, in the case of temp2, we have the following points-to
information:
```
temp2_34 = { c p1 }
```
which I believe is an over approximation. Looking at the constraints
generated, we see that temp2_34 was assigned the following constraint
temp2_34 = *temp_33 + 64
And that means that the method do_sd_constraint should have been used
to compute the correct points to information. Looking at the the
method, and adding some print statements, it is clear to me that the
problem with this imprecision is that temp_33 may point to { c } in
its second field.
Small correction: temp_33 may point to p1 in its third field.
However, isn't GCC supposed to take into account field
information in this case? I believe that in order to make this more
precise we need a change in the get_varinfo API to something that
takes into account offsets and gets the solution for pointer expressions.
Instead of this line
else if (v->may_have_pointers
&& add_graph_edge (graph, lhs, t))
flag |= bitmap_ior_into (sol, get_varinfo (t)->solution);
something like:
else if (v->may_have_pointers
&& add_graph_edge (graph, lhs, t))
flag |= bitmap_ior_into (sol, get_varinfo (t,
roffset)->solution);
This seems to me that it is already a known issue and it might be
described accurately by this comment.
TODO: Adding offsets to pointer-to-structures can be handled (IE
not punted
on and turned into anything), but isn't. You can just see what offset
inside the pointed-to struct it's going to access.
So, I just want to confirm, does this comment refer concretely to what
I'm trying to do? And does this mean that in order to accomplish an
API similar to what I described, would I need to create new constraint
variables? (One new constraint variable for each field in all pointer
to struct variables)
Thanks!