Hi, answering my own question:

It looks that field_sensitivity was disabled for this particular example. Because I was compiling with -O as opposed to -O2. (I disable -O2 because I needed to parse gimple exactly as I wrote it, otherwise some things might be optimized). After adding --param=max-fields-for-field-sensitive=$val we have the correct points-to solution.

c = { }
p1.0+64 = { }
p1.64+64 = { c }
p1.128+64 = { p1.0+64 }
main.clobber = { }
main.use = { p1.64+64 }
i_15 = { NONLOCAL } same as main.arg0
pc_28 = { c }
_27 = { NULL }
cast_32 = { c } same as pc_28
temp_33 = { p1.0+64 } same as p1.128+64
temp2_34 = { c }

Thanks!


On 28/09/2020 14:30, Erick Ochoa wrote:


On 28/09/2020 14:25, Erick Ochoa wrote:
Hi,

previously I sent an e-mail inquiring about the state of points-to information of structure variables allocated in the heap. It was brought to my attention that heap variables do not have a size to model and therefore IPA-PTA is not able to provide field sensitivity.

I now understand better how field sensitivity is modeled in IPA-PTA and the way size is needed in order to compute the correct solution. However, I am now trying to compute the points-to analysis for pointer expressions for stack allocated struct variables. I am trying to answer the question:

What does `temp->f1` points to? For the following simple example without heap allocated memory.

```c
struct A { char* f0; char *f1; struct A *f2;};

int __GIMPLE(startwith("ipa-pta"))
main (int argc, char * * argv)
{
   struct A p1;
   char * pc;
   char c;
   char *cast;
   struct A*temp;
   char *temp2;
   int i;
   int _27;

   i_15 = 1;
   pc = &c;
   p1.f1 = pc;
   p1.f2 = &p1;
   _27 = 0;
   cast = pc;
   temp = p1.f2;
   temp2 = temp->f1;
   return _27;
}
```

There are two question I have regarding this example. The first one is that IPA-PTA will determine that temp2 points to { c p1 } while I think it should only point to { c } and I'm trying to understand why. The second thing is that, I am still unsure how to get points-to information for pointer expressions like temp->f1.

Details:

IPA-PTA correctly points out that the structure p1 and structure pointer temp can point to both { c and p1 }

```
c = { }
p1 = { c p1 } same as temp_33
temp_33 = { c p1 }
```

I believe this is because p1 is a the whole struct variable, and temp_33 is also modeling the whole struct variable. (in other words *temp_33+64 points-to c, *temp_33+128 points-to p1. Note that nothing is in field f0)

However, in the case of temp2, we have the following points-to information:


```
temp2_34 = { c p1 }
```

which I believe is an over approximation. Looking at the constraints generated, we see that temp2_34 was assigned the following constraint

temp2_34 = *temp_33 + 64

And that means that the method do_sd_constraint should have been used to compute the correct points to information. Looking at the the method, and adding some print statements, it is clear to me that the problem with this imprecision is that temp_33 may point to { c } in its second field.

Small correction: temp_33 may point to p1 in its third field.

However, isn't GCC supposed to take into account field
information in this case? I believe that in order to make this more precise we need a change in the get_varinfo API to something that takes into account offsets and gets the solution for pointer expressions.

Instead of this line
           else if (v->may_have_pointers
                    && add_graph_edge (graph, lhs, t))
             flag |= bitmap_ior_into (sol, get_varinfo (t)->solution);

something like:

           else if (v->may_have_pointers
                    && add_graph_edge (graph, lhs, t))
             flag |= bitmap_ior_into (sol, get_varinfo (t, roffset)->solution);

This seems to me that it is already a known issue and it might be described accurately by this comment.

   TODO: Adding offsets to pointer-to-structures can be handled (IE not punted
   on and turned into anything), but isn't.  You can just see what offset
   inside the pointed-to struct it's going to access.

So, I just want to confirm, does this comment refer concretely to what I'm trying to do? And does this mean that in order to accomplish an API similar to what I described, would I need to create new constraint variables? (One new constraint variable for each field in all pointer to struct variables)

Thanks!

Reply via email to