[Bug tree-optimization/65752] Too strong optimizations int -> pointer casts

2015-11-16 Thread jeehoon.kang at sf dot snu.ac.kr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752

Jeehoon Kang  changed:

   What|Removed |Added

 CC||jeehoon.kang at sf dot 
snu.ac.kr

--- Comment #43 from Jeehoon Kang  ---
(In reply to Alexander Cherepanov from comment #38)
> The evident solution is to not apply this optimization when provenance info
> of the two variables differs. I guess for most integers it will be the same.

IMO tracking provenance info is not a good idea, since it is really
complicated.  First, since integers and pointers can be casted to each other,
not only pointers but also integers should carry provenance information. 
Second, tracking provenance info may work for simple examples, but it is even
hard to define the provenance info itself for complex expressions.  For e.g.,
what is the provenance of "e1-e2"?  "2*e"?  "e1 XOR e2"?  "e1 * e2"?  (even
given the provenance info for integer expressions "e*")



I would rather prefer marking pointers casted to integers as *escaped*, and
forgetting about the provenance at all.  Here are several reasons why this
works well:

- Standard optimizations are supported.  Say we want to support the following
constant propagation example:

char f() {
  char a = '0';
  g();  // unknown function; may guess the address of "a" and
// try to access it (but it is always unsuccessful)

  return a; // -> return '0'
}

  Since the address of "a" is not casted to integers, "a" is private to the
function "f" (i.e., not escaped from "f"), and "g" cannot access "a".  So we
know "a = 0" at the return.


- semantics is simple.  No need to track the provenance info for variables. 
Once a pointer is casted to integers, it is just integers without any tracked
information.

  As a result, the standard integer optimizations of our interest, as the
following, are fully supported:

  if (x != y) x = y;   ->   x = y;


- Performance degradation due to "casted pointers as escaped" is insignificant.

  Morally, if a pointer is casted to an integer, the address is regarded as
"global": having the integer value of the pointer means you can access the
pointer.  So there will be not much optimization opportunity (or intent) for
those pointers casted to integers.

  Of course, this argument should be validated by some experiment; yet I am
quite convinced it is the case that the performance degradation is
insignificant.



I would like to ask how you think about this suggestion.  Note that my argument
here is based on my paper on this issue, where you can find the formal memory
model we proposed, proofs that optimization examples are correct, and reasoning
principle for proving optimizations (see the paper and the slide):

  A Formal C Memory Model Supporting Integer-Pointer Casts.
  Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve
Zdancewic, Viktor Vafeiadis.
  PLDI 2015.
  http://sf.snu.ac.kr/intptrcast/

[Bug tree-optimization/65752] Too strong optimizations int -> pointer casts

2015-11-16 Thread jeehoon.kang at sf dot snu.ac.kr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752

--- Comment #45 from Jeehoon Kang  ---
> I think this is not true.  For example with MatLab (might be sth else,
> if I don't remember correctly) you are required to pass pointers to
> arrays in two halves in double(!) values (I believe the only function
> argument type they support).  GCC happily makes points-to analysis work 
> through those.

Thank you for giving me an example.  Yet, I think it is a little bit
unfortunate for MatLab (or sth else) to pass pointers by packing two into a
double, at least due to the readability problem.  I think it is beyond the
intended usage of the C/C++ language, but I understand that GCC is the
time-honored compiler for various software and systems.



> The other (unfortunate) thing is that in GCC pointer subtraction
> is always performed on integers, thus for the C source code
> 
>  int idx = ptr1 - ptr2;
> 
> we internally have sth like
> 
>  int idx = ((long)ptr1 - (long)ptr2) / 4;
> 
> so you can't really treat pointers as "escaped" here without loss.

Thank you for giving me the information.  I don't know the GCC internals, so I
would like to ask how much it would cost to introduce the syntax for pointer
subtractions.  I hope it is not that huge, but I really don't have any idea.



> Note that we've had this (kind of) in the past and it tried to go
> without making pointers escaped at these points but only consider
> casts from integers to pointers as pointing to anything.  But
> that's wrong correctness wise (not then treating the cast to integer
> as escape point).

Treating the cast to integer as escape point is proven-correct by a
machine-checked proof (in Coq) for various standard optimization examples, such
as CP, DCE, dead allocation elimination, etc.  For more detail, please see the
paper above-mentioned.



> I also don't think doing the above would solve the cases of equality
> compares of pointes themselves.  (hopefully undefined)

The formal memory model in the paper I mentioned invokes undefined behavior for
the pointer equality comparison example above.  In the formal model, a pointer
is represented as a pair of a memory block identifier (l) and an offset (o). 
(cf. the CompCert memory model)  When a memory is malloc-ed or alloca-ed, a new
memory block identifier is assigned.  A pointer equality, say of (l, o) and
(l', o'), invokes undefined behavior when l != l'.

So for the following example (by Alexander Cherepanov):

#include 
#include 

int main() {
   int y, x = 0;
   int *volatile v = &x;
   int *xp = v;
   int *i = &y + 1;

   if (xp != i) {
 printf("hello\n");
 xp = i;
   }

   printf("%d\n", xp == &x);
}

Say y and x are allocated at l1 and l2, respectively.  Then xp = (l2, 0), and i
= (l1, 4).  Thus comparing xp and i invokes undefined behavior, since l1 != l2.