On Thu, Apr 18, 2019 at 1:57 PM Uecker, Martin
<martin.uec...@med.uni-goettingen.de> wrote:
>
> Am Donnerstag, den 18.04.2019, 11:56 +0200 schrieb Richard Biener:
> > On Thu, Apr 18, 2019 at 11:31 AM Richard Biener
> > <richard.guent...@gmail.com> wrote:
> > >
> > > On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin
> > > <martin.uec...@med.uni-goettingen.de> wrote:
> > > >
> > > > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener:
> > > > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
> > > > > <martin.uec...@med.uni-goettingen.de> wrote:
>
> ....
> > > > Let's consider this example:
> > > >
> > > > int x;
> > > > int y;
> > > > uintptr_t pi = (uintptr_t)&x;
> > > > uintptr_t pj = (uintptr_t)&y;
> > > >
> > > > if (pi + 4 == pj) {
> > > >
> > > >    int* p = (int*)pj; // can be one-after pointer of 'x'
> > > >    p[-1] = 1;         // well defined?
> > > > }
> > > >
> > > > If I understand correctly, a pointer obtained from
> > > > pi + 4 would have a "anything" provenance (which is
> > > > fine). But the pointer obtained from 'pj' would have the
> > > > provenance of 'y' so the access to 'x' would not
> > > > be allowed.
> > >
> > > Correct.  This is the most difficult case for us to handle
> > > exactly also because (also valid for the proposal?)
> > >
> > > int x;
> > > int y;
> > > uintptr_t pi = (uintptr_t)&x;
> > > uintptr_t pj = (uintptr_t)&y;
> > >
> > > if (pi + 4 == pj) {
> > >
> > >    int* p = (int*)(pi + 4); // can be one-after pointer of 'x'
> > >    p[-1] = 1;         // well defined?
> > > }
> > >
> > > while well-handled by GCC in the written form (as you
> > > say, pi + 4 yields "anything" provenance), GCC itself
> > > may tranform it into the first variant by noticing
> > > the conditional equivalence and substituting pj for
> > > pi + 4.
>
> Integers are just integers in the proposal, so conditional
> equivalence is not a problem for them. In my opinion this
> is a strength of the proposal. Tracking provenance for
> integers would mean that all computations would be affected
> by such subtle semantics issues (where you can not even
> replace an integer by an equivalent one). In this
> proposal this is limited to pointers where it at least
> makes some sense.
>
> > > > But according to the preferred version of
> > > > our proposal, the pointer could also be used to
> > > > access 'x' because it is also exposed.
> > > >
> > > > GCC could make pj have a "anything" provenance
> > > > even though it is not modified. (This would break
> > > > some optimization such as the one for Matlab.)
> > > >
> > > > Maybe one could also refine this optimization to check
> > > > for additional conditions which rule out the case
> > > > that there is another object the pointer could point
> > > > to.
> > >
> > > The only feasible solution would be to not track
> > > provenance through non-pointers and make
> > > conversions of non-pointers to pointers have
> > > "anything" provenance.
>
> This would be one solution, yes. But you could
> reattach the same provenance if you know that the
> pointer points in the middle of an object (so is
> not a first or one-after pointer) or if you know
> that there is no exposed object directly adjacent
> to this object, etc..
>
> > > The additional issue that appears here though
> > > is that we cannot even turn (int *)(uintptr_t)p
> > > into p anymore since with the conditional
> > > substitution we can then still arrive at
> > > effectively (&y)[-1] = 1 which is of course
> > > undefined behavior.
> > >
> > > That is, your proposal makes
> > >
> > >  ((int *)(uintptr_t)&y)[-1] = 1
> > >
> > > well-defined (if &y - 1 == &x) but keeps
> > >
> > >   (&y)[-1] = 1
> > >
> > > as undefined which strikes me as a little bit
> > > inconsistent.  If that's true it's IMHO worth
> > > a defect report and second consideration.
>
> This is true. But I would not call it inconsistent.
> It is just unusual if you expect that casts to integers
> and back are no-ops.  In this proposal a round-trip has
> the effect of stripping the original provenance and
> attaching a new one (which could be the same as the
> old one).

Well, the standard explicitely says that if you convert
a pointer to an integer (with the same or more precision)
and back you get the same pointer back.  That suggests
(int *)(uintptr_t)&y is a semantical no-op?

> While in this specific scenario this might seem
> unreasonable, there are other examples where you may
> want to be able to get from one object to the others.
> and using casts to integers would then be the
> blessed way to express this.

Sure, no arguing about this.  Sofar this all has been in
the hands of implementors to make uses of this idiom work,
now users will be able to wield the standards sword :/

> In my opinion, this is also intuitive:
> By casting to an integer one then gets simple discrete
> pointer semantics where one does not have provenance.
>
>
> > Similarly that
> >
> > int x;
> > int y;
> > uintptr_t pj = (uintptr_t)&y;
> >
> > if (&x + 1 == &y) {
> >
> >    int* p = (int*)pj; // can be one-after pointer of 'x'
> >    p[-1] = 1;         // well defined?
> > }
> >
> > is undefined but when I add a no-op
> >
> >  (uintptr_t)&x;
> >
> > it is well-defined is undesirable.  Can this no-op
> > stmt appear in another function?  Or even in
> > another translation unit (if x and y are global variables)?
> > And does such stmt have to be present (in another
> > TU) to make the example valid in this case?
>
> Without that statement, the example is not valid as the
> address of 'x' is not exposed. With the statement this
> becomes valid and it does not matter where this statement
> appears. Again, I agree that he fact that such a statement
> has a side-effect is something one needs to get used to.
>
> But adress-taken already has side-effect which could be
> surprising, doesn't it? If I understood your answer
> above correctly, for GCC you get this side-effect already
> without the cast:
>
> &x;

Well, yes.  But for GCC the important issue is whether
this address-taking is still done after optimization
(at the point we use provenance info to compute points-to sets).
So this plain stmt wouldn't survive and would not make
the example valid.  It's of course a lot harder to write this
down into standard wording ;)  (if not impossible...)

I guess there as to be a data dependence between an address-taken
operation and recreating that address (or a derived one to the same
object).  That is, we're trying to support delta-compressing pointers
as often used in shared memory data structures.

But as you've seen already conditional "dependences" are prone
to break.

> For the statement to appear elsewhere, the address must
> escape first. I would expect a compiler to treat a
> cast to an integer identically to an escaped address.

Sure, (uintptr_t)&a also takes the address of a and passing
that integer to a function makes the address of the object a
escape.

> > To me all this makes requiring exposal through a cast
> > to a non-pointer (or accessing its representation) not
> > in any way more "useful" for an optimizing compiler than
> > modeling exposal through address-taking.
>
> There would be a difference for cases like this:
>
> int x[3];
> int y;
>
> x[0] = 1;
> uintptr_t pj = (uintptr_t)&y;
>
> if (pi + 4 == pj) {
>
>   int* p = (int*)(pi + 4);
>   p[-1] = 1;
> }
>
> Here 'x' is not exposed in our proposal so the assignment
> via 'p' is invalid but the address is taken implicitly.

Via the x[0] - yes.  Unfortunate details of the C standard ;)

> Other examples is storage allocated via malloc/alloca
> where there is always a pointer involved but which is
> not automatically exposed in our proposal.

True, but the compiler nevertheless has to assume it is exposed
once that pointer escapes the current function (or TU).  It's
hard to make the validity decision at parsing time and at
optimization time a stmt like

 (uintptr_t)ptr;

is gone very quickly.

Richard.

>
> Best,
> Martin
>
>
>

Reply via email to