On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell <peter.sew...@cl.cam.ac.uk> wrote:
>
> On Fri, 12 Apr 2019 at 15:51, Jeff Law <l...@redhat.com> wrote:
> >
> > On 4/2/19 2:11 AM, Peter Sewell wrote:
> > > Dear all,
> > >
> > > continuing the discussion from the 2018 GNU Tools Cauldron, we
> > > (the WG14 C memory object model study group) now
> > > have a detailed proposal for pointer provenance semantics, refining
> > > the "provenance not via integers (PNVI)" model presented there.
> > > This will be discussed at the ISO WG14 C standards committee at the
> > > end of April, and comments from the GCC community before then would
> > > be very welcome.   The proposal reconciles the needs of existing code
> > > and the behaviour of existing compilers as well as we can, but it doesn't
> > > exactly match any of the latter, so we'd especially like to know whether
> > > it would be feasible to implement - our hope is that it would only require
> > > minor changes.  It's presented in three documents:
> > >
> > > N2362  Moving to a provenance-aware memory model for C: proposal for C2x
> > > by the memory object model study group.  Jens Gustedt, Peter Sewell,
> > > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
> > > This introduces the proposal and gives the proposed change to the standard
> > > text, presented as change-highlighted pages of the standard
> > > (though one might want to read the N2363 examples before going into that).
> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
> > >
> > > N2363  C provenance semantics: examples.
> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin 
> > > Uecker.
> > > This explains the proposal and its design choices with discussion of a
> > > series of examples.
> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
> > >
> > > N2364  C provenance semantics: detailed semantics.
> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
> > > This gives a detailed mathematical semantics for the proposal
> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
> > >
> > > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
> > > executable version of the semantics, with a web interface that
> > > allows one to explore and visualise the behaviour of small test
> > > programs, stepping through and seeing the abstract-machine
> > > memory state including provenance information.   N2363 compares
> > > the results of this for the example programs with gcc, clang, and icc
> > > results, though the tests are really intended as tests of the semantics
> > > rather than compiler tests, so one has to interpret this with care.
> > THanks.  I just noticed this came up in EuroLLVM as well.    Getting
> > some standards clarity in this space would be good.
> >
> > Richi is in the best position to cover for GCC, but I suspect he's
> > buried with gcc-9 issues as we approach the upcoming release.  Hopefully
> > he'll have time to review this once crunch time has past.  I think more
> > than anything sanity checking the proposal's requirements vs what can be
> > reasonably implmemented is most important at this stage.
>
> Indeed.  We talked with him at the GNU cauldron, without uncovering
> any serious problems, but more detailed review from an implementability
> point of view would be great.   For the UB mailing list we just made
> a brief plain-text summary of the proposal (leaving out all the examples
> and standards diff, and glossing over some details).  I'll paste that
> in below in case it's helpful.  The next WG14 meeting is the week of
> April 29; comments before then would be particularly useful if that's 
> possible.
>
> best,
> Peter
>
> C pointer values are typically represented at runtime as simple
> concrete numeric values, but mainstream compilers routinely exploit
> information about the "provenance" of pointers to reason that they
> cannot alias, and hence to justify optimisations.  This is
> long-standing practice, but exactly what it means (what programmers
> can rely on, and what provenance-based alias analysis is allowed to
> do), has never been nailed down.   That's what the proposal does.
>
>
> The basic idea is to associate a *provenance* with every pointer
> value, identifying the original storage instance (or allocation, in
> other words) that the pointer is derived from.  In more detail:
>
> - We take abstract-machine pointer values to be pairs (pi,a), adding a
>   provenance pi, either @i where i is a storage instance ID, or the
>   *empty* provenance, to their concrete address a.
>
> - On every storage instance creation (of objects with static, thread,
>   automatic, and allocated storage duration), the abstract machine
>   nondeterministically chooses a fresh storage instance ID i (unique
>   across the entire execution), and the resulting pointer value
>   carries that single storage instance ID as its provenance @i.
>
> - Provenance is preserved by pointer arithmetic that adds or subtracts
>   an integer to a pointer.
>
> - At any access via a pointer value, its numeric address must be
>   consistent with its provenance, with undefined behaviour
>   otherwise. In particular:
>
>   -- access via a pointer value which has provenance a single storage
>      instance ID @i must be within the memory footprint of the
>      corresponding original storage instance, which must still be
>      live.
>
>   -- all other accesses, including those via a pointer value with
>      empty provenance, are undefined behaviour.
>
> Regarding such accesses as undefined behaviour is necessary to make
> optimisation based on provenance alias analysis sound: if the standard
> did define behaviour for programs that make provenance-violating
> accesses, e.g.~by adopting a concrete semantics, optimisation based on
> provenance-aware alias analysis would not be sound.  In other words,
> the provenance lets one distinguish a one-past pointer from a pointer
> to the start of an adjacently-allocated object, which otherwise are
> indistinguishable.
>
> All this is for the C abstract machine as defined in the standard:
> compilers might rely on provenance in their alias analysis and
> optimisation, but one would not expect normal implementations to
> record or manipulate provenance at runtime (though dynamic or static
> analysis tools might).
>
>
> Then, to support low-level systems programming, C provides many other
> ways to construct and manipulate pointer values:
>
> - casts of pointers to integer types and back, possibly with integer
>   arithmetic, e.g.~to force alignment, or to store information in
>   unused bits of pointers;
>
> - copying pointer values with memcpy;
>
> - manipulation of the representation bytes of pointers, e.g.~via user
>   code that copies them via char* or unsigned char* accesses;
>
> - type punning between pointer and integer values;
>
> - I/O, using either fprintf/fscanf and the %p format, fwrite/fread on
>   the pointer representation bytes, or pointer/integer casts and
>   integer I/O;
>
> - copying pointer values with realloc; and
>
> - constructing pointer values that embody knowledge established from
>   linking, and from constants that represent the addresses of
>   memory-mapped devices.
>
>
> A satisfactory semantics has to address all these, together with the
> implications on optimisation.  We've explored several, but our main
> proposal is "PNVI-ae-udi" (provenance not via integers,
> address-exposed, user-disambiguation).
>
> This semantics does not track provenance via integers.  Instead, at
> integer-to-pointer cast points, it checks whether the given address
> points within a live object that has previously been *exposed* and, if
> so, recreates the corresponding provenance.
>
> A storage instance is deemed exposed by a cast of a pointer to it to
> an integer type, by a read (at non-pointer type) of the representation
> of the pointer, or by an output of the pointer using %p.

So this is not what GCC implements which tracks provenance through
non-pointer types to a limited extent when only copying is taking place.

Your proposal makes

 int a, b;
 int *p = &a;
 int *q = &b;
 uintptr_t pi = (uintptr_t)p; //expose
 uintptr_t qi = (uintptr_t)q; //expose
 pi += 4;
 if (pi == qi)
   *(int *)pi = 1;

well-defined since (int *)pi now has the provenance of &b.

Note GCC, when tracking provenance of non-pointer type
adds like in

  int *p = &a;
  uintptr_t pi = (uintptr_t)p;
  pi += 4;

considers pi to have provenance "anything" (not sure if you
have something like that) since we add 4 which has provenance
"anything" to pi which has provenance &a.

> The user-disambiguation refinement adds some complexity but supports
> roundtrip casts, from pointer to integer and back, of pointers that
> are one-past a storage instance.

Reply via email to