On 17/04/2019, Richard Biener <richard.guent...@gmail.com> wrote: > On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell <peter.sew...@cl.cam.ac.uk> > wrote: >> >> On Fri, 12 Apr 2019 at 15:51, Jeff Law <l...@redhat.com> wrote: >> > >> > On 4/2/19 2:11 AM, Peter Sewell wrote: >> > > Dear all, >> > > >> > > continuing the discussion from the 2018 GNU Tools Cauldron, we >> > > (the WG14 C memory object model study group) now >> > > have a detailed proposal for pointer provenance semantics, refining >> > > the "provenance not via integers (PNVI)" model presented there. >> > > This will be discussed at the ISO WG14 C standards committee at the >> > > end of April, and comments from the GCC community before then would >> > > be very welcome. The proposal reconciles the needs of existing code >> > > and the behaviour of existing compilers as well as we can, but it >> > > doesn't >> > > exactly match any of the latter, so we'd especially like to know >> > > whether >> > > it would be feasible to implement - our hope is that it would only >> > > require >> > > minor changes. It's presented in three documents: >> > > >> > > N2362 Moving to a provenance-aware memory model for C: proposal for >> > > C2x >> > > by the memory object model study group. Jens Gustedt, Peter Sewell, >> > > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker. >> > > This introduces the proposal and gives the proposed change to the >> > > standard >> > > text, presented as change-highlighted pages of the standard >> > > (though one might want to read the N2363 examples before going into >> > > that). >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf >> > > >> > > N2363 C provenance semantics: examples. >> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, >> > > Martin Uecker. >> > > This explains the proposal and its design choices with discussion of >> > > a >> > > series of examples. >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf >> > > >> > > N2364 C provenance semantics: detailed semantics. >> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes. >> > > This gives a detailed mathematical semantics for the proposal >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf >> > > >> > > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an >> > > executable version of the semantics, with a web interface that >> > > allows one to explore and visualise the behaviour of small test >> > > programs, stepping through and seeing the abstract-machine >> > > memory state including provenance information. N2363 compares >> > > the results of this for the example programs with gcc, clang, and icc >> > > results, though the tests are really intended as tests of the >> > > semantics >> > > rather than compiler tests, so one has to interpret this with care. >> > THanks. I just noticed this came up in EuroLLVM as well. Getting >> > some standards clarity in this space would be good. >> > >> > Richi is in the best position to cover for GCC, but I suspect he's >> > buried with gcc-9 issues as we approach the upcoming release. >> > Hopefully >> > he'll have time to review this once crunch time has past. I think more >> > than anything sanity checking the proposal's requirements vs what can >> > be >> > reasonably implmemented is most important at this stage. >> >> Indeed. We talked with him at the GNU cauldron, without uncovering >> any serious problems, but more detailed review from an implementability >> point of view would be great. For the UB mailing list we just made >> a brief plain-text summary of the proposal (leaving out all the examples >> and standards diff, and glossing over some details). I'll paste that >> in below in case it's helpful. The next WG14 meeting is the week of >> April 29; comments before then would be particularly useful if that's >> possible. >> >> best, >> Peter >> >> C pointer values are typically represented at runtime as simple >> concrete numeric values, but mainstream compilers routinely exploit >> information about the "provenance" of pointers to reason that they >> cannot alias, and hence to justify optimisations. This is >> long-standing practice, but exactly what it means (what programmers >> can rely on, and what provenance-based alias analysis is allowed to >> do), has never been nailed down. That's what the proposal does. >> >> >> The basic idea is to associate a *provenance* with every pointer >> value, identifying the original storage instance (or allocation, in >> other words) that the pointer is derived from. In more detail: >> >> - We take abstract-machine pointer values to be pairs (pi,a), adding a >> provenance pi, either @i where i is a storage instance ID, or the >> *empty* provenance, to their concrete address a. >> >> - On every storage instance creation (of objects with static, thread, >> automatic, and allocated storage duration), the abstract machine >> nondeterministically chooses a fresh storage instance ID i (unique >> across the entire execution), and the resulting pointer value >> carries that single storage instance ID as its provenance @i. >> >> - Provenance is preserved by pointer arithmetic that adds or subtracts >> an integer to a pointer. >> >> - At any access via a pointer value, its numeric address must be >> consistent with its provenance, with undefined behaviour >> otherwise. In particular: >> >> -- access via a pointer value which has provenance a single storage >> instance ID @i must be within the memory footprint of the >> corresponding original storage instance, which must still be >> live. >> >> -- all other accesses, including those via a pointer value with >> empty provenance, are undefined behaviour. >> >> Regarding such accesses as undefined behaviour is necessary to make >> optimisation based on provenance alias analysis sound: if the standard >> did define behaviour for programs that make provenance-violating >> accesses, e.g.~by adopting a concrete semantics, optimisation based on >> provenance-aware alias analysis would not be sound. In other words, >> the provenance lets one distinguish a one-past pointer from a pointer >> to the start of an adjacently-allocated object, which otherwise are >> indistinguishable. >> >> All this is for the C abstract machine as defined in the standard: >> compilers might rely on provenance in their alias analysis and >> optimisation, but one would not expect normal implementations to >> record or manipulate provenance at runtime (though dynamic or static >> analysis tools might). >> >> >> Then, to support low-level systems programming, C provides many other >> ways to construct and manipulate pointer values: >> >> - casts of pointers to integer types and back, possibly with integer >> arithmetic, e.g.~to force alignment, or to store information in >> unused bits of pointers; >> >> - copying pointer values with memcpy; >> >> - manipulation of the representation bytes of pointers, e.g.~via user >> code that copies them via char* or unsigned char* accesses; >> >> - type punning between pointer and integer values; >> >> - I/O, using either fprintf/fscanf and the %p format, fwrite/fread on >> the pointer representation bytes, or pointer/integer casts and >> integer I/O; >> >> - copying pointer values with realloc; and >> >> - constructing pointer values that embody knowledge established from >> linking, and from constants that represent the addresses of >> memory-mapped devices. >> >> >> A satisfactory semantics has to address all these, together with the >> implications on optimisation. We've explored several, but our main >> proposal is "PNVI-ae-udi" (provenance not via integers, >> address-exposed, user-disambiguation). >> >> This semantics does not track provenance via integers. Instead, at >> integer-to-pointer cast points, it checks whether the given address >> points within a live object that has previously been *exposed* and, if >> so, recreates the corresponding provenance. >> >> A storage instance is deemed exposed by a cast of a pointer to it to >> an integer type, by a read (at non-pointer type) of the representation >> of the pointer, or by an output of the pointer using %p. > > So this is not what GCC implements which tracks provenance through > non-pointer types to a limited extent when only copying is taking place. > > Your proposal makes > > int a, b; > int *p = &a; > int *q = &b; > uintptr_t pi = (uintptr_t)p; //expose > uintptr_t qi = (uintptr_t)q; //expose > pi += 4; > if (pi == qi) > *(int *)pi = 1; > > well-defined since (int *)pi now has the provenance of &b.
Yes. (Just to be clear: it's not that we think the above example is desirable in itself, but it's well-defined as a consequence of what we do to make other common idioms, eg pointer bit manipulation, well-defined.) > Note GCC, when tracking provenance of non-pointer type > adds like in > > int *p = &a; > uintptr_t pi = (uintptr_t)p; > pi += 4; > > considers pi to have provenance "anything" (not sure if you > have something like that) since we add 4 which has provenance > "anything" to pi which has provenance &a. We don't at present have a provenance "anything", but if the gcc "anything" means that it's assumed that it might alias with anything, then it looks like gcc's implementing a sound approximation to the proposal here? best, Peter >> The user-disambiguation refinement adds some complexity but supports >> roundtrip casts, from pointer to integer and back, of pointers that >> are one-past a storage instance. >