Re: C provenance semantics proposal
On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin wrote: > > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener: > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin > > wrote: > > > > > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener: > > > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin > > > > wrote: > > > > > > > > > > > Since > > > > > > your proposal is based on an abstract machine there isn't anything > > > > > > like a pointer with multiple provenances (which "anything" is), just > > > > > > pointers with no provenance (pointing outside of any object), right? > > > > > > > > > > This is correct. What the proposal does though is put a limit > > > > > on where pointers obtained from integers are allowed to point > > > > > to: They cannot point to non-exposed objects. I assume GCC > > > > > "anything" provenances also cannot point to all possible > > > > > objects. > > > > > > > > Yes. We exclude objects that do not have their address taken > > > > though (so somewhat similar to your "exposed"). > > > > > > Also if the address never escapes? > > > > Yes. > > Then with respect to "expose" it seems GCC implements > a superset which means it allows some behavior which > is undefined according to the proposal. So all seems > well with respect to this part. > > > With respect to tracking provenance through integers > some changes might be required. > > Let's consider this example: > > int x; > int y; > uintptr_t pi = (uintptr_t)&x; > uintptr_t pj = (uintptr_t)&y; > > if (pi + 4 == pj) { > >int* p = (int*)pj; // can be one-after pointer of 'x' >p[-1] = 1; // well defined? > } > > If I understand correctly, a pointer obtained from > pi + 4 would have a "anything" provenance (which is > fine). But the pointer obtained from 'pj' would have the > provenance of 'y' so the access to 'x' would not > be allowed. Correct. This is the most difficult case for us to handle exactly also because (also valid for the proposal?) int x; int y; uintptr_t pi = (uintptr_t)&x; uintptr_t pj = (uintptr_t)&y; if (pi + 4 == pj) { int* p = (int*)(pi + 4); // can be one-after pointer of 'x' p[-1] = 1; // well defined? } while well-handled by GCC in the written form (as you say, pi + 4 yields "anything" provenance), GCC itself may tranform it into the first variant by noticing the conditional equivalence and substituting pj for pi + 4. > But according to the preferred version of > our proposal, the pointer could also be used to > access 'x' because it is also exposed. > > GCC could make pj have a "anything" provenance > even though it is not modified. (This would break > some optimization such as the one for Matlab.) > > Maybe one could also refine this optimization to check > for additional conditions which rule out the case > that there is another object the pointer could point > to. The only feasible solution would be to not track provenance through non-pointers and make conversions of non-pointers to pointers have "anything" provenance. The additional issue that appears here though is that we cannot even turn (int *)(uintptr_t)p into p anymore since with the conditional substitution we can then still arrive at effectively (&y)[-1] = 1 which is of course undefined behavior. That is, your proposal makes ((int *)(uintptr_t)&y)[-1] = 1 well-defined (if &y - 1 == &x) but keeps (&y)[-1] = 1 as undefined which strikes me as a little bit inconsistent. If that's true it's IMHO worth a defect report and second consideration. Richard. > Best, > Martin
Re: C provenance semantics proposal
On Thu, Apr 18, 2019 at 11:31 AM Richard Biener wrote: > > On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin > wrote: > > > > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener: > > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin > > > wrote: > > > > > > > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener: > > > > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin > > > > > wrote: > > > > > > > > > > > > > Since > > > > > > > your proposal is based on an abstract machine there isn't anything > > > > > > > like a pointer with multiple provenances (which "anything" is), > > > > > > > just > > > > > > > pointers with no provenance (pointing outside of any object), > > > > > > > right? > > > > > > > > > > > > This is correct. What the proposal does though is put a limit > > > > > > on where pointers obtained from integers are allowed to point > > > > > > to: They cannot point to non-exposed objects. I assume GCC > > > > > > "anything" provenances also cannot point to all possible > > > > > > objects. > > > > > > > > > > Yes. We exclude objects that do not have their address taken > > > > > though (so somewhat similar to your "exposed"). > > > > > > > > Also if the address never escapes? > > > > > > Yes. > > > > Then with respect to "expose" it seems GCC implements > > a superset which means it allows some behavior which > > is undefined according to the proposal. So all seems > > well with respect to this part. > > > > > > With respect to tracking provenance through integers > > some changes might be required. > > > > Let's consider this example: > > > > int x; > > int y; > > uintptr_t pi = (uintptr_t)&x; > > uintptr_t pj = (uintptr_t)&y; > > > > if (pi + 4 == pj) { > > > >int* p = (int*)pj; // can be one-after pointer of 'x' > >p[-1] = 1; // well defined? > > } > > > > If I understand correctly, a pointer obtained from > > pi + 4 would have a "anything" provenance (which is > > fine). But the pointer obtained from 'pj' would have the > > provenance of 'y' so the access to 'x' would not > > be allowed. > > Correct. This is the most difficult case for us to handle > exactly also because (also valid for the proposal?) > > int x; > int y; > uintptr_t pi = (uintptr_t)&x; > uintptr_t pj = (uintptr_t)&y; > > if (pi + 4 == pj) { > >int* p = (int*)(pi + 4); // can be one-after pointer of 'x' >p[-1] = 1; // well defined? > } > > while well-handled by GCC in the written form (as you > say, pi + 4 yields "anything" provenance), GCC itself > may tranform it into the first variant by noticing > the conditional equivalence and substituting pj for > pi + 4. > > > But according to the preferred version of > > our proposal, the pointer could also be used to > > access 'x' because it is also exposed. > > > > GCC could make pj have a "anything" provenance > > even though it is not modified. (This would break > > some optimization such as the one for Matlab.) > > > > Maybe one could also refine this optimization to check > > for additional conditions which rule out the case > > that there is another object the pointer could point > > to. > > The only feasible solution would be to not track > provenance through non-pointers and make > conversions of non-pointers to pointers have > "anything" provenance. > > The additional issue that appears here though > is that we cannot even turn (int *)(uintptr_t)p > into p anymore since with the conditional > substitution we can then still arrive at > effectively (&y)[-1] = 1 which is of course > undefined behavior. > > That is, your proposal makes > > ((int *)(uintptr_t)&y)[-1] = 1 > > well-defined (if &y - 1 == &x) but keeps > > (&y)[-1] = 1 > > as undefined which strikes me as a little bit > inconsistent. If that's true it's IMHO worth > a defect report and second consideration. Similarly that int x; int y; uintptr_t pj = (uintptr_t)&y; if (&x + 1 == &y) { int* p = (int*)pj; // can be one-after pointer of 'x' p[-1] = 1; // well defined? } is undefined but when I add a no-op (uintptr_t)&x; it is well-defined is undesirable. Can this no-op stmt appear in another function? Or even in another translation unit (if x and y are global variables)? And does such stmt have to be present (in another TU) to make the example valid in this case? To me all this makes requiring exposal through a cast to a non-pointer (or accessing its representation) not in any way more "useful" for an optimizing compiler than modeling exposal through address-taking. Richard. > Richard. > > > Best, > > Martin
Re: C provenance semantics proposal
On Thu, 18 Apr 2019 at 10:32, Richard Biener wrote: > > On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin > wrote: > > > > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener: > > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin > > > wrote: > > > > > > > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener: > > > > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin > > > > > wrote: > > > > > > > > > > > > > Since > > > > > > > your proposal is based on an abstract machine there isn't anything > > > > > > > like a pointer with multiple provenances (which "anything" is), > > > > > > > just > > > > > > > pointers with no provenance (pointing outside of any object), > > > > > > > right? > > > > > > > > > > > > This is correct. What the proposal does though is put a limit > > > > > > on where pointers obtained from integers are allowed to point > > > > > > to: They cannot point to non-exposed objects. I assume GCC > > > > > > "anything" provenances also cannot point to all possible > > > > > > objects. > > > > > > > > > > Yes. We exclude objects that do not have their address taken > > > > > though (so somewhat similar to your "exposed"). > > > > > > > > Also if the address never escapes? > > > > > > Yes. > > > > Then with respect to "expose" it seems GCC implements > > a superset which means it allows some behavior which > > is undefined according to the proposal. So all seems > > well with respect to this part. > > > > > > With respect to tracking provenance through integers > > some changes might be required. > > > > Let's consider this example: > > > > int x; > > int y; > > uintptr_t pi = (uintptr_t)&x; > > uintptr_t pj = (uintptr_t)&y; > > > > if (pi + 4 == pj) { > > > >int* p = (int*)pj; // can be one-after pointer of 'x' > >p[-1] = 1; // well defined? > > } > > > > If I understand correctly, a pointer obtained from > > pi + 4 would have a "anything" provenance (which is > > fine). But the pointer obtained from 'pj' would have the > > provenance of 'y' so the access to 'x' would not > > be allowed. > > Correct. This is the most difficult case for us to handle > exactly also because (also valid for the proposal?) > > int x; > int y; > uintptr_t pi = (uintptr_t)&x; > uintptr_t pj = (uintptr_t)&y; > > if (pi + 4 == pj) { > >int* p = (int*)(pi + 4); // can be one-after pointer of 'x' >p[-1] = 1; // well defined? > } > > while well-handled by GCC in the written form (as you > say, pi + 4 yields "anything" provenance), GCC itself > may tranform it into the first variant by noticing > the conditional equivalence and substituting pj for > pi + 4. In the proposed semantics, the integers have no provenance info at all, so pj and pi+4 are interchangeable inside the conditional. An equality test of two pointers, on the other hand, doesn't necessarily mean that they are interchangeable. I don't see any good way to avoid that in a provenance semantics, where a one-past pointer might sometimes compare equal to a pointer to an adjacent object but be illegal for accessing it. > > But according to the preferred version of > > our proposal, the pointer could also be used to > > access 'x' because it is also exposed. > > > > GCC could make pj have a "anything" provenance > > even though it is not modified. (This would break > > some optimization such as the one for Matlab.) > > > > Maybe one could also refine this optimization to check > > for additional conditions which rule out the case > > that there is another object the pointer could point > > to. > > The only feasible solution would be to not track > provenance through non-pointers and make > conversions of non-pointers to pointers have > "anything" provenance. > > The additional issue that appears here though > is that we cannot even turn (int *)(uintptr_t)p > into p anymore since with the conditional > substitution we can then still arrive at > effectively (&y)[-1] = 1 which is of course > undefined behavior. > > That is, your proposal makes > > ((int *)(uintptr_t)&y)[-1] = 1 > > well-defined (if &y - 1 == &x) but keeps > > (&y)[-1] = 1 > > as undefined that's true (if x has been exposed). >which strikes me as a little bit > inconsistent. If that's true it's IMHO worth > a defect report and second consideration. There's a trade-off here. We could permit roundtrips of pointer-to-integer-to-pointer only recover provenance if the pointer is properly within the object, giving empty provenance for a one-past pointer. That would fix the above, but it's not clear whether this would be a bad restriction for existing code. best, Peter > Richard. > > > Best, > > Martin
Re: C provenance semantics proposal
On Thu, 18 Apr 2019 at 10:56, Richard Biener wrote: > > On Thu, Apr 18, 2019 at 11:31 AM Richard Biener > wrote: > > > > On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin > > wrote: > > > > > > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener: > > > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin > > > > wrote: > > > > > > > > > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener: > > > > > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin > > > > > > wrote: > > > > > > > > > > > > > > > Since > > > > > > > > your proposal is based on an abstract machine there isn't > > > > > > > > anything > > > > > > > > like a pointer with multiple provenances (which "anything" is), > > > > > > > > just > > > > > > > > pointers with no provenance (pointing outside of any object), > > > > > > > > right? > > > > > > > > > > > > > > This is correct. What the proposal does though is put a limit > > > > > > > on where pointers obtained from integers are allowed to point > > > > > > > to: They cannot point to non-exposed objects. I assume GCC > > > > > > > "anything" provenances also cannot point to all possible > > > > > > > objects. > > > > > > > > > > > > Yes. We exclude objects that do not have their address taken > > > > > > though (so somewhat similar to your "exposed"). > > > > > > > > > > Also if the address never escapes? > > > > > > > > Yes. > > > > > > Then with respect to "expose" it seems GCC implements > > > a superset which means it allows some behavior which > > > is undefined according to the proposal. So all seems > > > well with respect to this part. > > > > > > > > > With respect to tracking provenance through integers > > > some changes might be required. > > > > > > Let's consider this example: > > > > > > int x; > > > int y; > > > uintptr_t pi = (uintptr_t)&x; > > > uintptr_t pj = (uintptr_t)&y; > > > > > > if (pi + 4 == pj) { > > > > > >int* p = (int*)pj; // can be one-after pointer of 'x' > > >p[-1] = 1; // well defined? > > > } > > > > > > If I understand correctly, a pointer obtained from > > > pi + 4 would have a "anything" provenance (which is > > > fine). But the pointer obtained from 'pj' would have the > > > provenance of 'y' so the access to 'x' would not > > > be allowed. > > > > Correct. This is the most difficult case for us to handle > > exactly also because (also valid for the proposal?) > > > > int x; > > int y; > > uintptr_t pi = (uintptr_t)&x; > > uintptr_t pj = (uintptr_t)&y; > > > > if (pi + 4 == pj) { > > > >int* p = (int*)(pi + 4); // can be one-after pointer of 'x' > >p[-1] = 1; // well defined? > > } > > > > while well-handled by GCC in the written form (as you > > say, pi + 4 yields "anything" provenance), GCC itself > > may tranform it into the first variant by noticing > > the conditional equivalence and substituting pj for > > pi + 4. > > > > > But according to the preferred version of > > > our proposal, the pointer could also be used to > > > access 'x' because it is also exposed. > > > > > > GCC could make pj have a "anything" provenance > > > even though it is not modified. (This would break > > > some optimization such as the one for Matlab.) > > > > > > Maybe one could also refine this optimization to check > > > for additional conditions which rule out the case > > > that there is another object the pointer could point > > > to. > > > > The only feasible solution would be to not track > > provenance through non-pointers and make > > conversions of non-pointers to pointers have > > "anything" provenance. > > > > The additional issue that appears here though > > is that we cannot even turn (int *)(uintptr_t)p > > into p anymore since with the conditional > > substitution we can then still arrive at > > effectively (&y)[-1] = 1 which is of course > > undefined behavior. > > > > That is, your proposal makes > > > > ((int *)(uintptr_t)&y)[-1] = 1 > > > > well-defined (if &y - 1 == &x) but keeps > > > > (&y)[-1] = 1 > > > > as undefined which strikes me as a little bit > > inconsistent. If that's true it's IMHO worth > > a defect report and second consideration. > > Similarly that > > int x; > int y; > uintptr_t pj = (uintptr_t)&y; > > if (&x + 1 == &y) { > >int* p = (int*)pj; // can be one-after pointer of 'x' >p[-1] = 1; // well defined? > } > > is undefined but when I add a no-op > > (uintptr_t)&x; > > it is well-defined is undesirable. Can this no-op > stmt appear in another function? Or even in > another translation unit (if x and y are global variables)? > And does such stmt have to be present (in another > TU) to make the example valid in this case? yes to all that - again, in the variant in which roundtrips of a one-past pointer are supported. > To me all this makes requiring exposal through a cast > to a non-pointer (or accessing its representation) not > in any way more "useful" for an optimizing compiler than > modeling exposal through address-taking. interesting, thanks
Re: [WIP][RFC] split of i386.c
Hello, the patch looks good to me. I think in the copyright comment you want to copy all the years of copyright of original i386.c since the code is not from 2019. also there are missing licence comments in the .h files. I would probably put the bigger machine specific optimization passes into separate files (these have noreason to be globbed). It will also make it more obvoius that i386 has couple of optimization passes. Why scalar chain needs to be exported? I think it would be nice to commit the patch early stage1 :) Honza On Mon, Mar 18, 2019 at 3:19 PM Martin Liška wrote: > Hi. > > I'm sending first version of the split, which has following > statistics: > > gcc/config.gcc | 5 +- > gcc/config/i386/i386-builtins.c | 2563 ++ > gcc/config/i386/i386-builtins.h | 314 ++ > gcc/config/i386/i386-expand.c | 19868 > + > gcc/config/i386/i386-expand.h |40 + > gcc/config/i386/i386-features.c | 2854 +++ > gcc/config/i386/i386-features.h | 179 + > gcc/config/i386/i386-options.c | 3678 ++ > gcc/config/i386/i386-options.h |76 + > gcc/config/i386/i386-protos.h | 4 - > gcc/config/i386/i386.c | 62939 > ++ > gcc/config/i386/i386.h | 9 + > gcc/config/i386/t-i386 |16 + > 13 files changed, 46542 insertions(+), 46003 deletions(-) > > The newly created files have following content: > - i386-builtins.c - builtin-in handling, __builtin_cpu_is and > __builtin_cpu_supports, target pragma handling > - i386-expand.c - all scalar and vector expansion code > - i386-features.c - contains isolated target features - vzerotoupper, stv, > cet, rpad, multi-versioning, > - i386-options.c - option related functions, target attribute handling > > Now the i386.c file is down in size: > 23038 gcc/config/i386/i386.c > > Question is whether I should continue or not? Remaining content of the > file is made of > various costing functions, print_reg*, various target hooks, coff, ms_abi, > retpolines, > output-functions, etc. I don't see any further split point which should > define a new > file. > > Patch: > https://drive.google.com/file/d/1SiNcR35DHMNBumyg5ltbOozEJ5Q0ajKn/view?usp=sharing > > Thoughts? > Thanks, > Martin >
Re: C provenance semantics proposal
Am Donnerstag, den 18.04.2019, 11:56 +0200 schrieb Richard Biener: > On Thu, Apr 18, 2019 at 11:31 AM Richard Biener > wrote: > > > > On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin > > wrote: > > > > > > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener: > > > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin > > > > wrote: > > > Let's consider this example: > > > > > > int x; > > > int y; > > > uintptr_t pi = (uintptr_t)&x; > > > uintptr_t pj = (uintptr_t)&y; > > > > > > if (pi + 4 == pj) { > > > > > > int* p = (int*)pj; // can be one-after pointer of 'x' > > > p[-1] = 1; // well defined? > > > } > > > > > > If I understand correctly, a pointer obtained from > > > pi + 4 would have a "anything" provenance (which is > > > fine). But the pointer obtained from 'pj' would have the > > > provenance of 'y' so the access to 'x' would not > > > be allowed. > > > > Correct. This is the most difficult case for us to handle > > exactly also because (also valid for the proposal?) > > > > int x; > > int y; > > uintptr_t pi = (uintptr_t)&x; > > uintptr_t pj = (uintptr_t)&y; > > > > if (pi + 4 == pj) { > > > > int* p = (int*)(pi + 4); // can be one-after pointer of 'x' > > p[-1] = 1; // well defined? > > } > > > > while well-handled by GCC in the written form (as you > > say, pi + 4 yields "anything" provenance), GCC itself > > may tranform it into the first variant by noticing > > the conditional equivalence and substituting pj for > > pi + 4. Integers are just integers in the proposal, so conditional equivalence is not a problem for them. In my opinion this is a strength of the proposal. Tracking provenance for integers would mean that all computations would be affected by such subtle semantics issues (where you can not even replace an integer by an equivalent one). In this proposal this is limited to pointers where it at least makes some sense. > > > But according to the preferred version of > > > our proposal, the pointer could also be used to > > > access 'x' because it is also exposed. > > > > > > GCC could make pj have a "anything" provenance > > > even though it is not modified. (This would break > > > some optimization such as the one for Matlab.) > > > > > > Maybe one could also refine this optimization to check > > > for additional conditions which rule out the case > > > that there is another object the pointer could point > > > to. > > > > The only feasible solution would be to not track > > provenance through non-pointers and make > > conversions of non-pointers to pointers have > > "anything" provenance. This would be one solution, yes. But you could reattach the same provenance if you know that the pointer points in the middle of an object (so is not a first or one-after pointer) or if you know that there is no exposed object directly adjacent to this object, etc.. > > The additional issue that appears here though > > is that we cannot even turn (int *)(uintptr_t)p > > into p anymore since with the conditional > > substitution we can then still arrive at > > effectively (&y)[-1] = 1 which is of course > > undefined behavior. > > > > That is, your proposal makes > > > > ((int *)(uintptr_t)&y)[-1] = 1 > > > > well-defined (if &y - 1 == &x) but keeps > > > > (&y)[-1] = 1 > > > > as undefined which strikes me as a little bit > > inconsistent. If that's true it's IMHO worth > > a defect report and second consideration. This is true. But I would not call it inconsistent. It is just unusual if you expect that casts to integers and back are no-ops. In this proposal a round-trip has the effect of stripping the original provenance and attaching a new one (which could be the same as the old one). While in this specific scenario this might seem unreasonable, there are other examples where you may want to be able to get from one object to the others. and using casts to integers would then be the blessed way to express this. In my opinion, this is also intuitive: By casting to an integer one then gets simple discrete pointer semantics where one does not have provenance. > Similarly that > > int x; > int y; > uintptr_t pj = (uintptr_t)&y; > > if (&x + 1 == &y) { > > int* p = (int*)pj; // can be one-after pointer of 'x' > p[-1] = 1; // well defined? > } > > is undefined but when I add a no-op > > (uintptr_t)&x; > > it is well-defined is undesirable. Can this no-op > stmt appear in another function? Or even in > another translation unit (if x and y are global variables)? > And does such stmt have to be present (in another > TU) to make the example valid in this case? Without that statement, the example is not valid as the address of 'x' is not exposed. With the statement this becomes valid and it does not matter where this statement appears. Again, I agree that he fact that such a statement has a side-effect is something one needs to get used to. But adress-taken already has side-effect which could be surprisin
Re: C provenance semantics proposal
Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell: > On Thu, 18 Apr 2019 at 10:32, Richard Biener > wrote: > An equality test of two pointers, on the other hand, doesn't necessarily > mean that they are interchangeable. I don't see any good way to > avoid that in a provenance semantics, where a one-past > pointer might sometimes compare equal to a pointer to an > adjacent object but be illegal for accessing it. As I see it, there are essentially four options: 1.) Compilers do not use conditional equivalences for optimizations of pointers (or only when additional conditions apply which make it safe) 2.) We make pointer comparison between a pointer and a one-after pointer of a different object undefined behaviour. 3.) We make comparison have the side effect that afterwards any of the two pointers could have any of the two provenances. (with disambiguitation similar to what we have for casts). 4.) Compilers make sure that exposed objects never are allocated next to each other (as Jens proposed). None of these options is great. Best, Martin
Re: C provenance semantics proposal
On Thu, Apr 18, 2019 at 1:57 PM Uecker, Martin wrote: > > Am Donnerstag, den 18.04.2019, 11:56 +0200 schrieb Richard Biener: > > On Thu, Apr 18, 2019 at 11:31 AM Richard Biener > > wrote: > > > > > > On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin > > > wrote: > > > > > > > > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener: > > > > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin > > > > > wrote: > > > > > > Let's consider this example: > > > > > > > > int x; > > > > int y; > > > > uintptr_t pi = (uintptr_t)&x; > > > > uintptr_t pj = (uintptr_t)&y; > > > > > > > > if (pi + 4 == pj) { > > > > > > > >int* p = (int*)pj; // can be one-after pointer of 'x' > > > >p[-1] = 1; // well defined? > > > > } > > > > > > > > If I understand correctly, a pointer obtained from > > > > pi + 4 would have a "anything" provenance (which is > > > > fine). But the pointer obtained from 'pj' would have the > > > > provenance of 'y' so the access to 'x' would not > > > > be allowed. > > > > > > Correct. This is the most difficult case for us to handle > > > exactly also because (also valid for the proposal?) > > > > > > int x; > > > int y; > > > uintptr_t pi = (uintptr_t)&x; > > > uintptr_t pj = (uintptr_t)&y; > > > > > > if (pi + 4 == pj) { > > > > > >int* p = (int*)(pi + 4); // can be one-after pointer of 'x' > > >p[-1] = 1; // well defined? > > > } > > > > > > while well-handled by GCC in the written form (as you > > > say, pi + 4 yields "anything" provenance), GCC itself > > > may tranform it into the first variant by noticing > > > the conditional equivalence and substituting pj for > > > pi + 4. > > Integers are just integers in the proposal, so conditional > equivalence is not a problem for them. In my opinion this > is a strength of the proposal. Tracking provenance for > integers would mean that all computations would be affected > by such subtle semantics issues (where you can not even > replace an integer by an equivalent one). In this > proposal this is limited to pointers where it at least > makes some sense. > > > > > But according to the preferred version of > > > > our proposal, the pointer could also be used to > > > > access 'x' because it is also exposed. > > > > > > > > GCC could make pj have a "anything" provenance > > > > even though it is not modified. (This would break > > > > some optimization such as the one for Matlab.) > > > > > > > > Maybe one could also refine this optimization to check > > > > for additional conditions which rule out the case > > > > that there is another object the pointer could point > > > > to. > > > > > > The only feasible solution would be to not track > > > provenance through non-pointers and make > > > conversions of non-pointers to pointers have > > > "anything" provenance. > > This would be one solution, yes. But you could > reattach the same provenance if you know that the > pointer points in the middle of an object (so is > not a first or one-after pointer) or if you know > that there is no exposed object directly adjacent > to this object, etc.. > > > > The additional issue that appears here though > > > is that we cannot even turn (int *)(uintptr_t)p > > > into p anymore since with the conditional > > > substitution we can then still arrive at > > > effectively (&y)[-1] = 1 which is of course > > > undefined behavior. > > > > > > That is, your proposal makes > > > > > > ((int *)(uintptr_t)&y)[-1] = 1 > > > > > > well-defined (if &y - 1 == &x) but keeps > > > > > > (&y)[-1] = 1 > > > > > > as undefined which strikes me as a little bit > > > inconsistent. If that's true it's IMHO worth > > > a defect report and second consideration. > > This is true. But I would not call it inconsistent. > It is just unusual if you expect that casts to integers > and back are no-ops. In this proposal a round-trip has > the effect of stripping the original provenance and > attaching a new one (which could be the same as the > old one). Well, the standard explicitely says that if you convert a pointer to an integer (with the same or more precision) and back you get the same pointer back. That suggests (int *)(uintptr_t)&y is a semantical no-op? > While in this specific scenario this might seem > unreasonable, there are other examples where you may > want to be able to get from one object to the others. > and using casts to integers would then be the > blessed way to express this. Sure, no arguing about this. Sofar this all has been in the hands of implementors to make uses of this idiom work, now users will be able to wield the standards sword :/ > In my opinion, this is also intuitive: > By casting to an integer one then gets simple discrete > pointer semantics where one does not have provenance. > > > > Similarly that > > > > int x; > > int y; > > uintptr_t pj = (uintptr_t)&y; > > > > if (&x + 1 == &y) { > > > >int* p = (int*)pj; // can be one-after pointer of 'x' > >p[-1] = 1; // well defined? > > } > > >
Re: C provenance semantics proposal
On Thu, Apr 18, 2019 at 2:20 PM Uecker, Martin wrote: > > Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell: > > On Thu, 18 Apr 2019 at 10:32, Richard Biener > > wrote: > > > > An equality test of two pointers, on the other hand, doesn't necessarily > > mean that they are interchangeable. I don't see any good way to > > avoid that in a provenance semantics, where a one-past > > pointer might sometimes compare equal to a pointer to an > > adjacent object but be illegal for accessing it. > > As I see it, there are essentially four options: > > 1.) Compilers do not use conditional equivalences for > optimizations of pointers (or only when additional > conditions apply which make it safe) > > 2.) We make pointer comparison between a pointer > and a one-after pointer of a different object > undefined behaviour. Yes please! OTOH GCC transforms (uintptr_t)&a != (uintptr_t)(&b+1) into &a != &b + 1 (for equality compares) and then doesn't follow this C rule anyways. > 3.) We make comparison have the side effect that > afterwards any of the two pointers could have any > of the two provenances. (with disambiguitation > similar to what we have for casts). > > 4.) Compilers make sure that exposed objects never > are allocated next to each other (as Jens proposed). 5.) While the standard guarantees that (int *)(uintptr_t)p == p it does not guarantee that (uintptr_t)&a and (uintptr_t)&b have a specific relation to each other. To me this means that (uintptr_t)(&b + 1) - (uintptr_t)&b is not necessarily equal to sizeof(b). (of course it's a QOI issue if that doesn't hold) > None of these options is great. Indeed. But you are now writing down one specific variant (which isn't great either). Sometimes no written down variant is better than a not so great one, even if there isn't any obviously greater one. That said, GCCs implementation of the proposal might be to require -fno-tree-pta to follow it. And even that might not fully rescue us because of that (int *)(uintptr_t) stripping... At least I see no way to make use of the "exposed"ness and thus we have to assume every variable is exposed. Of course similar if the address-taken variant would be written down in the standard given the standard applies to the source form and not some intermediate (optimized) compiler language. Richard. > > > Best, > Martin
Re: C provenance semantics proposal
On Thu, Apr 18, 2019 at 02:42:22PM +0200, Richard Biener wrote: > > 1.) Compilers do not use conditional equivalences for > > optimizations of pointers (or only when additional > > conditions apply which make it safe) > > > > 2.) We make pointer comparison between a pointer > > and a one-after pointer of a different object > > undefined behaviour. > > Yes please! OTOH GCC transforms > (uintptr_t)&a != (uintptr_t)(&b+1) > into &a != &b + 1 (for equality compares) and then I think we don't. It was http://gcc.gnu.org/PR88775, but we haven't applied those changes, because we don't consider the point to start of one object vs. pointer to end of another one case in pointer comparisons (but do consider it in integral comparisons). Jakub
Re: C provenance semantics proposal
On Thu, Apr 18, 2019 at 02:47:18PM +0200, Jakub Jelinek wrote: > On Thu, Apr 18, 2019 at 02:42:22PM +0200, Richard Biener wrote: > > > 1.) Compilers do not use conditional equivalences for > > > optimizations of pointers (or only when additional > > > conditions apply which make it safe) > > > > > > 2.) We make pointer comparison between a pointer > > > and a one-after pointer of a different object > > > undefined behaviour. > > > > Yes please! OTOH GCC transforms > > (uintptr_t)&a != (uintptr_t)(&b+1) > > into &a != &b + 1 (for equality compares) and then > > I think we don't. It was http://gcc.gnu.org/PR88775, but we haven't applied > those changes, because we don't consider the point to start of one object > vs. pointer to end of another one case in pointer comparisons (but do > consider it in integral comparisons). That said, in RTL we really don't differentiate between pointers and integers and we'll need to do something about that one day. Jakub
Re: C provenance semantics proposal
Am Donnerstag, den 18.04.2019, 14:30 +0200 schrieb Richard Biener: > On Thu, Apr 18, 2019 at 1:57 PM Uecker, Martin > wrote: > > > > Am Donnerstag, den 18.04.2019, 11:56 +0200 schrieb Richard Biener: > > > On Thu, Apr 18, 2019 at 11:31 AM Richard Biener > > > wrote: > > > > > > > > The additional issue that appears here though > > > > is that we cannot even turn (int *)(uintptr_t)p > > > > into p anymore since with the conditional > > > > substitution we can then still arrive at > > > > effectively (&y)[-1] = 1 which is of course > > > > undefined behavior. > > > > > > > > That is, your proposal makes > > > > > > > > ((int *)(uintptr_t)&y)[-1] = 1 > > > > > > > > well-defined (if &y - 1 == &x) but keeps > > > > > > > > (&y)[-1] = 1 > > > > > > > > as undefined which strikes me as a little bit > > > > inconsistent. If that's true it's IMHO worth > > > > a defect report and second consideration. > > > > This is true. But I would not call it inconsistent. > > It is just unusual if you expect that casts to integers > > and back are no-ops. In this proposal a round-trip has > > the effect of stripping the original provenance and > > attaching a new one (which could be the same as the > > old one). > > Well, the standard explicitely says that if you convert > a pointer to an integer (with the same or more precision) > and back you get the same pointer back. That suggests > (int *)(uintptr_t)&y is a semantical no-op? Not quite, it only guarantees that it compares equal (7.20.1.4) which for pointers is (sadly) not the same. But our proposal would make it work perfectly from a programmer's point of view: The pointer you get back can always be used instead of the original pointer. But because it is not always clear whether this was a pointer to a first element or a one-after pointer it has to work for both. For the compiler writer this means that it is not the same pointer but a pointer one know less about. > > While in this specific scenario this might seem > > unreasonable, there are other examples where you may > > want to be able to get from one object to the others. > > and using casts to integers would then be the > > blessed way to express this. > > Sure, no arguing about this. Sofar this all has been in > the hands of implementors to make uses of this idiom work, > now users will be able to wield the standards sword :/ Well, isn't this the point of a standard? But we want to get this right and this is why we are talking to you. > > In my opinion, this is also intuitive: > > By casting to an integer one then gets simple discrete > > pointer semantics where one does not have provenance. > > > > > > > Similarly that > > > > > > int x; > > > int y; > > > uintptr_t pj = (uintptr_t)&y; > > > > > > if (&x + 1 == &y) { > > > > > > int* p = (int*)pj; // can be one-after pointer of 'x' > > > p[-1] = 1; // well defined? > > > } > > > > > > is undefined but when I add a no-op > > > > > > (uintptr_t)&x; > > > > > > it is well-defined is undesirable. Can this no-op > > > stmt appear in another function? Or even in > > > another translation unit (if x and y are global variables)? > > > And does such stmt have to be present (in another > > > TU) to make the example valid in this case? > > > > Without that statement, the example is not valid as the > > address of 'x' is not exposed. With the statement this > > becomes valid and it does not matter where this statement > > appears. Again, I agree that he fact that such a statement > > has a side-effect is something one needs to get used to. > > > > But adress-taken already has side-effect which could be > > surprising, doesn't it? If I understood your answer > > above correctly, for GCC you get this side-effect already > > without the cast: > > > > &x; > > Well, yes. But for GCC the important issue is whether > this address-taking is still done after optimization > (at the point we use provenance info to compute points-to sets). > So this plain stmt wouldn't survive and would not make > the example valid. It's of course a lot harder to write this > down into standard wording ;) (if not impossible...) "it has a side-effect whenever GCC does not optimize it away" seems unlikely to get accepted in the standard ;-) One could make a special rule about the statements with unused results or add some language about "observability". But couldn't the frontend simply mark the relevant casts? (e.g. transform into __builtin_expose() or something) > I guess there as to be a data dependence between an address-taken > operation and recreating that address (or a derived one to the same > object). That is, we're trying to support delta-compressing pointers > as often used in shared memory data structures. > > But as you've seen already conditional "dependences" are prone > to break. Yes, this is why we do not like it. Even assuming we could make this sound, it would add a lot of complexity. Limiting "provenance tracking" to po
Re: C provenance semantics proposal
On 4/18/19 6:50 AM, Jakub Jelinek wrote: > On Thu, Apr 18, 2019 at 02:47:18PM +0200, Jakub Jelinek wrote: >> On Thu, Apr 18, 2019 at 02:42:22PM +0200, Richard Biener wrote: 1.) Compilers do not use conditional equivalences for optimizations of pointers (or only when additional conditions apply which make it safe) 2.) We make pointer comparison between a pointer and a one-after pointer of a different object undefined behaviour. >>> >>> Yes please! OTOH GCC transforms >>> (uintptr_t)&a != (uintptr_t)(&b+1) >>> into &a != &b + 1 (for equality compares) and then >> >> I think we don't. It was http://gcc.gnu.org/PR88775, but we haven't applied >> those changes, because we don't consider the point to start of one object >> vs. pointer to end of another one case in pointer comparisons (but do >> consider it in integral comparisons). > > That said, in RTL we really don't differentiate between pointers and > integers and we'll need to do something about that one day. I'd be happy to get things sorted out up to the RTL transition, particularly the cases involving equivalences. Distinguishing between pointer and same sized integers in RTL will be difficult. jeff
Re: C provenance semantics proposal
On 4/18/19 6:20 AM, Uecker, Martin wrote: > Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell: >> On Thu, 18 Apr 2019 at 10:32, Richard Biener >> wrote: > > >> An equality test of two pointers, on the other hand, doesn't necessarily >> mean that they are interchangeable. I don't see any good way to >> avoid that in a provenance semantics, where a one-past >> pointer might sometimes compare equal to a pointer to an >> adjacent object but be illegal for accessing it. > > As I see it, there are essentially four options: > > 1.) Compilers do not use conditional equivalences for > optimizations of pointers (or only when additional > conditions apply which make it safe) I know this will hit DOM and CSE. I wouldn't be surprised if it touches VRP as well, maybe PTA. It seems simple enough though :-) > > 2.) We make pointer comparison between a pointer > and a one-after pointer of a different object > undefined behaviour. I generally like this as well, though I suspect it probably makes a lot of currently well defined code undefined. > > 3.) We make comparison have the side effect that > afterwards any of the two pointers could have any > of the two provenances. (with disambiguitation > similar to what we have for casts). This could have some interesting effects on PTA. Richi? > > 4.) Compilers make sure that exposed objects never > are allocated next to each other (as Jens proposed). Ugh. Not sure how you enforce that. Consider that the compiler may ultimately have no control over layout of data in static storage. jeff
Re: C provenance semantics proposal
Am Donnerstag, den 18.04.2019, 14:42 +0200 schrieb Richard Biener: > On Thu, Apr 18, 2019 at 2:20 PM Uecker, Martin > wrote: > > > > Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell: > > > On Thu, 18 Apr 2019 at 10:32, Richard Biener > > > wrote: > > > > > > > An equality test of two pointers, on the other hand, doesn't necessarily > > > mean that they are interchangeable. I don't see any good way to > > > avoid that in a provenance semantics, where a one-past > > > pointer might sometimes compare equal to a pointer to an > > > adjacent object but be illegal for accessing it. > > > > As I see it, there are essentially four options: > > > > 1.) Compilers do not use conditional equivalences for > > optimizations of pointers (or only when additional > > conditions apply which make it safe) > > > > 2.) We make pointer comparison between a pointer > > and a one-after pointer of a different object > > undefined behaviour. > > Yes please! OTOH GCC transforms > (uintptr_t)&a != (uintptr_t)(&b+1) > into &a != &b + 1 (for equality compares) and then > doesn't follow this C rule anyways. I know this would be the best option from the point of view of a compiler write. My concern is that this adds a trap for programmers. You can then compare arbitrary pointers except in this specific special case. > > 3.) We make comparison have the side effect that > > afterwards any of the two pointers could have any > > of the two provenances. (with disambiguitation > > similar to what we have for casts). > > > > 4.) Compilers make sure that exposed objects never > > are allocated next to each other (as Jens proposed). > > 5.) While the standard guarantees that (int *)(uintptr_t)p == p > it does not guarantee that (uintptr_t)&a and (uintptr_t)&b > have a specific relation to each other. To me this means > that (uintptr_t)(&b + 1) - (uintptr_t)&b is not necessarily > equal to sizeof(b). (of course it's a QOI issue if that doesn't > hold) I think a direct mapping from addresses to integer is the only thing which is feasible in most cases. But maybe the compiler could actually move 'a' and 'b' away from each other? > > None of these options is great. > > Indeed. But you are now writing down one specific variant > (which isn't great either). Sometimes no written down variant > is better than a not so great one, even if there isn't any obviously > greater one. I am not sure everybody would agree. > That said, GCCs implementation of the proposal might be > to require -fno-tree-pta to follow it. And even that might not > fully rescue us because of that (int *)(uintptr_t) stripping... Don't strip it then? The FE could add a marker. Best, Marti > At least I see no way to make use of the "exposed"ness > and thus we have to assume every variable is exposed. > Of course similar if the address-taken variant would be > written down in the standard given the standard applies > to the source form and not some intermediate (optimized) > compiler language.
Re: C provenance semantics proposal
Am Donnerstag, den 18.04.2019, 07:42 -0600 schrieb Jeff Law: > On 4/18/19 6:20 AM, Uecker, Martin wrote: > > Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell: > > > On Thu, 18 Apr 2019 at 10:32, Richard Biener > > > wrote: ... > > 4.) Compilers make sure that exposed objects never > > are allocated next to each other (as Jens proposed). > > Ugh. Not sure how you enforce that. Consider that the compiler may > ultimately have no control over layout of data in static storage. One maybe only where it matters? I assume the biggest benefit is for local variables and there the compiler has full control. For arbitrary pointer coming from somewhere, one has no provenance information anyway. Best, Martin
Re: C provenance semantics proposal
On Thu, 18 Apr 2019 at 14:54, Uecker, Martin wrote: > > Am Donnerstag, den 18.04.2019, 07:42 -0600 schrieb Jeff Law: > > On 4/18/19 6:20 AM, Uecker, Martin wrote: > > > Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell: > > > > On Thu, 18 Apr 2019 at 10:32, Richard Biener > > > > wrote: > > ... > > > 4.) Compilers make sure that exposed objects never > > > are allocated next to each other (as Jens proposed). > > > > Ugh. Not sure how you enforce that. Consider that the compiler may > > ultimately have no control over layout of data in static storage. > > One maybe only where it matters? I assume the biggest benefit > is for local variables and there the compiler has full control. > > For arbitrary pointer coming from somewhere, one has no provenance > information anyway. that's not quite true - one does know that it can't have the same provenance as anything created more recently than the incoming pointer > Best, > Martin
Re: C provenance semantics proposal
Am Donnerstag, den 18.04.2019, 15:49 +0100 schrieb Peter Sewell: > On Thu, 18 Apr 2019 at 14:54, Uecker, Martin > wrote: > > > > Am Donnerstag, den 18.04.2019, 07:42 -0600 schrieb Jeff Law: > > > On 4/18/19 6:20 AM, Uecker, Martin wrote: > > > > Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell: > > > > > On Thu, 18 Apr 2019 at 10:32, Richard Biener > > > > > wrote: > > > > ... > > > > 4.) Compilers make sure that exposed objects never > > > > are allocated next to each other (as Jens proposed). > > > > > > Ugh. Not sure how you enforce that. Consider that the compiler may > > > ultimately have no control over layout of data in static storage. > > > > One maybe only where it matters? I assume the biggest benefit > > is for local variables and there the compiler has full control. > > > > For arbitrary pointer coming from somewhere, one has no provenance > > information anyway. > > that's not quite true - one does know that it can't have the same provenance > as anything created more recently than the incoming pointer Good point. But then the objects can not be next to each other anyway. Best, Martin
gcc-7-20190418 is now available
Snapshot gcc-7-20190418 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/7-20190418/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 7 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch revision 270448 You'll find: gcc-7-20190418.tar.xzComplete GCC SHA256=082381f532d4d11244de3d772dfc8c7b22f86a7f17c10fbab749c0db0bc16ce6 SHA1=afc1c4e7f1dcef83d4eb2099aa38ba73797a7afd Diffs from 7-20190411 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-7 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.