On Mon, May 17, 2021 at 1:46 PM Uecker, Martin <martin.uec...@med.uni-goettingen.de> wrote: > > > > Am Montag, den 17.05.2021, 09:44 +0200 schrieb Richard Biener: > > On Mon, May 17, 2021 at 9:32 AM Uecker, Martin > > <martin.uec...@med.uni-goettingen.de> wrote: > > > > > > > > > Am Montag, den 17.05.2021, 09:08 +0200 schrieb Richard Biener: > > > > On Sun, May 16, 2021 at 8:57 AM Uecker, Martin > > > > <martin.uec...@med.uni-goettingen.de> wrote: > > > > > > > > > > Hi Richard, > > > > > > > > > > I noticed that GCC 11 has different behavior in the following > > > > > example relative to 10.2 with -O2. I wonder whether this > > > > > is an intentional change and if yes, what are the rules? > > > > > > > > Yes, this is a fix for the long-standing PR57359 where we > > > > failed to preserve the order of stores when applying store- > > > > motion. > > > > > > > > The basic rule is that stores may not be re-ordered based on > > > > TBAA since a store changes the dynamic type of a memory > > > > location. That notably happens for C/C++ programs re-using > > > > allocated storage and for C++ programs using placement > > > > new. > > > > > > > > It looked like a not so important bug since all other compilers > > > > fall into the same trap, nevertheless they are all wrong in > > > > doing so (and so was GCC). All work fine at -O0 of course. > > > > > > > > Richard. > > > > > > This is excellent news! We were already thinking about > > > how one could change the effective type rules in C, but > > > it is not clear how this could have been done. > > > > Yes, indeed. It also turned out the solution found for GCC > > has zero performance impact in practice. The idea is simply > > to re-do dependent stores in program order on each loop exit. > > Without this "trick" fixing the bug has severe impact on SPEC > > (fixing as in disabling the invalid store-motion optimizations). > > Nice! > > Was moving these stores out of a loop the only > case where GCC did this incorrect store-motions?
Those are the ones I'm aware of, yes. > The other areas where the effective type > rules appear unclear and/or not matching reality are: > > - effective type of aggregates when they are incrementally > constructed or changed (not sure whether this is relevant > for optimization in GCC) So like struct S { int i; }; (int *)p = 1; ... = *(struct S *)p; or ... = ((struct S*)p)->i; this and cases with multiple members should work fine with GCC. > - access paths, i.e. is 'i.p' an access to 'i' or is > 'a[3]' an access to 'a' with array type (assuming 'a' > is an lvalue of array type before it decays and not > a pointer). And if yes, when does this access happen? > I think GCC relies on this. Yes, GCC assumes that i.p is an access to 'i' (but not in address-taking context like &i.p). Likewise for ((struct S *)p)->i > - initial sequence rule We do not honor this, but in practice if you have struct S1 { int i; float f; }; struct S2 { int i; double g; }; then accessing the initial sequence as *(int *) works, but struct S1 s1 = {}; int i = ((struct S2 *)&s1)->i; does not because we consider it an access as 'S2' due to the previous point. If it were just an access to 'i' it would work again. > Anything else? The most recent interesting case from the C++ world is when they allowed struct S { int n; std::byte a[16]; } S s; new (s.a) T(); S s2; s2 = s; thus construct an arbitrary type into an aggregate member of std::byte (or char type I believe). With GCCs memory model copying this like s2.a = s.a would have worked (because std::byte and char alias everything). But when you do s2 = s then the accesses as 'struct S' do not TBAA conflict with say 'double' (if you make T == double). For this reason the C++ frontend now drops TBAA for all std::byte/char array containing aggregeates ... I do know this kind of stuff is used in practice in C code as well but usually C people do not use aggregate copying but instead use memcpy (which would be fine). So maybe just an anecdote but eventually such situation occurs in (future) C as well. Richard. > Best, > Martin > > > > > Richard. > > > > > Best, > > > Martin > > > > > > > > > > > > > > Thanks! > > > > > Martin > > > > > > > > > > https://godbolt.org/z/57res7ax1 > > > > > > > > > > #include <stdio.h> > > > > > #include <stdlib.h> > > > > > > > > > > > > > > > __attribute__((__noinline__, __weak__)) > > > > > void f(long unk, void *pa, void *pa2, void *pb, long *x) { > > > > > for (long i = 0; i < unk; ++i) { > > > > > int oldy = *(int *)pa; > > > > > *(float *)pb = 42; > > > > > *(int *)pa2 = oldy ^ x[i]; > > > > > } > > > > > } > > > > > > > > > > int main(void) { > > > > > void *p = malloc(sizeof(int)); > > > > > *(int *)p = 13; > > > > > f(1, p, p, p, (long []){ 0 }); > > > > > printf("*pa(%d)\n", *(int *)p); > > > > > }