https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102436
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Target Milestone|--- |11.3 Priority|P3 |P2 Keywords| |missed-optimization Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Last reconfirmed| |2021-09-22 Status|UNCONFIRMED |ASSIGNED --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Memory reference 3: numb_moves Memory reference 4: _24->from ... Querying dependency of refs 3 and 4: dependent. Querying SM WAW dependencies of ref 3 in loop 1: dependent the issue is that we require conditional executed stores to be independent on all other stores as we cannot re-issue other stores on exit in the proper order. Now, in this case the dependent stores are executed under the same condition and in fact ordered in a way that we don't have to re-issue any dependent store. We're failing to handle this special case after the store-motion re-write that fixed the TBAA issues. Smaller testcase where we can just issue the conditional store to 'p': unsigned p; void foo (float *q) { for (int i = 0; i < 256; ++i) { if (p) { unsigned a = p; *(q++) = 1.; p = a + 1; } } } the following are what's very much more difficult to handle (we have to issue a conditional sequence of two stores, and remember the location the non-invariant store stored to _and_ verify we can re-emit that out-of-order, and we have to remember the value stored): unsigned p; void foo (float *q) { for (int i = 0; i < 256; ++i) { if (p) { unsigned a = p; p = a + 1; *(q++) = 1.; } } } a bit easier (the store we have to re-issue is always executed after the last conditional store): unsigned p; void foo (float *q) { for (int i = 0; i < 256; ++i) { if (p) { unsigned a = p; p = a + 1; } *(q++) = 1.; } } impossible / invalid: unsigned p; void foo (float *q) { for (int i = 0; i < 256; ++i) { *(q++) = 1.; if (p) { unsigned a = p; p = a + 1; } } } I will see how difficult it is to teach the already interwinded code the "trivial" case and whether the bit easier case falls out naturally.