https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70359
Aldy Hernandez <aldyh at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |aldyh at gcc dot gnu.org --- Comment #30 from Aldy Hernandez <aldyh at gcc dot gnu.org> --- Created attachment 43597 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43597&action=edit untested patch implementing suggestion in comment 26 The attached untested patch attempts to implement the suggestion in comment 26 of replacing the out-of-loop pre-inc with post-inc values. Richi, is this more or less what you had in mind? Assuming this: LOOP: # p_8 = PHI <p_16(2), p_19(3)> ... p_19 = p_8 + 4294967295; goto LOOP: The patch replaces: p_22 = p_8 + 4294967294; MEM[(char *)p_19 + 4294967295B] = 45; into: p_22 = p_19 + 4294967295; *p_22 = 45; This allows the backend to use auto-dec in two places: strb r1, [r4, #-1]! ... strblt r3, [r4, #-1]! ...reducing the byte count from 116 to 104, but just shy of the 96 needed to eliminate the regression. I will discuss the missing bytes in a follow-up comment, as they are unrelated to this IV adjustment patch. It is worth noting that x86 also benefits from a reduction of 3 bytes with this patch, as we remove 2 lea instructions: one within the loop, and one before returning. Thus, I believe this is a regression across the board, or at least in multiple architectures. A few comments... While I see the benefit of hijacking insert_backedge_copies() for this, I am not a big fan of changing the IL after the last tree dump (*t.optimized), as the modified IL would only be visible in *r.expand. Could we perhaps move this to another spot? Say after the last forwprop pass, or perhaps right before expand? Or perhaps have a *t.final dump right before expand? As mentioned, this is only a proof of concept. I made the test rather restrictive. I suppose we could relax the conditions and generalize it a bit. There are comments throughout showing what I had in mind.