https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59393
Jeffrey A. Law <law at redhat dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |law at redhat dot com --- Comment #7 from Jeffrey A. Law <law at redhat dot com> --- I was looking at this and noticed we have several sequences like _18 = l_11 >> 16; _19 = _18 & 255; _20 = _19 + 256; _21 = _20 * 8; There's variations in the constants, but the pattern repeats regularly. My first thought was to rewrite that as _18 = l_11 >> 13; _19 = _18 & 0x7f8; _20 = _19 + 0x800; That seemed to be slightly worse on x86_64. I'd already noticed that the addition was setting bits we knew to be zero, so it could be rewritten using an IOR like this: _18 = l_11 >> 13; _19 = _18 & 0x7f8; _20 = _19 | 0x800; In isolation, that looked good on x86_64. So my thought was that we may have an gcc-7 improvement that could be made for this code. But then I coded up a quick pattern in match.pd and tested it and the resulting assembly code was considerably worse on x86_64 for the benchmark code. There's a couple things in play here on x86_64. In the benchmark code these are address computations. The *8 and +256 in the original sequence can be a part of the effective address in the memory reference. Furthermore, the masking is a 2 byte movzbl in the original sequence, but a 3 byte and # in the later sequences. This negates all the gain by using IOR instead of PLUS, which was shorter for x86_64. mips16 does slightly better with the second sequence, saving ~76 bytes on the included testcase. However, given how highly dependent this is on the target's addressing modes, match.pd is probably not the place to attack this problem. Combine is likely a better place, using either a generic splitting sequence that self-tunes via rtx_cost. Or via a target specific splitter. The closest we get right now is this combine attempt: (set (reg:SI 1077) (plus:SI (ashift:SI (and:SI (lshiftrt:SI (reg:SI 1073) (const_int 8 [0x8])) (reg:SI 1074)) (const_int 2 [0x2])) (const_int 1024 [0x400]))) reg:SI 1074 is (const_int 255), but we can't blindly substitute in because reg 1074 has other uses as seen by this attempt: (parallel [ (set (reg:SI 1077) (plus:SI (and:SI (ashift:SI (reg:SI 1072) (const_int 2 [0x2])) (const_int 1020 [0x3fc])) (const_int 1024 [0x400]))) (set (reg:SI 1074) (const_int 255 [0xff])) ])