https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85637
Bug ID: 85637 Summary: Unneeded store of member variables in inner loop Product: gcc Version: 7.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: petschy at gmail dot com Target Milestone: --- Created attachment 44058 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44058&action=edit source Attached a simple Adler32 checksum class. When updating with an array of bytes, the inner loop just accumulates the two sums, then the modulo is done in the outer loop. This way the cost of the two modulos is amortized. At the start the two member variables are loaded into registers, however, they are stored back to memory in each inner loop iteration. Then, also at the end after the modulo, but before the end of the outer loop. There is only one exit from the function. Why not store the registers back just once right before the ret? Dump of assembler code for function Adler32::Update(void const*, unsigned int): 0x0000000000400500 <+0>: test %edx,%edx 0x0000000000400502 <+2>: je 0x400578 <Adler32::Update(void const*, unsigned int)+120> 0x0000000000400504 <+4>: mov (%rdi),%ecx ; ecx is m_s1 0x0000000000400506 <+6>: mov 0x4(%rdi),%r8d ; r8d is m_s2 0x000000000040050a <+10>: mov $0x80078071,%r10d 0x0000000000400510 <+16>: xor %r9d,%r9d 0x0000000000400513 <+19>: cmp $0x15af,%edx 0x0000000000400519 <+25>: jbe 0x400527 <Adler32::Update(void const*, unsigned int)+39> 0x000000000040051b <+27>: lea -0x15b0(%rdx),%r9d 0x0000000000400522 <+34>: mov $0x15b0,%edx 0x0000000000400527 <+39>: lea -0x1(%rdx),%eax 0x000000000040052a <+42>: lea 0x1(%rsi,%rax,1),%rdx 0x000000000040052f <+47>: nop 0x0000000000400530 <+48>: add $0x1,%rsi 0x0000000000400534 <+52>: movzbl -0x1(%rsi),%eax 0x0000000000400538 <+56>: add %eax,%ecx ; m_s1 += *buf 0x000000000040053a <+58>: add %ecx,%r8d ; m_s2 += m_s1 0x000000000040053d <+61>: cmp %rdx,%rsi 0x0000000000400540 <+64>: mov %ecx,(%rdi) ; !!! unneeded store 0x0000000000400542 <+66>: mov %r8d,0x4(%rdi) ; !!! ditto 0x0000000000400546 <+70>: jne 0x400530 <Adler32::Update(void const*, unsigned int)+48> 0x0000000000400548 <+72>: mov %ecx,%eax 0x000000000040054a <+74>: mul %r10d 0x000000000040054d <+77>: mov %r8d,%eax 0x0000000000400550 <+80>: shr $0xf,%edx 0x0000000000400553 <+83>: imul $0xfff1,%edx,%edx 0x0000000000400559 <+89>: sub %edx,%ecx 0x000000000040055b <+91>: mul %r10d 0x000000000040055e <+94>: mov %ecx,(%rdi) ; !!! this could be done after the jne at +118 0x0000000000400560 <+96>: shr $0xf,%edx 0x0000000000400563 <+99>: imul $0xfff1,%edx,%edx 0x0000000000400569 <+105>: sub %edx,%r8d 0x000000000040056c <+108>: test %r9d,%r9d 0x000000000040056f <+111>: mov %r9d,%edx 0x0000000000400572 <+114>: mov %r8d,0x4(%rdi) ; !!! ditto 0x0000000000400576 <+118>: jne 0x400510 <Adler32::Update(void const*, unsigned int)+16> 0x0000000000400578 <+120>: repz retq The above code is generated w/ 7.3.1, 6.3.1 generates the exact same code. 8.0.1 and 8.1.1 generates somewhat different code, longer by 32 bytes, but the placing of the stores are the same. The size difference is odd, but I'll open another bug for that. Platform: AMD64 (FX-8150), Debian 9.4 $ g++-6.3.1 -v Using built-in specs. COLLECT_GCC=g++-6.3.1 COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/6.3.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../configure --enable-languages=c,c++ --disable-multilib --program-suffix=-6.3.1 --disable-bootstrap --enable-checking=release CFLAGS='-O2 -march=native' CXXFLAGS='-O2 -march=native' Thread model: posix gcc version 6.3.1 20170120 (GCC) $ g++-7.3.1 -v Using built-in specs. COLLECT_GCC=g++-7.3.1 COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/7.3.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../configure --enable-languages=c,c++ --disable-multilib --program-suffix=-7.3.1 --disable-bootstrap CFLAGS='-O2 -march=native -mtune=native' CXXFLAGS='-O2 -march=native -mtune=native' Thread model: posix gcc version 7.3.1 20180429 (GCC) $ g++-8.0.1 -v Using built-in specs. COLLECT_GCC=g++-8.0.1 COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/8.0.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../configure --enable-languages=c,c++ --disable-multilib --program-suffix=-8.0.1 --disable-bootstrap CFLAGS='-O2 -march=native' CXXFLAGS='-O2 -march=native' Thread model: posix gcc version 8.0.1 20180214 (experimental) (GCC) $ g++-8.1.1 -v Using built-in specs. COLLECT_GCC=g++-8.1.1 COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/8.1.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../configure --enable-languages=c,c++ --disable-multilib --program-suffix=-8.1.1 --disable-bootstrap CFLAGS='-O2 -march=native -mtune=native' CXXFLAGS='-O2 -march=native -mtune=native' Thread model: posix gcc version 8.1.1 20180502 (GCC)