https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104632
Bug ID: 104632 Summary: Missed optimization about backward reads Product: gcc Version: 11.2.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: lh_mouse at 126 dot com Target Milestone: --- Target: x86_64-linux-gnu This is a piece of code that has been simplified from a Boyer-Moore-Horspool implementation: https://gcc.godbolt.org/z/766GYM8xf ```c++ // In real code this was // `load_le32_backwards(::std::reverse_iterator<const unsigned char*> ptr) unsigned load_le32_backwards(const unsigned char* ptr) { unsigned word = ptr[-1]; word = word << 8 | ptr[-2]; word = word << 8 | ptr[-3]; word = word << 8 | ptr[-4]; return word; } ``` This is equivalent to `return ((unsigned*)ptr)[-1];` on x86_64, but GCC fails to optimize it: GCC output: ``` load_le32_backwards(unsigned char const*): movzx edx, BYTE PTR [rdi-1] movzx eax, BYTE PTR [rdi-2] sal edx, 8 or eax, edx movzx edx, BYTE PTR [rdi-3] sal eax, 8 or edx, eax movzx eax, BYTE PTR [rdi-4] sal edx, 8 or eax, edx ret ``` Clang output: ``` load_le32_backwards(unsigned char const*): # @load_le32_backwards(unsigned char const*) mov eax, dword ptr [rdi - 4] ret ```