https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109127
Bug ID: 109127 Summary: More advanced constexpr value compile time evaluation Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: dmitriy.ovdienko at gmail dot com Target Milestone: --- Hello, I'd like to report the idea which could improve the application performance. The idea is related to `constexpr` math, which can be performed at compile time. At some degree C++ compiler manages to perform the optimization. But in my more real example for some reason it does not perform that kind of optimization. Let's start with the simple example which explains the idea and which works. Following function serializes the `constexpr` unsigned into the string. It does not work right, as an output is reversed, but we will get into it later. ```cpp // The expected output is "543\0" void foo1(char* ptr) { constexpr unsigned Tag = 345; auto v = Tag; do { *ptr++ = (v % 10) + '0'; v /= 10; } while(v); *ptr = 0; } ``` The produced assembly is as following: ```asm foo1(char*): mov eax, DWORD PTR .LC0[rip] mov DWORD PTR [rdi], eax ret .LC0: .byte 53 .byte 52 .byte 51 .byte 0 ``` It is good enough. I would replace the reading from the memory `.LC0` with the hardcoded unsigned integer though, so CPU does not have to access other memory locations: ``` mov eax, 0x35343300 ; instead of mov eax, DWORD PTR .LC0[rip] ``` Now, I change the code a bit to use 16-base math. That is an intermediate step before we go to the real code: ```cpp void foo2(char* ptr) { constexpr unsigned Tag = 0xF345; auto v = Tag; while(v != 0xF) { *ptr++ = (v % 16) + '0'; v /= 16; } *ptr = 0; } ``` The assembly is the same as above, which is good. The thing which does not work is if I reverse the output bytes, then compiler does not perform the `constexpr` math in the compile time: ```cpp void foo3(char* ptr) { constexpr unsigned Tag = 0x345; // Convert 0x345 -> 0xF543 auto v = Tag; auto reversed = 0xFu; // 0xF is a stop value while(v) { reversed <<= 4; reversed |= v & 0xFu; v >>= 4; } // Now serialize 0xF543 into "345\0" while(reversed != 0xF) { *ptr++ = (reversed % 16) + '0'; reversed /= 16; } *ptr = 0; } ``` The assembly output is following: ```asm foo3(char*): mov eax, 62277 .L2: mov edx, eax add rdi, 1 shr eax, 4 and edx, 15 add edx, 48 mov BYTE PTR [rdi-1], dl cmp eax, 15 jne .L2 mov BYTE PTR [rdi], 0 ret ``` In the assembly above there is a `.L2` loop, which could be calculated during the compilation. The workaround is to force compiler to calculate the reversed unsigned and store it as constexpr: ```cpp constexpr unsigned reverse(unsigned v) { auto reversed = 0xFu; while(v) { reversed <<= 4; reversed |= v & 0xFu; v >>= 4; } return reversed; } void foo3(char* ptr) { constexpr unsigned Tag = 0x543; constexpr unsigned ReversedTag = reverse(Tag); auto reversed = ReversedTag; while(reversed != 0xF) { *ptr++ = (reversed % 16) + '0'; reversed /= 16; } *ptr = 0; } ``` The assembly is back to normal: ```cpp foo3(char*): mov eax, DWORD PTR .LC0[rip] mov DWORD PTR [rdi], eax ret .LC0: .byte 53 .byte 52 .byte 51 .byte 0 ```