https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78182
Bug ID: 78182 Summary: Missed optimizations: "fused" byte stores Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rureclonic at thraml dot com Target Milestone: --- Consider the following program, which writes '000000000001' in `text`. constexpr unsigned long Size = 12; char text[Size]; void foo() { auto value = 1u; for (auto i = 0ul; i < Size; ++i) { text[Size - i - 1] = (value % 10) + '0'; value /= 10; } } The codegen for foo with `g++-7 -std=c++1z -O3 -march=corei7-avx` is: foo(): mov BYTE PTR text[rip+11], 49 mov BYTE PTR text[rip+10], 48 mov BYTE PTR text[rip+9], 48 mov BYTE PTR text[rip+8], 48 mov BYTE PTR text[rip+7], 48 mov BYTE PTR text[rip+6], 48 mov BYTE PTR text[rip+5], 48 mov BYTE PTR text[rip+4], 48 mov BYTE PTR text[rip+3], 48 mov BYTE PTR text[rip+2], 48 mov BYTE PTR text[rip+1], 48 mov BYTE PTR text[rip], 48 ret Why can't the 9 last movs be combined into wider stores?