https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78182

            Bug ID: 78182
           Summary: Missed optimizations: "fused" byte stores
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rureclonic at thraml dot com
  Target Milestone: ---

Consider the following program, which writes '000000000001' in `text`.

  constexpr unsigned long Size = 12;
  char text[Size];

  void foo() {
    auto value = 1u;

    for (auto i = 0ul; i < Size; ++i) {
      text[Size - i - 1] = (value % 10) + '0';
      value /= 10;
    }
  }

The codegen for foo with `g++-7 -std=c++1z -O3 -march=corei7-avx` is:

  foo():
        mov     BYTE PTR text[rip+11], 49
        mov     BYTE PTR text[rip+10], 48
        mov     BYTE PTR text[rip+9], 48
        mov     BYTE PTR text[rip+8], 48
        mov     BYTE PTR text[rip+7], 48
        mov     BYTE PTR text[rip+6], 48
        mov     BYTE PTR text[rip+5], 48
        mov     BYTE PTR text[rip+4], 48
        mov     BYTE PTR text[rip+3], 48
        mov     BYTE PTR text[rip+2], 48
        mov     BYTE PTR text[rip+1], 48
        mov     BYTE PTR text[rip], 48
        ret

Why can't the 9 last movs be combined into wider stores?

Reply via email to