https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84013

            Bug ID: 84013
           Summary: wrong __restrict clique with inline asm operand
           Product: gcc
           Version: 7.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: katsunori.kumatani at gmail dot com
  Target Milestone: ---

Using a __restrict parameter in an asm results in the wrong clique assigned to
the MEM_REF if the same pointer is used without an asm (where it has the
correct clique). This happens since GCC 6.4.0 and up (including GCC 7 and
trunk). GCC 6.3.0 assigns the clique differently (and even worse, in some
cases, even generates wrong code), for 6.4.0 a fix was backported but the fix
still has wrong cliques. Consider this snippet:

auto abc(int** __restrict a, int** __restrict b)
{
  *a = 0;
  asm volatile("":"+m"(*b));
  *a = 0;
  return *b;
}

Compile with -O2 -fipa-pta (the clique is _always_ wrong, but -fipa-pta is
needed for demonstration for some reason, I don't know why). The output on
x86_64:

GCC 6.3.0:
  movq $0, (%rdi)
  movq (%rsi), %rax
  ret

GCC 6.4.0 / GCC 7 / trunk:
  movq $0, (%rdi)
  movq $0, (%rdi)             # redundant
  movq (%rsi), %rax
  ret

As you can see, the RTL DSE pass doesn't remove the redundant store *a, because
the clique is wrong and thinks the asm can alias with *a, which is not the
case, because they're both __restrict. GCC 6.3.0 generates "better" code
because it has the other bug 79552 (which is fixed), so that's not a solution.

I'll comment with the cliques I got when inspecting after the "optimized" pass
(final gimple pass):

auto abc(int** __restrict a, int** __restrict b)
{
  *a = 0;                     // clique 1 base 1
  asm volatile("":"+m"(*b));  // clique 0 base 0 (wrong)
  *a = 0;                     // clique 1 base 1
  return *b;                  // clique 1 base 2 (what it should be)
}

The asm should obviously have clique 1 base 2 to match the return, since it's
the same pointer. To me, this is clearly a bug. I don't know if it's a missed
optimization or can even produce wrong code in certain cases, however.

Note that if you remove the return (i.e. any use of the *b pointer other than
the asm), you get this:

auto abc(int** __restrict a, int** __restrict b)
{
  *a = 0;                     // clique 1 base 1
  asm volatile("":"+m"(*b));  // clique 1 base 0
  *a = 0;                     // clique 1 base 1
}

Which is correct because it has the same clique as *a (1), but the base 0
worries me a bit, shouldn't it be base 2 as well just like before?

Note that the cliques are _always_ wrong, but the redundant output appears only
with -fipa-pta (even though not even one call is involved), not sure why (not
sure how it can remove the redundant store in that case though).

Reply via email to