https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84013
Bug ID: 84013
Summary: wrong __restrict clique with inline asm operand
Product: gcc
Version: 7.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: katsunori.kumatani at gmail dot com
Target Milestone: ---
Using a __restrict parameter in an asm results in the wrong clique assigned to
the MEM_REF if the same pointer is used without an asm (where it has the
correct clique). This happens since GCC 6.4.0 and up (including GCC 7 and
trunk). GCC 6.3.0 assigns the clique differently (and even worse, in some
cases, even generates wrong code), for 6.4.0 a fix was backported but the fix
still has wrong cliques. Consider this snippet:
auto abc(int** __restrict a, int** __restrict b)
{
*a = 0;
asm volatile("":"+m"(*b));
*a = 0;
return *b;
}
Compile with -O2 -fipa-pta (the clique is _always_ wrong, but -fipa-pta is
needed for demonstration for some reason, I don't know why). The output on
x86_64:
GCC 6.3.0:
movq $0, (%rdi)
movq (%rsi), %rax
ret
GCC 6.4.0 / GCC 7 / trunk:
movq $0, (%rdi)
movq $0, (%rdi) # redundant
movq (%rsi), %rax
ret
As you can see, the RTL DSE pass doesn't remove the redundant store *a, because
the clique is wrong and thinks the asm can alias with *a, which is not the
case, because they're both __restrict. GCC 6.3.0 generates "better" code
because it has the other bug 79552 (which is fixed), so that's not a solution.
I'll comment with the cliques I got when inspecting after the "optimized" pass
(final gimple pass):
auto abc(int** __restrict a, int** __restrict b)
{
*a = 0; // clique 1 base 1
asm volatile("":"+m"(*b)); // clique 0 base 0 (wrong)
*a = 0; // clique 1 base 1
return *b; // clique 1 base 2 (what it should be)
}
The asm should obviously have clique 1 base 2 to match the return, since it's
the same pointer. To me, this is clearly a bug. I don't know if it's a missed
optimization or can even produce wrong code in certain cases, however.
Note that if you remove the return (i.e. any use of the *b pointer other than
the asm), you get this:
auto abc(int** __restrict a, int** __restrict b)
{
*a = 0; // clique 1 base 1
asm volatile("":"+m"(*b)); // clique 1 base 0
*a = 0; // clique 1 base 1
}
Which is correct because it has the same clique as *a (1), but the base 0
worries me a bit, shouldn't it be base 2 as well just like before?
Note that the cliques are _always_ wrong, but the redundant output appears only
with -fipa-pta (even though not even one call is involved), not sure why (not
sure how it can remove the redundant store in that case though).