[Bug rtl-optimization/83272] Unnecessary mask instruction generated

slash.tmp at free dot fr Tue, 05 Dec 2017 01:07:36 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83272


--- Comment #3 from Mason <slash.tmp at free dot fr> ---
I think Jakub is right about an interaction between movzbl and shrb.

unsigned long long foo1(unsigned char *p) { return *p; }

foo1:
        movzbl  (%rdi), %eax
        ret

I.e. gcc "knows" that movzbl clears the upper bits of RAX. However...

unsigned long long foo2(unsigned char *p) { return *p / 2; }

foo2:
        movzbl  (%rdi), %eax
        shrb    %al
        movzbl  %al, %eax
        ret

gcc seems to think that shrb might "mess up" some bits in RAX, which then
need to be cleared again through an extra movzbl.

"shr %al" is encoded as "d0 e8" while "shr %eax" is "d1 e8"
The former is not more efficient than the latter.

As Marc Glisse pointed out, a solution is convincing gcc to store the
intermediate results in a register:

unsigned long long foo3(unsigned char *p)
{
        unsigned int temp = *p;
        return temp / 2;
}

foo3:
        movzbl  (%rdi), %eax
        shrl    %eax
        ret

gcc "knows" that shrl does not "disturb" RAX, so no extra movzbl required.


According to the "integer promotions" rules, *p is promoted to int in the
expression (*p / 2) used in foo2. However...

unsigned long long foo4(unsigned char *p)
{
        int temp = *p;
        return temp / 2;
}

foo4:
        movzbl  (%rdi), %eax
        sarl    %eax
        cltq
        ret


An 8-bit unsigned char promoted to int or unsigned int has the same value.

I expect the same code generated for foo2, foo3, foo4.
Yet we get 3 different code snippets.

[Bug rtl-optimization/83272] Unnecessary mask instruction generated

Reply via email to