https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96201

            Bug ID: 96201
           Summary: x86 movsd/movsq string instructions and alignment
                    inference
           Product: gcc
           Version: 10.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: michaeljclark at mac dot com
  Target Milestone: ---

Taking the time to record some observations and extract minimal test code for
alignment (inference) and x86 string instruction selection.

GCC9 and GCC10 are not generating x86 string instructions in cases apparently
due to the compiler believing the addresses are not aligned.

GCC10 appears to have an additional issue whereby x86 string instructions are
not selected unless the address is aligned to twice the natural alignment.

Two observations:

* (GCC9/10) integer alignment is not inferred from expressions i.e. x & ~3
* (GCC10) __builtin_assume_aligned appears to require double the alignment

The double alignment issue was observed with both int/movsd and long/movsq
whereby GCC10 will only generate movsd or movsq if the alignment is double the
type's natural alignment. The test case here is for int.


--- BEGIN SAMPLE CODE ---

void f1(long d, long s, unsigned n)
{
    int *sn = (int*)( (long)(s    ) & ~3l );
    int *dn = (int*)( (long)(d    ) & ~3l );
    int *de = (int*)( (long)(d + n) & ~3l );

    while (dn < de) *dn++ = *sn++;
}

void f2(long d, long s, unsigned n)
{
    int *sn = (int*)( (long)(s    ) & ~7l );
    int *dn = (int*)( (long)(d    ) & ~7l );
    int *de = (int*)( (long)(d + n) & ~7l );

    while (dn < de) *dn++ = *sn++;
}

void f3(long d, long s, unsigned n)
{
    int *sn = __builtin_assume_aligned( (int*)( (long)(s    ) & ~3l ), 4 );
    int *dn = __builtin_assume_aligned( (int*)( (long)(d    ) & ~3l ), 4 );
    int *de = __builtin_assume_aligned( (int*)( (long)(d + n) & ~3l ), 4 );

    while (dn < de) *dn++ = *sn++;
}

void f4(long d, long s, unsigned n)
{
    int *sn = __builtin_assume_aligned( (int*)((long)(s    ) & ~3l ), 8 );
    int *dn = __builtin_assume_aligned( (int*)((long)(d    ) & ~3l ), 8 );
    int *de = __builtin_assume_aligned( (int*)((long)(d + n) & ~3l ), 8 );

    while (dn < de) *dn++ = *sn++;
}

--- END SAMPLE CODE ---


GCC9 generates this for f1, f2 and GCC10 generates this for f1, f2, f3

.Ln:
        leaq    (%rax,%rsi), %rcx
        movq    %rax, %rdx
        addq    $4, %rax
        movl    (%rcx), %ecx
        movl    %ecx, (%rdx)
        cmpq    %rax, %rdi
        ja      .Ln

GCC9 generates this for f3, f4 and GCC10 generates this only for f4

.Ln:
        movsl
        cmpq    %rdi, %rdx
        ja      .Ln

Reply via email to