https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96201
Bug ID: 96201 Summary: x86 movsd/movsq string instructions and alignment inference Product: gcc Version: 10.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: michaeljclark at mac dot com Target Milestone: --- Taking the time to record some observations and extract minimal test code for alignment (inference) and x86 string instruction selection. GCC9 and GCC10 are not generating x86 string instructions in cases apparently due to the compiler believing the addresses are not aligned. GCC10 appears to have an additional issue whereby x86 string instructions are not selected unless the address is aligned to twice the natural alignment. Two observations: * (GCC9/10) integer alignment is not inferred from expressions i.e. x & ~3 * (GCC10) __builtin_assume_aligned appears to require double the alignment The double alignment issue was observed with both int/movsd and long/movsq whereby GCC10 will only generate movsd or movsq if the alignment is double the type's natural alignment. The test case here is for int. --- BEGIN SAMPLE CODE --- void f1(long d, long s, unsigned n) { int *sn = (int*)( (long)(s ) & ~3l ); int *dn = (int*)( (long)(d ) & ~3l ); int *de = (int*)( (long)(d + n) & ~3l ); while (dn < de) *dn++ = *sn++; } void f2(long d, long s, unsigned n) { int *sn = (int*)( (long)(s ) & ~7l ); int *dn = (int*)( (long)(d ) & ~7l ); int *de = (int*)( (long)(d + n) & ~7l ); while (dn < de) *dn++ = *sn++; } void f3(long d, long s, unsigned n) { int *sn = __builtin_assume_aligned( (int*)( (long)(s ) & ~3l ), 4 ); int *dn = __builtin_assume_aligned( (int*)( (long)(d ) & ~3l ), 4 ); int *de = __builtin_assume_aligned( (int*)( (long)(d + n) & ~3l ), 4 ); while (dn < de) *dn++ = *sn++; } void f4(long d, long s, unsigned n) { int *sn = __builtin_assume_aligned( (int*)((long)(s ) & ~3l ), 8 ); int *dn = __builtin_assume_aligned( (int*)((long)(d ) & ~3l ), 8 ); int *de = __builtin_assume_aligned( (int*)((long)(d + n) & ~3l ), 8 ); while (dn < de) *dn++ = *sn++; } --- END SAMPLE CODE --- GCC9 generates this for f1, f2 and GCC10 generates this for f1, f2, f3 .Ln: leaq (%rax,%rsi), %rcx movq %rax, %rdx addq $4, %rax movl (%rcx), %ecx movl %ecx, (%rdx) cmpq %rax, %rdi ja .Ln GCC9 generates this for f3, f4 and GCC10 generates this only for f4 .Ln: movsl cmpq %rdi, %rdx ja .Ln