This report was prompted by a mail on the lkml which was suggesting to
hand-craft memset: http://lkml.org/lkml/2007/8/17/309 . So I wondered if the
code generated for __builtin_memset was any good, and could be used instead of
hand-crafted code. I tested with (Debian) GCC 3.4.6, 4.1.3, 4.2.1, and also
with a snapshot of GCC 4.3. All the results are similar, so I will only show
them for GCC 4.2 on x86-64. Compilation was done with -O3.
First, the __builtin_memset code:
void fill1(char *s, int a)
{
__builtin_memset(s, a, 15);
}
GCC generates:
0: 40 0f b6 c6 movzbl %sil,%eax
4: 48 ba 01 01 01 01 01 mov $0x101010101010101,%rdx
b: 01 01 01
e: 40 0f b6 ce movzbl %sil,%ecx
12: 48 0f af c2 imul %rdx,%rax
16: 40 88 77 0e mov %sil,0xe(%rdi)
1a: 48 89 07 mov %rax,(%rdi)
1d: 40 0f b6 c6 movzbl %sil,%eax
21: 69 c0 01 01 01 01 imul $0x1010101,%eax,%eax
27: 89 47 08 mov %eax,0x8(%rdi)
2a: 89 c8 mov %ecx,%eax
2c: c1 e0 08 shl $0x8,%eax
2f: 01 c8 add %ecx,%eax
31: 66 89 47 0c mov %ax,0xc(%rdi)
35: c3 retq
Notice that GCC first computes %sil * (01)^8 and puts it into %rax, then it
computes %sil * (01)^4 and puts it into %eax (where it already was, due to the
previous multiplication), then it computes %sil * (01)^2 and puts it into %ax
(where it already was, again).
Second, some code where multiplication results are reused:
void fill2(char *s, int a)
{
unsigned long long int v = (unsigned char)a * 0x0101010101010101ull;
*(unsigned long long int *)s = v;
*(unsigned *)(s + 8) = v;
*(unsigned short *)(s + 12) = v;
*(s + 15) = v;
}
GCC generates:
0: 40 0f b6 f6 movzbl %sil,%esi
4: 48 b8 01 01 01 01 01 mov $0x101010101010101,%rax
b: 01 01 01
e: 48 0f af f0 imul %rax,%rsi
12: 48 89 37 mov %rsi,(%rdi)
15: 89 77 08 mov %esi,0x8(%rdi)
18: 66 89 77 0c mov %si,0xc(%rdi)
1c: 40 88 77 0f mov %sil,0xf(%rdi)
20: c3 retq
The function is 21 bytes smaller (-40%), it does not require two additional
registers (c and d), and it will not be slower.
The same issue arises on x86_32. The hand-written code (with 32bit integers
this time) is 14 bytes smaller for memset(,,15).
--
Summary: Redundant multiplications for memset
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: guillaume dot melquiond at ens-lyon dot fr
GCC target triplet: x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33103