Following testcase from PR target/33329 shows the problem where gcc doesn't fold vector arithmetic operations with constant arguments to a load of vector constant.
For clarity, sse4 will be used, but the same problem is present on sse2. --cut here-- extern void g (int *); void f (void) { int tabs[8], tabcount; for (tabcount = 1; tabcount <= 8; tabcount += 7) { int i; for (i = 0; i < 8; i++) tabs[i] = 2 * i; g (tabs); } } --cut here-- produces (gcc -O2 -msse4 -ftree-vectorize): .LCFI2: movdqa .LC0(%rip), %xmm1 leaq 16(%rsp), %rbp movdqa .LC1(%rip), %xmm0 paddd .LC2(%rip), %xmm1 pmulld %xmm1, %xmm0 # 19 *sse4_1_mulv4si3 [length = 4] movdqa %xmm0, (%rsp) .L2: movdqa .LC3(%rip), %xmm0 # 54 movq %rbp, %rdi addl $1, %ebx movdqa (%rsp), %xmm2 # 55 movdqa %xmm0, (%rbp) movdqa %xmm2, 16(%rbp) call g cmpl $2, %ebx jne .L2 All instructions above the loop have constant arguments. This is evident from combine RTL dump, where insn 19 is represented using following RTX: (insn 19 17 25 2 pr33329.c:13 (set (reg:V4SI 78) (mult:V4SI (reg:V4SI 77) (reg:V4SI 73))) 1136 {*sse4_1_mulv4si3} (expr_list:REG_DEAD (reg:V4S I 73) (expr_list:REG_EQUAL (const_vector:V4SI [ (const_int 8 [0x8]) (const_int 10 [0xa]) (const_int 12 [0xc]) (const_int 14 [0xe]) ]) (nil)))) Actually gcc already calculated correct const_vector value, but it looks like it doesn't know what to do with it. For optimal code, insn #55 should load vector constant from the constant pool in the same way as insn #54. -- Summary: Vector RTL arithmetic operations with constant arguments are not fully folded. Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ubizjak at gmail dot com GCC build triplet: x86_64-pc-linux-gnu GCC host triplet: x86_64-pc-linux-gnu GCC target triplet: x86_64-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33353