> > Bootstrap/make check/Specs2k are passing on i686 and x86_64. Thanks for returning to this!
glibc has quite comprehensive testsuite for stringop. It may be useful to test it with -minline-all-stringop -mstringop-stategy=vector I tested the patch on my core notebook and my memcpy micro benchmark. Vector loop is not a win since apparenlty we do not produce any SSE code for 64bit compilation. What CPUs and bock sizes this is intended for? Also the internal loop with -march=native seems to come out as: .L7: movq (%rsi,%r8), %rax movq 8(%rsi,%r8), %rdx movq 48(%rsi,%r8), %r9 movq 56(%rsi,%r8), %r10 movdqu 16(%rsi,%r8), %xmm3 movdqu 32(%rsi,%r8), %xmm1 movq %rax, (%rdi,%r8) movq %rdx, 8(%rdi,%r8) movdqa %xmm3, 16(%rdi,%r8) movdqa %xmm1, 32(%rdi,%r8) movq %r9, 48(%rdi,%r8) movq %r10, 56(%rdi,%r8) addq $64, %r8 cmpq %r11, %r8 It is not htat much of SSE enablement since RA seems to home the vars in integer regs. Could you please look into it? > > Changelog entry: > > 2013-04-10 Michael Zolotukhin <michael.v.zolotuk...@gmail.com> > > * config/i386/i386-opts.h (enum stringop_alg): Add vector_loop. > * config/i386/i386.c (expand_set_or_movmem_via_loop): Use > adjust_address instead of change_address to keep info about alignment. > (emit_strmov): Remove. > (emit_memmov): New function. > (expand_movmem_epilogue): Refactor to properly handle bigger sizes. > (expand_movmem_epilogue): Likewise and return updated rtx for > destination. > (expand_constant_movmem_prologue): Likewise and return updated rtx for > destination and source. > (decide_alignment): Refactor, handle vector_loop. > (ix86_expand_movmem): Likewise. > (ix86_expand_setmem): Likewise. > * config/i386/i386.opt (Enum): Add vector_loop to option stringop_alg. > * emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF. diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index 73a59b5..edb59da 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -1565,6 +1565,18 @@ get_mem_align_offset (rtx mem, unsigned int align) expr = inner; } } + else if (TREE_CODE (expr) == MEM_REF) + { + tree base = TREE_OPERAND (expr, 0); + tree byte_offset = TREE_OPERAND (expr, 1); + if (TREE_CODE (base) != ADDR_EXPR + || TREE_CODE (byte_offset) != INTEGER_CST) + return -1; + if (!DECL_P (TREE_OPERAND (base, 0)) + || DECL_ALIGN (TREE_OPERAND (base, 0)) < align) You can use TYPE_ALIGN here? In general can't we replace all the GIMPLE handling by get_object_alignment? + return -1; + offset += tree_low_cst (byte_offset, 1); + } else return -1; This change out to go independently. I can not review it. I will make first look over the patch shortly, but please send updated patch fixing the problem with integer regs. Honza