>
> Bootstrap/make check/Specs2k are passing on i686 and x86_64.
Thanks for returning to this!
glibc has quite comprehensive testsuite for stringop. It may be useful to test
it
with -minline-all-stringop -mstringop-stategy=vector
I tested the patch on my core notebook and my memcpy micro benchmark.
Vector loop is not a win since apparenlty we do not produce any SSE code for
64bit
compilation. What CPUs and bock sizes this is intended for?
Also the internal loop with -march=native seems to come out as:
.L7:
movq (%rsi,%r8), %rax
movq 8(%rsi,%r8), %rdx
movq 48(%rsi,%r8), %r9
movq 56(%rsi,%r8), %r10
movdqu 16(%rsi,%r8), %xmm3
movdqu 32(%rsi,%r8), %xmm1
movq %rax, (%rdi,%r8)
movq %rdx, 8(%rdi,%r8)
movdqa %xmm3, 16(%rdi,%r8)
movdqa %xmm1, 32(%rdi,%r8)
movq %r9, 48(%rdi,%r8)
movq %r10, 56(%rdi,%r8)
addq $64, %r8
cmpq %r11, %r8
It is not htat much of SSE enablement since RA seems to home the vars in
integer regs.
Could you please look into it?
>
> Changelog entry:
>
> 2013-04-10 Michael Zolotukhin <[email protected]>
>
> * config/i386/i386-opts.h (enum stringop_alg): Add vector_loop.
> * config/i386/i386.c (expand_set_or_movmem_via_loop): Use
> adjust_address instead of change_address to keep info about alignment.
> (emit_strmov): Remove.
> (emit_memmov): New function.
> (expand_movmem_epilogue): Refactor to properly handle bigger sizes.
> (expand_movmem_epilogue): Likewise and return updated rtx for
> destination.
> (expand_constant_movmem_prologue): Likewise and return updated rtx for
> destination and source.
> (decide_alignment): Refactor, handle vector_loop.
> (ix86_expand_movmem): Likewise.
> (ix86_expand_setmem): Likewise.
> * config/i386/i386.opt (Enum): Add vector_loop to option stringop_alg.
> * emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF.
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 73a59b5..edb59da 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1565,6 +1565,18 @@ get_mem_align_offset (rtx mem, unsigned int align)
expr = inner;
}
}
+ else if (TREE_CODE (expr) == MEM_REF)
+ {
+ tree base = TREE_OPERAND (expr, 0);
+ tree byte_offset = TREE_OPERAND (expr, 1);
+ if (TREE_CODE (base) != ADDR_EXPR
+ || TREE_CODE (byte_offset) != INTEGER_CST)
+ return -1;
+ if (!DECL_P (TREE_OPERAND (base, 0))
+ || DECL_ALIGN (TREE_OPERAND (base, 0)) < align)
You can use TYPE_ALIGN here? In general can't we replace all the GIMPLE
handling by get_object_alignment?
+ return -1;
+ offset += tree_low_cst (byte_offset, 1);
+ }
else
return -1;
This change out to go independently. I can not review it.
I will make first look over the patch shortly, but please send updated patch
fixing
the problem with integer regs.
Honza