> 
> Bootstrap/make check/Specs2k are passing on i686 and x86_64.
Thanks for returning to this!

glibc has quite comprehensive testsuite for stringop.  It may be useful to test 
it
with -minline-all-stringop -mstringop-stategy=vector

I tested the patch on my core notebook and my memcpy micro benchmark. 
Vector loop is not a win since apparenlty we do not produce any SSE code for 
64bit
compilation. What CPUs and bock sizes this is intended for?

Also the internal loop with -march=native seems to come out as:
.L7:
        movq    (%rsi,%r8), %rax
        movq    8(%rsi,%r8), %rdx
        movq    48(%rsi,%r8), %r9
        movq    56(%rsi,%r8), %r10
        movdqu  16(%rsi,%r8), %xmm3
        movdqu  32(%rsi,%r8), %xmm1
        movq    %rax, (%rdi,%r8)
        movq    %rdx, 8(%rdi,%r8)
        movdqa  %xmm3, 16(%rdi,%r8)
        movdqa  %xmm1, 32(%rdi,%r8)
        movq    %r9, 48(%rdi,%r8)
        movq    %r10, 56(%rdi,%r8)
        addq    $64, %r8
        cmpq    %r11, %r8

It is not htat much of SSE enablement since RA seems to home the vars in 
integer regs.
Could you please look into it?
> 
> Changelog entry:
> 
> 2013-04-10  Michael Zolotukhin  <michael.v.zolotuk...@gmail.com>
> 
>         * config/i386/i386-opts.h (enum stringop_alg): Add vector_loop.
>         * config/i386/i386.c (expand_set_or_movmem_via_loop): Use
>         adjust_address instead of change_address to keep info about alignment.
>         (emit_strmov): Remove.
>         (emit_memmov): New function.
>         (expand_movmem_epilogue): Refactor to properly handle bigger sizes.
>         (expand_movmem_epilogue): Likewise and return updated rtx for
>         destination.
>         (expand_constant_movmem_prologue): Likewise and return updated rtx for
>         destination and source.
>         (decide_alignment): Refactor, handle vector_loop.
>         (ix86_expand_movmem): Likewise.
>         (ix86_expand_setmem): Likewise.
>         * config/i386/i386.opt (Enum): Add vector_loop to option stringop_alg.
>         * emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF.

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 73a59b5..edb59da 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1565,6 +1565,18 @@ get_mem_align_offset (rtx mem, unsigned int align)
          expr = inner;
        }
     }
+  else if (TREE_CODE (expr) == MEM_REF)
+    {
+      tree base = TREE_OPERAND (expr, 0);
+      tree byte_offset = TREE_OPERAND (expr, 1);
+      if (TREE_CODE (base) != ADDR_EXPR
+         || TREE_CODE (byte_offset) != INTEGER_CST)
+       return -1;
+      if (!DECL_P (TREE_OPERAND (base, 0))
+         || DECL_ALIGN (TREE_OPERAND (base, 0)) < align)

You can use TYPE_ALIGN here? In general can't we replace all the GIMPLE
handling by get_object_alignment?

+       return -1;
+      offset += tree_low_cst (byte_offset, 1);
+    }
   else
     return -1;
 
This change out to go independently. I can not review it. 
I will make first look over the patch shortly, but please send updated patch 
fixing
the problem with integer regs.

Honza

Reply via email to