On Mon, Jul 11, 2011 at 1:57 PM, Michael Zolotukhin
<michael.v.zolotuk...@gmail.com> wrote:
> Sorry, for sending once again - forgot to attach the patch.
>
> On 11 July 2011 23:50, Michael Zolotukhin
> <michael.v.zolotuk...@gmail.com> wrote:
>> The attached patch enables use of vector instructions in memmov/memset
>> expanding.
>>
>> New algorithm for move-mode selection is implemented for move_by_pieces,
>> store_by_pieces.
>> x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in
>> similar way, x86 cost-models parameters are slightly changed to support
>> this. This implementation checks if array's alignment is known at compile
>> time and chooses expanding algorithm and move-mode according to it.
>>
>> Bootstrapped, two new fails due to incorrect tests (see
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503). New implementation gives
>> quite big performance gain on memset/memcpy in some cases.
>>
>> A bunch of new tests are added to verify the implementation.
>>
>> Is it ok for trunk?
>>
>> Changelog:
>>
>> 2011-07-11  Zolotukhin Michael  <michael.v.zolotuk...@intel.com>
>>
>>     * config/i386/i386.h (processor_costs): Add second dimension to
>>     stringop_algs array.
>>     (clear_ratio): Tune value to improve performance.
>>     * config/i386/i386.c (cost models): Initialize second dimension of
>>     stringop_algs arrays.  Tune cost model in atom_cost, generic32_cost
>>     and generic64_cost.
>>     (ix86_expand_move): Add support for vector moves, that use half of
>>     vector register.
>>     (expand_set_or_movmem_via_loop_with_iter): New function.
>>     (expand_set_or_movmem_via_loop): Enable reuse of the same iters in
>>     different loops, produced by this function.
>>     (emit_strset): New function.
>>     (promote_duplicated_reg): Add support for vector modes, add
>>     declaration.
>>     (promote_duplicated_reg_to_size): Likewise.
>>     (expand_movmem_epilogue): Add epilogue generation for bigger sizes.
>>     (expand_setmem_epilogue): Likewise.
>>     (expand_movmem_prologue): Likewise for prologue.
>>     (expand_setmem_prologue): Likewise.
>>     (expand_constant_movmem_prologue): Likewise.
>>     (expand_constant_setmem_prologue): Likewise.
>>     (decide_alg): Add new argument align_unknown.  Fix algorithm of
>>     strategy selection if TARGET_INLINE_ALL_STRINGOPS is set.
>>     (decide_alignment): Update desired alignment according to chosen move
>>     mode.
>>     (ix86_expand_movmem): Change unrolled_loop strategy to use SSE-moves.
>>     (ix86_expand_setmem): Likewise.
>>     (ix86_slow_unaligned_access): Implementation of new hook
>>     slow_unaligned_access.
>>     (ix86_promote_rtx_for_memset): Implementation of new hook
>>     promote_rtx_for_memset.
>>     * config/i386/sse.md (sse2_loadq): Add expand for sse2_loadq.
>>     (vec_dupv4si): Add expand for vec_dupv4si.
>>     (vec_dupv2di): Add expand for vec_dupv2di.
>>     * emit-rtl.c (adjust_address_1): Improve algorithm for determining
>>     alignment of address+offset.
>>     (get_mem_align_offset): Add handling of MEM_REFs.
>>     * expr.c (compute_align_by_offset): New function.
>>     (move_by_pieces_insn): New function.
>>     (widest_mode_for_unaligned_mov): New function.
>>     (widest_mode_for_aligned_mov): New function.
>>     (widest_int_mode_for_size): Change type of size from int to
>>     HOST_WIDE_INT.
>>     (set_by_pieces_1): New function (new algorithm of memset expanding).
>>     (set_by_pieces_2): New function.
>>     (generate_move_with_mode): New function for set_by_pieces.
>>     (alignment_for_piecewise_move): Use hook slow_unaligned_access instead
>>     of macros SLOW_UNALIGNED_ACCESS.
>>     (emit_group_load_1): Likewise.
>>     (emit_group_store): Likewise.
>>     (emit_push_insn): Likewise.
>>     (store_field): Likewise.
>>     (expand_expr_real_1): Likewise.
>>     (compute_aligned_cost): New function.
>>     (compute_unaligned_cost): New function.
>>     (vector_mode_for_mode): New function.
>>     (vector_extensions_used_for_mode): New function.
>>     (move_by_pieces): New algorithm of memmove expanding.
>>     (move_by_pieces_ninsns): Update according to changes in
>>     move_by_pieces.
>>     (move_by_pieces_1): Remove as unused.
>>     (store_by_pieces): New algorithm for memset expanding.
>>     (clear_by_pieces): Likewise.
>>     (store_by_pieces_1): Remove incorrect parameters' attributes.
>>     * expr.h (compute_align_by_offset): Add declaration.
>>     * rtl.h (vector_extensions_used_for_mode): Add declaration.
>>     * builtins.c (expand_builtin_memset_args): Update according to changes
>>     in set_by_pieces.
>>     * target.def (DEFHOOK): Add hook slow_unaligned_access and
>>     promote_rtx_for_memset.
>>     * targhooks.c (default_slow_unaligned_access): Add default hook
>>     implementation.
>>     (default_promote_rtx_for_memset): Likewise.
>>     * targhooks.h (default_slow_unaligned_access): Add prototype.
>>     (default_promote_rtx_for_memset): Likewise.
>>     * cse.c (cse_insn): Stop forward propagation of vector constants.
>>     * fwprop.c (forward_propagate_and_simplify): Likewise.
>>     * doc/tm.texi (SLOW_UNALIGNED_ACCESS): Remove documentation for deleted
>>     macro SLOW_UNALIGNED_ACCESS.
>>     (TARGET_SLOW_UNALIGNED_ACCESS): Add documentation on new hook.
>>     (TARGET_PROMOTE_RTX_FOR_MEMSET): Likewise.
>>     * doc/tm.texi.in (SLOW_UNALIGNED_ACCESS): Likewise.
>>     (TARGET_SLOW_UNALIGNED_ACCESS): Likewise.
>>     (TARGET_PROMOTE_RTX_FOR_MEMSET): Likewise.
>>
>> 2011-07-11  Zolotukhin Michael  <michael.v.zolotuk...@intel.com>
>>
>>     * testsuite/gcc.target/i386/memset-s64-a0-1.c: New testcase.
>>     * testsuite/gcc.target/i386/memset-s64-a0-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s768-a0-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s768-a0-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s16-a1-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s16-a1-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-a0-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s64-a0-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-a1-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s64-a1-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-au-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s64-au-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s512-a0-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s512-a0-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s512-a1-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s512-a1-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s512-au-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s512-au-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s3072-a1-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s3072-a1-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s3072-au-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s3072-au-1.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-a0-4.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-a0-5.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s768-a0-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s768-a0-4.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s16-a1-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s16-a1-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-a0-6.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s64-a0-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-a1-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s64-a1-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-au-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s64-au-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s512-a0-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s512-a0-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s512-a1-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s512-a1-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s512-au-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s512-au-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s3072-a1-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s3072-a1-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s3072-au-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s3072-au-2.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-a0-7.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-a0-8.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s768-a0-5.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s768-a0-6.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s16-a1-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s16-a1-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-a0-9.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s64-a0-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-a1-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s64-a1-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-au-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s64-au-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s512-a0-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s512-a0-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s512-a1-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s512-a1-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s512-au-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s512-au-3.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-a0-10.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-a0-11.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s768-a0-7.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s768-a0-8.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s16-a1-4.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s16-a1-4.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-a0-12.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s64-a0-4.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-a1-4.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s64-a1-4.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s64-au-4.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s64-au-4.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s512-a0-4.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s512-a0-4.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s512-a1-4.c: Ditto.
>>     * testsuite/gcc.target/i386/memcpy-s512-a1-4.c: Ditto.
>>     * testsuite/gcc.target/i386/memset-s512-au-4.c: Ditto.
>>
>

Please don't use -m32/-m64 in testcases directly.
You should use

/* { dg-do compile { target { ! ia32 } } } */

for 32bit insns and

/* { dg-do compile { target { ia32 } } } */

for 64bit insns.


-- 
H.J.

Reply via email to