On Mon, Jul 11, 2011 at 1:57 PM, Michael Zolotukhin <michael.v.zolotuk...@gmail.com> wrote: > Sorry, for sending once again - forgot to attach the patch. > > On 11 July 2011 23:50, Michael Zolotukhin > <michael.v.zolotuk...@gmail.com> wrote: >> The attached patch enables use of vector instructions in memmov/memset >> expanding. >> >> New algorithm for move-mode selection is implemented for move_by_pieces, >> store_by_pieces. >> x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in >> similar way, x86 cost-models parameters are slightly changed to support >> this. This implementation checks if array's alignment is known at compile >> time and chooses expanding algorithm and move-mode according to it. >> >> Bootstrapped, two new fails due to incorrect tests (see >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503). New implementation gives >> quite big performance gain on memset/memcpy in some cases. >> >> A bunch of new tests are added to verify the implementation. >> >> Is it ok for trunk? >> >> Changelog: >> >> 2011-07-11 Zolotukhin Michael <michael.v.zolotuk...@intel.com> >> >> * config/i386/i386.h (processor_costs): Add second dimension to >> stringop_algs array. >> (clear_ratio): Tune value to improve performance. >> * config/i386/i386.c (cost models): Initialize second dimension of >> stringop_algs arrays. Tune cost model in atom_cost, generic32_cost >> and generic64_cost. >> (ix86_expand_move): Add support for vector moves, that use half of >> vector register. >> (expand_set_or_movmem_via_loop_with_iter): New function. >> (expand_set_or_movmem_via_loop): Enable reuse of the same iters in >> different loops, produced by this function. >> (emit_strset): New function. >> (promote_duplicated_reg): Add support for vector modes, add >> declaration. >> (promote_duplicated_reg_to_size): Likewise. >> (expand_movmem_epilogue): Add epilogue generation for bigger sizes. >> (expand_setmem_epilogue): Likewise. >> (expand_movmem_prologue): Likewise for prologue. >> (expand_setmem_prologue): Likewise. >> (expand_constant_movmem_prologue): Likewise. >> (expand_constant_setmem_prologue): Likewise. >> (decide_alg): Add new argument align_unknown. Fix algorithm of >> strategy selection if TARGET_INLINE_ALL_STRINGOPS is set. >> (decide_alignment): Update desired alignment according to chosen move >> mode. >> (ix86_expand_movmem): Change unrolled_loop strategy to use SSE-moves. >> (ix86_expand_setmem): Likewise. >> (ix86_slow_unaligned_access): Implementation of new hook >> slow_unaligned_access. >> (ix86_promote_rtx_for_memset): Implementation of new hook >> promote_rtx_for_memset. >> * config/i386/sse.md (sse2_loadq): Add expand for sse2_loadq. >> (vec_dupv4si): Add expand for vec_dupv4si. >> (vec_dupv2di): Add expand for vec_dupv2di. >> * emit-rtl.c (adjust_address_1): Improve algorithm for determining >> alignment of address+offset. >> (get_mem_align_offset): Add handling of MEM_REFs. >> * expr.c (compute_align_by_offset): New function. >> (move_by_pieces_insn): New function. >> (widest_mode_for_unaligned_mov): New function. >> (widest_mode_for_aligned_mov): New function. >> (widest_int_mode_for_size): Change type of size from int to >> HOST_WIDE_INT. >> (set_by_pieces_1): New function (new algorithm of memset expanding). >> (set_by_pieces_2): New function. >> (generate_move_with_mode): New function for set_by_pieces. >> (alignment_for_piecewise_move): Use hook slow_unaligned_access instead >> of macros SLOW_UNALIGNED_ACCESS. >> (emit_group_load_1): Likewise. >> (emit_group_store): Likewise. >> (emit_push_insn): Likewise. >> (store_field): Likewise. >> (expand_expr_real_1): Likewise. >> (compute_aligned_cost): New function. >> (compute_unaligned_cost): New function. >> (vector_mode_for_mode): New function. >> (vector_extensions_used_for_mode): New function. >> (move_by_pieces): New algorithm of memmove expanding. >> (move_by_pieces_ninsns): Update according to changes in >> move_by_pieces. >> (move_by_pieces_1): Remove as unused. >> (store_by_pieces): New algorithm for memset expanding. >> (clear_by_pieces): Likewise. >> (store_by_pieces_1): Remove incorrect parameters' attributes. >> * expr.h (compute_align_by_offset): Add declaration. >> * rtl.h (vector_extensions_used_for_mode): Add declaration. >> * builtins.c (expand_builtin_memset_args): Update according to changes >> in set_by_pieces. >> * target.def (DEFHOOK): Add hook slow_unaligned_access and >> promote_rtx_for_memset. >> * targhooks.c (default_slow_unaligned_access): Add default hook >> implementation. >> (default_promote_rtx_for_memset): Likewise. >> * targhooks.h (default_slow_unaligned_access): Add prototype. >> (default_promote_rtx_for_memset): Likewise. >> * cse.c (cse_insn): Stop forward propagation of vector constants. >> * fwprop.c (forward_propagate_and_simplify): Likewise. >> * doc/tm.texi (SLOW_UNALIGNED_ACCESS): Remove documentation for deleted >> macro SLOW_UNALIGNED_ACCESS. >> (TARGET_SLOW_UNALIGNED_ACCESS): Add documentation on new hook. >> (TARGET_PROMOTE_RTX_FOR_MEMSET): Likewise. >> * doc/tm.texi.in (SLOW_UNALIGNED_ACCESS): Likewise. >> (TARGET_SLOW_UNALIGNED_ACCESS): Likewise. >> (TARGET_PROMOTE_RTX_FOR_MEMSET): Likewise. >> >> 2011-07-11 Zolotukhin Michael <michael.v.zolotuk...@intel.com> >> >> * testsuite/gcc.target/i386/memset-s64-a0-1.c: New testcase. >> * testsuite/gcc.target/i386/memset-s64-a0-2.c: Ditto. >> * testsuite/gcc.target/i386/memset-s768-a0-1.c: Ditto. >> * testsuite/gcc.target/i386/memset-s768-a0-2.c: Ditto. >> * testsuite/gcc.target/i386/memset-s16-a1-1.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s16-a1-1.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-a0-3.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s64-a0-1.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-a1-1.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s64-a1-1.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-au-1.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s64-au-1.c: Ditto. >> * testsuite/gcc.target/i386/memset-s512-a0-1.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s512-a0-1.c: Ditto. >> * testsuite/gcc.target/i386/memset-s512-a1-1.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s512-a1-1.c: Ditto. >> * testsuite/gcc.target/i386/memset-s512-au-1.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s512-au-1.c: Ditto. >> * testsuite/gcc.target/i386/memset-s3072-a1-1.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s3072-a1-1.c: Ditto. >> * testsuite/gcc.target/i386/memset-s3072-au-1.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s3072-au-1.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-a0-4.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-a0-5.c: Ditto. >> * testsuite/gcc.target/i386/memset-s768-a0-3.c: Ditto. >> * testsuite/gcc.target/i386/memset-s768-a0-4.c: Ditto. >> * testsuite/gcc.target/i386/memset-s16-a1-2.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s16-a1-2.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-a0-6.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s64-a0-2.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-a1-2.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s64-a1-2.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-au-2.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s64-au-2.c: Ditto. >> * testsuite/gcc.target/i386/memset-s512-a0-2.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s512-a0-2.c: Ditto. >> * testsuite/gcc.target/i386/memset-s512-a1-2.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s512-a1-2.c: Ditto. >> * testsuite/gcc.target/i386/memset-s512-au-2.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s512-au-2.c: Ditto. >> * testsuite/gcc.target/i386/memset-s3072-a1-2.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s3072-a1-2.c: Ditto. >> * testsuite/gcc.target/i386/memset-s3072-au-2.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s3072-au-2.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-a0-7.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-a0-8.c: Ditto. >> * testsuite/gcc.target/i386/memset-s768-a0-5.c: Ditto. >> * testsuite/gcc.target/i386/memset-s768-a0-6.c: Ditto. >> * testsuite/gcc.target/i386/memset-s16-a1-3.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s16-a1-3.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-a0-9.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s64-a0-3.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-a1-3.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s64-a1-3.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-au-3.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s64-au-3.c: Ditto. >> * testsuite/gcc.target/i386/memset-s512-a0-3.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s512-a0-3.c: Ditto. >> * testsuite/gcc.target/i386/memset-s512-a1-3.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s512-a1-3.c: Ditto. >> * testsuite/gcc.target/i386/memset-s512-au-3.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s512-au-3.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-a0-10.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-a0-11.c: Ditto. >> * testsuite/gcc.target/i386/memset-s768-a0-7.c: Ditto. >> * testsuite/gcc.target/i386/memset-s768-a0-8.c: Ditto. >> * testsuite/gcc.target/i386/memset-s16-a1-4.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s16-a1-4.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-a0-12.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s64-a0-4.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-a1-4.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s64-a1-4.c: Ditto. >> * testsuite/gcc.target/i386/memset-s64-au-4.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s64-au-4.c: Ditto. >> * testsuite/gcc.target/i386/memset-s512-a0-4.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s512-a0-4.c: Ditto. >> * testsuite/gcc.target/i386/memset-s512-a1-4.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-s512-a1-4.c: Ditto. >> * testsuite/gcc.target/i386/memset-s512-au-4.c: Ditto. >> >
Please don't use -m32/-m64 in testcases directly. You should use /* { dg-do compile { target { ! ia32 } } } */ for 32bit insns and /* { dg-do compile { target { ia32 } } } */ for 64bit insns. -- H.J.