On Fri, Aug 1, 2014 at 2:21 AM, <[email protected]> wrote:
>
>
>> On Jun 6, 2014, at 1:50 AM, James Greenhalgh <[email protected]>
>> wrote:
>>
>>
>> Hi,
>>
>> The move_by_pieces infrastructure performs a copy by repeatedly trying
>> the largest safe copy it can make. So for a 15-byte copy we might see:
>>
>> offset amount bytes copied
>> 0 8 0-7
>> 8 4 8-11
>> 12 2 12-13
>> 14 1 14
>>
>> However, we can implement a 15-byte copy as so:
>>
>> offset amount bytes copied
>> 0 8 0-7
>> 7 8 7-14
>>
>> Which can prove more efficient for both space and speed.
>>
>> In this patch we set MOVE_RATIO low to avoid using move_by_pieces, and
>> implement the movmem pattern name to expand small block copy cases. Note,
>> this
>> optimization does not apply for -mstrict-align targets, which must continue
>> copying byte-by-byte.
>
> Why not change move_by_pieces instead of having a target specific code? This
> seems like a better option. You can check is unaligned slow target macro to
> see if you want to do this optimization too. As I mentioned in the other
> email make sure you check the volatile ness of the from and to before doing
> this optimization.
Attached is the patch which does what I mentioned, I also changed
store_by_pieces to implement a similar optimization there (for memset
and strcpy). Also since I used SLOW_UNALIGNED_ACCESS, this is a
generic optimization.
I had tested an earlier version on x86_64-linux-gnu and I am in the
middle of bootstrap/testing on this one.
Thanks,
Andrew Pinski
* expr.c (move_by_pieces):
Take the min of max_size and len to speed up things
and to take advatage of the mode in move_by_pieces_1.
(move_by_pieces_1): Read/write the leftovers using an overlapping
memory locations to reduce the number of reads/writes.
(store_by_pieces_1): Take the min of max_size and len to speed up things
and to take advatage of the mode in store_by_pieces_2.
(store_by_pieces_2): Write the leftovers using an overlapping
memory locations to reduce the number of writes.
>
> Thanks,
> Andrew
>
>
>>
>> Setting MOVE_RATIO low in this way causes a few tests to begin failing,
>> both of these are documented in the test-case as expected to fail for
>> low MOVE_RATIO targets, which do not allow certain Tree-Level optimizations.
>>
>> Bootstrapped on aarch64-unknown-linux-gnu with no issues.
>>
>> OK for trunk?
>>
>> Thanks,
>> James
>>
>> ---
>> gcc/
>>
>> 2014-06-06 James Greenhalgh <[email protected]>
>>
>> * config/aarch64/aarch64-protos.h (aarch64_expand_movmem): New.
>> * config/aarch64/aarch64.c (aarch64_move_pointer): New.
>> (aarch64_progress_pointer): Likewise.
>> (aarch64_copy_one_part_and_move_pointers): Likewise.
>> (aarch64_expand_movmen): Likewise.
>> * config/aarch64/aarch64.h (MOVE_RATIO): Set low.
>> * config/aarch64/aarch64.md (movmem<mode>): New.
>>
>> gcc/testsuite/
>>
>> 2014-06-06 James Greenhalgh <[email protected]>
>>
>> * gcc.dg/tree-ssa/pr42585.c: Skip for AArch64.
>> * gcc.dg/tree-ssa/sra-12.c: Likewise.
>> <0001-AArch64-Implement-movmem-for-the-benefit-of-inline-m.patch>
Index: expr.c
===================================================================
--- expr.c (revision 213306)
+++ expr.c (working copy)
@@ -876,6 +876,9 @@ move_by_pieces (rtx to, rtx from, unsign
if (data.reverse) data.offset = len;
data.len = len;
+ /* Use the MIN of the length and the max size we can use. */
+ max_size = max_size > (len + 1) ? (len + 1) : max_size;
+
/* If copying requires more than two move insns,
copy addresses to registers (to make displacements shorter)
and use post-increment if available. */
@@ -1073,6 +1076,32 @@ move_by_pieces_1 (insn_gen_fn genfun, ma
data->len -= size;
}
+
+ /* If we have some data left and unalign accesses
+ are not slow, back up slightly and emit the move. */
+ if (data->len > 0
+ && !STRICT_ALIGNMENT
+ && !SLOW_UNALIGNED_ACCESS (mode, 1)
+ /* Not a stack push */
+ && data->to
+ /* Neither side is volatile memory. */
+ && !MEM_VOLATILE_P (data->to)
+ && !MEM_VOLATILE_P (data->from)
+ && ceil_log2 (data->len) == exact_log2 (size)
+ /* No incrementing of the to or from. */
+ && data->explicit_inc_to == 0
+ && data->explicit_inc_from == 0
+ /* No auto-incrementing of the to or from. */
+ && !data->autinc_to
+ && !data->autinc_from
+ && !data->reverse)
+ {
+ unsigned offset = data->offset - (size - data->len);
+ to1 = adjust_address (data->to, mode, offset);
+ from1 = adjust_address (data->from, mode, offset);
+ emit_insn ((*genfun) (to1, from1));
+ data->len = 0;
+ }
}
/* Emit code to move a block Y to a block X. This may be done with
@@ -2636,6 +2665,9 @@ store_by_pieces_1 (struct store_by_piece
if (data->reverse)
data->offset = data->len;
+ /* Use the MIN of the length and the max size we can use. */
+ max_size = max_size > (data->len + 1) ? (data->len + 1) : max_size;
+
/* If storing requires more than two move insns,
copy addresses to registers (to make displacements shorter)
and use post-increment if available. */
@@ -2733,6 +2765,24 @@ store_by_pieces_2 (insn_gen_fn genfun, m
data->len -= size;
}
+
+ /* If we have some data left and unalign accesses
+ are not slow, back up slightly and emit that constant. */
+ if (data->len > 0
+ && !STRICT_ALIGNMENT
+ && !SLOW_UNALIGNED_ACCESS (mode, 1)
+ && !MEM_VOLATILE_P (data->to)
+ && ceil_log2 (data->len) == exact_log2 (size)
+ && data->explicit_inc_to == 0
+ && !data->autinc_to
+ && !data->reverse)
+ {
+ unsigned offset = data->offset - (size - data->len);
+ to1 = adjust_address (data->to, mode, offset);
+ cst = (*data->constfun) (data->constfundata, offset, mode);
+ emit_insn ((*genfun) (to1, cst));
+ data->len = 0;
+ }
}
/* Write zeros through the storage of OBJECT. If OBJECT has BLKmode, SIZE is