Sudakshina Das <[email protected]> writes:
> Apologies for the delay. I have attached another version of the patch.
> I have disabled the test cases for ILP32. This is only because function body
> check
> fails because there is an addition unsigned extension instruction for src
> pointer in
> every test (uxtw x0, w0). The actual inlining is not different.
Yeah, agree that's the best way of handling the ILP32 difference.
> […]
> +/* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant. Without
> + -mstrict-align, make decisions in "setmem". Otherwise follow a sensible
> + default: when optimizing for size adjust the ratio to account for the
nit: should just be one space after “:”
> […]
> @@ -21289,6 +21292,134 @@ aarch64_expand_cpymem (rtx *operands)
> return true;
> }
>
> +/* Like aarch64_copy_one_block_and_progress_pointers, except for memset where
> + *src is a register we have created with the duplicated value to be set.
> */
“*src” -> SRC
since there's no dereference now
> […]
> + /* In case we are optimizing for size or if the core does not
> + want to use STP Q regs, lower the max_set_size. */
> + max_set_size = (!speed_p
> + || (aarch64_tune_params.extra_tuning_flags
> + & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))
> + ? max_set_size/2 : max_set_size;
Formatting nit: should be a space either side of “/”.
> + while (n > 0)
> + {
> + /* Find the largest mode in which to do the copy in without
> + over writing. */
s/in without/without/
> + opt_scalar_int_mode mode_iter;
> + FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
> + if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit))
> + cur_mode = mode_iter.require ();
> +
> + gcc_assert (cur_mode != BLKmode);
> +
> + mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant ();
> + aarch64_set_one_block_and_progress_pointer (src, &dst, cur_mode);
> +
> + n -= mode_bits;
> +
> + /* Do certain trailing copies as overlapping if it's going to be
> + cheaper. i.e. less instructions to do so. For instance doing a 15
> + byte copy it's more efficient to do two overlapping 8 byte copies than
> + 8 + 4 + 2 + 1. */
> + if (n > 0 && n < copy_limit / 2)
> + {
> + next_mode = smallest_mode_for_size (n, MODE_INT);
> + int n_bits = GET_MODE_BITSIZE (next_mode).to_constant ();
Sorry for the runaround, but looking at this again, I'm a bit worried
that we only indirectly test that n_bits is within the length of the
original set. I guess it is because if n < copy_limit / 2 then
n < mode_bits, and so n_bits will never exceed mode_bits. I think
it might be worth adding an assert to make that “clearer” (maybe
only to me, probably obvious to everyone else):
gcc_assert (n_bits <= mode_bits);
OK with those changes, thanks.
Richard
> + dst = aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT);
> + n = n_bits;
> + }
> + }
> +
> + return true;
> +}
> +
> +
> /* Split a DImode store of a CONST_INT SRC to MEM DST as two
> SImode stores. Handle the case when the constant has identical
> bottom and top halves. This is beneficial when the two stores can be