Re: [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP

Evandro Menezes Wed, 13 Jul 2016 13:11:37 -0700

On 07/13/16 11:14, Kyrill Tkachov wrote:

Hi all,
The most common way to load and store TImode value in aarch64 is toperform an LDP/STP of two X-registers.
This is the *movti_aarch64 pattern in aarch64.md.
There is a bug in the logic in aarch64_classify_address where itvalidates the offset in the address usedto load a TImode value. It passes down TImode to theaarch64_offset_7bit_signed_scaled_p check which rejectsoffsets that are not a multiple of the mode size of TImode (16).However, this is too conservative as X-reg LDP/STP
instructions accept immediate offsets that are a multiple of 8.
Also, considering that the definition ofaarch64_offset_7bit_signed_scaled_p is:
  return (offset >= -64 * GET_MODE_SIZE (mode)
      && offset < 64 * GET_MODE_SIZE (mode)
      && offset % GET_MODE_SIZE (mode) == 0);
I think the range check may even be wrong for TImode as this willaccept offsets in the range [-1024, 1024)
(as long as they are a multiple of 16)
whereas X-reg LDP/STP instructions only accept offsets in the range[-512, 512).So since the check is for an X-reg LDP/STP address we should bepassing down DImode.
This patch does that and enables more aggressive generation of REG+IMMaddressing modes for 64-bit aligned
TImode values, eliminating many address calculation instructions.
For the testcase in the patch we currently generate:
bar:
        add     x1, x1, 8
        add     x0, x0, 8
        ldp     x2, x3, [x1]
        stp     x2, x3, [x0]
        ret

whereas with this patch we generate:
bar:
        ldp     x2, x3, [x1, 8]
        stp     x2, x3, [x0, 8]
        ret

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?


LGTM

--
Evandro Menezes

Re: [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP

Reply via email to