RE: [PATCH][GCC][mid-end] Allow larger copies when not slow_unaligned_access and no padding.

Tamar Christina Tue, 31 Jul 2018 02:47:54 -0700

Ping 😊


> -----Original Message-----
> From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org>
> On Behalf Of Tamar Christina
> Sent: Tuesday, July 24, 2018 17:34
> To: Richard Biener <rguent...@suse.de>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; l...@redhat.com;
> i...@airs.com; amo...@gmail.com; berg...@vnet.ibm.com
> Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when not
> slow_unaligned_access and no padding.
> 
> Hi Richard,
> 
> Thanks for the review!
> 
> The 07/23/2018 18:46, Richard Biener wrote:
> > On July 23, 2018 7:01:23 PM GMT+02:00, Tamar Christina
> <tamar.christ...@arm.com> wrote:
> > >Hi All,
> > >
> > >This allows copy_blkmode_to_reg to perform larger copies when it is
> > >safe to do so by calculating the bitsize per iteration doing the
> > >maximum copy allowed that does not read more than the amount of bits
> > >left to copy.
> > >
> > >Strictly speaking, this copying is only done if:
> > >
> > >  1. the target supports fast unaligned access  2. no padding is
> > > being used.
> > >
> > >This should avoid the issues of the first patch (PR85123) but still
> > >work for targets that are safe to do so.
> > >
> > >Original patch
> > >https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01088.html
> > >Previous respin
> > >https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00239.html
> > >
> > >
> > >This produces for the copying of a 3 byte structure:
> > >
> > >fun3:
> > >   adrp    x1, .LANCHOR0
> > >   add     x1, x1, :lo12:.LANCHOR0
> > >   mov     x0, 0
> > >   sub     sp, sp, #16
> > >   ldrh    w2, [x1, 16]
> > >   ldrb    w1, [x1, 18]
> > >   add     sp, sp, 16
> > >   bfi     x0, x2, 0, 16
> > >   bfi     x0, x1, 16, 8
> > >   ret
> > >
> > >whereas before it was producing
> > >
> > >fun3:
> > >   adrp    x0, .LANCHOR0
> > >   add     x2, x0, :lo12:.LANCHOR0
> > >   sub     sp, sp, #16
> > >   ldrh    w1, [x0, #:lo12:.LANCHOR0]
> > >   ldrb    w0, [x2, 2]
> > >   strh    w1, [sp, 8]
> > >   strb    w0, [sp, 10]
> > >   ldr     w0, [sp, 8]
> > >   add     sp, sp, 16
> > >   ret
> > >
> > >Cross compiled and regtested on
> > >  aarch64_be-none-elf
> > >  armeb-none-eabi
> > >and no issues
> > >
> > >Boostrapped and regtested
> > > aarch64-none-linux-gnu
> > > x86_64-pc-linux-gnu
> > > powerpc64-unknown-linux-gnu
> > > arm-none-linux-gnueabihf
> > >
> > >and found no issues.
> > >
> > >OK for trunk?
> >
> > How does this affect store-to-load forwarding when the source is initialized
> piecewise? IMHO we should avoid larger loads but generate larger stores
> when possible.
> >
> > How do non-x86 architectures behave with respect to STLF?
> >
> 
> I should have made it more explicit in my cover letter, but this only covers 
> reg
> to reg copies.
> So the store-t-load forwarding shouldn't really come to play here, unless I'm
> missing something
> 
> The example in my patch shows that the loads from mem are mostly
> unaffected.
> 
> For x86 the change is also quite significant, e.g for a 5 byte struct load it 
> used
> to generate
> 
> fun5:
>       movl    foo5(%rip), %eax
>       movl    %eax, %edi
>       movzbl  %al, %edx
>       movzbl  %ah, %eax
>       movb    %al, %dh
>       movzbl  foo5+2(%rip), %eax
>       shrl    $24, %edi
>       salq    $16, %rax
>       movq    %rax, %rsi
>       movzbl  %dil, %eax
>       salq    $24, %rax
>       movq    %rax, %rcx
>       movq    %rdx, %rax
>       movzbl  foo5+4(%rip), %edx
>       orq     %rsi, %rax
>       salq    $32, %rdx
>       orq     %rcx, %rax
>       orq     %rdx, %rax
>       ret
> 
> instead of
> 
> fun5:
>         movzbl  foo5+4(%rip), %eax
>         salq    $32, %rax
>         movq    %rax, %rdx
>         movl    foo5(%rip), %eax
>         orq     %rdx, %rax
>         ret
> 
> so the loads themselves are unaffected.
> 
> Thanks,
> Tamar
> 
> > Richard.
> >
> > >Thanks,
> > >Tamar
> > >
> > >gcc/
> > >2018-07-23  Tamar Christina  <tamar.christ...@arm.com>
> > >
> > >   * expr.c (copy_blkmode_to_reg): Perform larger copies when safe.
> >
> 
> --

RE: [PATCH][GCC][mid-end] Allow larger copies when not slow_unaligned_access and no padding.

Reply via email to