Ping 😊
> -----Original Message----- > From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> > On Behalf Of Tamar Christina > Sent: Tuesday, July 24, 2018 17:34 > To: Richard Biener <rguent...@suse.de> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; l...@redhat.com; > i...@airs.com; amo...@gmail.com; berg...@vnet.ibm.com > Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when not > slow_unaligned_access and no padding. > > Hi Richard, > > Thanks for the review! > > The 07/23/2018 18:46, Richard Biener wrote: > > On July 23, 2018 7:01:23 PM GMT+02:00, Tamar Christina > <tamar.christ...@arm.com> wrote: > > >Hi All, > > > > > >This allows copy_blkmode_to_reg to perform larger copies when it is > > >safe to do so by calculating the bitsize per iteration doing the > > >maximum copy allowed that does not read more than the amount of bits > > >left to copy. > > > > > >Strictly speaking, this copying is only done if: > > > > > > 1. the target supports fast unaligned access 2. no padding is > > > being used. > > > > > >This should avoid the issues of the first patch (PR85123) but still > > >work for targets that are safe to do so. > > > > > >Original patch > > >https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01088.html > > >Previous respin > > >https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00239.html > > > > > > > > >This produces for the copying of a 3 byte structure: > > > > > >fun3: > > > adrp x1, .LANCHOR0 > > > add x1, x1, :lo12:.LANCHOR0 > > > mov x0, 0 > > > sub sp, sp, #16 > > > ldrh w2, [x1, 16] > > > ldrb w1, [x1, 18] > > > add sp, sp, 16 > > > bfi x0, x2, 0, 16 > > > bfi x0, x1, 16, 8 > > > ret > > > > > >whereas before it was producing > > > > > >fun3: > > > adrp x0, .LANCHOR0 > > > add x2, x0, :lo12:.LANCHOR0 > > > sub sp, sp, #16 > > > ldrh w1, [x0, #:lo12:.LANCHOR0] > > > ldrb w0, [x2, 2] > > > strh w1, [sp, 8] > > > strb w0, [sp, 10] > > > ldr w0, [sp, 8] > > > add sp, sp, 16 > > > ret > > > > > >Cross compiled and regtested on > > > aarch64_be-none-elf > > > armeb-none-eabi > > >and no issues > > > > > >Boostrapped and regtested > > > aarch64-none-linux-gnu > > > x86_64-pc-linux-gnu > > > powerpc64-unknown-linux-gnu > > > arm-none-linux-gnueabihf > > > > > >and found no issues. > > > > > >OK for trunk? > > > > How does this affect store-to-load forwarding when the source is initialized > piecewise? IMHO we should avoid larger loads but generate larger stores > when possible. > > > > How do non-x86 architectures behave with respect to STLF? > > > > I should have made it more explicit in my cover letter, but this only covers > reg > to reg copies. > So the store-t-load forwarding shouldn't really come to play here, unless I'm > missing something > > The example in my patch shows that the loads from mem are mostly > unaffected. > > For x86 the change is also quite significant, e.g for a 5 byte struct load it > used > to generate > > fun5: > movl foo5(%rip), %eax > movl %eax, %edi > movzbl %al, %edx > movzbl %ah, %eax > movb %al, %dh > movzbl foo5+2(%rip), %eax > shrl $24, %edi > salq $16, %rax > movq %rax, %rsi > movzbl %dil, %eax > salq $24, %rax > movq %rax, %rcx > movq %rdx, %rax > movzbl foo5+4(%rip), %edx > orq %rsi, %rax > salq $32, %rdx > orq %rcx, %rax > orq %rdx, %rax > ret > > instead of > > fun5: > movzbl foo5+4(%rip), %eax > salq $32, %rax > movq %rax, %rdx > movl foo5(%rip), %eax > orq %rdx, %rax > ret > > so the loads themselves are unaffected. > > Thanks, > Tamar > > > Richard. > > > > >Thanks, > > >Tamar > > > > > >gcc/ > > >2018-07-23 Tamar Christina <tamar.christ...@arm.com> > > > > > > * expr.c (copy_blkmode_to_reg): Perform larger copies when safe. > > > > --