On Tue, Jul 13, 2021 at 2:59 PM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Tue, Jul 13, 2021 at 2:19 PM Christoph Müllner via Gcc
> <gcc@gcc.gnu.org> wrote:
> >
> > On Tue, Jul 13, 2021 at 2:11 AM Alexandre Oliva <ol...@adacore.com> wrote:
> > >
> > > On Jul 12, 2021, Christoph Müllner <cmuell...@gcc.gnu.org> wrote:
> > >
> > > > * Why does the generic by-pieces infrastructure have a higher priority
> > > > than the target-specific expansion via INSNs like setmem?
> > >
> > > by-pieces was not affected by the recent change, and IMHO it generally
> > > makes sense for it to have priority over setmem.  It generates only
> > > straigh-line code for constant-sized blocks.  Even if you can beat that
> > > with some machine-specific logic, you'll probably end up generating
> > > equivalent code at least in some cases, and then, you probably want to
> > > carefully tune the settings that select one or the other, or disable
> > > by-pieces altogether.
> > >
> > >
> > > by-multiple-pieces, OTOH, is likely to be beaten by machine-specific
> > > looping constructs, if any are available, so setmem takes precedence.
> > >
> > > My testing involved bringing it ahead of the insns, to exercise the code
> > > more thoroughly even on x86*, but the submitted patch only used
> > > by-multiple-pieces as a fallback.
> >
> > Let me give you an example of what by-pieces does on RISC-V (RV64GC).
> > The following code...
> >
> > void* do_memset0_8 (void *p)
> > {
> >     return memset (p, 0, 8);
> > }
> >
> > void* do_memset0_15 (void *p)
> > {
> >     return memset (p, 0, 15);
> > }
> >
> > ...becomes (you can validate that with compiler explorer):
> >
> > do_memset0_8(void*):
> >         sb      zero,0(a0)
> >         sb      zero,1(a0)
> >         sb      zero,2(a0)
> >         sb      zero,3(a0)
> >         sb      zero,4(a0)
> >         sb      zero,5(a0)
> >         sb      zero,6(a0)
> >         sb      zero,7(a0)
> >         ret
> > do_memset0_15(void*):
> >         sb      zero,0(a0)
> >         sb      zero,1(a0)
> >         sb      zero,2(a0)
> >         sb      zero,3(a0)
> >         sb      zero,4(a0)
> >         sb      zero,5(a0)
> >         sb      zero,6(a0)
> >         sb      zero,7(a0)
> >         sb      zero,8(a0)
> >         sb      zero,9(a0)
> >         sb      zero,10(a0)
> >         sb      zero,11(a0)
> >         sb      zero,12(a0)
> >         sb      zero,13(a0)
> >         sb      zero,14(a0)
> >         ret
> >
> > Here is what a setmemsi expansion in the backend can do (in case
> > unaligned access is cheap):
> >
> > 000000000000003c <do_memset0_8>:
> >   3c:   00053023                sd      zero,0(a0)
> >   40:   8082                    ret
> >
> > 000000000000007e <do_memset0_15>:
> >   7e:   00053023                sd      zero,0(a0)
> >   82:   000533a3                sd      zero,7(a0)
> >   86:   8082                    ret
> >
> > Is there a way to generate similar code with the by-pieces infrastructure?
>
> Sure - tell it unaligned access is cheap.  See alignment_for_piecewise_move
> and how it uses slow_unaligned_access.

Thanks for the pointer.
I already knew about slow_unaligned_access, but I was not aware of
overlap_op_by_pieces_p.
Enabling both gives exactly the same as above.

Thanks,
Christoph

> > > > * And if there are no particular reasons, would it be acceptable to
> > > > change the order?
> > >
> > > I suppose moving insns ahead of by-pieces might break careful tuning of
> > > multiple platforms, so I'd rather we did not make that change.
> >
> > Only platforms that have "setmemsi" implemented would be affected.
> > And those platforms (arm, frv, ft32, nds32, pa, rs6000, rx, visium)
> > have a carefully tuned
> > implementation of the setmem expansion. I can't imagine that these
> > setmem expansions
> > produce less optimal code than the by-pieces infrastructure (which has
> > less knowledge
> > about the target).
> >
> > Thanks,
> > Christoph

Reply via email to