On Tue, Jul 13, 2021 at 2:19 PM Christoph Müllner via Gcc
<gcc@gcc.gnu.org> wrote:
>
> On Tue, Jul 13, 2021 at 2:11 AM Alexandre Oliva <ol...@adacore.com> wrote:
> >
> > On Jul 12, 2021, Christoph Müllner <cmuell...@gcc.gnu.org> wrote:
> >
> > > * Why does the generic by-pieces infrastructure have a higher priority
> > > than the target-specific expansion via INSNs like setmem?
> >
> > by-pieces was not affected by the recent change, and IMHO it generally
> > makes sense for it to have priority over setmem.  It generates only
> > straigh-line code for constant-sized blocks.  Even if you can beat that
> > with some machine-specific logic, you'll probably end up generating
> > equivalent code at least in some cases, and then, you probably want to
> > carefully tune the settings that select one or the other, or disable
> > by-pieces altogether.
> >
> >
> > by-multiple-pieces, OTOH, is likely to be beaten by machine-specific
> > looping constructs, if any are available, so setmem takes precedence.
> >
> > My testing involved bringing it ahead of the insns, to exercise the code
> > more thoroughly even on x86*, but the submitted patch only used
> > by-multiple-pieces as a fallback.
>
> Let me give you an example of what by-pieces does on RISC-V (RV64GC).
> The following code...
>
> void* do_memset0_8 (void *p)
> {
>     return memset (p, 0, 8);
> }
>
> void* do_memset0_15 (void *p)
> {
>     return memset (p, 0, 15);
> }
>
> ...becomes (you can validate that with compiler explorer):
>
> do_memset0_8(void*):
>         sb      zero,0(a0)
>         sb      zero,1(a0)
>         sb      zero,2(a0)
>         sb      zero,3(a0)
>         sb      zero,4(a0)
>         sb      zero,5(a0)
>         sb      zero,6(a0)
>         sb      zero,7(a0)
>         ret
> do_memset0_15(void*):
>         sb      zero,0(a0)
>         sb      zero,1(a0)
>         sb      zero,2(a0)
>         sb      zero,3(a0)
>         sb      zero,4(a0)
>         sb      zero,5(a0)
>         sb      zero,6(a0)
>         sb      zero,7(a0)
>         sb      zero,8(a0)
>         sb      zero,9(a0)
>         sb      zero,10(a0)
>         sb      zero,11(a0)
>         sb      zero,12(a0)
>         sb      zero,13(a0)
>         sb      zero,14(a0)
>         ret
>
> Here is what a setmemsi expansion in the backend can do (in case
> unaligned access is cheap):
>
> 000000000000003c <do_memset0_8>:
>   3c:   00053023                sd      zero,0(a0)
>   40:   8082                    ret
>
> 000000000000007e <do_memset0_15>:
>   7e:   00053023                sd      zero,0(a0)
>   82:   000533a3                sd      zero,7(a0)
>   86:   8082                    ret
>
> Is there a way to generate similar code with the by-pieces infrastructure?

Sure - tell it unaligned access is cheap.  See alignment_for_piecewise_move
and how it uses slow_unaligned_access.

Richard.

> > > * And if there are no particular reasons, would it be acceptable to
> > > change the order?
> >
> > I suppose moving insns ahead of by-pieces might break careful tuning of
> > multiple platforms, so I'd rather we did not make that change.
>
> Only platforms that have "setmemsi" implemented would be affected.
> And those platforms (arm, frv, ft32, nds32, pa, rs6000, rx, visium)
> have a carefully tuned
> implementation of the setmem expansion. I can't imagine that these
> setmem expansions
> produce less optimal code than the by-pieces infrastructure (which has
> less knowledge
> about the target).
>
> Thanks,
> Christoph

Reply via email to