Re: Priority of builtins expansion strategies

Christoph Müllner via Gcc Tue, 13 Jul 2021 05:19:00 -0700

On Tue, Jul 13, 2021 at 2:11 AM Alexandre Oliva <[email protected]> wrote:
>
> On Jul 12, 2021, Christoph Müllner <[email protected]> wrote:
>
> > * Why does the generic by-pieces infrastructure have a higher priority
> > than the target-specific expansion via INSNs like setmem?
>
> by-pieces was not affected by the recent change, and IMHO it generally
> makes sense for it to have priority over setmem.  It generates only
> straigh-line code for constant-sized blocks.  Even if you can beat that
> with some machine-specific logic, you'll probably end up generating
> equivalent code at least in some cases, and then, you probably want to
> carefully tune the settings that select one or the other, or disable
> by-pieces altogether.
>
>
> by-multiple-pieces, OTOH, is likely to be beaten by machine-specific
> looping constructs, if any are available, so setmem takes precedence.
>
> My testing involved bringing it ahead of the insns, to exercise the code
> more thoroughly even on x86*, but the submitted patch only used
> by-multiple-pieces as a fallback.


Let me give you an example of what by-pieces does on RISC-V (RV64GC).
The following code...

void* do_memset0_8 (void *p)
{
    return memset (p, 0, 8);
}

void* do_memset0_15 (void *p)
{
    return memset (p, 0, 15);
}

...becomes (you can validate that with compiler explorer):

do_memset0_8(void*):
        sb      zero,0(a0)
        sb      zero,1(a0)
        sb      zero,2(a0)
        sb      zero,3(a0)
        sb      zero,4(a0)
        sb      zero,5(a0)
        sb      zero,6(a0)
        sb      zero,7(a0)
        ret
do_memset0_15(void*):
        sb      zero,0(a0)
        sb      zero,1(a0)
        sb      zero,2(a0)
        sb      zero,3(a0)
        sb      zero,4(a0)
        sb      zero,5(a0)
        sb      zero,6(a0)
        sb      zero,7(a0)
        sb      zero,8(a0)
        sb      zero,9(a0)
        sb      zero,10(a0)
        sb      zero,11(a0)
        sb      zero,12(a0)
        sb      zero,13(a0)
        sb      zero,14(a0)
        ret

Here is what a setmemsi expansion in the backend can do (in case
unaligned access is cheap):

000000000000003c <do_memset0_8>:
  3c:   00053023                sd      zero,0(a0)
  40:   8082                    ret

000000000000007e <do_memset0_15>:
  7e:   00053023                sd      zero,0(a0)
  82:   000533a3                sd      zero,7(a0)
  86:   8082                    ret

Is there a way to generate similar code with the by-pieces infrastructure?

> > * And if there are no particular reasons, would it be acceptable to
> > change the order?
>
> I suppose moving insns ahead of by-pieces might break careful tuning of
> multiple platforms, so I'd rather we did not make that change.

Only platforms that have "setmemsi" implemented would be affected.
And those platforms (arm, frv, ft32, nds32, pa, rs6000, rx, visium)
have a carefully tuned
implementation of the setmem expansion. I can't imagine that these
setmem expansions
produce less optimal code than the by-pieces infrastructure (which has
less knowledge
about the target).

Thanks,
Christoph

Re: Priority of builtins expansion strategies

Reply via email to