On Tue, Jul 13, 2021 at 2:19 PM Christoph Müllner via Gcc <gcc@gcc.gnu.org> wrote: > > On Tue, Jul 13, 2021 at 2:11 AM Alexandre Oliva <ol...@adacore.com> wrote: > > > > On Jul 12, 2021, Christoph Müllner <cmuell...@gcc.gnu.org> wrote: > > > > > * Why does the generic by-pieces infrastructure have a higher priority > > > than the target-specific expansion via INSNs like setmem? > > > > by-pieces was not affected by the recent change, and IMHO it generally > > makes sense for it to have priority over setmem. It generates only > > straigh-line code for constant-sized blocks. Even if you can beat that > > with some machine-specific logic, you'll probably end up generating > > equivalent code at least in some cases, and then, you probably want to > > carefully tune the settings that select one or the other, or disable > > by-pieces altogether. > > > > > > by-multiple-pieces, OTOH, is likely to be beaten by machine-specific > > looping constructs, if any are available, so setmem takes precedence. > > > > My testing involved bringing it ahead of the insns, to exercise the code > > more thoroughly even on x86*, but the submitted patch only used > > by-multiple-pieces as a fallback. > > Let me give you an example of what by-pieces does on RISC-V (RV64GC). > The following code... > > void* do_memset0_8 (void *p) > { > return memset (p, 0, 8); > } > > void* do_memset0_15 (void *p) > { > return memset (p, 0, 15); > } > > ...becomes (you can validate that with compiler explorer): > > do_memset0_8(void*): > sb zero,0(a0) > sb zero,1(a0) > sb zero,2(a0) > sb zero,3(a0) > sb zero,4(a0) > sb zero,5(a0) > sb zero,6(a0) > sb zero,7(a0) > ret > do_memset0_15(void*): > sb zero,0(a0) > sb zero,1(a0) > sb zero,2(a0) > sb zero,3(a0) > sb zero,4(a0) > sb zero,5(a0) > sb zero,6(a0) > sb zero,7(a0) > sb zero,8(a0) > sb zero,9(a0) > sb zero,10(a0) > sb zero,11(a0) > sb zero,12(a0) > sb zero,13(a0) > sb zero,14(a0) > ret > > Here is what a setmemsi expansion in the backend can do (in case > unaligned access is cheap): > > 000000000000003c <do_memset0_8>: > 3c: 00053023 sd zero,0(a0) > 40: 8082 ret > > 000000000000007e <do_memset0_15>: > 7e: 00053023 sd zero,0(a0) > 82: 000533a3 sd zero,7(a0) > 86: 8082 ret > > Is there a way to generate similar code with the by-pieces infrastructure?
Sure - tell it unaligned access is cheap. See alignment_for_piecewise_move and how it uses slow_unaligned_access. Richard. > > > * And if there are no particular reasons, would it be acceptable to > > > change the order? > > > > I suppose moving insns ahead of by-pieces might break careful tuning of > > multiple platforms, so I'd rather we did not make that change. > > Only platforms that have "setmemsi" implemented would be affected. > And those platforms (arm, frv, ft32, nds32, pa, rs6000, rx, visium) > have a carefully tuned > implementation of the setmem expansion. I can't imagine that these > setmem expansions > produce less optimal code than the by-pieces infrastructure (which has > less knowledge > about the target). > > Thanks, > Christoph