On Tue, Jul 13, 2021 at 2:59 PM Richard Biener <richard.guent...@gmail.com> wrote: > > On Tue, Jul 13, 2021 at 2:19 PM Christoph Müllner via Gcc > <gcc@gcc.gnu.org> wrote: > > > > On Tue, Jul 13, 2021 at 2:11 AM Alexandre Oliva <ol...@adacore.com> wrote: > > > > > > On Jul 12, 2021, Christoph Müllner <cmuell...@gcc.gnu.org> wrote: > > > > > > > * Why does the generic by-pieces infrastructure have a higher priority > > > > than the target-specific expansion via INSNs like setmem? > > > > > > by-pieces was not affected by the recent change, and IMHO it generally > > > makes sense for it to have priority over setmem. It generates only > > > straigh-line code for constant-sized blocks. Even if you can beat that > > > with some machine-specific logic, you'll probably end up generating > > > equivalent code at least in some cases, and then, you probably want to > > > carefully tune the settings that select one or the other, or disable > > > by-pieces altogether. > > > > > > > > > by-multiple-pieces, OTOH, is likely to be beaten by machine-specific > > > looping constructs, if any are available, so setmem takes precedence. > > > > > > My testing involved bringing it ahead of the insns, to exercise the code > > > more thoroughly even on x86*, but the submitted patch only used > > > by-multiple-pieces as a fallback. > > > > Let me give you an example of what by-pieces does on RISC-V (RV64GC). > > The following code... > > > > void* do_memset0_8 (void *p) > > { > > return memset (p, 0, 8); > > } > > > > void* do_memset0_15 (void *p) > > { > > return memset (p, 0, 15); > > } > > > > ...becomes (you can validate that with compiler explorer): > > > > do_memset0_8(void*): > > sb zero,0(a0) > > sb zero,1(a0) > > sb zero,2(a0) > > sb zero,3(a0) > > sb zero,4(a0) > > sb zero,5(a0) > > sb zero,6(a0) > > sb zero,7(a0) > > ret > > do_memset0_15(void*): > > sb zero,0(a0) > > sb zero,1(a0) > > sb zero,2(a0) > > sb zero,3(a0) > > sb zero,4(a0) > > sb zero,5(a0) > > sb zero,6(a0) > > sb zero,7(a0) > > sb zero,8(a0) > > sb zero,9(a0) > > sb zero,10(a0) > > sb zero,11(a0) > > sb zero,12(a0) > > sb zero,13(a0) > > sb zero,14(a0) > > ret > > > > Here is what a setmemsi expansion in the backend can do (in case > > unaligned access is cheap): > > > > 000000000000003c <do_memset0_8>: > > 3c: 00053023 sd zero,0(a0) > > 40: 8082 ret > > > > 000000000000007e <do_memset0_15>: > > 7e: 00053023 sd zero,0(a0) > > 82: 000533a3 sd zero,7(a0) > > 86: 8082 ret > > > > Is there a way to generate similar code with the by-pieces infrastructure? > > Sure - tell it unaligned access is cheap. See alignment_for_piecewise_move > and how it uses slow_unaligned_access.
Thanks for the pointer. I already knew about slow_unaligned_access, but I was not aware of overlap_op_by_pieces_p. Enabling both gives exactly the same as above. Thanks, Christoph > > > > * And if there are no particular reasons, would it be acceptable to > > > > change the order? > > > > > > I suppose moving insns ahead of by-pieces might break careful tuning of > > > multiple platforms, so I'd rather we did not make that change. > > > > Only platforms that have "setmemsi" implemented would be affected. > > And those platforms (arm, frv, ft32, nds32, pa, rs6000, rx, visium) > > have a carefully tuned > > implementation of the setmem expansion. I can't imagine that these > > setmem expansions > > produce less optimal code than the by-pieces infrastructure (which has > > less knowledge > > about the target). > > > > Thanks, > > Christoph