On Tue, Jul 13, 2021 at 2:11 AM Alexandre Oliva <ol...@adacore.com> wrote: > > On Jul 12, 2021, Christoph Müllner <cmuell...@gcc.gnu.org> wrote: > > > * Why does the generic by-pieces infrastructure have a higher priority > > than the target-specific expansion via INSNs like setmem? > > by-pieces was not affected by the recent change, and IMHO it generally > makes sense for it to have priority over setmem. It generates only > straigh-line code for constant-sized blocks. Even if you can beat that > with some machine-specific logic, you'll probably end up generating > equivalent code at least in some cases, and then, you probably want to > carefully tune the settings that select one or the other, or disable > by-pieces altogether. > > > by-multiple-pieces, OTOH, is likely to be beaten by machine-specific > looping constructs, if any are available, so setmem takes precedence. > > My testing involved bringing it ahead of the insns, to exercise the code > more thoroughly even on x86*, but the submitted patch only used > by-multiple-pieces as a fallback.
Let me give you an example of what by-pieces does on RISC-V (RV64GC). The following code... void* do_memset0_8 (void *p) { return memset (p, 0, 8); } void* do_memset0_15 (void *p) { return memset (p, 0, 15); } ...becomes (you can validate that with compiler explorer): do_memset0_8(void*): sb zero,0(a0) sb zero,1(a0) sb zero,2(a0) sb zero,3(a0) sb zero,4(a0) sb zero,5(a0) sb zero,6(a0) sb zero,7(a0) ret do_memset0_15(void*): sb zero,0(a0) sb zero,1(a0) sb zero,2(a0) sb zero,3(a0) sb zero,4(a0) sb zero,5(a0) sb zero,6(a0) sb zero,7(a0) sb zero,8(a0) sb zero,9(a0) sb zero,10(a0) sb zero,11(a0) sb zero,12(a0) sb zero,13(a0) sb zero,14(a0) ret Here is what a setmemsi expansion in the backend can do (in case unaligned access is cheap): 000000000000003c <do_memset0_8>: 3c: 00053023 sd zero,0(a0) 40: 8082 ret 000000000000007e <do_memset0_15>: 7e: 00053023 sd zero,0(a0) 82: 000533a3 sd zero,7(a0) 86: 8082 ret Is there a way to generate similar code with the by-pieces infrastructure? > > * And if there are no particular reasons, would it be acceptable to > > change the order? > > I suppose moving insns ahead of by-pieces might break careful tuning of > multiple platforms, so I'd rather we did not make that change. Only platforms that have "setmemsi" implemented would be affected. And those platforms (arm, frv, ft32, nds32, pa, rs6000, rx, visium) have a carefully tuned implementation of the setmem expansion. I can't imagine that these setmem expansions produce less optimal code than the by-pieces infrastructure (which has less knowledge about the target). Thanks, Christoph