Hi, I'm working on some platform-specific optimizations for memset/memcpy/strcpy/strncpy. However, I am having difficulties understanding how my code should be integrated. Initially, I got inspired by rs6000-string.c, where I see expansion code for instructions like setmemsi or cmpstrsi. However, that expansion code is not always called. Instead, the first strategy is using the generic by-pieces infrastructure.
To understand what I mean, let's have a look at memset (expand_builtin_memset_args). The backend can provide a tailored code sequence by expanding setmem. However, there is also a generic solution available using the by-pieces infrastructure. The generic by-pieces infrastructure has a higher priority than the target-specific setmem expansion. However, the recently added by-multiple-pieces infrastructure has lower priority than setmem. See: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/builtins.c;h=39ab139b7e1c06c98d2db1aef2b3a6095ffbec63;hb=HEAD#l7004 The same observation is true for most (all?) other uses of builtins. The current priority requires me to duplicate the condition code to decide if my optimization can be applied to the following places: 1) in TARGET_USE_BY_PIECES_INFRASTRUCTURE_P () to block by-pieces 2) in the setmem expansion to gate the optimization As I would expect that a target-specific mechanism is preferred over a generic mechanism, my questions are: * Why does the generic by-pieces infrastructure have a higher priority than the target-specific expansion via INSNs like setmem? * And if there are no particular reasons, would it be acceptable to change the order? Thanks, Christoph