Hi,

I'm working on some platform-specific optimizations for
memset/memcpy/strcpy/strncpy.
However, I am having difficulties understanding how my code should be
integrated.
Initially, I got inspired by rs6000-string.c, where I see expansion
code for instructions
like setmemsi or cmpstrsi. However, that expansion code is not always called.
Instead, the first strategy is using the generic by-pieces infrastructure.

To understand what I mean, let's have a look at memset
(expand_builtin_memset_args).
The backend can provide a tailored code sequence by expanding setmem.
However, there is also a generic solution available using the
by-pieces infrastructure.
The generic by-pieces infrastructure has a higher priority than the
target-specific setmem
expansion. However, the recently added by-multiple-pieces
infrastructure has lower priority
than setmem.

See:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/builtins.c;h=39ab139b7e1c06c98d2db1aef2b3a6095ffbec63;hb=HEAD#l7004

The same observation is true for most (all?) other uses of builtins.

The current priority requires me to duplicate the condition code to
decide if my optimization
can be applied to the following places:
1) in TARGET_USE_BY_PIECES_INFRASTRUCTURE_P () to block by-pieces
2) in the setmem expansion to gate the optimization

As I would expect  that a target-specific mechanism is preferred over
a generic mechanism,
my questions are:
* Why does the generic by-pieces infrastructure have a higher priority
than the target-specific expansion via INSNs like setmem?
* And if there are no particular reasons, would it be acceptable to
change the order?

Thanks,
Christoph

Reply via email to