On Mon, Dec 12, 2022 at 02:44:04PM +0100, Alejandro Colomar via Gcc wrote: > > I don't see any problem with the code snippets you provided. > > Well, then the optimization may be the other way around (although I question > why it is implemented that way, and not the other way around, but I'm not a > hardware or libc guy, so there may be reasons). > > If calling memcpy(3) is better, then the code calling mempcpy(3) could be > expanded inline to call it (but I doubt it). > > If calling mempcpy(3) is better, then the hand-made pattern resembling > mempcpy(3) should probably be merged as a call to mempcpy(3). > > But acting different on equivalent calls to both of them seems inconsistent > to me, unless you trust the programmer to know better how to optimize, that > is...
I think that is the case, plus the question if one can use a non-standard function to implement a standard function (and if it would be triggered by seeing an expected prototype for the non-standard function). Otherwise, whether mempcpy in libc is implemented as memcpy + tweak return value or has its own implementation is something that is heavily dependent on the target and changes over time, so hardcoding that in gcc is problematic. For -Os mempcpy call might be very well smaller even if the library side is then slower. Jakub