https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88626
--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> --- In my application (quite a bit bigger than the testcase...), looking at the optimized dump, I see that the function is inlined without the __builtin_constant_p code, but when I add the __builtin_constant_p code (__builtin_constant_p should essentially always be false in this case), a lot of calls remain. Writing __attribute__((always_inline)) on the function "fixes" the performance issue, it has no measurable impact on the original code, and gives the _bcp code the same perf as the code without _bcp. However, the attribute is not a real solution, "always" is too strong...