Together with the preparatory compiler patches, this patch restores unrolling in std::__find_if, but this time relying on the compiler to do it by using:
#pragma GCC unroll 4 which should restore the majority of the regression relative to the hand-unrolled version while still being vectorizable with WIP alignment peeling enhancements. On Neoverse V1 with LTO, this reduces the regression in xalancbmk (from SPEC CPU 2017) from 5.8% to 1.7% (restoring ~71% of the lost performance). Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk? Thanks, Alex libstdc++-v3/ChangeLog: PR libstdc++/116140 * include/bits/stl_algobase.h (std::__find_if): Add #pragma to request GCC to unroll the loop. --- libstdc++-v3/include/bits/stl_algobase.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h index 27f6c377ad6..f13662fc448 100644 --- a/libstdc++-v3/include/bits/stl_algobase.h +++ b/libstdc++-v3/include/bits/stl_algobase.h @@ -2104,6 +2104,7 @@ _GLIBCXX_END_NAMESPACE_ALGO inline _Iterator __find_if(_Iterator __first, _Iterator __last, _Predicate __pred) { +#pragma GCC unroll 4 while (__first != __last && !__pred(__first)) ++__first; return __first;