https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108823
Bug ID: 108823
Summary: ranges::transform could be smarter with two sized
ranges
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: barry.revzin at gmail dot com
Target Milestone: ---
>From StackOverflow (https://stackoverflow.com/q/75464599/2069064):
#include <algorithm>
#include <ranges>
#include <vector>
#include <algorithm>
#include <ranges>
#include <vector>
std::vector<int> fn1(std::vector<int> u, std::vector<int> const& v) {
#ifdef RANGES
std::ranges::transform(u, v, u.begin(), std::plus<int>{});
#else
std::transform(u.begin(), u.end(), v.begin(), u.begin(), std::plus<int>{});
#endif
return u;
}
Without RANGES defined, this generates vectorized code. With RANGES, it does
not. These aren't exactly identical, since without RANGES the function is UB if
u.size() > v.size() while with RANGES it's fine - but the RANGES implementation
is still suboptimal.
Rewriting the RANGES impl to:
auto sz = std::min(u.size(), v.size());
std::ranges::transform(
std::ranges::subrange(u.begin(), u.begin() + sz),
std::ranges::subrange(v.begin(), std::unreachable_sentinel),
u.begin(),
std::plus<int>{});
gets the code to vectorize again. This is probably because the loop is simply:
»···for (; __first1 != __last1 && __first2 != __last2;
»··· ++__first1, (void)++__first2, ++__result)
The two conditions probably throws the optimizer, where if the algorithm is
written as a single condition (as the rewrite reduces to, since __first2 !=
__last2 becomes true), it's easier to optimize.