https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121912
Bug ID: 121912
Summary: std::search optimizations for bidirectional and random
access iterators.
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: redi at gcc dot gnu.org
Target Milestone: ---
Given std::search(haystack, haystack_end, needle, needle_end) we always do a
linear search through the whole haystack.
If haystack and needle are both random access iterators, we can return early if
at any time during the algo:
(haystack_end - haystack) < (needle_end - needle)
If haystack is random access but needle is not, we can still optimize the algo
by keeping track of a lower bound estimate for distance(needle, needle_end).
Every time we find *needle in the haystack and start iterating to see if the
rest of [needle,needle_end) matches, we can count how many matches there are,
and update our lower bound estimate for distance(needle, needle_end). Then we
can return early if haystack_end - haystack is smaller than that estimate.
Even if haystack is not random access but is bidirectional, we can _still_
optimize the algo. Every time we match an element from the needle we can
decrement a copy of haystack_end and on the next iteration use that decremented
iterator as the end for the next loop's find_if(haystack, end, *needle).
There's never any need for that find_if call to look all the way to
haystack_end, because by the time we reach the for (;;) loop we know that the
needle is at least length 2.
We can also apply these optimizations to ranges::search, using the
sized_sentinel_for concept to extend the optimizations to sized
non-random-access ranges.