On Fri, Jan 06, 2017 at 03:16:12PM +0000, Jonathan Wakely wrote: > On 06/01/17 16:09 +0100, Jakub Jelinek wrote: > > On Fri, Jan 06, 2017 at 08:42:15AM -0600, Aditya Kumar wrote: > > > Yes, we do. > > > Sorry for the mistake, it happened because I first wrote this for > > > libcxx (https://reviews.llvm.org/D27068) and while porting that line > > > got missed. > > > > Shouldn't find at least in the case where it is narrow char string > > just use C library memmem? That implements a Two-Way searching algorithm > > with some improvements from Boyer-Moore. > > Otherwise, consider what even your modified version will do for > > > > #include <string> > > int main() { > > (std::string(10000000, 'a')+"b").find(std::string(1000000, 'a')+"b"); > > } > > Yes, I have an incomplete local patch that checks for memmem and uses > it where possible. This change would still be valuable for non-GNU > targets though.
The description of Two-Way algorithm is in http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260 Boyer-Moore in http://en.wikipedia.org/wiki/Boyer-Moore_string_search_algorithm Of course, as you know the lengths of both strings, you can quite cheaply decide whether it is worth spending some time building data structures to speed up the following search or not (but so does memmem already, e.g. handle with no searching zero length needle, or needle longer than haystack). Or for short needles start with memchr of the first char, while for longer needles don't do that etc. > > Or does the C++ library need to reinvent everything implemented in the C > > library? > > In this case yes, because strstr doesn't take the length of the > strings into account, and memmem isn't standard. Because std::string > knows its length we can do better than strstr for some cases, such as > std::string(1000, 'a').find(std::string(1001, 'a')). Jakub