https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103798
Bug ID: 103798 Summary: Missed optimization: char_traits<char>::find (and thus string_view::find_first_of) is slow when invoked with short strings Product: gcc Version: 11.2.1 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: zhaoweiliew at gmail dot com Target Milestone: --- Given the following code: #include <string_view> size_t findFirstE_slow(std::string_view sv) { return sv.find_first_of("eE"); } size_t findFirstE_fast(std::string_view sv) { auto it{sv.begin()}; for (; it != sv.end() && *it != 'e' && *it != 'E'; ++it) ; return it == sv.end() ? std::string_view::npos : size_t(it - sv.begin()); } x86-64 gcc (trunk) -std=c++20 -O3 generates the following assembly: .LC0: .string "eE" findFirstE_slow(std::basic_string_view<char, std::char_traits<char> >): push r12 push rbp push rbx test rdi, rdi je .L4 mov rbx, rdi mov rbp, rsi xor r12d, r12d jmp .L3 .L8: add r12, 1 cmp rbx, r12 je .L4 .L3: movsx esi, BYTE PTR [rbp+0+r12] mov edx, 2 mov edi, OFFSET FLAT:.LC0 call memchr test rax, rax je .L8 mov rax, r12 pop rbx pop rbp pop r12 ret .L4: mov r12, -1 pop rbx pop rbp mov rax, r12 pop r12 ret findFirstE_fast(std::basic_string_view<char, std::char_traits<char> >): add rdi, rsi cmp rdi, rsi je .L13 mov rax, rsi jmp .L12 .L15: add rax, 1 cmp rdi, rax je .L13 .L12: movzx edx, BYTE PTR [rax] and edx, -33 cmp dl, 69 jne .L15 sub rax, rsi ret .L13: mov rax, -1 ret Even though both findFirstE_slow() and findFirstE_fast() are logically equivalent, findFirstE_slow() calls memchr() for every character in the input string. findFirstE_fast() does what we would reasonably expect, comparing every character in the input string with 'e' and 'E'. In practice, when the set of characters to be found is small, findFirstE_slow() runs noticeably slower than findFirstE_fast(). This seems to be a missed optimization in char_traits<char>::find(), which affects several find-related methods in string_view. Relevant StackOverflow question with quick-bench output, Compiler Explorer output, and more discussion: https://stackoverflow.com/questions/70433152/missed-optimization-with-string-viewfind-first-of