https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103798
Bug ID: 103798
Summary: Missed optimization: char_traits<char>::find (and thus
string_view::find_first_of) is slow when invoked with
short strings
Product: gcc
Version: 11.2.1
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: zhaoweiliew at gmail dot com
Target Milestone: ---
Given the following code:
#include <string_view>
size_t findFirstE_slow(std::string_view sv) {
return sv.find_first_of("eE");
}
size_t findFirstE_fast(std::string_view sv) {
auto it{sv.begin()};
for (; it != sv.end() && *it != 'e' && *it != 'E'; ++it)
;
return it == sv.end() ? std::string_view::npos : size_t(it - sv.begin());
}
x86-64 gcc (trunk) -std=c++20 -O3 generates the following assembly:
.LC0:
.string "eE"
findFirstE_slow(std::basic_string_view<char, std::char_traits<char> >):
push r12
push rbp
push rbx
test rdi, rdi
je .L4
mov rbx, rdi
mov rbp, rsi
xor r12d, r12d
jmp .L3
.L8:
add r12, 1
cmp rbx, r12
je .L4
.L3:
movsx esi, BYTE PTR [rbp+0+r12]
mov edx, 2
mov edi, OFFSET FLAT:.LC0
call memchr
test rax, rax
je .L8
mov rax, r12
pop rbx
pop rbp
pop r12
ret
.L4:
mov r12, -1
pop rbx
pop rbp
mov rax, r12
pop r12
ret
findFirstE_fast(std::basic_string_view<char, std::char_traits<char> >):
add rdi, rsi
cmp rdi, rsi
je .L13
mov rax, rsi
jmp .L12
.L15:
add rax, 1
cmp rdi, rax
je .L13
.L12:
movzx edx, BYTE PTR [rax]
and edx, -33
cmp dl, 69
jne .L15
sub rax, rsi
ret
.L13:
mov rax, -1
ret
Even though both findFirstE_slow() and findFirstE_fast() are logically
equivalent, findFirstE_slow() calls memchr() for every character in the input
string.
findFirstE_fast() does what we would reasonably expect, comparing every
character in the input string with 'e' and 'E'.
In practice, when the set of characters to be found is small, findFirstE_slow()
runs noticeably slower than findFirstE_fast(). This seems to be a missed
optimization in char_traits<char>::find(), which affects several find-related
methods in string_view.
Relevant StackOverflow question with quick-bench output, Compiler Explorer
output, and more discussion:
https://stackoverflow.com/questions/70433152/missed-optimization-with-string-viewfind-first-of