https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125540
Bug ID: 125540
Summary: [15 Regression] Vectorizer regression:
*std::max_element falls back to scalar cmovl argmax
loop
Product: gcc
Version: 15.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: torsten.mandel at sap dot com
Target Milestone: ---
Component: tree-optimization
Version: 15.2.0
Keywords: missed-optimization
Known to work: 13.2.1
Known to fail: 15.2.0
--- Description ---
GCC 13 vectorizes `*std::max_element(begin, end)` to a SIMD vpmaxsd reduction
when the returned iterator is only dereferenced (never escapes). GCC 15
regresses to a scalar cmovl-based argmax loop for the same code.
The regression affects both `std::vector<int>::iterator` and raw pointer
iterators (via `std::span<const int>`), and reproduces at every x86 ISA level
tested (SSE4.1, AVX2, AVX512F). Semantically equivalent value-recurrence
loops and `std::ranges::max` continue to vectorize on GCC 15.
--- Reproducer (argmax_repro.cpp) ---
#include <algorithm>
#include <ranges>
#include <span>
#include <vector>
int iter_vec(const std::vector<int>& v) {
return *std::max_element(v.begin(), v.end());
}
int iter_span(std::span<const int> s) {
return *std::max_element(s.begin(), s.end());
}
int loop_vec(const std::vector<int>& v) {
int m = v.front();
for (int x : v) if (x > m) m = x;
return m;
}
int loop_span(std::span<const int> s) {
int m = s.front();
for (int x : s) if (x > m) m = x;
return m;
}
int ranges_vec(const std::vector<int>& v) {
return std::ranges::max(v);
}
int ranges_span(std::span<const int> s) {
return std::ranges::max(s);
}
--- Command to reproduce ---
g++ -std=c++20 -O2 -mavx2 -S -o argmax_repro.s argmax_repro.cpp
Then inspect iter_vec / iter_span for presence of vpmaxsd:
grep -c vpmaxsd argmax_repro.s
--- Expected result ---
All six functions emit vpmaxsd instructions (SIMD max reduction), as GCC 13
does. The iterator in std::max_element is only dereferenced at the call site
and never escapes, so the pointer recurrence is dead and the loop is
semantically a value max.
--- Actual result (GCC 15) ---
iter_vec and iter_span emit a scalar cmovl argmax loop with zero vpmaxsd
instructions. The remaining four functions (loop_vec, loop_span, ranges_vec,
ranges_span) continue to vectorize correctly.
--- Assembly analysis (AVX2, -O2) ---
iter_vec on GCC 13.2.1 — SIMD reduction via vpmaxsd:
_Z8iter_vecRKSt6vectorIiSaIiEE:
movq 8(%rdi), %rdx
movq (%rdi), %rsi
cmpq %rdx, %rsi
je .L2
vmovd (%rsi), %xmm0
leaq 4(%rsi), %rax
vmovd %xmm0, %ecx
cmpq %rax, %rdx
je .L1
movq %rdx, %rcx
subq %rax, %rcx
andl $4, %ecx
je .L4
vmovd (%rax), %xmm1
leaq 8(%rsi), %rax
vpmaxsd %xmm1, %xmm0, %xmm0 ; <-- SIMD integer max
vmovd %xmm0, %ecx
cmpq %rax, %rdx
je .L1
.L4:
vmovd (%rax), %xmm1
addq $8, %rax
vpmaxsd %xmm1, %xmm0, %xmm0 ; <-- SIMD integer max
vmovd -4(%rax), %xmm1
vpmaxsd %xmm1, %xmm0, %xmm0 ; <-- SIMD integer max
vmovd %xmm0, %ecx
cmpq %rax, %rdx
jne .L4
.L1:
movl %ecx, %eax
ret
iter_vec on GCC 15.2.0 — scalar cmovl argmax (REGRESSION):
_Z8iter_vecRKSt6vectorIiSaIiEE:
movq 8(%rdi), %rcx
movq (%rdi), %rdx
cmpq %rcx, %rdx
je .L2
leaq 4(%rdx), %rax
cmpq %rax, %rcx
je .L2
.L4:
movl (%rax), %esi
cmpl %esi, (%rdx)
cmovl %rax, %rdx ; carries the POINTER across
iterations
addq $4, %rax
cmpq %rax, %rcx
jne .L4
.L2:
movl (%rdx), %eax
ret
The iter_span function shows the identical pattern on each compiler,
confirming the iterator type (vector::iterator vs raw const int*) is not the
discriminator.