https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554
--- Comment #5 from Avi Kivity <avi at scylladb dot com> --- Here's some big-picture data. Compiled with clang, which seems to ignore these STLF issues. no-slp: 42641.91 tps ( 75.1 allocs/op, 12.1 tasks/op, 44929 insns/op) 42446.41 tps ( 75.1 allocs/op, 12.1 tasks/op, 44870 insns/op) 42495.03 tps ( 75.1 allocs/op, 12.1 tasks/op, 44931 insns/op) 42703.40 tps ( 75.1 allocs/op, 12.1 tasks/op, 44916 insns/op) 42798.98 tps ( 75.1 allocs/op, 12.1 tasks/op, 44963 insns/op) slp: 41536.46 tps ( 75.1 allocs/op, 12.1 tasks/op, 44828 insns/op) 41482.05 tps ( 75.1 allocs/op, 12.1 tasks/op, 44802 insns/op) 41707.23 tps ( 75.1 allocs/op, 12.1 tasks/op, 44874 insns/op) 41811.10 tps ( 75.1 allocs/op, 12.1 tasks/op, 44847 insns/op) 41764.39 tps ( 75.1 allocs/op, 12.1 tasks/op, 44846 insns/op) So slp definitely has negative impact on ops/sec, even though it reduces instructions/op. This is on an older machine (newer ones have ~5X perf, with 3X higher IPC and the rest due to higher frequency).