https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554
--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 6 Dec 2021, avi at scylladb dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554 > > --- Comment #5 from Avi Kivity <avi at scylladb dot com> --- > Here's some big-picture data. Compiled with clang, which seems to ignore these > STLF issues. > > no-slp: > > 42641.91 tps ( 75.1 allocs/op, 12.1 tasks/op, 44929 insns/op) > 42446.41 tps ( 75.1 allocs/op, 12.1 tasks/op, 44870 insns/op) > 42495.03 tps ( 75.1 allocs/op, 12.1 tasks/op, 44931 insns/op) > 42703.40 tps ( 75.1 allocs/op, 12.1 tasks/op, 44916 insns/op) > 42798.98 tps ( 75.1 allocs/op, 12.1 tasks/op, 44963 insns/op) > > slp: > > 41536.46 tps ( 75.1 allocs/op, 12.1 tasks/op, 44828 insns/op) > 41482.05 tps ( 75.1 allocs/op, 12.1 tasks/op, 44802 insns/op) > 41707.23 tps ( 75.1 allocs/op, 12.1 tasks/op, 44874 insns/op) > 41811.10 tps ( 75.1 allocs/op, 12.1 tasks/op, 44847 insns/op) > 41764.39 tps ( 75.1 allocs/op, 12.1 tasks/op, 44846 insns/op) > > So slp definitely has negative impact on ops/sec, even though it reduces > instructions/op. This is on an older machine (newer ones have ~5X perf, with > 3X > higher IPC and the rest due to higher frequency). Is that with the function inlined? Can you show the argument setup code at the caller side?