https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 6 Dec 2021, avi at scylladb dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554
> 
> --- Comment #5 from Avi Kivity <avi at scylladb dot com> ---
> Here's some big-picture data. Compiled with clang, which seems to ignore these
> STLF issues.
> 
> no-slp:
> 
> 42641.91 tps ( 75.1 allocs/op,  12.1 tasks/op,   44929 insns/op)
> 42446.41 tps ( 75.1 allocs/op,  12.1 tasks/op,   44870 insns/op)
> 42495.03 tps ( 75.1 allocs/op,  12.1 tasks/op,   44931 insns/op)
> 42703.40 tps ( 75.1 allocs/op,  12.1 tasks/op,   44916 insns/op)
> 42798.98 tps ( 75.1 allocs/op,  12.1 tasks/op,   44963 insns/op)
> 
> slp:
> 
> 41536.46 tps ( 75.1 allocs/op,  12.1 tasks/op,   44828 insns/op)
> 41482.05 tps ( 75.1 allocs/op,  12.1 tasks/op,   44802 insns/op)
> 41707.23 tps ( 75.1 allocs/op,  12.1 tasks/op,   44874 insns/op)
> 41811.10 tps ( 75.1 allocs/op,  12.1 tasks/op,   44847 insns/op)
> 41764.39 tps ( 75.1 allocs/op,  12.1 tasks/op,   44846 insns/op)
> 
> So slp definitely has negative impact on ops/sec, even though it reduces
> instructions/op. This is on an older machine (newer ones have ~5X perf, with 
> 3X
> higher IPC and the rest due to higher frequency).

Is that with the function inlined?  Can you show the argument setup
code at the caller side?

Reply via email to