On Mon, 2015-09-28 at 13:28 +0300, Kirill Tkhai wrote:
> Looks like, NAK may be better, because it saves L1 cache, while the patch
> always invalidates it.
Yeah, bounce hurts more when there's no concurrency win waiting to be
collected. This mixed load wasn't a great choice, but it turned out to
be pretty interesting. Something waking a gaggle of waiters on a busy
big socket should do very bad things.
> Could you say, do you execute pgbench using just -cX -jY -T30 or something
> special? I've tried it,
> but the dispersion of the results much differs from time to time.
pgbench -T $testtime -j 1 -S -c $clients
> > Ok, that's what I want to see, full repeat.
> > master = twiddle
> > master+ = twiddle+patch
> >
> > concurrent tbench 4 + pgbench, 2 minutes per client count (i4790+smt)
> > master
> > master+
> > pgbench 1 2 3 avg 1 2
> > 3 avg comp
> > clients 1 tps = 18599 18627 18532 18586 17480 17682
> > 17606 17589 .946
> > clients 2 tps = 32344 32313 32408 32355 25167 26140
> > 23730 25012 .773
> > clients 4 tps = 52593 51390 51095 51692 22983 23046
> > 22427 22818 .441
> > clients 8 tps = 70354 69583 70107 70014 66924 66672
> > 69310 67635 .966
> >
> > Hrm... turn the tables, measure tbench while pgbench 4 client load runs
> > endlessly.
> >
> > master
> > master+
> > tbench 1 2 3 avg 1 2
> > 3 avg comp
> > pairs 1 MB/s = 430 426 436 430 481 481
> > 494 485 1.127
> > pairs 2 MB/s = 1083 1085 1072 1080 1086 1090
> > 1083 1086 1.005
> > pairs 4 MB/s = 1725 1697 1729 1717 2023 2002
> > 2006 2010 1.170
> > pairs 8 MB/s = 2740 2631 2700 2690 3016 2977
> > 3071 3021 1.123
> >
> > tbench without competition
> > master master+ comp
> > pairs 1 MB/s = 694 692 .997
> > pairs 2 MB/s = 1268 1259 .992
> > pairs 4 MB/s = 2210 2165 .979
> > pairs 8 MB/s = 3586 3526 .983 (yawn, all within routine
> > variance)
>
> Hm, it seems tbench with competition is better only because of a busy system
> makes tbench
> processes be woken on the same cpu.
Yeah. When box is really full, select_idle_sibling() (obviously) turns
into a waste of cycles, but even as you approach that, especially when
filling the box with identical copies of nearly fully synchronous high
frequency localhost packet blasters, stacking is a win.
What bent my head up a bit was the combined effect of making wake_wide()
really keep pgbench from collapsing then adding the affine wakeup grant
for tbench. It's not at all clear to me why 2,4 would be so demolished.
-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/