On 15.02.12 20:47, Alexander Motin wrote: > On 02/14/12 00:38, Alexander Motin wrote: >> I see no much point in committing them sequentially, as they are quite >> orthogonal. I need to make one decision. I am going on small vacation >> next week. It will give time for thoughts to settle. May be I indeed >> just clean previous patch a bit and commit it when I get back. I've >> spent too much time trying to make these things formal and so far >> results are not bad, but also not so brilliant as I would like. May be >> it is indeed time to step back and try some more simple solution. > > I've decided to stop those cache black magic practices and focus on > things that really exist in this world -- SMT and CPU load. I've dropped > most of cache related things from the patch and made the rest of things > more strict and predictable: > http://people.freebsd.org/~mav/sched.htt34.patch > > This patch adds check to skip fast previous CPU selection if it's SMT > neighbor is in use, not just if no SMT present as in previous patches. > > I've took affinity/preference algorithm from the first patch and > improved it. That makes pickcpu() to prefer previous core or it's > neighbors in case of equal load. That is very simple to keep it, but > still should give cache hits. > > I've changed the general algorithm of topology tree processing. First I > am looking for idle core on the same last-level cache as before, with > affinity to previous core or it's neighbors on higher level caches. > Original code could put additional thread on already busy core, while > next socket is completely idle. Now if there is no idle core on this > cache, then all other CPUs are checked. > > CPU groups comparison now done in two steps: first, same as before, > compared summary load of all cores; but now, if it is equal, I am > comparing load of the less/most loaded cores. That should allow to > differentiate whether load 2 really means 1+1 or 2+0. In that case group > with 2+0 will be taken as more loaded than one with 1+1, making group > choice more grounded and predictable. > > I've added randomization in case if all above factors are equal. > > As before I've tested this on Core i7-870 with 4 physical and 8 logical > cores and Atom D525 with 2 physical and 4 logical cores. On Core i7 I've > got speedup up to 10-15% in super-smack MySQL and PostgreSQL indexed > select for 2-8 threads and no penalty in other cases. pbzip2 shows up to > 13% performance increase for 2-5 threads and no penalty in other cases. > > Tests on Atom show mostly about the same performance as before in > database benchmarks: faster for 1 thread, slower for 2-3 and about the > same for other cases. Single stream network performance improved same as > for the first patch. That CPU is quite difficult to handle as with mix > of effective SMT and lack of L3 cache different scheduling approaches > give different results in different situations. > > Specific performance numbers can be found here: > http://people.freebsd.org/~mav/bench.ods > Every point there includes at least 5 samples and except pbzip2 test > that is quite unstable with previous sources all are statistically valid. > > Florian is now running alternative set of benchmarks on dual-socket > hardware without SMT. >
I have updated my PostgreSQL [1] and pbzip2 [2] benchmarks. You should be looking for "ULE+mav-htt33". On a system without HTT this patch is at least as good as stock ULE and in some cases it's a nice improvement. Florian [1] https://docs.google.com/spreadsheet/ccc?key=0Ai0N1xDe3uNAdDRxcVFiYjNMSnJWOTZhUWVWWlBlemc&hl=en_US&pli=1#gid=4 [2]https://docs.google.com/spreadsheet/ccc?key=0Ai0N1xDe3uNAdDRxcVFiYjNMSnJWOTZhUWVWWlBlemc&hl=en_US&pli=1#gid=2
signature.asc
Description: OpenPGP digital signature

