Hi Kristian, just out of curiosity: is it possible to find out which functions cause highest amount of icache misses? Can it have anything to do with branch misprediction?
Regards, Sergey On Fri, Jan 24, 2014 at 03:51:25PM +0100, Kristian Nielsen wrote: > I have been analysing CPU bottlenecks in single-threaded sysbench read-only > load. I found that icache misses is the main bottleneck, and that > profile-guided compiler optimisation (PGO) with GCC gives a large speedup, 25% > or more. > > (More details in my blog posts: > > http://kristiannielsen.livejournal.com/17676.html > http://kristiannielsen.livejournal.com/18168.html > ) > > Now I would like to ask for some discussions/help in how to get this > implemented in practice. It involves changing the build process for our > binaries: First compile with gcc --coverage, then run some profile workload, > then recompile with -fprofile-use. > > I implemented a simple program to generate some profile load: > > https://github.com/knielsen/gen_profile_load > > It runs a bunch of simple insert/select/update/delete, with different > combinations of storage engine, binlog format, and client API. It is designed > to run inside the build tree and handle starting and stopping the server being > tested, so it is pretty close to a working setup. These commands work to > generate a binary that is faster due to PGO: > > mkdir bld > cd bld > cmake -DWITHOUT_PERFSCHEMA_STORAGE_ENGINE=1 > -DCMAKE_BUILD_TYPE=RelWithDebInfo > -DCMAKE_C_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 --coverage" > -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 --coverage" > .. > make > > tests/gen_profile_load > > cmake -DWITHOUT_PERFSCHEMA_STORAGE_ENGINE=1 > -DCMAKE_BUILD_TYPE=RelWithDebInfo > -DCMAKE_C_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 -fprofile-use > -fprofile-correction" > -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 > -fprofile-use -fprofile-correction" > make > > So all the pieces really are there, it should be possible to implement it. But > we need to find a good way to integrate it into our build system. > > The best would be to integrate it into our cmake files. > > The gen_profile_load.c could go into tests/, ideally we would build both a > static and dynamically linked version (so we get PGO for both libmysqlclient.a > and libmysqlclient.so). Anyone can help me get cmake to do that? > > And it would be cool if we could get the above procedure to work completely > within cmake, so that the user could just do: > > cmake -DWITH_PGO ... ; make > > and cmake would itself handle first building with --coverage, then running > gen_profile_load.static and gen_profile_load.dynamic, then rebuilding with > -fprofile-use. Anyone know if this is possible with cmake, and if so could > help implement it? > > But alternatively, we could integrate a double build, like the commands above, > into the buildbot scripts (.deb, .rpm, bintar). > > Any comments? Here are some more points: > > - I tested that gen_profile_load gives a good speedup of sysbench read-only > (around 30%, so still very significant even though it generates a different > and more varied load). > > - As another test, I removed all SELECT from gen_profile_load, and ran the > resulting PGO binary with sysbench read-only. This still gave a fair > speedup, despite the PGO load being completely different from the benchmark > load. This gives me confidence that the PGO should not cause performance > regressions in cases not covered well by gen_profile_load > > - More tests would be nice, of course. Axel, would you be able to build some > binaries following above procedure, and test some different random > benchmarks? Anything that is easy to run could be interesting, both to test > for improvement, and to check against regressions. > > - We probably need a recent GCC version to get good results. I used GCC > version 4.7.2. Maybe we should install this GCC version in all the VMs we > use to build binaries? > > - Should we do this in 5.5? I think we might want to. The speedup is quite > significant, and it seems very safe - no code modifications are involved, > only different compiler options. > > Any thoughts? Volunteeres for helping with the cmake or buildbot parts? > > - Kristian. > > _______________________________________________ > Mailing list: https://launchpad.net/~maria-developers > Post to : [email protected] > Unsubscribe : https://launchpad.net/~maria-developers > More help : https://help.launchpad.net/ListHelp _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

