https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952
--- Comment #16 from Daniel Borkmann <daniel at iogearbox dot net> --- (In reply to Martin Liška from comment #15) > (In reply to Daniel Borkmann from comment #12) > > I've been looking into this issue quite recently and improved the benchmark > > tool a bit along the way. There need to be multiple considerations wrt to > > traversing the switch cases, the case is here is doing round robin, but > > additional distributions / tests could be added. Pushed here just in case: > > https://github.com/borkmann/microbenchmark > > Thanks a lot for the benchmark. > > > Numbers I'm getting are stable: > > > > * Xeon E3-1240, packet.net c1.small.x86 instance: > > > > # make prep > > [...] > > # make > > gcc -g -I. -O2 -c -o test.o test.c > > gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20 > > -c -o switch-no-table.o switch-no-table.c > > gcc -g -I. -O2 -mindirect-branch=thunk -c -o switch.o switch.c > > gcc -g -I. -O2 -c -o switch-no-retpol.o switch-no-retpol.c > > gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o > > taskset 1 ./test > > no retpoline : 6098325270 > > no jump table: 6298192058 (no retpoline: 103.28%) > > jump table : 22081802856 (no retpoline: 362.10%, no jump table: > > 350.61%) > > # make > > taskset 1 ./test > > no retpoline : 6098439816 > > no jump table: 6298242270 (no retpoline: 103.28%) > > jump table : 22107872854 (no retpoline: 362.52%, no jump table: > > 351.02%) > > # make > > taskset 1 ./test > > no retpoline : 6098187038 > > no jump table: 6298308128 (no retpoline: 103.28%) > > jump table : 22071053524 (no retpoline: 361.93%, no jump table: > > 350.43%) > > > > * Xeon Gold 5120, packet.net m2.xlarge.x86 instance: > > > > # make prep > > [...] > > # make > > gcc -g -I. -O2 -c -o test.o test.c > > gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20 > > -c -o switch-no-table.o switch-no-table.c > > gcc -g -I. -O2 -mindirect-branch=thunk -c -o switch.o switch.c > > gcc -g -I. -O2 -c -o switch-no-retpol.o switch-no-retpol.c > > gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o > > taskset 1 ./test > > no retpoline : 5450356814 > > no jump table: 5620673036 (no retpoline: 103.12%) > > jump table : 21448285314 (no retpoline: 393.52%, no jump table: > > 381.60%) > > # make > > taskset 1 ./test > > no retpoline : 5450356100 > > no jump table: 5620678302 (no retpoline: 103.12%) > > jump table : 21448119720 (no retpoline: 393.52%, no jump table: > > 381.59%) > > # make > > taskset 1 ./test > > no retpoline : 5450331258 > > no jump table: 5620839740 (no retpoline: 103.13%) > > jump table : 21446922902 (no retpoline: 393.50%, no jump table: > > 381.56%) > > I can confirm the numbers. I've got: > model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz > > taskset 1 ./test > no retpoline : 4311969467 > no jump table: 5146081372 (no retpoline: 119.34%) > jump table : 18845846887 (no retpoline: 437.06%, no jump table: > 366.22%) Ok, great, thanks for testing on your side as well! > > I've also looked into clang for their -mretpoline flag, and they generally > > turn off jump table generation in this case. For gcc, the s390 folks > > implemented a target override for the default case-values-threshold to raise > > it to 20. > > Note that GCC has similar parameter: > > --param case-values-threshold > The smallest number of different values for which it is best > to use a jump-table instead of a tree of conditional branches. If the value > is 0, use the default for the machine. The default is 0. Yeah, I know, I've used it above for the test case (see the gcc cmdline parts). > For 20 branches, I've got even worse numbers: > https://github.com/marxin/microbenchmark-1/tree/retpoline-table > > taskset 1 ./test > no retpoline : 5096377521 > no jump table: 5169400990 (no retpoline: 101.43%) > jump table : 28830137876 (no retpoline: 565.70%, no jump table: > 557.71%) > > So are you suggesting to disable jump tables with retpolines at all? I leave that up to you guys, but I would at min probably implement something like s390 folks did for gcc, commit db7a90aa0de5 ("S/390: Disable prediction of indirect branches"), see s390_case_values_threshold() which does: +unsigned int +s390_case_values_threshold (void) +{ + /* Disabling branch prediction for indirect jumps makes jump tables + much more expensive. */ + if (TARGET_INDIRECT_BRANCH_NOBP_JUMP) + return 20; + + return default_case_values_threshold (); +} > For x86 something similar could be done. Anyway, H.J. Lu asked me > > to reopen this issue (but seems like I cannot make this change from my > > account). > > Yep, I would need an account ending with @gcc.org to change a bug.