https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952
Martin Liška <marxin at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |ASSIGNED --- Comment #15 from Martin Liška <marxin at gcc dot gnu.org> --- (In reply to Daniel Borkmann from comment #12) > I've been looking into this issue quite recently and improved the benchmark > tool a bit along the way. There need to be multiple considerations wrt to > traversing the switch cases, the case is here is doing round robin, but > additional distributions / tests could be added. Pushed here just in case: > https://github.com/borkmann/microbenchmark Thanks a lot for the benchmark. > > Numbers I'm getting are stable: > > * Xeon E3-1240, packet.net c1.small.x86 instance: > > # make prep > [...] > # make > gcc -g -I. -O2 -c -o test.o test.c > gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20 > -c -o switch-no-table.o switch-no-table.c > gcc -g -I. -O2 -mindirect-branch=thunk -c -o switch.o switch.c > gcc -g -I. -O2 -c -o switch-no-retpol.o switch-no-retpol.c > gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o > taskset 1 ./test > no retpoline : 6098325270 > no jump table: 6298192058 (no retpoline: 103.28%) > jump table : 22081802856 (no retpoline: 362.10%, no jump table: > 350.61%) > # make > taskset 1 ./test > no retpoline : 6098439816 > no jump table: 6298242270 (no retpoline: 103.28%) > jump table : 22107872854 (no retpoline: 362.52%, no jump table: > 351.02%) > # make > taskset 1 ./test > no retpoline : 6098187038 > no jump table: 6298308128 (no retpoline: 103.28%) > jump table : 22071053524 (no retpoline: 361.93%, no jump table: > 350.43%) > > * Xeon Gold 5120, packet.net m2.xlarge.x86 instance: > > # make prep > [...] > # make > gcc -g -I. -O2 -c -o test.o test.c > gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20 > -c -o switch-no-table.o switch-no-table.c > gcc -g -I. -O2 -mindirect-branch=thunk -c -o switch.o switch.c > gcc -g -I. -O2 -c -o switch-no-retpol.o switch-no-retpol.c > gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o > taskset 1 ./test > no retpoline : 5450356814 > no jump table: 5620673036 (no retpoline: 103.12%) > jump table : 21448285314 (no retpoline: 393.52%, no jump table: > 381.60%) > # make > taskset 1 ./test > no retpoline : 5450356100 > no jump table: 5620678302 (no retpoline: 103.12%) > jump table : 21448119720 (no retpoline: 393.52%, no jump table: > 381.59%) > # make > taskset 1 ./test > no retpoline : 5450331258 > no jump table: 5620839740 (no retpoline: 103.13%) > jump table : 21446922902 (no retpoline: 393.50%, no jump table: > 381.56%) I can confirm the numbers. I've got: model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz taskset 1 ./test no retpoline : 4311969467 no jump table: 5146081372 (no retpoline: 119.34%) jump table : 18845846887 (no retpoline: 437.06%, no jump table: 366.22%) > > I've also looked into clang for their -mretpoline flag, and they generally > turn off jump table generation in this case. For gcc, the s390 folks > implemented a target override for the default case-values-threshold to raise > it to 20. Note that GCC has similar parameter: --param case-values-threshold The smallest number of different values for which it is best to use a jump-table instead of a tree of conditional branches. If the value is 0, use the default for the machine. The default is 0. For 20 branches, I've got even worse numbers: https://github.com/marxin/microbenchmark-1/tree/retpoline-table taskset 1 ./test no retpoline : 5096377521 no jump table: 5169400990 (no retpoline: 101.43%) jump table : 28830137876 (no retpoline: 565.70%, no jump table: 557.71%) So are you suggesting to disable jump tables with retpolines at all? For x86 something similar could be done. Anyway, H.J. Lu asked me > to reopen this issue (but seems like I cannot make this change from my > account). Yep, I would need an account ending with @gcc.org to change a bug.