https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952

Daniel Borkmann <daniel at iogearbox dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |daniel at iogearbox dot net

--- Comment #12 from Daniel Borkmann <daniel at iogearbox dot net> ---
I've been looking into this issue quite recently and improved the benchmark
tool a bit along the way. There need to be multiple considerations wrt to
traversing the switch cases, the case is here is doing round robin, but
additional distributions / tests could be added. Pushed here just in case:
https://github.com/borkmann/microbenchmark

Numbers I'm getting are stable:

* Xeon E3-1240, packet.net c1.small.x86 instance:

 # make prep
 [...]
 # make
 gcc -g -I. -O2   -c -o test.o test.c
 gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20   -c
-o switch-no-table.o switch-no-table.c
 gcc -g -I. -O2 -mindirect-branch=thunk   -c -o switch.o switch.c
 gcc -g -I. -O2   -c -o switch-no-retpol.o switch-no-retpol.c
 gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o
 taskset 1 ./test
 no retpoline :      6098325270
 no jump table:      6298192058 (no retpoline: 103.28%)
 jump table   :     22081802856 (no retpoline: 362.10%, no jump table: 350.61%)
 # make
 taskset 1 ./test
 no retpoline :      6098439816
 no jump table:      6298242270 (no retpoline: 103.28%)
 jump table   :     22107872854 (no retpoline: 362.52%, no jump table: 351.02%)
 # make
 taskset 1 ./test
 no retpoline :      6098187038
 no jump table:      6298308128 (no retpoline: 103.28%)
 jump table   :     22071053524 (no retpoline: 361.93%, no jump table: 350.43%)

* Xeon Gold 5120, packet.net m2.xlarge.x86 instance:

 # make prep
 [...]
 # make
 gcc -g -I. -O2   -c -o test.o test.c
 gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20   -c
-o switch-no-table.o switch-no-table.c
 gcc -g -I. -O2 -mindirect-branch=thunk   -c -o switch.o switch.c
 gcc -g -I. -O2   -c -o switch-no-retpol.o switch-no-retpol.c
 gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o
 taskset 1 ./test
 no retpoline :      5450356814
 no jump table:      5620673036 (no retpoline: 103.12%)
 jump table   :     21448285314 (no retpoline: 393.52%, no jump table: 381.60%)
 # make
 taskset 1 ./test
 no retpoline :      5450356100
 no jump table:      5620678302 (no retpoline: 103.12%)
 jump table   :     21448119720 (no retpoline: 393.52%, no jump table: 381.59%)
 # make
 taskset 1 ./test
 no retpoline :      5450331258
 no jump table:      5620839740 (no retpoline: 103.13%)
 jump table   :     21446922902 (no retpoline: 393.50%, no jump table: 381.56%)

I've also looked into clang for their -mretpoline flag, and they generally turn
off jump table generation in this case. For gcc, the s390 folks implemented a
target override for the default case-values-threshold to raise it to 20. For
x86 something similar could be done. Anyway, H.J. Lu asked me to reopen this
issue (but seems like I cannot make this change from my account).

Reply via email to