https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |ASSIGNED

--- Comment #15 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Daniel Borkmann from comment #12)
> I've been looking into this issue quite recently and improved the benchmark
> tool a bit along the way. There need to be multiple considerations wrt to
> traversing the switch cases, the case is here is doing round robin, but
> additional distributions / tests could be added. Pushed here just in case:
> https://github.com/borkmann/microbenchmark

Thanks a lot for the benchmark.

> 
> Numbers I'm getting are stable:
> 
> * Xeon E3-1240, packet.net c1.small.x86 instance:
> 
>  # make prep
>  [...]
>  # make
>  gcc -g -I. -O2   -c -o test.o test.c
>  gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20  
> -c -o switch-no-table.o switch-no-table.c
>  gcc -g -I. -O2 -mindirect-branch=thunk   -c -o switch.o switch.c
>  gcc -g -I. -O2   -c -o switch-no-retpol.o switch-no-retpol.c
>  gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o
>  taskset 1 ./test
>  no retpoline :      6098325270
>  no jump table:      6298192058 (no retpoline: 103.28%)
>  jump table   :     22081802856 (no retpoline: 362.10%, no jump table:
> 350.61%)
>  # make
>  taskset 1 ./test
>  no retpoline :      6098439816
>  no jump table:      6298242270 (no retpoline: 103.28%)
>  jump table   :     22107872854 (no retpoline: 362.52%, no jump table:
> 351.02%)
>  # make
>  taskset 1 ./test
>  no retpoline :      6098187038
>  no jump table:      6298308128 (no retpoline: 103.28%)
>  jump table   :     22071053524 (no retpoline: 361.93%, no jump table:
> 350.43%)
> 
> * Xeon Gold 5120, packet.net m2.xlarge.x86 instance:
> 
>  # make prep
>  [...]
>  # make
>  gcc -g -I. -O2   -c -o test.o test.c
>  gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20  
> -c -o switch-no-table.o switch-no-table.c
>  gcc -g -I. -O2 -mindirect-branch=thunk   -c -o switch.o switch.c
>  gcc -g -I. -O2   -c -o switch-no-retpol.o switch-no-retpol.c
>  gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o
>  taskset 1 ./test
>  no retpoline :      5450356814
>  no jump table:      5620673036 (no retpoline: 103.12%)
>  jump table   :     21448285314 (no retpoline: 393.52%, no jump table:
> 381.60%)
>  # make
>  taskset 1 ./test
>  no retpoline :      5450356100
>  no jump table:      5620678302 (no retpoline: 103.12%)
>  jump table   :     21448119720 (no retpoline: 393.52%, no jump table:
> 381.59%)
>  # make
>  taskset 1 ./test
>  no retpoline :      5450331258
>  no jump table:      5620839740 (no retpoline: 103.13%)
>  jump table   :     21446922902 (no retpoline: 393.50%, no jump table:
> 381.56%)

I can confirm the numbers. I've got:
model name      : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

taskset 1 ./test
no retpoline :      4311969467
no jump table:      5146081372 (no retpoline: 119.34%)
jump table   :     18845846887 (no retpoline: 437.06%, no jump table: 366.22%)


> 
> I've also looked into clang for their -mretpoline flag, and they generally
> turn off jump table generation in this case. For gcc, the s390 folks
> implemented a target override for the default case-values-threshold to raise
> it to 20. 

Note that GCC has similar parameter:

--param case-values-threshold
               The smallest number of different values for which it is best to
use a jump-table instead of a tree of conditional branches.  If the value is 0,
use the default for the machine.  The default is 0.

For 20 branches, I've got even worse numbers:
https://github.com/marxin/microbenchmark-1/tree/retpoline-table

taskset 1 ./test
no retpoline :      5096377521
no jump table:      5169400990 (no retpoline: 101.43%)
jump table   :     28830137876 (no retpoline: 565.70%, no jump table: 557.71%)

So are you suggesting to disable jump tables with retpolines at all?

For x86 something similar could be done. Anyway, H.J. Lu asked me
> to reopen this issue (but seems like I cannot make this change from my
> account).

Yep, I would need an account ending with @gcc.org to change a bug.

Reply via email to