https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952

--- Comment #16 from Daniel Borkmann <daniel at iogearbox dot net> ---
(In reply to Martin Liška from comment #15)
> (In reply to Daniel Borkmann from comment #12)
> > I've been looking into this issue quite recently and improved the benchmark
> > tool a bit along the way. There need to be multiple considerations wrt to
> > traversing the switch cases, the case is here is doing round robin, but
> > additional distributions / tests could be added. Pushed here just in case:
> > https://github.com/borkmann/microbenchmark
> 
> Thanks a lot for the benchmark.
> 
> > Numbers I'm getting are stable:
> > 
> > * Xeon E3-1240, packet.net c1.small.x86 instance:
> > 
> >  # make prep
> >  [...]
> >  # make
> >  gcc -g -I. -O2   -c -o test.o test.c
> >  gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20  
> > -c -o switch-no-table.o switch-no-table.c
> >  gcc -g -I. -O2 -mindirect-branch=thunk   -c -o switch.o switch.c
> >  gcc -g -I. -O2   -c -o switch-no-retpol.o switch-no-retpol.c
> >  gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o
> >  taskset 1 ./test
> >  no retpoline :      6098325270
> >  no jump table:      6298192058 (no retpoline: 103.28%)
> >  jump table   :     22081802856 (no retpoline: 362.10%, no jump table:
> > 350.61%)
> >  # make
> >  taskset 1 ./test
> >  no retpoline :      6098439816
> >  no jump table:      6298242270 (no retpoline: 103.28%)
> >  jump table   :     22107872854 (no retpoline: 362.52%, no jump table:
> > 351.02%)
> >  # make
> >  taskset 1 ./test
> >  no retpoline :      6098187038
> >  no jump table:      6298308128 (no retpoline: 103.28%)
> >  jump table   :     22071053524 (no retpoline: 361.93%, no jump table:
> > 350.43%)
> > 
> > * Xeon Gold 5120, packet.net m2.xlarge.x86 instance:
> > 
> >  # make prep
> >  [...]
> >  # make
> >  gcc -g -I. -O2   -c -o test.o test.c
> >  gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20  
> > -c -o switch-no-table.o switch-no-table.c
> >  gcc -g -I. -O2 -mindirect-branch=thunk   -c -o switch.o switch.c
> >  gcc -g -I. -O2   -c -o switch-no-retpol.o switch-no-retpol.c
> >  gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o
> >  taskset 1 ./test
> >  no retpoline :      5450356814
> >  no jump table:      5620673036 (no retpoline: 103.12%)
> >  jump table   :     21448285314 (no retpoline: 393.52%, no jump table:
> > 381.60%)
> >  # make
> >  taskset 1 ./test
> >  no retpoline :      5450356100
> >  no jump table:      5620678302 (no retpoline: 103.12%)
> >  jump table   :     21448119720 (no retpoline: 393.52%, no jump table:
> > 381.59%)
> >  # make
> >  taskset 1 ./test
> >  no retpoline :      5450331258
> >  no jump table:      5620839740 (no retpoline: 103.13%)
> >  jump table   :     21446922902 (no retpoline: 393.50%, no jump table:
> > 381.56%)
> 
> I can confirm the numbers. I've got:
> model name    : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
> 
> taskset 1 ./test
> no retpoline :      4311969467
> no jump table:      5146081372 (no retpoline: 119.34%)
> jump table   :     18845846887 (no retpoline: 437.06%, no jump table:
> 366.22%)

Ok, great, thanks for testing on your side as well!

> > I've also looked into clang for their -mretpoline flag, and they generally
> > turn off jump table generation in this case. For gcc, the s390 folks
> > implemented a target override for the default case-values-threshold to raise
> > it to 20. 
> 
> Note that GCC has similar parameter:
> 
> --param case-values-threshold
>                The smallest number of different values for which it is best
> to use a jump-table instead of a tree of conditional branches.  If the value
> is 0, use the default for the machine.  The default is 0.

Yeah, I know, I've used it above for the test case (see the gcc cmdline parts).

> For 20 branches, I've got even worse numbers:
> https://github.com/marxin/microbenchmark-1/tree/retpoline-table
> 
> taskset 1 ./test
> no retpoline :      5096377521
> no jump table:      5169400990 (no retpoline: 101.43%)
> jump table   :     28830137876 (no retpoline: 565.70%, no jump table:
> 557.71%)
> 
> So are you suggesting to disable jump tables with retpolines at all?

I leave that up to you guys, but I would at min probably implement something
like s390 folks did for gcc, commit db7a90aa0de5 ("S/390: Disable prediction of
indirect branches"), see s390_case_values_threshold() which does:

+unsigned int
+s390_case_values_threshold (void)
+{
+  /* Disabling branch prediction for indirect jumps makes jump tables
+     much more expensive.  */
+  if (TARGET_INDIRECT_BRANCH_NOBP_JUMP)
+    return 20;
+
+  return default_case_values_threshold ();
+}

> For x86 something similar could be done. Anyway, H.J. Lu asked me
> > to reopen this issue (but seems like I cannot make this change from my
> > account).
> 
> Yep, I would need an account ending with @gcc.org to change a bug.

Reply via email to