"Richard Biener" <richard.guent...@gmail.com> wrote:
> On Mon, Aug 24, 2020 at 1:22 PM Stefan Kanthak <stefan.kant...@nexgo.de> > wrote: >> >> "Richard Biener" <richard.guent...@gmail.com> wrote: >> >> > On Mon, Aug 17, 2020 at 7:09 PM Stefan Kanthak <stefan.kant...@nexgo.de> >> > wrote: >> >> >> >> "Allan Sandfeld Jensen" <li...@carewolf.com> wrote: >> >> >> >> > On Freitag, 14. August 2020 18:43:12 CEST Stefan Kanthak wrote: [...] > Whether or not the branch is predicted taken does not matter, what > matters is that the continuation is not data dependent on the branch > target computation and thus can execute in parallel to it. My benchmark shows that this doesn't matter! >> > The proposed change turns the control into a data dependence which >> > constrains instruction scheduling and retirement. >> >> It doesn't matter: the branch has the same data dependency too! >> >> > Indeed a mispredicted branch will likely be more costly. >> >> And no branch is even better: the branch predictor has a limited capacity, >> so every removed branch instruction can help improve its efficiency. >> >> > x86 CPUs do not perform data speculation. >> >> >> mov ecx, edi >> >> movabs rax, 4294977024 >> >> shr rax, cl >> >> xor edi, edi >> >> cmp ecx, 33 >> >> setb dil >> >> and eax, edi >> >> I already presented measured numbers: with random data, the branch-free >> code is faster, with ordered data the original code. >> >> Left column 1 billion sequential characters >> for (int i=1000000000; i; --i) ...(i); >> right column 1 billion random characters, in cycles per character: > > I guess feeding it Real Text (TM) is the only relevant benchmark, > doing sth like > > for (;;) > cnt[isWhitespace(*ptr++)]++; I approximated that using a PRNG... >> GCC: 2.4 3.4 >> branch-free: 3.0 2.5 > > I'd call that unconclusive data - you also failed to show your test data > is somehow relevant. Since nobody can predict real world data all test data are irrelevant, somehow. I thus call your argument a NULL argument. > We do know that mispredicted branches are bad. > You show well-predicted branches are good. Wrong: I show that no branches are still better. > By simple statistics singling out 4 out of 255 values will make the > branches well-predicted. Your statistic is wrong: 1. the branch singles out 224 of 256 values, i.e. 7/8 of all data; 2. whitespace lies in the 1/8 which is not singled out. >> Now perform a linear interpolation and find the break-even point at >> p=0.4, with p=0 for ordered data and p=1 for random data, or just use >> the average of these numbers: 2.9 cycles vs. 2.75 cycles. >> That's small, but measurable! Stefan