On Mon, Oct 5, 2015 at 9:48 PM, Dave Hansen <[email protected]> wrote:
>
> Although I was probably wrong about the source of the overhead, the
> point still remains that the prefaulting is eating cycles for no
> practical benefit.

Yeah, no, I'm not disagreeing with that part, I'm just more of a "at
this point in the rc series we are probably better off reverting".

Your ext4 patch may well fix the issue, and be the right thing to do
(_regardless_ of the revert, in fact - while it might make the revert
unnecessary, it might also be a good idea even if we do revert).

The subtlety of this just worries me, and the reason I'd still be
inclined to revert is simply "it's been that way a long time, the safe
thing is to go back and take this slow".

> With "-e cycles:pp":
>>        │      sub    $0x8,%rsp
>>  24.57 │      stac
>>  15.49 │      mov    (%rcx),%sil
>>  29.06 │      clac
>>   2.24 │      test   %eax,%eax
>>   8.77 │      mov    %sil,-0x1(%rbp)
>>   2.22 │    ↓ jne    66
>>        │      movslq %edx,%rdx

Ok, so it really is the stac/clac that is the bulk of the cost. Hmm.

You're right that the loop there will only be executed once for your
case, so moving the stac/clac outside probably doesn't help. It
*might* still make a difference just for microarchitectural reasons
(ie they may cause more trouble just because they are close to an
instruction that depends on them), but it's questionable.

It is a bit worrisome to see that those things are so expensive. Right
now almost all user accesses will cause *lots* of clac/stac stuff.

I originally asked Intel to do SMAP using a segment prefix, but that
was not to be..

              Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to