-----Original Message-----
From: Qing Zhao <[email protected]>
Date: Thursday, September 3, 2020 at 12:55 PM
To: Kees Cook <[email protected]>
Cc: Segher Boessenkool <[email protected]>, Jakub Jelinek
<[email protected]>, Uros Bizjak <[email protected]>, "Rodriguez Bahena, Victor"
<[email protected]>, GCC Patches <[email protected]>
Subject: Re: PING [Patch][Middle-end]Add
-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> On Sep 3, 2020, at 12:13 PM, Kees Cook <[email protected]> wrote:
>
> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>> On average, all the options starting with “used_…” (i.e, only the
registers that are used in the routine will be zeroed) have very low runtime
overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks.
>> If all the registers will be zeroed, the runtime overhead is bigger,
all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on
average.
>> Looks like the overhead of zeroing vector registers is much bigger.
>>
>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough,
the runtime overhead with this is very small.
>
> That looks great; thanks for doing those tests!
>
> (And it seems like these benchmarks are kind of a "worst case" scenario
> with regard to performance, yes? As in it's mostly tight call loops?)
The top 3 benchmarks that have the most overhead from this option are:
531.deepsjeng_r, 541.leela_r, and 511.povray_r.
All of them are C++ benchmarks.
I guess that the most important reason is the smaller routine size in
general (especially at the hot execution path or loops).
As a result, the overhead of these additional zeroing instructions in each
routine will be relatively higher.
Qing
I think that overhead is expected in benchmarks like 541.leela_r, according to
https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html is a benchmark
for Artificial Intelligence (Monte Carlo simulation, game tree search & pattern
recognition). The addition of fzero-call-used-regs will represent an overhead
each time the functions are being call and in areas like game tree search is
high.
Qing, thanks a lot for the measurement, I am not sure if this is the limit of
overhead the community is willing to accept by adding extra security (me as gcc
user will be willing to accept).
Regards
Victor
>
> --
> Kees Cook