On 11/17/2017 01:37 PM, Jeff Law wrote:
ISTM the better way to drive this is to query the branch probabilities.
It'd probably be simpler too. Is there some reason that's not a good
solution?
(a) I'd have to learn how to do that
(b) in the case where the condition is just a null check,
ma.cc.046t.profile_estimate considers the memset reachable 53.47% of
the time (see defect 83023)
when the condition is 'ptr && some_bool' we think it reachable 33% of
the time.
It's not clear to me what a sensible threshold might be. I suppose more
realistic probabilities are 99.99% in the first case and 50% in the
second case?
(c) the performance skew is presumably proportional to the size
parameter. small size is probably swamped by the allocation effort
itself. A large size, the memset cost might start to dominate.
Profiling shows that it is the kernel burning this in flushing the tlb
during a syscall.
My guess is that the useage pattern repeatedly allocates and frees a
large chunk of uninitialized memory. That ends up not being syscally at
all. With the change to use calloc, each of those allocations turns out
to be a large TLB churn getting read-as-zero anonymous pages. And
possibly similar churn returning freed pages to the OS.
nathan
--
Nathan Sidwell