On 11/17/2017 01:37 PM, Jeff Law wrote:

ISTM the better way to drive this is to query the branch probabilities.
It'd probably be simpler too.  Is there some reason that's not a good
solution?

(a) I'd have to learn how to do that

(b) in the case where the condition is just a null check, ma.cc.046t.profile_estimate considers the memset reachable 53.47% of the time (see defect 83023)

when the condition is 'ptr && some_bool' we think it reachable 33% of the time.

It's not clear to me what a sensible threshold might be. I suppose more realistic probabilities are 99.99% in the first case and 50% in the second case?

(c) the performance skew is presumably proportional to the size parameter. small size is probably swamped by the allocation effort itself. A large size, the memset cost might start to dominate. Profiling shows that it is the kernel burning this in flushing the tlb during a syscall.

My guess is that the useage pattern repeatedly allocates and frees a large chunk of uninitialized memory. That ends up not being syscally at all. With the change to use calloc, each of those allocations turns out to be a large TLB churn getting read-as-zero anonymous pages. And possibly similar churn returning freed pages to the OS.

nathan

--
Nathan Sidwell

Reply via email to