On Wed, Dec 28, 2016 at 07:40:54PM +0100, Lluís Vilanova wrote: > Optimizes tracing of events with the 'tcg' and 'vcpu' properties (e.g., memory > accesses), making it feasible to statically enable them by default on all QEMU > builds. > > Some quick'n'dirty numbers with 400.perlbench (SPECcpu2006) on the train input > (medium size - suns.pl) and the guest_mem_before event: > > * vanilla, statically disabled > real 0m2,259s > user 0m2,252s > sys 0m0,004s > > * vanilla, statically enabled (overhead: 2.18x) > real 0m4,921s > user 0m4,912s > sys 0m0,008s > > * multi-tb, statically disabled (overhead: 0.99x) [within noise range] > real 0m2,228s > user 0m2,216s > sys 0m0,008s > > * multi-tb, statically enabled (overhead: 0.99x) [within noise range] > real 0m2,229s > user 0m2,224s > sys 0m0,004s > > > Right now, events with the 'tcg' property always generate TCG code to trace > that > event at guest code execution time, where the event's dynamic state is > checked. > > This series adds a performance optimization where TCG code for events with the > 'tcg' and 'vcpu' properties is not generated if the event is dynamically > disabled. This optimization raises two issues: > > * An event can be dynamically disabled/enabled after the corresponding TCG > code > has been generated (i.e., a new TB with the corresponding code should be > used). > > * Each vCPU can have a different dynamic state for the same event (i.e., > tracing > the memory accesses of only one process pinned to a vCPU). > > To handle both issues, this series integrates the dynamic tracing event state > into the TB hashing function, so that vCPUs tracing different events will use > separate TBs. Note that only events with the 'vcpu' property are used for > hashing (as stored in the bitmap of CPUState->trace_dstate). > > This makes dynamic event state changes on vCPUs very efficient, since they can > use TBs produced by other vCPUs while on the same event state combination (or > produced by the same vCPU, earlier). > > Discarded alternatives: > > * Emitting TCG code to check if an event needs tracing, where we should still > move the tracing call code to either a cold path (making tracing performance > worse), or leave it inlined (making non-tracing performance worse). > > * Eliding TCG code only when *zero* vCPUs are tracing an event, since enabling > it on a single vCPU will impact the performance of all other vCPUs that are > not tracing that event. > > Signed-off-by: Lluís Vilanova <vilan...@ac.upc.edu> > --- > > Changes in v6 > ============= > > * Check hashing size error with QEMU_BUILD_BUG_ON [Richard Henderson]. > > > Changes in v5 > ============= > > * Move define into "qemu-common.h" to allow compilation of tests. > > > Changes in v4 > ============= > > * Incorporate trace_dstate into the TB hashing function instead of using > multiple physical TB caches [suggested by Richard Henderson]. > > > Changes in v3 > ============= > > * Rebase on 0737f32daf. > * Do not use reserved symbol prefixes ("__") [Stefan Hajnoczi]. > * Refactor trace_get_vcpu_event_count() to be inlinable. > * Optimize cpu_tb_cache_set_requested() (hottest path). > > > Changes in v2 > ============= > > * Fix bitmap copy in cpu_tb_cache_set_apply(). > * Split generated code re-alignment into a separate patch [Daniel P. > Berrange]. > > > Lluís Vilanova (7): > exec: [tcg] Refactor flush of per-CPU virtual TB cache > trace: Make trace_get_vcpu_event_count() inlinable > trace: [tcg] Delay changes to dynamic state when translating > exec: [tcg] Use different TBs according to the vCPU's dynamic tracing > state > trace: [tcg] Do not generate TCG code to trace dinamically-disabled > events > trace: [tcg,trivial] Re-align generated code > trace: [trivial] Statically enable all guest events > > > cpu-exec.c | 52 > +++++++++++++++++++++++++++--- > cputlb.c | 2 + > include/exec/exec-all.h | 11 ++++++ > include/exec/tb-hash-xx.h | 8 ++++- > include/exec/tb-hash.h | 5 ++- > include/qemu-common.h | 3 ++ > include/qom/cpu.h | 7 ++++ > qom/cpu.c | 4 ++ > scripts/tracetool/__init__.py | 1 + > scripts/tracetool/backend/dtrace.py | 2 + > scripts/tracetool/backend/ftrace.py | 20 ++++++------ > scripts/tracetool/backend/log.py | 17 +++++----- > scripts/tracetool/backend/simple.py | 2 + > scripts/tracetool/backend/syslog.py | 6 ++- > scripts/tracetool/backend/ust.py | 2 + > scripts/tracetool/format/h.py | 24 ++++++++++---- > scripts/tracetool/format/tcg_h.py | 19 +++++++++-- > scripts/tracetool/format/tcg_helper_c.py | 3 +- > tests/qht-bench.c | 2 + > trace-events | 6 ++- > trace/control-internal.h | 5 +++ > trace/control-target.c | 14 +++++++- > trace/control.c | 9 +---- > trace/control.h | 5 ++- > translate-all.c | 30 +++++++++++++---- > 25 files changed, 195 insertions(+), 64 deletions(-) > > > To: qemu-devel@nongnu.org > Cc: Stefan Hajnoczi <stefa...@redhat.com> > Cc: Eduardo Habkost <ehabk...@redhat.com> > Cc: Eric Blake <ebl...@redhat.com>
The tracing aspects seem fine. I have left a comment regarding thread-safety. I'll merge it once Richard Henderson has had time to review it from a TCG perspective.
signature.asc
Description: PGP signature