On Mon, Sep 18, 2017 at 18:42:55 +0100, Peter Maydell wrote:
> On 18 September 2017 at 18:09, Lluís Vilanova <[email protected]> wrote:
> > It also means we won't be able to "conditionally" instrument instructions
> > (e.g.,
> > based on their opcode, address range, etc.).
>
> You can still do that, it's just less efficient (your
> condition-check happens in the callout to the instrumentation
> plugin). We can add "filter" options later if we need them
> (which I would rather do than have translate-time callbacks).
My initial reaction to not having translation-time hooks was that it'd
be too slow. However, thinking about it a bit more I suspect it might
not matter much. Other tools such as Pin/DynamoRIO have these hooks,
and in their case it makes sense because one can choose to, instead of
having one callback per executed instruction, get a callback per
executed "trace" (a trace is a chain of TBs). Crucially, these tools do
not go through an intermediate IR, and as a result are about 10X faster
than QEMU.
Therefore, given that QEMU is comparatively slow, we might find that
per-executed-instruction callbacks do not end up slowing things down
significantly.
I like the idea of discussing an API alone, but there is value in having
an implementation to go with it, so that we can explore the perf
trade-offs involved. I was busy with other things the last 10 days or so,
but by the end of this week I hope I'll be able to share some
perf numbers on the per-TB vs. per-instruction callback issue.
Thanks,
Emilio