Re: [RFC PATCH v2 0/7] tcg-plugins: add hooks for interrupts, exceptions and traps

Pierrick Bouvier Wed, 23 Oct 2024 08:16:46 -0700

Hi Julian,

On 10/23/24 05:56, Julian Ganz wrote:

Hi, Pierrick,


resent as I was too stupid to hit reply instead of reply-all.

October 22, 2024 at 11:15 PM, "Pierrick Bouvier" wrote:


On 10/22/24 01:21, Julian Ganz wrote:


Hi, Pierrick,
  October 21, 2024 at 11:59 PM, "Pierrick Bouvier" wrote:


  Maybe we could have a single API like:

  enum qemu_plugin_cf_event_type {
  QEMU_PLUGIN_CF_INTERRUPT;
  QEMU_PLUGIN_CF_TRAP;
  QEMU_PLUGIN_CF_SEMIHOSTING;
  };

  I have considered such an enum, as an input for the callback, as a
  parameter of the registration function, and both. Of course, if you were
  to add a selection parameter for the registration function, you likely
  want OR-able flags.
  An additional input for the callback type would obviously require a new
  function type just for that callback. Since the callbacks are somewhat
  similar to the VCPU init, exit, resume, ... ones it felt appropriate
  to use the same function type for all of them.

I tend to disagree on that. IMHO, it's better to reduce number of API entries 
instead of callback types.
It's easy for a user to understand how to implement a given callback, while 
it's hard to understand which API you need for which thing.

For the syscall cbs, we already have a specific callback. So why not here?


<snip>

As for the registration, it may make the registration a bit more
  convenient and maybe keep the API clutter a bit lower, but not by that
  much.

It's ok for the user. But I think it's more complicated to extend, when we'll 
want to introduce control flow API in the future. Do we want 5 or 6 different 
callbacks when people want to track fully control flow from a plugin?


Ok, I'll introduce an enum and combine the three callbacks in the next
iteration then.

typedef struct {
  enum qemu_plugin_cf_event_type ev;
  union {
  data_for_interrupt interrupt;
  data_for_trap trap;
  data_for_semihosting semihosting;
  } qemu_plugin_cf_event;
  /* data_for_... could contain things like from/to addresses, interrupt id, 
... */

  I don't think this is a good idea.
  Traps are just too diverse, imo there is too little overlap between
  different architectures, with the sole exception maybe being the PC
  prior to the trap. "Interrupt id" sounds like a reasonably common
  concept, but then you would need to define a mapping for each and every
  architecture. What integer type do you use? In RISC-V, for example,
  exceptions and interrupt "ids" are differentiated via the most
  significant bit. Dou keep that or do you zero it? And then there's
  ring/privilage mode, cause (sometimes for each mode), ...

I didn't want to open the per architecture pandora box :).
I don't think the plugin API itself should deal with per architecture
details like meaning of a given id. I was just thinking to push this "raw" 
information to the plugin, that may/may not use architecture specific knowledge to do its 
work. We already have plugins that have similar per architecture knowledge 
(contrib/plugins/howvec.c) and it's ok in some specific cases.


But how would such an interface look? The last PC aside, what would you
include, and how? A GArray with named items that are itself just opaque
blobs?

I was not thinking about a new interface for this. Having the "raw"interrupt id is enough for a plugin to do useful things, by havingknowledge of which architecture it's instrumenting.

And what would be the benefit compared to just querying the respective
target specific registers through qemu_plugin_read_register? Which btw.
is what we were going to do for our use-case. Even the example you
brought up (howvec) uses querying functions and doesn't expect to get
all the info via parameters.


You're right, but it's because it's querying instruction data.

I may be wrong on that, but at translation time, we may or may not beinterested in accessing tb/insn data.

However, for control flow analysis, beyond a simple counting plugin, weprobably want to access further data almost everytime.

I see it closer from syscall instrumentation, which pushes the syscallid, and all register values, instead of letting the user poke it. Makesmore sense compared to that?

But having something like from/to address seems useful to start. Even if we 
don't provide it for all events yet, it's ok.


Yes, I certainly see the advantages of having either the last PC or the
would-be-next PC as they are sufficiently universal. You can usually
retrieve them from target-specific registers, but that may be more
complicated in practice. In the case of RISC-V for example, the value
of the EPC differs between interrupts and exceptions.

To the opposite of interrupt id, a PC is something universal bydefinition, and with a single meaning across architecture. However,accessing it by name varies per architecture, and even per sub events,as you are stating for RISC-V.

That PC value should also be easy enough to obtain at the hook call
sites. We only need to store the (old) PC before doing the setup. The
"to-address" is the current PC at the time the callback is invoked.
Anything else would be a bug. I was going to write that you can
already query that in a plugin through a dedicated helper function
but apparently I misremembered.

I'll include this in the next iteration.

It would also complicate call sites for hooks in target code. You'd
  either need awkwardly long function signitures or setup code for that
  struct. Both are things you don't want, as a hook call site should
  never distract from the actual logic surrounding them. You could
  probably have something reasonable in Rust, using a builder/command
  pattern. But in C this would require too much boiler plate code than
  I'd be comfortable with.

We can have one "builder" function per data type, with fixed parameters (no 
varargs), it's reasonable and would scale well with new entries/data information.


I'm still not on board on preparing a more complex data type. For the
next iteration I'd rather stick to a simple function receiving the
"type" of event and the PCs. That may not be extensible, but I don't see
any benefit in shoehorning inheritelntly target-specifc information into
a complex struct.

It's a good compromise for now, and avoid introducing a more complexdata type.

If this is a hard requirement, I'll of course still do so.

My goal was not to hijack this series, or put the burden on you, but totalk about which direction we want to go.As long as we start with a single callback (vs one per event), I'm fineto go slowly and iterate in the future.

Regards,
Julian


Thanks Julian!

Re: [RFC PATCH v2 0/7] tcg-plugins: add hooks for interrupts, exceptions and traps

Reply via email to