On 10/23/24 09:12, Julian Ganz wrote:
Hi, Pierrick,

October 23, 2024 at 5:16 PM, "Pierrick Bouvier" wrote:

Hi Julian,

On 10/23/24 05:56, Julian Ganz wrote:

  October 22, 2024 at 11:15 PM, "Pierrick Bouvier" wrote:


On 10/22/24 01:21, Julian Ganz wrote:


  Ok, I'll introduce an enum and combine the three callbacks in the next
  iteration then.
  typedef struct {
  enum qemu_plugin_cf_event_type ev;
  union {
  data_for_interrupt interrupt;
  data_for_trap trap;
  data_for_semihosting semihosting;
  } qemu_plugin_cf_event;
  /* data_for_... could contain things like from/to addresses, interrupt id, 
... */

  I don't think this is a good idea.
  Traps are just too diverse, imo there is too little overlap between
  different architectures, with the sole exception maybe being the PC
  prior to the trap. "Interrupt id" sounds like a reasonably common
  concept, but then you would need to define a mapping for each and every
  architecture. What integer type do you use? In RISC-V, for example,
  exceptions and interrupt "ids" are differentiated via the most
  significant bit. Dou keep that or do you zero it? And then there's
  ring/privilage mode, cause (sometimes for each mode), ...


I didn't want to open the per architecture pandora box :).
  I don't think the plugin API itself should deal with per architecture
  details like meaning of a given id. I was just thinking to push this "raw" 
information to the plugin, that may/may not use architecture specific knowledge to do its 
work. We already have plugins that have similar per architecture knowledge 
(contrib/plugins/howvec.c) and it's ok in some specific cases.

  But how would such an interface look? The last PC aside, what would you
  include, and how? A GArray with named items that are itself just opaque
  blobs?

I was not thinking about a new interface for this. Having the "raw" interrupt 
id is enough for a plugin to do useful things, by having knowledge of which architecture 
it's instrumenting.

But what is would the "raw" interrupt id even be for a given
architecture? I don't think you can answer this question with "obviously
this _one_ integer" for all of them.


Choosing interrupt id as an example of what we want to include was confusing, and brought more conversation than I expected.

I wanted to point that we may want different data per event, instead of talking specifically of interrupt or per architecture specific data. To still answer you, I was thinking sharing interrupt number for x86_64, or interrupt id for aarch64. It's probably too naive, so let's drop this and move on :).


And what would be the benefit compared to just querying the respective
  target specific registers through qemu_plugin_read_register? Which btw.
  is what we were going to do for our use-case. Even the example you
  brought up (howvec) uses querying functions and doesn't expect to get
  all the info via parameters.

You're right, but it's because it's querying instruction data.
I may be wrong on that, but at translation time, we may or may not be 
interested in accessing tb/insn data.

However, for control flow analysis, beyond a simple counting plugin, we 
probably want to access further data almost everytime.

I see it closer from syscall instrumentation, which pushes the syscall id, and 
all register values, instead of letting the user poke it. Makes more sense 
compared to that?

Yes, but then you are in "GArray of named, potentially complex value"
terretory again. And the comparison with syscalls also falls apart when
you consider that, for syscalls, they are well defined and enumerated
identically for at least a variety of targets, while the same kind of
"enumeration", if it even exists, is in completely different order for
every architecture.


We can restrict to having from/to PCs for now. Mentioning interrupt id as example was a mistake.



But having something like from/to address seems useful to start. Even if we 
don't provide it for all events yet, it's ok.

  Yes, I certainly see the advantages of having either the last PC or the
  would-be-next PC as they are sufficiently universal. You can usually
  retrieve them from target-specific registers, but that may be more
  complicated in practice. In the case of RISC-V for example, the value
  of the EPC differs between interrupts and exceptions.

To the opposite of interrupt id, a PC is something universal by definition, and 
with a single meaning across architecture. However, accessing it by name varies 
per architecture, and even per sub events, as you are stating for RISC-V.

Yes. And for that very reason I would not pass "the EPC" to a callback
but a clearly, target agnostic, defined value such as:

| The PC of the instruction that would have been executed next, were it
| not for that event

or

| The PC of the instruction that was executed befroe the event occurred

And unlike interrupt ids, the plugin API already has a precedent for
what type to use: uint64_t


Looks good. As said in previous message, we can drop the complex data type idea.

So we could have something like:

/* plugin side */
void on_cf_event(qemu_plugin_cf_event_type, uint64_t from, uint64_t to) {
   ...
}

/* API side */
void qemu_plugin_register_vcpu_syscall_cb(
qemu_plugin_id_t id, qemu_plugin_cf_event_type type, qemu_plugin_register_vcpu_cf_cb);

We thus would have a new callback type qemu_plugin_vcpu_cf_cb_t added.

For registering several events, we might define enum values for types indexed on every bit, so we can directly xor the enum to register several types. (same idea than existing qemu_plugin_mem_rw, but for more values, with a specific ALL value covering all possibilities).

Does that match what you were thinking?

Regards,
Julian

Reply via email to