On 10/23/24 09:12, Julian Ganz wrote:
Hi, Pierrick,
October 23, 2024 at 5:16 PM, "Pierrick Bouvier" wrote:
Hi Julian,
On 10/23/24 05:56, Julian Ganz wrote:
October 22, 2024 at 11:15 PM, "Pierrick Bouvier" wrote:
On 10/22/24 01:21, Julian Ganz wrote:
Ok, I'll introduce an enum and combine the three callbacks in the next
iteration then.
typedef struct {
enum qemu_plugin_cf_event_type ev;
union {
data_for_interrupt interrupt;
data_for_trap trap;
data_for_semihosting semihosting;
} qemu_plugin_cf_event;
/* data_for_... could contain things like from/to addresses, interrupt id,
... */
I don't think this is a good idea.
Traps are just too diverse, imo there is too little overlap between
different architectures, with the sole exception maybe being the PC
prior to the trap. "Interrupt id" sounds like a reasonably common
concept, but then you would need to define a mapping for each and every
architecture. What integer type do you use? In RISC-V, for example,
exceptions and interrupt "ids" are differentiated via the most
significant bit. Dou keep that or do you zero it? And then there's
ring/privilage mode, cause (sometimes for each mode), ...
I didn't want to open the per architecture pandora box :).
I don't think the plugin API itself should deal with per architecture
details like meaning of a given id. I was just thinking to push this "raw"
information to the plugin, that may/may not use architecture specific knowledge to do its
work. We already have plugins that have similar per architecture knowledge
(contrib/plugins/howvec.c) and it's ok in some specific cases.
But how would such an interface look? The last PC aside, what would you
include, and how? A GArray with named items that are itself just opaque
blobs?
I was not thinking about a new interface for this. Having the "raw" interrupt
id is enough for a plugin to do useful things, by having knowledge of which architecture
it's instrumenting.
But what is would the "raw" interrupt id even be for a given
architecture? I don't think you can answer this question with "obviously
this _one_ integer" for all of them.
Choosing interrupt id as an example of what we want to include was
confusing, and brought more conversation than I expected.
I wanted to point that we may want different data per event, instead of
talking specifically of interrupt or per architecture specific data.
To still answer you, I was thinking sharing interrupt number for x86_64,
or interrupt id for aarch64. It's probably too naive, so let's drop this
and move on :).
And what would be the benefit compared to just querying the respective
target specific registers through qemu_plugin_read_register? Which btw.
is what we were going to do for our use-case. Even the example you
brought up (howvec) uses querying functions and doesn't expect to get
all the info via parameters.
You're right, but it's because it's querying instruction data.
I may be wrong on that, but at translation time, we may or may not be
interested in accessing tb/insn data.
However, for control flow analysis, beyond a simple counting plugin, we
probably want to access further data almost everytime.
I see it closer from syscall instrumentation, which pushes the syscall id, and
all register values, instead of letting the user poke it. Makes more sense
compared to that?
Yes, but then you are in "GArray of named, potentially complex value"
terretory again. And the comparison with syscalls also falls apart when
you consider that, for syscalls, they are well defined and enumerated
identically for at least a variety of targets, while the same kind of
"enumeration", if it even exists, is in completely different order for
every architecture.
We can restrict to having from/to PCs for now. Mentioning interrupt id
as example was a mistake.
But having something like from/to address seems useful to start. Even if we
don't provide it for all events yet, it's ok.
Yes, I certainly see the advantages of having either the last PC or the
would-be-next PC as they are sufficiently universal. You can usually
retrieve them from target-specific registers, but that may be more
complicated in practice. In the case of RISC-V for example, the value
of the EPC differs between interrupts and exceptions.
To the opposite of interrupt id, a PC is something universal by definition, and
with a single meaning across architecture. However, accessing it by name varies
per architecture, and even per sub events, as you are stating for RISC-V.
Yes. And for that very reason I would not pass "the EPC" to a callback
but a clearly, target agnostic, defined value such as:
| The PC of the instruction that would have been executed next, were it
| not for that event
or
| The PC of the instruction that was executed befroe the event occurred
And unlike interrupt ids, the plugin API already has a precedent for
what type to use: uint64_t
Looks good. As said in previous message, we can drop the complex data
type idea.
So we could have something like:
/* plugin side */
void on_cf_event(qemu_plugin_cf_event_type, uint64_t from, uint64_t to) {
...
}
/* API side */
void qemu_plugin_register_vcpu_syscall_cb(
qemu_plugin_id_t id, qemu_plugin_cf_event_type type,
qemu_plugin_register_vcpu_cf_cb);
We thus would have a new callback type qemu_plugin_vcpu_cf_cb_t added.
For registering several events, we might define enum values for types
indexed on every bit, so we can directly xor the enum to register
several types. (same idea than existing qemu_plugin_mem_rw, but for more
values, with a specific ALL value covering all possibilities).
Does that match what you were thinking?
Regards,
Julian