Daniel Henrique Barboza <[email protected]> writes: > On 8/7/21 11:06 AM, Markus Armbruster wrote: >> Daniel Henrique Barboza <[email protected]> writes: >> >>> Linux Kernel 5.12 is now unisolating CPU DRCs in the device_removal >>> error path, signalling that the hotunplug process wasn't successful. >>> This allow us to send a DEVICE_UNPLUG_ERROR in drc_unisolate_logical() >>> to signal this error to the management layer. >>> >>> We also have another error path in spapr_memory_unplug_rollback() for >>> configured LMB DRCs. Kernels older than 5.13 will not unisolate the LMBs >>> in the hotunplug error path, but it will reconfigure them. Let's send >>> the DEVICE_UNPLUG_ERROR event in that code path as well to cover the >>> case of older kernels. >>> >>> Reviewed-by: Greg Kurz <[email protected]> >>> Signed-off-by: Daniel Henrique Barboza <[email protected]> >>> --- >>> hw/ppc/spapr.c | 9 ++++++++- >>> hw/ppc/spapr_drc.c | 18 ++++++++++++------ >>> 2 files changed, 20 insertions(+), 7 deletions(-) >>> >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >>> index 1611d7ab05..5459f9a7e9 100644 >>> --- a/hw/ppc/spapr.c >>> +++ b/hw/ppc/spapr.c >>> @@ -29,6 +29,7 @@ >>> #include "qemu/datadir.h" >>> #include "qapi/error.h" >>> #include "qapi/qapi-events-machine.h" >>> +#include "qapi/qapi-events-qdev.h" >>> #include "qapi/visitor.h" >>> #include "sysemu/sysemu.h" >>> #include "sysemu/hostmem.h" >>> @@ -3686,13 +3687,19 @@ void spapr_memory_unplug_rollback(SpaprMachineState >>> *spapr, DeviceState *dev) >>> /* >>> * Tell QAPI that something happened and the memory >>> - * hotunplug wasn't successful. >>> + * hotunplug wasn't successful. Keep sending >>> + * MEM_UNPLUG_ERROR even while sending DEVICE_UNPLUG_ERROR >>> + * until the deprecation MEM_UNPLUG_ERROR is due. >>> */ >>> if (dev->id) { >>> qapi_error = g_strdup_printf("Memory hotunplug rejected by the >>> guest " >>> "for device %s", dev->id); >>> qapi_event_send_mem_unplug_error(dev->id, qapi_error); >>> } >>> + >>> + qapi_event_send_device_unplug_error(!!dev->id, dev->id, >>> + dev->canonical_path, >>> + qapi_error != NULL, qapi_error); >>> } >>> >> When dev->id is null, we send something like >> >> {"event": "DEVICE_UNPLUG_ERROR", >> "data": {"path": "/machine/..."}, >> "timestamp": ...} >> >> Unless I'm missing something, this is all the information the management >> application really needs. >> >> When dev->id is non-null, we add to "data": >> >> "device": "dev123", >> "msg": "Memory hotunplug rejected by the guest for device >> dev123", >> >> I'm fine with emitting the device ID when we have it. >> >> What's the intended use of "msg"? >> >> Could DEVICE_UNPLUG_ERROR ever be emitted for this device with a >> different "msg"? > > > It won't have a different 'msg' for the current use of the event in both ppc64 > and x86. It'll always be the same '<dev> hotunplug rejected by the guest' > message. > > The idea is that a future caller might want to insert a more informative > message, such as "hotunplug failed: memory is being used by kernel space" > or any other more specific condition. But then I guess we can argue that, > if that time comes, one can just add this new optional 'msg' member in this > event, and for now we can live without it. > > Would you oppose to renaming this new event to "DEVICE_UNPLUG_GUEST_ERROR" > and then remove the 'msg' member? I guess this rename would make it clearer > for management that we're reporting a guest side error, making any further > clarifications via 'msg' unneeded.
No objection.
