On Wed, Jan 04, 2017 at 11:33:36AM +0100, Jan Kiszka wrote: > On 2017-01-03 07:15, Peter Xu wrote: > > On Sun, Jun 26, 2016 at 03:27:50PM +0200, Jan Kiszka wrote: > >> On 2016-06-26 03:48, Peter Xu wrote: > >>> On Sat, Jun 25, 2016 at 05:18:40PM +0200, Jan Kiszka wrote: > >>>> On 2016-06-25 15:18, Peter Xu wrote: > >>>>> On Sat, Jun 25, 2016 at 10:08:10AM +0200, Jan Kiszka wrote: > >>> > >>> [...] > >>> > >>>>> I have a thought on how to implement the "sink" you have mentioned: > >>>>> > >>>>> First of all, in KVM, we provide a new KVM_IRQ_ROUTING_* type, maybe > >>>>> called: > >>>>> > >>>>> KVM_IRQ_ROUTING_EVENTFD > >>>> > >>>> Not really, because all sources are either using eventfds, which you can > >>>> also terminate in user space (already done for vhost and vfio in certain > >>>> scenarios - IIRC) or originate there anyway (IOAPIC). > >>> > >>> But how should we handle the cases when the interrupt path are all in > >>> kernel? > >> > >> There are none which we can't redirect (only full in-kernel irqchip > >> would have, but that's unsupported anyway). > >> > >>> > >>> For vhost, data should be transfered all inside kernel when split > >>> irqchip and irqfd are used: when vhost got data, it triggers irqfd to > >>> deliver the interrupt to KVM. Along the way, we should all in kernel. > >>> > >>> For vfio, we have vfio_msihandler() who handles the hardware IRQ and > >>> then triggers irqfd as well to KVM. Again, it seems all in kernel > >>> space, no chance to stop that as well. > >>> > >>> Please correct me if I was wrong. > >> > >> Look at what vhost is doing e.g.: when a virtqueue is masked, it > >> installs an event notifier that records incoming events in a pending > >> state field. When it's unmasked, the corresponding KVM irqfd is installed. > > > > Hmm I think it's time I pick up this topic up again... :) > > > > Since it's been half a year from the last post of this thread (I > > believe this thread is the so-called "cold data" and should be stored > > on tapes already... and sorry fot the long delay), I'd like to do a > > quick summary on this: interrupt remap still cannot work well when we > > install fault interrupts - when that happens, we should inject VT-d > > fault, rather than keeping silence. > > > > The suggestion from Jan above should be a good solution that only need > > to touch qemu part - that's the most benefit AFAIU. However, OTOH IMO > > we need to modify all the kvm irqfd users with this fix (pci-assign, > > ioapic, ivshmem, vfio-pci, virtio) - we need to have all these devices > > init with an "fault sink" eventfd, then when we detected specific > > irqfd install error, we install the "fault sink". What's worse, if we > > add new devices with irqfd support, we need to implement the same > > error handling logic as well. Am I understanding it correctly? If so, > > isn't that awkward? > > > > Now I am re-thinking about my KVM_IRQ_ROUTING_EVENTFD proposal to do > > it - in that case, we should not need to worry about the users of kvm > > irqfd, and the error handling is done automatically even with new > > irqfd users coming in. The disadvantage is of course we need to touch > > both qemu and kvm, also we need to touch KVM API for it (though I > > think it'll only need very small change in KVM). And not sure whether > > that would worth it. > > > > Or, any better way to do it? > > > > Hope I didn't miss anything. Comments are welcomed! > > > > I don't have the details in mind again, but I suppose the only > alternative to fixing a QEMU boilerplate code issue with new KVM kernel > interface is abstracting the common patterns in QEMU that all the irqfd > users share and solve solve that topic once. Might turn out, though, > that the exiting kernel interface prevents this...
Hmm, (after a quick glance) I was just afraid that I might need to touch lots of codes in QEMU even to provide such a common layer for this single fault tolerance feature. Then let me think it over again... Thanks Jan! -- peterx
