On Mon, Mar 04, 2019 at 11:54:38PM -0800, Mike Larkin wrote:
> On Mon, Mar 04, 2019 at 01:25:16PM +0200, Wictor Lund wrote:
> > Hi misc@!
> >
> > I have figured out that it is possible to get vmd(8) into a state where
> > 1) com1_dev.rcv_pending != 0
> > 2) there is data pending on com1_dev.fd
> > 3) the guest doesn't seem to care
> >
> > This results in a locked up situation where com_rcv_event() is called on
> > indefinitely. It seems to me that an interrupt is lost somewhere, leading
> > to a situation where the guest OS is happily ignorant of the available data,
> > while the vmm is waiting for the guest to eat it up.
> >
> > This has made it impossible to install Linux via the serial console on
> > vmm(4). It seems that people previously have reported "freezing" problems
> > in vmm(4) form time to time, but when reported no one else have been able to
> > reproduce it.
> >
> > I have solved the problem for myself by changing com_rcv_event() to the
> > following:
> >
> > static void
> > com_rcv_event(int fd, short kind, void *arg)
> > {
> > mutex_lock(&com1_dev.mutex);
> >
> > /*
> > * We already have other data pending to be received. The data that
> > * has become available now will be moved to the com port later.
> > */
> > if (com1_dev.rcv_pending) {
> > /* If pending interrupt, inject */
> > if ((com1_dev.regs.iir & IIR_NOPEND) == 0) {
> > utrace("comrcv injintr", &com1_dev.regs.lsr,
> > sizeof(com1_dev.regs.lsr));
> > /* XXX: vcpu_id */
> > vcpu_assert_pic_irq((uintptr_t) arg, 0,
> > com1_dev.irq);
> > vcpu_deassert_pic_irq((uintptr_t) arg, 0,
> > com1_dev.irq);
> > }
> > mutex_unlock(&com1_dev.mutex);
> > return;
> > }
> > if (com1_dev.regs.lsr & LSR_RXRDY)
> > com1_dev.rcv_pending = 1;
> > else {
> > com_rcv(&com1_dev, (uintptr_t) arg, 0);
> >
> > /* If pending interrupt, inject */
> > if ((com1_dev.regs.iir & IIR_NOPEND) == 0) {
> > /* XXX: vcpu_id */
> > vcpu_assert_pic_irq((uintptr_t) arg, 0,
> > com1_dev.irq);
> > vcpu_deassert_pic_irq((uintptr_t) arg, 0,
> > com1_dev.irq);
> > }
> > }
> >
> > mutex_unlock(&com1_dev.mutex);
> > }
> >
> > However, I have little experience in the interrupt behaviour on x86. I'm
> > also aware of that there has been an attempt to fix this behaviour [1].
> >
> > I think the problem is that when com_rcv() is called from
> > vcpu_process_com_data(), the interrupt is triggered using vcpu_exit_inout(),
> > which was not touched in the previous attempt [1] to fix the "freezing"
> > problem. vcpu_exit_inout() still uses a simple vcpu_assert_pic_irq() call
> > to trigger the interrupt while for example com_rcv_event() uses the
> > vcpu_assert_pic_irq(); vcpu_deassert_pic_irq() sequence to trigger it.
> >
> > With my modifications to com_rcv_event() I was able to install not only
> > alpine linux, but even debian using the serial console. Without the
> > modification I can't even install alpine linux via the serial console.
> >
> > Any thoughts on this? If people think my change is a sound one, I can make
> > a proper patch for it. If people think the change is unsound, I would have
> > to look into changing vcpu_exit_inout() and probably extend the interface to
> > it to decide how the interrupt should be triggered.
> >
> > 1. https://marc.info/?l=openbsd-cvs&m=153115270302514&w=2
> >
> > --
> > Wictor Lund
> >
>
> Thanks Wictor!
>
> Can you make a proper diff and resend please?
The diff is appended below.
I noticed that uint8_t elcr[2] is handled under pthread_mutex_t pic_mtx in
i8259_deassert_irq() (which is called from vcpu_deassert_pic_irq()), while
elcr is not handled under pic_mtx in pic_set_elcr() and vcpu_exit_elcr().
This may or may not be of importance because i8259_deassert_irq() is a no-op
depending on the value of elcr.
I could write a diff immediately that makes those functions take the mutex,
but I haven't studied the locking strategies in vmd(8) well enough to know
whether it may cause a deadlock.
--
Wictor Lund
diff --git a/usr.sbin/vmd/ns8250.c b/usr.sbin/vmd/ns8250.c
index 6ed614b0a9f..3efbf357702 100644
--- a/usr.sbin/vmd/ns8250.c
+++ b/usr.sbin/vmd/ns8250.c
@@ -115,6 +115,12 @@ com_rcv_event(int fd, short kind, void *arg)
* has become available now will be moved to the com port later.
*/
if (com1_dev.rcv_pending) {
+ /* If no interrupt is currently pending, inject one */
+ if ((com1_dev.regs.iir & IIR_NOPEND) == 0) {
+ /* XXX: vcpu_id */
+ vcpu_assert_pic_irq((uintptr_t) arg, 0, com1_dev.irq);
+ vcpu_deassert_pic_irq((uintptr_t) arg, 0, com1_dev.irq);
+ }
mutex_unlock(&com1_dev.mutex);
return;
}