On Mon, Mar 04, 2019 at 01:25:16PM +0200, Wictor Lund wrote:
> Hi misc@!
>
> I have figured out that it is possible to get vmd(8) into a state where
> 1) com1_dev.rcv_pending != 0
> 2) there is data pending on com1_dev.fd
> 3) the guest doesn't seem to care
>
> This results in a locked up situation where com_rcv_event() is called on
> indefinitely. It seems to me that an interrupt is lost somewhere, leading
> to a situation where the guest OS is happily ignorant of the available data,
> while the vmm is waiting for the guest to eat it up.
>
> This has made it impossible to install Linux via the serial console on
> vmm(4). It seems that people previously have reported "freezing" problems
> in vmm(4) form time to time, but when reported no one else have been able to
> reproduce it.
>
> I have solved the problem for myself by changing com_rcv_event() to the
> following:
>
> static void
> com_rcv_event(int fd, short kind, void *arg)
> {
> mutex_lock(&com1_dev.mutex);
>
> /*
> * We already have other data pending to be received. The data that
> * has become available now will be moved to the com port later.
> */
> if (com1_dev.rcv_pending) {
> /* If pending interrupt, inject */
> if ((com1_dev.regs.iir & IIR_NOPEND) == 0) {
> utrace("comrcv injintr", &com1_dev.regs.lsr,
> sizeof(com1_dev.regs.lsr));
> /* XXX: vcpu_id */
> vcpu_assert_pic_irq((uintptr_t) arg, 0, com1_dev.irq);
> vcpu_deassert_pic_irq((uintptr_t) arg, 0,
> com1_dev.irq);
> }
> mutex_unlock(&com1_dev.mutex);
> return;
> }
> if (com1_dev.regs.lsr & LSR_RXRDY)
> com1_dev.rcv_pending = 1;
> else {
> com_rcv(&com1_dev, (uintptr_t) arg, 0);
>
> /* If pending interrupt, inject */
> if ((com1_dev.regs.iir & IIR_NOPEND) == 0) {
> /* XXX: vcpu_id */
> vcpu_assert_pic_irq((uintptr_t) arg, 0, com1_dev.irq);
> vcpu_deassert_pic_irq((uintptr_t) arg, 0,
> com1_dev.irq);
> }
> }
>
> mutex_unlock(&com1_dev.mutex);
> }
>
> However, I have little experience in the interrupt behaviour on x86. I'm
> also aware of that there has been an attempt to fix this behaviour [1].
>
> I think the problem is that when com_rcv() is called from
> vcpu_process_com_data(), the interrupt is triggered using vcpu_exit_inout(),
> which was not touched in the previous attempt [1] to fix the "freezing"
> problem. vcpu_exit_inout() still uses a simple vcpu_assert_pic_irq() call
> to trigger the interrupt while for example com_rcv_event() uses the
> vcpu_assert_pic_irq(); vcpu_deassert_pic_irq() sequence to trigger it.
>
> With my modifications to com_rcv_event() I was able to install not only
> alpine linux, but even debian using the serial console. Without the
> modification I can't even install alpine linux via the serial console.
>
> Any thoughts on this? If people think my change is a sound one, I can make
> a proper patch for it. If people think the change is unsound, I would have
> to look into changing vcpu_exit_inout() and probably extend the interface to
> it to decide how the interrupt should be triggered.
>
> 1. https://marc.info/?l=openbsd-cvs&m=153115270302514&w=2
>
> --
> Wictor Lund
>
Thanks Wictor!
Can you make a proper diff and resend please?
-ml