On Wed, Nov 7, 2018 at 12:16 AM Rishi <[email protected]> wrote:
>
>
> On Tue, Nov 6, 2018 at 10:41 PM Rishi <[email protected]> wrote:
>
>>
>>
>> On Tue, Nov 6, 2018 at 5:47 PM Wei Liu <[email protected]> wrote:
>>
>>> On Tue, Nov 06, 2018 at 03:31:31PM +0530, Rishi wrote:
>>> >
>>> > So after knowing the stack trace, it appears that the CPU was getting
>>> stuck
>>> > for xen_hypercall_xen_version
>>>
>>> That hypercall is used when a PV kernel (re-)enables interrupts. See
>>> xen_irq_enable. The purpose is to force the kernel to switch to
>>> hypervisor.
>>>
>>> >
>>> > watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]
>>> >
>>> >
>>> > [30569.582740] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
>>> > [swapper/0:0]
>>> >
>>> > [30569.588186] Kernel panic - not syncing: softlockup: hung tasks
>>> >
>>> > [30569.591307] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G L
>>> 4.19.1
>>> > #1
>>> >
>>> > [30569.595110] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257
>>> 12/12/2016
>>> >
>>> > [30569.598356] Call Trace:
>>> >
>>> > [30569.599597] <IRQ>
>>> >
>>> > [30569.600920] dump_stack+0x5a/0x73
>>> >
>>> > [30569.602998] panic+0xe8/0x249
>>> >
>>> > [30569.604806] watchdog_timer_fn+0x200/0x230
>>> >
>>> > [30569.607029] ? softlockup_fn+0x40/0x40
>>> >
>>> > [30569.609246] __hrtimer_run_queues+0x133/0x270
>>> >
>>> > [30569.611712] hrtimer_interrupt+0xfb/0x260
>>> >
>>> > [30569.613800] xen_timer_interrupt+0x1b/0x30
>>> >
>>> > [30569.616972] __handle_irq_event_percpu+0x69/0x1a0
>>> >
>>> > [30569.619831] handle_irq_event_percpu+0x30/0x70
>>> >
>>> > [30569.622382] handle_percpu_irq+0x34/0x50
>>> >
>>> > [30569.625048] generic_handle_irq+0x1e/0x30
>>> >
>>> > [30569.627216] __evtchn_fifo_handle_events+0x163/0x1a0
>>> >
>>> > [30569.629955] __xen_evtchn_do_upcall+0x41/0x70
>>> >
>>> > [30569.632612] xen_evtchn_do_upcall+0x27/0x50
>>> >
>>> > [30569.635136] xen_do_hypervisor_callback+0x29/0x40
>>> >
>>> > [30569.638181] RIP: e030:xen_hypercall_xen_version+0xa/0x20
>>>
>>> What is the asm code for this RIP?
>>>
>>>
>>> Wei.
>>>
>>
>> The issue of crash is getting resolved with appending "noirqbalance" at
>> xen command line. This way all dom0 cpus are available but not irq balanced
>> at xen.
>>
>> Even though I'm running irqbalance service in dom0 the irqs seems to be
>> not moving. <- this is dom0 perspective, I do not know yet, if it follows
>> Xen irq.
>>
>> I tried objdump, while I have have the function in out but there is no
>> asm code of it. Its just "..."
>>
>> ffffffff81001220 <xen_hypercall_xen_version>:
>>
>> ...
>>
>>
>> ffffffff81001240 <xen_hypercall_console_io>:
>>
>> ...
>>
>> All "hypercalls" appear similarly.
>>
>
> How frequent can be that hypercall/xen_irq_enable()? Like n/s or once a
> while?
> During my tests, the system runs stable unless I'm downloading a large
> file. Files around a GB size are getting downloaded without crash, but
> system crash comes when its above it. I'm using a 2.1GB file & wget to
> download.
>
> Is there a way I can simulate PV kernel (re-)enable of interrupt using a
> kernel module with a controlled fashion?
>
If this is on right track
ffffffff8101ab70 <xen_force_evtchn_callback>:
ffffffff8101ab70: 31 ff xor %edi,%edi
ffffffff8101ab72: 31 f6 xor %esi,%esi
ffffffff8101ab74: e8 a7 66 fe ff callq ffffffff81001220
<xen_hypercall_xen_version>
ffffffff8101ab79: c3 retq
ffffffff8101ab7a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
It seems I'm hitting following code from xen_irq_enable
barrier(); /* unmask then check (avoid races) */
if (unlikely(vcpu->evtchn_upcall_pending))
xen_force_evtchn_callback();
The code says unlikely yet, it is being called, And I got following
structure
struct vcpu_info {
/*
* 'evtchn_upcall_pending' is written non-zero by Xen to indicate
* a pending notification for a particular VCPU. It is then cleared
* by the guest OS /before/ checking for pending work, thus avoiding
* a set-and-check race. Note that the mask is only accessed by Xen
* on the CPU that is currently hosting the VCPU. This means that
the
* pending and mask flags can be updated by the guest without
special
* synchronisation (i.e., no need for the x86 LOCK prefix).
Let me know if the thread is being spammed with such intermediates.
_______________________________________________
Xen-devel mailing list
[email protected]
https://lists.xenproject.org/mailman/listinfo/xen-devel