On Mon, Nov 5, 2018 at 6:32 PM Rishi <[email protected]> wrote:

>
>
> On Mon, Nov 5, 2018 at 6:29 PM Rishi <[email protected]> wrote:
>
>> Yes, I'm taking out patches from 4.4 and actually do have a working 4.9
>> kernel along with blktap. Tested networking and disk IO in it.
>>
>> There are roughly 415 patches to 4.4 out of which some ~210+ are already
>> applied in 4.9 and ~220+ are already applied in 4.14. I dont have numbers
>> for 4.19 yet.
>>
>> Essentially I'm down to single digit number of patches atm to have a
>> working setup for kernel 4.9. I know there would be mishaps since I'm not
>> applying all patches but my experiment is to see how close can we stay near
>> mainline kernel + what can be the patches that kernel.org can accept.
>>
>>
>>
>> On Mon, Nov 5, 2018 at 6:19 PM Wei Liu <[email protected]> wrote:
>>
>>> I forgot to say: please don't top-post.
>>>
>>> On Mon, Nov 05, 2018 at 06:00:10PM +0530, Rishi wrote:
>>> > I'm using a XenServer Host and XCP-NG on it as HVM. I used xencons=tty
>>> > console=ttyS0 on XCP-NG dom0 kernel line, to obtain serial console.
>>> > I'm working on to build a more recent dom0 kernel for improved support
>>> of
>>> > Ceph in XenServer/XCP-NG.
>>>
>>> This is an interesting setup. I don't think you can expect to just drop
>>> in a new kernel to XenServer/XCP-NG and then it works flawlessly. What
>>> did you do to the patch queue XenServer carries for 4.4?
>>>
>>> Also, have you got a working baseline? I.e. did the stock 4.4 kernel
>>> work?
>>>
>>> Wei.
>>>
>>> >
>>> >
>>> >
>>> > On Mon, Nov 5, 2018 at 5:28 PM Wei Liu <[email protected]> wrote:
>>> >
>>> > > On Mon, Nov 05, 2018 at 05:18:43PM +0530, Rishi wrote:
>>> > > > Yes, I'm running it in a HVM domU for development purpose.
>>> > >
>>> > > What is your exact setup?
>>> > >
>>> > > Wei.
>>> > >
>>> > > >
>>> > > > On Mon, Nov 5, 2018 at 5:11 PM Wei Liu <[email protected]>
>>> wrote:
>>> > > >
>>> > > > > On Mon, Nov 05, 2018 at 04:58:35PM +0530, Rishi wrote:
>>> > > > > > Alright, I got the serial console and following is the crash
>>> log.
>>> > > Thank
>>> > > > > you
>>> > > > > > for pointing that out.
>>> > > > > >
>>> > > > > > [  133.594852] watchdog: BUG: soft lockup - CPU#2 stuck for
>>> 22s!
>>> > > > > > [ksoftirqd/2:22]
>>> > > > > >
>>> > > > > > [  133.599232] Kernel panic - not syncing: softlockup: hung
>>> tasks
>>> > > > > >
>>> > > > > > [  133.602275] CPU: 2 PID: 22 Comm: ksoftirqd/2 Tainted: G
>>> > > > > > L    4.19.1
>>> > > > > > #1
>>> > > > > >
>>> > > > > > [  133.606620] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257
>>> > > > > 12/12/2016
>>> > > > >
>>> > > > > Is this serial log from the host? It says it is running as a HVM
>>> DomU.
>>> > > > > Maybe you have mistaken guest serial log with host serial log?
>>> > > > >
>>> > > > > This indicates your machine runs XenServer, which has its own
>>> patch
>>> > > > > queues on top of upstream Xen. You may also want to report to
>>> xs-devel
>>> > > > > mailing list.
>>> > > > >
>>> > > > > Wei.
>>> > > > >
>>> > >
>>>
>>
>
> Sorry, I'll take care of top post from onwards.
>

So after knowing the stack trace, it appears that the CPU was getting stuck
for xen_hypercall_xen_version

watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]


[30569.582740] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
[swapper/0:0]

[30569.588186] Kernel panic - not syncing: softlockup: hung tasks

[30569.591307] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L    4.19.1
#1

[30569.595110] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257 12/12/2016

[30569.598356] Call Trace:

[30569.599597]  <IRQ>

[30569.600920]  dump_stack+0x5a/0x73

[30569.602998]  panic+0xe8/0x249

[30569.604806]  watchdog_timer_fn+0x200/0x230

[30569.607029]  ? softlockup_fn+0x40/0x40

[30569.609246]  __hrtimer_run_queues+0x133/0x270

[30569.611712]  hrtimer_interrupt+0xfb/0x260

[30569.613800]  xen_timer_interrupt+0x1b/0x30

[30569.616972]  __handle_irq_event_percpu+0x69/0x1a0

[30569.619831]  handle_irq_event_percpu+0x30/0x70

[30569.622382]  handle_percpu_irq+0x34/0x50

[30569.625048]  generic_handle_irq+0x1e/0x30

[30569.627216]  __evtchn_fifo_handle_events+0x163/0x1a0

[30569.629955]  __xen_evtchn_do_upcall+0x41/0x70

[30569.632612]  xen_evtchn_do_upcall+0x27/0x50

[30569.635136]  xen_do_hypervisor_callback+0x29/0x40

[30569.638181] RIP: e030:xen_hypercall_xen_version+0xa/0x20

[30569.641302] Code: 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc
cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05
<41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc

[30569.651998] RSP: e02b:ffff8800b6203e10 EFLAGS: 00000246

[30569.655077] RAX: 0000000000040007 RBX: ffff8800ae41a898 RCX:
ffffffff8100122a

[30569.659226] RDX: ffffc900400080ff RSI: 0000000000000000 RDI:
0000000000000000

[30569.663480] RBP: ffff8800ae41a890 R08: 0000000000000000 R09:
0000000000000000

[30569.667943] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000080000600

[30569.672057] R13: 000000000000001d R14: 00000000000001d0 R15:
000000000000001d

[30569.675911]  ? xen_hypercall_xen_version+0xa/0x20

[30569.678470]  ? xen_force_evtchn_callback+0x9/0x10

[30569.681495]  ? check_events+0x12/0x20

[30569.683738]  ? xen_restore_fl_direct+0x1f/0x20

[30569.686632]  ? _raw_spin_unlock_irqrestore+0x14/0x20

[30569.689166]  ? cp_rx_poll+0x427/0x4d0 [8139cp]

[30569.691519]  ? net_rx_action+0x171/0x3a0

[30569.694219]  ? __do_softirq+0x11e/0x295

[30569.696442]  ? irq_exit+0x62/0xb0

[30569.698251]  ? xen_evtchn_do_upcall+0x2c/0x50

[30569.701037]  ? xen_do_hypervisor_callback+0x29/0x40

[30569.704439]  </IRQ>

[30569.705731]  ? xen_hypercall_sched_op+0xa/0x20

[30569.708766]  ? xen_hypercall_sched_op+0xa/0x20

[30569.711344]  ? xen_safe_halt+0xc/0x20

[30569.713353]  ? default_idle+0x80/0x140

[30569.715345]  ? do_idle+0x13a/0x250

[30569.717216]  ? cpu_startup_entry+0x6f/0x80

[30569.719511]  ? start_kernel+0x4f6/0x516

[30569.721681]  ? set_init_arg+0x57/0x57

[30569.723985]  ? xen_start_kernel+0x575/0x57f

[30569.726453] Kernel Offset: disabled



So I wrote a kernel module to try to call the function
xen_hypercall_xen_version through it, it could successfully run and return
the version.


What else should I be checking for?
_______________________________________________
Xen-devel mailing list
[email protected]
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to