I suspect the following has occured:

https://github.com/torvalds/linux/blob/v3.16/kernel/sched/fair.c#L4829
        if (!cfs_rq->nr_running)
                goto idle;

        put_prev_task(rq, prev);

        do {
                se = pick_next_entity(cfs_rq, NULL);
                set_next_entity(cfs_rq, se);
                cfs_rq = group_cfs_rq(se);
        } while (cfs_rq);

We can see that `nr_running` is actually set to 0 when checking the
dumps

crash> cfs_rq.nr_running ffff883ffedf3140
  nr_running = 0

When following the call order of `put_prev_task`, it actually calls the
class-specific `put_prev_task_fair`.   This involves calling
`put_prev_entity`.  As part of that function, it does the following:

        /* throttle cfs_rqs exceeding runtime */
        check_cfs_rq_runtime(cfs_rq);

If we start tracing on what `check_cfs_rq_runtime` does, we can see that
it calls `throttle_cfs_rq` which actually manipulates the `nr_running`.
If `nr_running` was not 0 but was set to 0 during `put_prev_task`, the
system crashes as it attempts to pick an entity (but cannot find any).

This is my first time debugging Kernel issues so this is just what I
think so far, but comments are welcome..

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1458045

Title:
  KVM and CFS bandwidth control causes kernel crashes (oops)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1458045/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to