While looking for some improvements on cpu-on-off-test.sh, I
investigated the path that is taken on azure. One good tool to debug
these failures is the cpuhp tracing.

echo 1 > /sys/kernel/debug/tracing/events/cpuhp/enable
echo 1 > /sys/kernel/debug/tracing/tracing_on
cat /sys/kernel/debug/tracing/trace_pipe &
echo 0 > /sys/devices/system/cpu/cpu1/online

            bash-2660    [001] .... 10306.404476: cpuhp_enter: cpu: 0001 
target: 143 step: 213 (cpuhp_kick_ap_work)
         cpuhp/1-14      [001] .... 10306.404506: cpuhp_enter: cpu: 0001 
target: 143 step: 212 (sched_cpu_deactivate)
         cpuhp/1-14      [001] .... 10306.420476: cpuhp_exit:  cpu: 0001  
state: 211 step: 212 ret: 0
         cpuhp/1-14      [001] .... 10306.420502: cpuhp_enter: cpu: 0001 
target: 143 step: 189 (msr_device_destroy [msr])
         cpuhp/1-14      [001] .... 10306.420623: cpuhp_exit:  cpu: 0001  
state: 188 step: 189 ret: 0
         cpuhp/1-14      [001] .... 10306.420629: cpuhp_enter: cpu: 0001 
target: 143 step: 187 (mce_cpu_pre_down)
         cpuhp/1-14      [001] .... 10306.420653: cpuhp_exit:  cpu: 0001  
state: 186 step: 187 ret: 0
         cpuhp/1-14      [001] .... 10306.420654: cpuhp_enter: cpu: 0001 
target: 143 step: 184 (hv_synic_cleanup)
         cpuhp/1-14      [001] .... 10306.420655: cpuhp_exit:  cpu: 0001  
state: 183 step: 184 ret: -16
         cpuhp/1-14      [001] .... 10306.426523: cpuhp_enter: cpu: 0001 
target: 213 step: 185 (compute_batch_value)
         cpuhp/1-14      [001] .... 10306.426541: cpuhp_exit:  cpu: 0001  
state: 185 step: 185 ret: 0
         cpuhp/1-14      [001] .... 10306.426546: cpuhp_enter: cpu: 0001 
target: 213 step: 186 (acpi_soft_cpu_online)
         cpuhp/1-14      [001] .... 10306.426553: cpuhp_exit:  cpu: 0001  
state: 186 step: 186 ret: 0
         cpuhp/1-14      [001] .... 10306.426553: cpuhp_enter: cpu: 0001 
target: 213 step: 187 (mce_cpu_online)
         cpuhp/1-14      [001] .... 10306.426597: cpuhp_exit:  cpu: 0001  
state: 187 step: 187 ret: 0
         cpuhp/1-14      [001] .... 10306.426606: cpuhp_enter: cpu: 0001 
target: 213 step: 188 (console_cpu_notify)
         cpuhp/1-14      [001] .... 10306.426607: cpuhp_exit:  cpu: 0001  
state: 188 step: 188 ret: 0
         cpuhp/1-14      [001] .... 10306.426611: cpuhp_enter: cpu: 0001 
target: 213 step: 189 (msr_device_create [msr])
         cpuhp/1-14      [001] .... 10306.426665: cpuhp_exit:  cpu: 0001  
state: 189 step: 189 ret: 0
         cpuhp/1-14      [001] .... 10306.426667: cpuhp_enter: cpu: 0001 
target: 213 step: 212 (sched_cpu_activate)
         cpuhp/1-14      [001] .... 10306.426670: cpuhp_exit:  cpu: 0001  
state: 212 step: 212 ret: 0


Notice the -16 failure on hv_synic_cleanup. It can fail for two reasons: the 
cpu is VMBUS_CONNECT_CPU, which is 0, not the case here, or the cpu is the 
target_cpu for a vmbus_channel.

# grep 1 /sys/bus/vmbus/devices/*/channels/*/cpu
/sys/bus/vmbus/devices/00000000-0001-8899-0000-000000000000/channels/3/cpu:1
/sys/bus/vmbus/devices/000d3a6e-002e-000d-3a6e-002e000d3a6e/channels/15/cpu:1
/sys/bus/vmbus/devices/f8b3781b-1e82-4818-a1c3-63d806ec15bb/channels/13/cpu:1

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure-4.15 in Ubuntu.
https://bugs.launchpad.net/bugs/1923114

Title:
  ubuntu_kernel_selftests: ./cpu-on-off-test.sh: line 94: echo: write
  error: Device or resource busy

Status in ubuntu-kernel-tests:
  In Progress
Status in linux-azure package in Ubuntu:
  New
Status in linux-azure-4.15 package in Ubuntu:
  New
Status in linux-azure source package in Trusty:
  New
Status in linux-azure-4.15 source package in Trusty:
  New
Status in linux-azure source package in Xenial:
  New
Status in linux-azure-4.15 source package in Xenial:
  New
Status in linux-azure source package in Bionic:
  New
Status in linux-azure-4.15 source package in Bionic:
  New
Status in linux-azure source package in Groovy:
  New
Status in linux-azure-4.15 source package in Groovy:
  New

Bug description:
  Test cpu-hotplug from ubuntu_kernel_selftests failed with
  bionic:linux-azure-4.15 running on a Basic A2 with 2 cores (besides
  other instance types):

  selftests: cpu-on-off-test.sh
  ========================================
  pid 28041's current affinity mask: 3
  pid 28041's new affinity mask: 1
  CPU online/offline summary:
  present_cpus = 0-1 present_max = 1
  Cpus in online state: 0-1
  Cpus in offline state: 0
  Limited scope test: one hotplug cpu
  (leaves cpu in the original state):
  online to offline to online: cpu 1
  not ok 1..1 selftests: cpu-on-off-test.sh [FAIL]
  ./cpu-on-off-test.sh: line 94: echo: write error: Device or resource busy
  offline_cpu_expect_success 1: unexpected fail

  http://10.246.72.46/4.15.0-1112.124~16.04.1-azure/xenial-linux-azure-
  azure-
  
amd64-4.15.0-Basic_A2-ubuntu_kernel_selftests/ubuntu_kernel_selftests/results/ubuntu_kernel_selftests.cpu-
  hotplug/debug/ubuntu_kernel_selftests.cpu-hotplug.DEBUG.html

  The problem happens at "autotest-client-
  tests/ubuntu_kernel_selftests/cpu-on-off-test.sh" when executing:

          echo 0 > $SYSFS/devices/system/cpu/cpu$1/online

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1923114/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to