** Also affects: linux-aws (Ubuntu Impish)
   Importance: Undecided
       Status: New

** Changed in: linux-aws (Ubuntu Impish)
   Importance: Undecided => Medium

** Changed in: linux-aws (Ubuntu Impish)
       Status: New => In Progress

** Changed in: linux-aws (Ubuntu Impish)
     Assignee: (unassigned) => Tim Gardner (timg-tpi)

** Changed in: linux-aws (Ubuntu)
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1966969

Title:
  linux-aws: Xen: Issues with detaching volume

Status in linux-aws package in Ubuntu:
  Invalid
Status in linux-aws source package in Impish:
  In Progress

Bug description:
  SRU Justification

  [Impact]

  We are observing issue with the secondary volume stuck in detaching. This is 
observed with the latest Canonical, Ubuntu EKS Node OS (k8s_1.19), 20.04 LTS, 
amd64 focal image build on 2022-03-08 and Xen instance type( for eg : m4, t2 
instance type )
  AMI in eu-west-1 : ami-0f4ffbcba23a6c434
  AMI in us-east-1 : ami-021feb4aa3b3c59c3

  When terminating an instance with a stuck volume the shutdown process is 
interrupted by a xen task hanging:
  [ 847.895334] INFO: task xenwatch:188 blocked for more than 483 seconds.
  [ 847.901573] Not tainted 5.13.0-1017-aws #19~20.04.1-Ubuntu
  [ 847.907144] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 847.914462] task:xenwatch state:D stack: 0 pid: 188 ppid: 2 flags:0x00004000
  [ 847.914467] Call Trace:
  [ 847.914469] <TASK>
  [ 847.914472] __schedule+0x2ee/0x900
  [ 847.914478] schedule+0x4f/0xc0
  [ 847.914479] schedule_preempt_disabled+0xe/0x10
  [ 847.914482] __mutex_lock.isra.0+0x183/0x4d0
  [ 847.914486] __mutex_lock_slowpath+0x13/0x20
  [ 847.914487] mutex_lock+0x32/0x40
  [ 847.914489] del_gendisk+0x90/0x200
  [ 847.914493] xlvbd_release_gendisk+0x72/0xc0
  [ 847.914499] blkback_changed+0x101/0x210
  [ 847.914502] xenbus_otherend_changed+0x8f/0x130
  [ 847.914507] backend_changed+0x13/0x20
  [ 847.914510] xenwatch_thread+0xa6/0x180
  [ 847.914513] ? wait_woken+0x80/0x80
  [ 847.914517] ? test_reply.isra.0+0x40/0x40
  [ 847.914520] kthread+0x12b/0x150
  [ 847.914523] ? set_kthread_struct+0x40/0x40
  [ 847.914525] ret_from_fork+0x22/0x30
  [ 847.914531] </TASK>

  this looks like it's waiting on a xen block device to be released.

  Following steps used to reproduce the issue:
  * Created a m4,t2(xen) instance with the latest ami for latest Canonical, 
Ubuntu EKS Node OS (k8s_1.19), 20.04 LTS, amd64
  * Created a filesystem on volume
  * Mounted volume through OS
  * Unmounted volume in OS
  * Detached volume from AWS console
  * Volume gets stuck.

  We at the internal team observed this commit upstream:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=05d69d950d9d84218fc9beafd02dea1f6a70e09e

  [...] and a del_gendisk from the block device release
  method, which will deadlock.

  It has a Fixes: tag referring to a commit from 5.13 so this could be
  the root-cause. While testing, we observed this commit is fixing the
  issue.

  [Test Plan]

  Amazon tested

  [Where things could go wrong]

  Detaching volumes could fail in new and bizarre ways

  [Other Info]

  SF: #00331175

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1966969/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to