From cloud-init point of view the solution now implemented make sense:
to run it before the apt-daily-upgrade. However, I wanted to add that
there are other use cases as well such as SSM documents being executed
on instances. These can be executed in batch at any time and may also
require installation of packages and thus interfere with these
unattended upgrades.

The execution of documents is not linked directly to cloud-init and may
be ran after the instances has been booted, so this falls in the other
category of having some kind queuing system or at least a centralized
way to obtain a lock to be able to use apt. At the moment there are
dozens of different possibilities how to get a mutex to be able to
execute apt, but somehow we couldn't find a bullet proof way that works
*every time*.

So maybe this does not really fit into this ticket, but to address that
this is only a partial fix to a bigger problem.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to apt in Ubuntu.
https://bugs.launchpad.net/bugs/1693361

Title:
  cloud-init sometimes fails on dpkg lock due to concurrent apt-
  daily.service execution

Status in APT:
  Fix Released
Status in cloud-init:
  Fix Released
Status in apt package in Ubuntu:
  Invalid
Status in cloud-init package in Ubuntu:
  Fix Released
Status in cloud-init source package in Xenial:
  Fix Released
Status in cloud-init source package in Yakkety:
  Won't Fix
Status in cloud-init source package in Zesty:
  Fix Released
Status in cloud-init source package in Artful:
  Fix Released

Bug description:
  === Begin SRU Template ===
  [Impact]
  A cloud-config that contains packages to install (see below) or
  'package_upgrade' will run 'apt-get update'.  That can sometimes fail as a
  result of contention with the apt-daily.service that updates that information.

  Cloud-config showing the problem is just like:

    $ cat my.yaml
    #cloud-config
    packages: ['hello']

  [Test Case]
  lxc-proposed-snapshot is
    
https://git.launchpad.net/~smoser/cloud-init/+git/sru-info/tree/bin/lxc-proposed-snapshot
  It publishes an image to lxd with proposed enabled and cloud-init upgraded.

  a.) launch an instance with proposed version of cloud-init and some user-data.
     This is platform independent.  The test case demonstrates lxd.
     $ printf "%s\n%s\n%s\n" "#cloud-config" "packages: ['hello']" \
         "package_upgrade: true" > config.yaml
     $ release=xenial
     $ ref=proposed-$release
     $ ./lxc-proposed-snapshot --proposed --publish $release $ref;

  b.) start the instance
     $ name=$release-1693361
     $ lxc launch my-xenial "--config=user.user-data=$(cat config.yaml)
     $ sleep 1
     $ lxc exec $name -- tail -f /var/log/cloud-init.log 
/var/log/cloud-init-output.log
     # watch this boot.

   c.) Look for evidence of systemd failure
     journalctl -o short-precise | grep -i break
     journalctl -o short-precise | grep -i order

  [Regression Potential]
  Regression chance here is low.  Its possible that ordering loops
  could occur.  When that does happen, journalctl will mention it.  
Unfortunately
  in such cases systemd somewhat randomly picks a service to kil so behavior
  is somewhat undefined.

  [Other Info]
  Upstream commit at
    https://git.launchpad.net/cloud-init/commit/?id=11121fe4

  === End SRU Template ===

  apt-daily is now a systemd service rather than being invoked by
  cron.daily.  If one builds a custom AMI it is possible that the apt-
  daily.timer will fire during boot.  This can fire at the same time
  cloud-init is running and if cloud-init loses the race the invocation
  of apt (e.g. use of "packages:" in the config) will fail.

  There is a lot of discussion online about this change to apt-daily
  (e.g. unattended upgrades happening during business hours, delaying
  boot, etc.) and discussion of potential systemd changes regarding
  timers firing during boot (c.f.
  https://github.com/systemd/systemd/issues/5659).

  While it would be better to solve this in apt itself, I suggest that
  cloud-init be defensive when calling apt and implement some retry
  mechanism.

  Various instances of people running into this issue:

  https://github.com/chef/bento/issues/609
  https://clusterhq.atlassian.net/browse/FLOC-4486
  https://github.com/boxcutter/ubuntu/issues/73
  
https://unix.stackexchange.com/questions/315502/how-to-disable-apt-daily-service-on-ubuntu-cloud-vm-image

To manage notifications about this bug go to:
https://bugs.launchpad.net/apt/+bug/1693361/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to