** Description changed:

+ = PROBLEM =
+ 
+ The version of Upstart currently in trusty (1.12.1-0ubuntu4.2) suffers
+ from a couple of problems which in combination could make upgrading
+ difficult on systems with high uptimes.
+ 
+ The 2 issues are:
+ 
+ == bug 901038 ==
+ 
+ The issue here is that telinit in trusty is not fully synchronous. This
+ is not normally a problem. However if Upstart is re-exec'ed using
+ 'telinit u', as is done by the maintainer scripts for the following 
packages...
+ 
+   - libc6
+   - libdbus-1-3
+   - libjson-c2
+   - libnih1
+   - libnih-dbus1
+   - libselinux1
+   - libsepol1
+   - upstart
+ 
+ ... and if that operation is slow (it is normally extremely fast),
+ subsequent package upgrades (part of the same apt-get run) which *also*
+ need to restart Upstart may fail since 'telinit u' is unable to connect
+ to PID 1 as a result of Upstart still being in the process of
+ re-exec'ing from the last call to 'telinit u'.
+ 
+ 
+ == bug 1338637 ==
+ 
+ This bug only really affects server systems which have high uptimes. If
+ a re-exec is triggered via 'telinit u' as part of a package upgrade (see
+ above), Upstart will consume 2 additional inotify watches. Although this
+ doesn't affect the correct behaviour of Upstart, it does have two
+ repercussions:
+ 
+ a) It wastes inotify watches.
+ b) It slows down an Upstart re-exec ('telinit u').
+ 
+ Note that the slow-down for a vanilla Trusty server may not be
+ detectable unless the value of /proc/sys/fs/inotify/max_user_instances
+ has been raised above the default value of 128.
+ 
+ 
+ = FIX DETAILS =
+ 
+ == bug 901038 ==
+ 
+ The fix for this bug now makes 'telinit u' block until the re-exec
+ operation has completed fully.
+ 
+ Technically, it is not possible to make 'telinit u' synchronous since
+ because D-Bus connections cannot be serialised and since the 'telinit u'
+ request is made via a D-Bus connection, when Upstart re-exec's, it has
+ to sever all D-Bus connections, including the 'telinit u' D-Bus
+ connection. As such, 'telinit u' now performs the following operations:
+ 
+   - Requests synchronously that Upstart re-exec itself.
+   - Polls Upstart "forever" by attempting to connect to PID 1 and if
+     that operation fails, waiting for a period, the retrying.
+ 
+ The code is well commented to explain this less-than-ideal but
+ nonetheless essential poll operation:
+ 
+   http://bazaar.launchpad.net/~upstart-
+ devel/upstart/trunk/view/head:/util/telinit.c#L171
+ 
+ == bug 1338637 ==
+ 
+ This was a simple bug fix.
+ 
+ 
+ = FIX AVAILABILITY =
+ 
+ TBD.
+ 
+ 
+ = IMPACT =
+ 
+ On a high uptime server, the more times that 'telinit u' is called (as
+ the result of normal apt-get updates to any of the packages listed under
+ bug 901038 above), the slower the operation will take to complete, and
+ thus the likelihood of bug 901038 being seen will increase.
+ 
+ 
+ = JUSTIFICATION =
+ 
+ System updates need to "just work". However, as outlined above, the
+ longer a systems uptime, the more likely it is to be affected by this
+ issue which increases the likelihood of a system update failure.
+ 
+ This issue needs to be fixed as soon as possible to minimise issues for
+ trusty users and administrators, particularly before any potential
+ upgrade to Utopic.
+ 
+ 
+ = TEST CASE =
+ 
+ == To demonstrate bug 1338637 ==
+ 
+ $ sudo telinit u
+ $ sudo ls -l /proc/1/fd|grep inotify|wc -l
+ 
+ The correct output from the above 2 commands should be simply "2".
+ 
+ However, on trusty systems, if the commands above are run after a fresh
+ boot, the value will in all likelihood be "4". Also, every subsequent
+ 'telinit u' will increase the number displayed by 2.
+ 
+ == To demonstrate bug 901038 ==
+ 
+ for i in $(seq 10); do telinit u; done
+ 
+ There should be no output from the command-line above, but on a trusty
+ system the chances are that one or more lines will display like this:
+ 
+ $ for i in $(seq 5); do sudo telinit u;done
+ telinit: Failed to connect to socket /com/ubuntu/upstart: Connection refused
+ telinit: Failed to connect to socket /com/ubuntu/upstart: Connection refused
+ telinit: Failed to connect to socket /com/ubuntu/upstart: Connection refused
+ 
+ == To prove the overall problem has been fixed ==
+ 
+ 1) Download the attached "force-reexec.sh" onto a trusty system.
+ 2) Make executable:
+ 
+    $ chmod 755 ./force-reexec.sh
+ 
+ 3) Run script as root, specifying a number of iterations (100) and
+    saving output to file "./typescript.bad":
+ 
+    $ script -c 'sudo ./force-reexec.sh 100' typescript.bad
+ 
+ 4) Upgrade to latest version of Upstart that fixes the 2 bugs.
+ 
+ 5) Reboot.
+ 
+ 6) Re-run the script
+ 
+    $ script -c 'sudo ./force-reexec.sh 100' typescript.good
+ 
+ 7) Check the results:
+ 
+    - file "typescript.bad" will show an increasing number of watches
+      and *MAY* show a slowly increasing restart time (depending on 
+      the value of /proc/sys/fs/inotify/max_user_instances (see above)).
+ 
+   - file "typescript.good" should consistently show 2 watches and a
+     restart time of 0 on average.
+ 
+ 
+ [REGRESSION POTENTIAL]
+ 
+ The only theoretically potential issue is what happens if the continuous
+ poll performed by 'telinit u' never completes. However, this should
+ never happen since:
+ 
+ 1) Upstart checks to ensure that it can serialise its own state *before*
+    it actually performs the re-exec. If it cannot for some reason (the
+    only possibility here is critically low memory), it will
+    automatically degrade to a stateless re-exec. A stateless re-exec can
+    only fail if the on-disk "/sbin/init" binary or associated libraries
+    are somehow corrupted.
+ 
+ 2) If, after checking that its own state can be serialised, the actual
+    re-exec operation fails "mid-flight", again, Upstart will
+    automatically revert to performing a simple stateless re-exec which
+    can only fail if the on-disk "/sbin/init" binary or associated
+    libraries are somehow corrupted.
+ 
+ 
+ = OTHER INFORMATION =
+ 
+ == Upgrade Procedure ==
+ 
+ Note that updating a system to the latest version of Upstart that fixes
+ the two bugs outlined above will stop the number of inotify watches
+ growing, but will NOT bring the value down to the expected "2".
+ 
+ As such, to correct the problem fully, it is necessary to reboot the
+ system after successfully upgrading the Upstart package.
+ 
+ == D-Bus Daemon ==
+ 
+ Note that although Upstart makes heavy use of D-Bus, it does not require
+ a D-Bus daemon to be running. Specifically, 'telinit u' communicates
+ with PID 1 via a private abstract D-Bus socket, so is immune from issues
+ with dbus-daemon(1).
+ 
+ 
+ 
+ = Original Description =
+ 
  $ sudo ls -al /proc/1/fd|grep anon|wc -l
  2
  
  $ i=0; while [ $i -lt 1024 ]; do sudo telinit u; i=$((i+1)); done
  
  $ sudo ls -al /proc/1/fd|grep anon|wc -l
  106

** Attachment added: "force-reexec.sh : script that re-exec's upstart and 
demonstrates both bugs."
   
https://bugs.launchpad.net/upstart/+bug/1338637/+attachment/4207315/+files/force-reexec.sh

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1338637

Title:
  continuous re-exec can result in a build-up of inotify fds [SRU]

To manage notifications about this bug go to:
https://bugs.launchpad.net/upstart/+bug/1338637/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to