** Description changed: + = PROBLEM = + + The version of Upstart currently in trusty (1.12.1-0ubuntu4.2) suffers + from a couple of problems which in combination could make upgrading + difficult on systems with high uptimes. + + The 2 issues are: + + == bug 901038 == + + The issue here is that telinit in trusty is not fully synchronous. This + is not normally a problem. However if Upstart is re-exec'ed using + 'telinit u', as is done by the maintainer scripts for the following packages... + + - libc6 + - libdbus-1-3 + - libjson-c2 + - libnih1 + - libnih-dbus1 + - libselinux1 + - libsepol1 + - upstart + + ... and if that operation is slow (it is normally extremely fast), + subsequent package upgrades (part of the same apt-get run) which *also* + need to restart Upstart may fail since 'telinit u' is unable to connect + to PID 1 as a result of Upstart still being in the process of + re-exec'ing from the last call to 'telinit u'. + + + == bug 1338637 == + + This bug only really affects server systems which have high uptimes. If + a re-exec is triggered via 'telinit u' as part of a package upgrade (see + above), Upstart will consume 2 additional inotify watches. Although this + doesn't affect the correct behaviour of Upstart, it does have two + repercussions: + + a) It wastes inotify watches. + b) It slows down an Upstart re-exec ('telinit u'). + + Note that the slow-down for a vanilla Trusty server may not be + detectable unless the value of /proc/sys/fs/inotify/max_user_instances + has been raised above the default value of 128. + + + = FIX DETAILS = + + == bug 901038 == + + The fix for this bug now makes 'telinit u' block until the re-exec + operation has completed fully. + + Technically, it is not possible to make 'telinit u' synchronous since + because D-Bus connections cannot be serialised and since the 'telinit u' + request is made via a D-Bus connection, when Upstart re-exec's, it has + to sever all D-Bus connections, including the 'telinit u' D-Bus + connection. As such, 'telinit u' now performs the following operations: + + - Requests synchronously that Upstart re-exec itself. + - Polls Upstart "forever" by attempting to connect to PID 1 and if + that operation fails, waiting for a period, the retrying. + + The code is well commented to explain this less-than-ideal but + nonetheless essential poll operation: + + http://bazaar.launchpad.net/~upstart- + devel/upstart/trunk/view/head:/util/telinit.c#L171 + + == bug 1338637 == + + This was a simple bug fix. + + + = FIX AVAILABILITY = + + TBD. + + + = IMPACT = + + On a high uptime server, the more times that 'telinit u' is called (as + the result of normal apt-get updates to any of the packages listed under + bug 901038 above), the slower the operation will take to complete, and + thus the likelihood of bug 901038 being seen will increase. + + + = JUSTIFICATION = + + System updates need to "just work". However, as outlined above, the + longer a systems uptime, the more likely it is to be affected by this + issue which increases the likelihood of a system update failure. + + This issue needs to be fixed as soon as possible to minimise issues for + trusty users and administrators, particularly before any potential + upgrade to Utopic. + + + = TEST CASE = + + == To demonstrate bug 1338637 == + + $ sudo telinit u + $ sudo ls -l /proc/1/fd|grep inotify|wc -l + + The correct output from the above 2 commands should be simply "2". + + However, on trusty systems, if the commands above are run after a fresh + boot, the value will in all likelihood be "4". Also, every subsequent + 'telinit u' will increase the number displayed by 2. + + == To demonstrate bug 901038 == + + for i in $(seq 10); do telinit u; done + + There should be no output from the command-line above, but on a trusty + system the chances are that one or more lines will display like this: + + $ for i in $(seq 5); do sudo telinit u;done + telinit: Failed to connect to socket /com/ubuntu/upstart: Connection refused + telinit: Failed to connect to socket /com/ubuntu/upstart: Connection refused + telinit: Failed to connect to socket /com/ubuntu/upstart: Connection refused + + == To prove the overall problem has been fixed == + + 1) Download the attached "force-reexec.sh" onto a trusty system. + 2) Make executable: + + $ chmod 755 ./force-reexec.sh + + 3) Run script as root, specifying a number of iterations (100) and + saving output to file "./typescript.bad": + + $ script -c 'sudo ./force-reexec.sh 100' typescript.bad + + 4) Upgrade to latest version of Upstart that fixes the 2 bugs. + + 5) Reboot. + + 6) Re-run the script + + $ script -c 'sudo ./force-reexec.sh 100' typescript.good + + 7) Check the results: + + - file "typescript.bad" will show an increasing number of watches + and *MAY* show a slowly increasing restart time (depending on + the value of /proc/sys/fs/inotify/max_user_instances (see above)). + + - file "typescript.good" should consistently show 2 watches and a + restart time of 0 on average. + + + [REGRESSION POTENTIAL] + + The only theoretically potential issue is what happens if the continuous + poll performed by 'telinit u' never completes. However, this should + never happen since: + + 1) Upstart checks to ensure that it can serialise its own state *before* + it actually performs the re-exec. If it cannot for some reason (the + only possibility here is critically low memory), it will + automatically degrade to a stateless re-exec. A stateless re-exec can + only fail if the on-disk "/sbin/init" binary or associated libraries + are somehow corrupted. + + 2) If, after checking that its own state can be serialised, the actual + re-exec operation fails "mid-flight", again, Upstart will + automatically revert to performing a simple stateless re-exec which + can only fail if the on-disk "/sbin/init" binary or associated + libraries are somehow corrupted. + + + = OTHER INFORMATION = + + == Upgrade Procedure == + + Note that updating a system to the latest version of Upstart that fixes + the two bugs outlined above will stop the number of inotify watches + growing, but will NOT bring the value down to the expected "2". + + As such, to correct the problem fully, it is necessary to reboot the + system after successfully upgrading the Upstart package. + + == D-Bus Daemon == + + Note that although Upstart makes heavy use of D-Bus, it does not require + a D-Bus daemon to be running. Specifically, 'telinit u' communicates + with PID 1 via a private abstract D-Bus socket, so is immune from issues + with dbus-daemon(1). + + + + = Original Description = + $ sudo ls -al /proc/1/fd|grep anon|wc -l 2 $ i=0; while [ $i -lt 1024 ]; do sudo telinit u; i=$((i+1)); done $ sudo ls -al /proc/1/fd|grep anon|wc -l 106
** Attachment added: "force-reexec.sh : script that re-exec's upstart and demonstrates both bugs." https://bugs.launchpad.net/upstart/+bug/1338637/+attachment/4207315/+files/force-reexec.sh -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1338637 Title: continuous re-exec can result in a build-up of inotify fds [SRU] To manage notifications about this bug go to: https://bugs.launchpad.net/upstart/+bug/1338637/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs