Package: aptitude-robot
Severity: grave
Version: 1.3.4-1
Tags: patch

Hi,

since at least 4th of December 2014 I noticed hanging
aptitude-robot-session processes on our four machines already running
Jessie. Those hangs seem to happen everytime there are package updates
available.

The issue only happens since recently as we run Jessie as well as Wheezy
systems with aptitude-robot-session via cron for quite a while now. One
installation dates back to 22nd of September and another one to at least
July when the aptitude-robot version currently in Jessie was already
installed.

Today I was able to reduce the issue to a simple "yes '' | aptitude
install -y" with pending and scheduled updates:

# ps auxwwwf | egrep --color '[a]pt|[d]pkg|[d]ebconf|[y]es'
root      9967  0.0  0.0   8228   656 pts/3    S+   20:28   0:00  |       \_ yes
root      9968 14.3  2.3 212780 76112 pts/3    Sl+  20:28   0:01  |       \_ 
aptitude install -y
root     10065  2.3  0.0      0     0 ?        Zs   20:28   0:00  |           
\_ [dpkg] <defunct>

My test setup for this issue was a Jessie installation and I used
dh-exec as package to be upgraded. For that I downloaded the according
dh-exec package for my architecture one version below the current
version in Jessie from http://snapshot.debian.org/package/dh-exec/0.13/.
(Jessie currently has dh-exec 0.14.)

Then I called the following commands to simulate what makes
aptitude-robot-session hang:

# dpkg -i dh-exec_0.13_amd64.deb
# aptitude install --schedule-only dh-exec
# yes '' | aptitude install -y

If a process hangs as mentioned above, the following suffices to unhang
the process:

# ls -l /proc/9968/fd | fgrep /dev/pts
# cat /dev/pts/10

(i.e. check the /dev/pts that aptitude has open and cat that pts. Both,
aptitude-robot-session as well as the cat will exit at about the same
time.)

So the main issue is that aptitude is connected to some terminal, wants
to get rid of some output, but there is nothing which takes the
output.

I'm not yet 100% sure about the correct fix, mostly because I still have
no idea which change made this issue appear.

Besides the manual workaround mentioned above, I've found two other
changes which cause the issue to vanish in the example above:

1) If I remove the "yes '' |" the issue is gone.

   But the "yes" is necessary for the cases which are not covered by
   --force-conf{def,old} e.g. first installs with config files already
   present. See commit 169ee18d77a6a80248bdbd1d95cf626638219cb5 and the
   changelog entry for 1.2.15-1.

2) If I add a "< /dev/null" behind the aptitude call:

   yes '' | aptitude install -y < /dev/null

   The "< /dev/null" was actually present in aptitude-robot-session
   until the "yes '' |" was added, but it seems that nowadays both are
   necessary to avoid the drawbacks of (1) mentioned above.

So I assume that the following patch will fix the issue:

diff --git a/aptitude-robot-session b/aptitude-robot-session
index 213ce85..39dbfbb 100755
--- a/aptitude-robot-session
+++ b/aptitude-robot-session
@@ -67,7 +67,8 @@ export DEBIAN_PRIORITY
 nice yes '' | \
 /usr/sbin/aptitude-robot -y -q "$@" \
     -o DPkg::Options::=--force-confdef \
-    -o DPkg::Options::=--force-confold
+    -o DPkg::Options::=--force-confold \
+    < /dev/null
 
 if [ -n "$POST_SESSION_HOOK" ]; then
     $POST_SESSION_HOOK
diff --git a/debian/changelog b/debian/changelog
index ca9a7ee..5baf5df 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,10 @@
+aptitude-robot (1.3.4.1-1) UNRELEASED; urgency=medium
+
+  * Fix hanging aptitude-robot-session processes with zombie dpkg children
+    by reintroducing "< /dev/null".
+
+ -- Axel Beckert <a...@debian.org>  Sat, 13 Dec 2014 20:48:10 +0100
+
 aptitude-robot (1.3.4-1) unstable; urgency=low
 
   [ Axel Beckert ]
diff --git a/configure.ac b/configure.ac
index 9325322..c31aef5 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1,7 +1,7 @@
 AC_PREREQ([2.67])dnl require version in Debian squeeze (or higher)
 AC_INIT(
     [Aptitude Robot],
-    [1.3.4],
+    [1.3.4.1],
     [el...@heebs.ch],
     [aptitude-robot],
     [https://github.com/elmar/aptitude-robot.git]

But since I've only tested it with manually calling aptitude as shown
above yet, this needs some more testing over the next few days.

Additional things I checked but which didn't seem to make a difference:

* Checking all dpkg versions since 1.17.13 because dpkg in
  Testing/Jessie was upgraded from 1.17.13 to 1.17.21 on 3rd of December
  and first noticed occurrence of that issue was on 4th of December in
  the morning.
* Changing needrestart's configuration to $nrconf{restart} = 'a' and
  $nrconf{ui} = 'NeedRestart::UI::stdio';
* Purging needrestart

[Bug report written on a different system than the one where the issue
 occurred.]

-- System Information:
Debian Release: 8.0
  APT prefers unstable
  APT policy: (990, 'unstable'), (600, 'testing'), (110, 'experimental'), (109, 
'buildd-unstable'), (109, 'buildd-experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 3.17.0-trunk-amd64 (SMP w/4 CPU cores)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)


-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to