Am 16.06.2015 um 17:21 schrieb Faidon Liambotis: > Hi, > > Any news about this? Can I help in any way? > > Thanks, > Faidon > > On Fri, May 29, 2015 at 05:41:24PM +0300, Faidon Liambotis wrote: >> On Mon, Apr 06, 2015 at 08:50:58PM +0100, Ben Hutchings wrote: >>> It looks the same as this problem: >>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705 >>> http://thread.gmane.org/gmane.linux.ubuntu.devel.kernel.general/39123/ >> >> I just encountered this bug while trying to install jessie on a Dell >> PowerEdge R610 with a SAS 6/iR (fairly recent, much more than 1950s). >> The kernel crashes while in d-i and installation fails. I also tried >> with a nightly d-i with Linux 4.0 -- same issue. >> >> Ironically, I found this bug report, clicked through the referenced >> links, only to discover I had previously investigated this when >> installling a similar server with Ubuntu 14.04 and I've even replied to >> the Launchpad bug above... I can confirm it's the exact same bug. Note >> that it was also covered by LWN(!): https://lwn.net/Articles/611226/ >> >> It's disappointing that this bug hasn't been fixed yet upstream and >> especially the part where mptsas' error handling is broken and the >> kernel crashes instead of gracefully failing. This is a different, >> secondary, bug that is just triggered by the timeout. >> >> In any case, there seems to have been /some/ improvement upstream on >> this. systemd has increased the timeout from 30s to 60s (2e92633) and >> subsequently to 180s (b5338a1), in commits that are both included in >> v217. They have also made this a kernel command-line option >> (udev.event-timeout & rd.udev.event-timeout) but those are more invasive >> patches. >> >> My working servers with Ubuntu 12.04 & 14.04 indicate on their dmesg >> that the probe time is somewhere between 18-31s, so 180s would >> definitely fix the effect of this bug. >> >> The commits above aren't directly backportable to v215 as the upstream >> code has changed significantly but the very simple patch attached is the >> equivalent fix for v215 (it's untested, though). >> >> This affects a large number of Dell systems (~100 alone in my case) and >> there is no practical workaround, so it'd be great if this was fixed in >> a jessie point release. >> >> Best, >> Faidon > >> diff --git a/src/udev/udevd.c b/src/udev/udevd.c >> index a45d324..072499c 100644 >> --- a/src/udev/udevd.c >> +++ b/src/udev/udevd.c >> @@ -1415,7 +1415,7 @@ int main(int argc, char *argv[]) >> if (worker->state != WORKER_RUNNING) >> continue; >> >> - if ((now(CLOCK_MONOTONIC) - >> worker->event_start_usec) > 30 * USEC_PER_SEC) { >> + if ((now(CLOCK_MONOTONIC) - >> worker->event_start_usec) > 180 * USEC_PER_SEC) { >> log_error("worker [%u] %s timeout; >> kill it", worker->pid, >> worker->event ? >> worker->event->devpath : "<idle>"); >> kill(worker->pid, SIGKILL); >
Looking more into this, this patch might actually not be sufficient / the right fix and we might need the following instead: $ git diff diff --git a/src/udev/udev-event.c b/src/udev/udev-event.c index 5213a4a..66d8c40 100644 --- a/src/udev/udev-event.c +++ b/src/udev/udev-event.c @@ -48,7 +48,7 @@ struct udev_event *udev_event_new(struct udev_device *dev) udev_list_init(udev, &event->seclabel_list, false); event->fd_signal = -1; event->birth_usec = now(CLOCK_MONOTONIC); - event->timeout_usec = 30 * 1000 * 1000; + event->timeout_usec = 180 * 1000 * 1000; return event; } Anyone willing to test this patch? I can provide pre-built packages for i386 and amd64. Note: if we want to get this into 8.2, this should happen quickly. The deadline for 8.2. is this weekend. Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth?
signature.asc
Description: OpenPGP digital signature