Package: ganeti Version: 2.11.6-1~bpo70+1 Severity: important Dear Maintainer,
on the ganeti master node I found that the luxid dies regularly. This happens somewhere between 1 to 1.5 days, possibly depending on the number of commands it deals with. A simple /etc/init.d/ganeti restart "fixes" it and one can use the cluster again, but obviously that is not a good solution. The last thing in log when this happens is ------------------------------------------------------------------------ 2015-05-07 08:50:05,904644000000 CEST: ganeti-luxid pid=175797 INFO Rereading job 51157 2015-05-07 08:50:05,905380000000 CEST: ganeti-luxid pid=175797 INFO Finished jobs: (51157,JOB_STATUS_SUCCESS) 2015-05-07 08:54:21,256650000000 CEST: ganeti-luxid pid=175797 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable) 2015-05-07 08:54:21,257120000000 CEST: ganeti-luxid pid=175797 WARNING Rescheduling jobs: 2015-05-07 08:54:21,257706000000 CEST: ganeti-luxid pid=175797 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable) 2015-05-07 08:54:21,257828000000 CEST: ganeti-luxid pid=175797 WARNING Rescheduling jobs: 2015-05-07 08:54:21,258016000000 CEST: ganeti-luxid pid=175797 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable) 2015-05-07 08:54:21,258163000000 CEST: ganeti-luxid pid=175797 WARNING Rescheduling jobs: 2015-05-07 08:54:21,258431000000 CEST: ganeti-luxid pid=175797 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable) 2015-05-07 08:54:21,258550000000 CEST: ganeti-luxid pid=175797 WARNING Rescheduling jobs: 2015-05-07 08:54:21,258755000000 CEST: ganeti-luxid pid=175797 INFO Waiting jobs: []; running jobs: [36853,36858,36883,36884,36886,36917,37010,44515,44518,44528,44529,44532,44533,44534,44535, 44540,44554] 2015-05-07 08:55:03,053666000000 CEST: ganeti-luxid pid=175797 INFO Successfully handled QueryGroups 2015-05-07 08:55:03,400245000000 CEST: ganeti-luxid pid=175797 INFO Successfully handled Query 2015-05-07 08:55:04,784546000000 CEST: ganeti-luxid pid=175797 INFO Successfully handled Query 2015-05-07 08:55:05,230757000000 CEST: ganeti-luxid pid=175797 INFO Successfully handled Query 2015-05-07 08:55:05,574790000000 CEST: ganeti-luxid pid=175797 INFO New jobs enqueued: 51158 2015-05-07 08:55:05,575083000000 CEST: ganeti-luxid pid=175797 INFO Starting jobs: 51158 2015-05-07 08:55:05,575246000000 CEST: ganeti-luxid pid=175797 INFO Successfully handled SubmitJob ganeti-luxid: file descriptor 1025 out of range for select (0--1024). Recompile with -threaded to work around this. ------------------------------------------------------------------------ Note that the part with the failed started jobs due to "Resource temporarily unavailable" happens at other times too, at which luxid does NOT die, so while that is another thing to look at (wtf, there are enough resources), I dont think it is the cause. Especially as the failure happens without that too: ------------------------------------------------------------------------ 2015-05-13 08:40:05,864418000000 CEST: ganeti-luxid pid=844184 INFO Finished jobs: (51941,JOB_STATUS_SUCCESS) 2015-05-13 08:42:47,965607000000 CEST: ganeti-luxid pid=844184 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable) 2015-05-13 08:42:47,965896000000 CEST: ganeti-luxid pid=844184 WARNING Rescheduling jobs: 2015-05-13 08:42:47,966336000000 CEST: ganeti-luxid pid=844184 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable) 2015-05-13 08:42:47,966457000000 CEST: ganeti-luxid pid=844184 WARNING Rescheduling jobs: 2015-05-13 08:42:47,966727000000 CEST: ganeti-luxid pid=844184 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable) 2015-05-13 08:42:47,966844000000 CEST: ganeti-luxid pid=844184 WARNING Rescheduling jobs: 2015-05-13 08:42:47,967252000000 CEST: ganeti-luxid pid=844184 INFO Waiting jobs: []; running jobs: [36853,36858,36883,36884,36886,36917,37010,44515,44518,44528,44529,44532,44533,44534,44535, 44540,44554] 2015-05-13 08:42:47,967498000000 CEST: ganeti-luxid pid=844184 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable) 2015-05-13 08:42:47,967651000000 CEST: ganeti-luxid pid=844184 WARNING Rescheduling jobs: 2015-05-13 08:45:02,422865000000 CEST: ganeti-luxid pid=844184 INFO Successfully handled QueryGroups ganeti-luxid: file descriptor 1025 out of range for select (0--1024). Recompile with -threaded to work around this. ------------------------------------------------------------------------ -- System Information: Debian Release: 7.8 APT prefers oldstable-updates APT policy: (500, 'oldstable-updates'), (500, 'oldstable') Architecture: amd64 (x86_64) Kernel: Linux 3.16.0-0.bpo.4-amd64 (SMP w/2 CPU cores) Locale: LANG=de_DE.utf8, LC_CTYPE=de_DE.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash -- bye Joerg -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org