Package: ganeti
Version: 2.11.6-1~bpo70+1
Severity: important

Dear Maintainer,

on the ganeti master node I found that the luxid dies regularly. This
happens somewhere between 1 to 1.5 days, possibly depending on the
number of commands it deals with. A simple /etc/init.d/ganeti restart
"fixes" it and one can use the cluster again, but obviously that is
not a good solution.

The last thing in log when this happens is

------------------------------------------------------------------------
2015-05-07 08:50:05,904644000000 CEST: ganeti-luxid pid=175797 INFO Rereading 
job 51157
2015-05-07 08:50:05,905380000000 CEST: ganeti-luxid pid=175797 INFO Finished 
jobs: (51157,JOB_STATUS_SUCCESS)
2015-05-07 08:54:21,256650000000 CEST: ganeti-luxid pid=175797 WARNING Starting 
jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-07 08:54:21,257120000000 CEST: ganeti-luxid pid=175797 WARNING 
Rescheduling jobs: 
2015-05-07 08:54:21,257706000000 CEST: ganeti-luxid pid=175797 WARNING Starting 
jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-07 08:54:21,257828000000 CEST: ganeti-luxid pid=175797 WARNING 
Rescheduling jobs: 
2015-05-07 08:54:21,258016000000 CEST: ganeti-luxid pid=175797 WARNING Starting 
jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-07 08:54:21,258163000000 CEST: ganeti-luxid pid=175797 WARNING 
Rescheduling jobs: 
2015-05-07 08:54:21,258431000000 CEST: ganeti-luxid pid=175797 WARNING Starting 
jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-07 08:54:21,258550000000 CEST: ganeti-luxid pid=175797 WARNING 
Rescheduling jobs: 
2015-05-07 08:54:21,258755000000 CEST: ganeti-luxid pid=175797 INFO Waiting 
jobs: []; running jobs: 
[36853,36858,36883,36884,36886,36917,37010,44515,44518,44528,44529,44532,44533,44534,44535,
44540,44554]
2015-05-07 08:55:03,053666000000 CEST: ganeti-luxid pid=175797 INFO 
Successfully handled QueryGroups
2015-05-07 08:55:03,400245000000 CEST: ganeti-luxid pid=175797 INFO 
Successfully handled Query
2015-05-07 08:55:04,784546000000 CEST: ganeti-luxid pid=175797 INFO 
Successfully handled Query
2015-05-07 08:55:05,230757000000 CEST: ganeti-luxid pid=175797 INFO 
Successfully handled Query
2015-05-07 08:55:05,574790000000 CEST: ganeti-luxid pid=175797 INFO New jobs 
enqueued: 51158
2015-05-07 08:55:05,575083000000 CEST: ganeti-luxid pid=175797 INFO Starting 
jobs: 51158
2015-05-07 08:55:05,575246000000 CEST: ganeti-luxid pid=175797 INFO 
Successfully handled SubmitJob
ganeti-luxid: file descriptor 1025 out of range for select (0--1024).
Recompile with -threaded to work around this.
------------------------------------------------------------------------

Note that the part with the failed started jobs due to "Resource
temporarily unavailable" happens at other times too, at which luxid
does NOT die, so while that is another thing to look at (wtf, there
are enough resources), I dont think it is the cause. Especially as the
failure happens without that too:

------------------------------------------------------------------------
2015-05-13 08:40:05,864418000000 CEST: ganeti-luxid pid=844184 INFO Finished 
jobs: (51941,JOB_STATUS_SUCCESS)
2015-05-13 08:42:47,965607000000 CEST: ganeti-luxid pid=844184 WARNING Starting 
jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-13 08:42:47,965896000000 CEST: ganeti-luxid pid=844184 WARNING 
Rescheduling jobs: 
2015-05-13 08:42:47,966336000000 CEST: ganeti-luxid pid=844184 WARNING Starting 
jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-13 08:42:47,966457000000 CEST: ganeti-luxid pid=844184 WARNING 
Rescheduling jobs: 
2015-05-13 08:42:47,966727000000 CEST: ganeti-luxid pid=844184 WARNING Starting 
jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-13 08:42:47,966844000000 CEST: ganeti-luxid pid=844184 WARNING 
Rescheduling jobs: 
2015-05-13 08:42:47,967252000000 CEST: ganeti-luxid pid=844184 INFO Waiting 
jobs: []; running jobs: 
[36853,36858,36883,36884,36886,36917,37010,44515,44518,44528,44529,44532,44533,44534,44535,
44540,44554]
2015-05-13 08:42:47,967498000000 CEST: ganeti-luxid pid=844184 WARNING Starting 
jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-13 08:42:47,967651000000 CEST: ganeti-luxid pid=844184 WARNING 
Rescheduling jobs: 
2015-05-13 08:45:02,422865000000 CEST: ganeti-luxid pid=844184 INFO 
Successfully handled QueryGroups
ganeti-luxid: file descriptor 1025 out of range for select (0--1024).
Recompile with -threaded to work around this.
------------------------------------------------------------------------


-- System Information:
Debian Release: 7.8
  APT prefers oldstable-updates
  APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-0.bpo.4-amd64 (SMP w/2 CPU cores)
Locale: LANG=de_DE.utf8, LC_CTYPE=de_DE.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

-- 
bye Joerg


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to