Sorry, I overlooked this info: "The server has no problems, plenty of memory and a fast diskarray (SAS->SATA). Never technical problems with this server. And it worked without problems for a long period."
Which tells us that your system is on a metal box. I am afraid you 've got a hardware problem of some sort. I advise you to start checking all hardware components (or just replace the box). Regards, Kuba On Sun, 2011-02-27 at 12:57 +0100, Ruud Baart wrote: > Problem: > For a customer we use LDAP for many years. Last year suddenly the slapd > service just stopped without any traces in the logfiles. After a restart > of slapd everything works fine again. But the problem was there: it was > not an incident, now and then slapd just stops and always without any > traces in the logfiles. Sometime three times a day, sometime a week > without a failure. I can't find a pattern or any relation to any other > service on the linux server. > > Environment: > - Several (debian squeeze) servers , several windows servers. We use bdb > database backend. > - There is one master LDAP server which provides syncprov and two > replica's LDAP servers (syncrepl). The master server is most intens used > (mainly samba as primary domain controller: a few hundred useraccounts, > lot of groupaccounts, workstations, acl's, etc.), one of the replica's > is not very busy but handles the mail for all users (lookup: amavis, > postfix, courier-imap, mailaccount settings etc). The third replica is > not busy at all, it is a remote location. > - Total LDAP is 3700 dn's, slapcat produces a file of 7,3 Mb. > - It is only the master LDAP with stops suddenly. I have never seen a > failure of a replica LDAP. > > Because I have no clear idea about the problem I have no idea which > technical details are relevant: > DB_CONFIG > =========== > set_cachesize 0 10485760 1 > set_lk_max_objects 10000 > set_lk_max_locks 10000 > set_lk_max_lockers 10000 > set_lg_dir /home/ldap-dbd > The database is stored on a ext3 filesystem, kernel 2.6.32. The server > has no problems, plenty of memory and a fast diskarray (SAS->SATA). > Never technical problems with this server. And it worked without > problems for a long period. Nothing has changed to the environment or > the LDAP setup (except of course with the upgrade to debian squeeze but > the problem was already there). > > What we have tried: > - upgrade from openldap 2..4.17 (debian lenny+backports) to openldap > 2.4.23 (debian squeeze). I saw in the release notes that problems > related to syncrepl were solved. Therefor we waited for version 2.4.23 > te become available in debian. This upgrade made no difference. > - reindex, rebuilt the directory. When I rebuilt the LDAP with a clean > LDIF file on the master LDAP or an other machine with ldapadd there is > not one error or warning. > > The workaround for the moment: > I have written a process monitor (perl daemon) which monitors the slapd > daemon and if it suddenly stops, slapd is restarted. It is of course not > a solution but the 300 user can work. If slapd stops without a restart > within 1 minute a few hundred people can't work because samba stops working. > > I would like to receive suggestions what we can do to find the problem. > Because there is no pattern, nothing in the logfiles I don't know where > to start. >
