Hi,

I am running a 4-way multi-master configuration with a number of slaves
in remote locations. I am currently running openldap 2.4.33 on top of
CentOS 6.3 (I built 2.4.33 from a modified base centos 6 spec file). I
was originally running the centos base openldap 2.4.23 using N-way
multimaster using the syncrepl configuration but I was having problems
with the masters and slaves staying in perfect sync--other than this
2.4.23 was running stably since last spring. I'll try to be brief in
what has happened since Feb 1.

* I upgraded the 4 masters to 2.4.33 and kept the syncrepl
configuration. The syncrepl masters were using RefreshAndPersist while
the slave consumers were using RefreshOnly.
* After the upgrade the 2.4.33 masters began locking up, not refusing
connections, but not returning queries--this would happen 3-4 per day.
When one master locked all the masters would lock. Slaves appear to not
be affected by this.
* I downgraded back 2.4.23 in all of the masters only to have the
lock-ups continue. 
* I slapcat'ed the database on one master and blew away the databases on
all the other masters and slaves and rebuilt everything. I rebuilt one
master and one slave and rsync'ed the slapd.d directory where needed.
Then I started each master one-by-one to validate that they mirrored the
databases correctly. Then I repeated this on the slaves. Unfortunately
the masters would continue to lock up as above.
* So, seeing that the lock-ups were occurring regardless of the openldap
version I decided to go back to 2.4.33 and make the move to
delta-replication.
* This past weekend I finally got delta-replication working. I did the
slapcat-rebuild slapd.d-slapadd on one master and rsync'ed slapd.d to
each master one at a time. All was well and all databases were in
perfect sync.
* Unfortunately the masters would continue to lock, accepting
connections but never servicing the request so all queries would hang.

Looking at this again today I noticed that my masters were all running
at near 100% CPU but continuing to service queries. Depending on the #
of CPUs only one or two threads would be running this high. Using strace
-tt -p <pid-ofthread>, this is what would be spewing out:

18:52:05.713266 sched_yield()           = 0
18:52:05.713323 sched_yield()           = 0
18:52:05.713380 sched_yield()           = 0
18:52:05.713438 sched_yield()           = 0
18:52:05.713495 sched_yield()           = 0
18:52:05.713553 sched_yield()           = 0
18:52:05.713611 sched_yield()           = 0
18:52:05.713668 sched_yield()           = 0
18:52:05.713726 sched_yield()           = 0
18:52:05.713783 sched_yield()           = 0
18:52:05.713840 sched_yield()           = 0
18:52:05.713898 sched_yield()           = 0

I haven't correlated this to the slapd daemons hanging, yet.

There is nothing interesting in the logs when the slapd daemons would
hang. Again when one master hangs they all would hang. I would restart
each master one by one and on occasions when one master restarted the
others would start servicing again. Other times it would take two or
three restarts to get all of the masters servicing again. The only gain
with delta-replication is that they only hang once a day now and usually
after I had gone home.

For now I have implemented a small script that is run from cron every
two minutes to test the slapd daemons if they are hung doing a simple
ldapsearch and if so then restart the slapd daemon. This is done on all
four masters. My database is not large at all with only ~100 users but
it is critical as it is the backend authentication for everything
including the remote access.

Here is the slapcat of my cn=config database (minus the schemas and
operational attributes). It is a fairly typical delta-replication
configuration. The accesslogs use hdb as that is what most (all) of the
accesslogs examples show. The main database is bdb.

Any suggestions  would be greatly appreciated.

Regards,
Bob 
--bs

dn: cn=config
objectClass: olcGlobal
cn: config
olcConfigFile: slapd.conf
olcConfigDir: slapd.d
olcArgsFile: /var/run/openldap/slapd.args
olcAttributeOptions: lang-
olcAuthzPolicy: none
olcConcurrency: 0
olcConnMaxPendingAuth: 1000
olcGentleHUP: FALSE
olcIdleTimeout: 0
olcIndexSubstrIfMaxLen: 4
olcIndexSubstrIfMinLen: 2
olcIndexSubstrAnyLen: 4
olcIndexSubstrAnyStep: 2
olcIndexIntLen: 4
olcLocalSSF: 71
olcPidFile: /var/run/openldap/slapd.pid
olcReadOnly: FALSE
olcSaslSecProps: noplain,noanonymous
olcSecurity: tls=1
olcServerID: 1 ldap://auth1noc.man.o3b.local
olcServerID: 2 ldap://auth2noc.man.o3b.local
olcServerID: 3 ldap://auth1noc.btz.o3b.local
olcServerID: 4 ldap://auth2noc.btz.o3b.local
olcServerID: 5 ldap://auth1gw.nma.o3b.local
olcServerID: 6 ldap://auth2gw.nma.o3b.local
olcServerID: 7 ldap://auth1gw.sun.o3b.local
olcServerID: 8 ldap://auth2gw.sun.o3b.local
olcServerID: 9 ldap://auth1gw.per.o3b.local
olcServerID: 10 ldap://auth2gw.per.o3b.local
olcSockbufMaxIncoming: 262143
olcSockbufMaxIncomingAuth: 16777215
olcThreads: 16
olcTLSCipherSuite: HIGH:MEDIUM:SSLv2
olcTLSCertificateFile: /etc/openldap/cacerts/auth-o3b.crt
olcTLSCertificateKeyFile: /etc/openldap/cacerts/auth-o3b.key
olcTLSCRLCheck: none
olcToolThreads: 1
olcWriteTimeout: 0
olcTLSCACertificateFile: /etc/pki/tls/certs/o3b-master-ca.crt
olcTLSVerifyClient: never
olcLogLevel: sync
olcConnMaxPending: 101

dn: cn=module{0},cn=config
objectClass: olcModuleList
cn: module{0}
olcModulePath: /usr/lib64/openldap
olcModuleLoad: {0}syncprov.la
olcModuleLoad: {1}memberof.la
olcModuleLoad: {2}ppolicy.la
olcModuleLoad: {3}accesslog.la

dn: olcDatabase={-1}frontend,cn=config
objectClass: olcDatabaseConfig
objectClass: olcFrontendConfig
olcDatabase: {-1}frontend
olcAccess: {0}to dn.base=""  by * read
olcAccess: {1}to dn.subtree="cn=monitor"  by
dn.base="cn=rootdn,dc=o3bnetworks
.net" read
olcAccess: {2}to dn.base="cn=subschema"  by * read
olcAddContentAcl: FALSE
olcLastMod: TRUE
olcMaxDerefDepth: 0
olcReadOnly: FALSE
olcSchemaDN: cn=Subschema
olcSecurity: tls=1
olcMonitoring: FALSE
olcPasswordHash: {SSHA}

dn: olcDatabase={0}config,cn=config
objectClass: olcDatabaseConfig
olcDatabase: {0}config
olcAccess: {0}to *  by dn.base="cn=rootdn,dc=o3bnetworks.net" write  by
dn.bas
e="cn=syncdn,dc=o3bnetworks.net" read  by * none
olcAddContentAcl: TRUE
olcLastMod: TRUE
olcLimits: {0}dn.base="cn=rootdn,dc=o3bnetworks.net" size=unlimited
time=unli
mited
olcLimits: {1}dn.base="cn=syncdn,dc=o3bnetworks.net" size=unlimited
time=unli
mited
olcMaxDerefDepth: 15
olcReadOnly: FALSE
olcRootDN: cn=config
olcMirrorMode: TRUE
olcMonitoring: FALSE
olcRootPW:: ***
olcSyncrepl: {0}rid=001 provider=ldap://auth1noc.man.o3b.local
bindmethod=simp
le binddn="cn=syncdn,dc=o3bnetworks.net" credentials="33jJ9nSkSD"
keepalive=0
:5:0 starttls=yes tls_reqcert=allow tls_cipher_suite=HIGH:MEDIUM:SSLv2
search
base="cn=config" scope=sub schemachecking=off type=refreshAndPersist
retry="5
  5 300 +" logbase="cn=accesslog"
logfilter="(&(objectClass=auditWriteObject)(
reqResult=0))" syncdata=accesslog
olcSyncrepl: {1}rid=002 provider=ldap://auth2noc.man.o3b.local
bindmethod=simp
le binddn="cn=syncdn,dc=o3bnetworks.net" credentials="33jJ9nSkSD"
keepalive=0
:5:0 starttls=yes tls_reqcert=allow tls_cipher_suite=HIGH:MEDIUM:SSLv2
search
base="cn=config" scope=sub schemachecking=off type=refreshAndPersist
retry="5
  5 300 +" logbase="cn=accesslog"
logfilter="(&(objectClass=auditWriteObject)(
reqResult=0))" syncdata=accesslog
olcSyncrepl: {2}rid=003 provider=ldap://auth1noc.btz.o3b.local
bindmethod=simp
le binddn="cn=syncdn,dc=o3bnetworks.net" credentials="33jJ9nSkSD"
keepalive=0
:5:0 starttls=yes tls_reqcert=allow tls_cipher_suite=HIGH:MEDIUM:SSLv2
search
base="cn=config" scope=sub schemachecking=off type=refreshAndPersist
retry="5
  5 300 +" logbase="cn=accesslog"
logfilter="(&(objectClass=auditWriteObject)(
reqResult=0))" syncdata=accesslog
olcSyncrepl: {3}rid=004 provider=ldap://auth2noc.btz.o3b.local
bindmethod=simp
le binddn="cn=syncdn,dc=o3bnetworks.net" credentials="33jJ9nSkSD"
keepalive=0
:5:0 starttls=yes tls_reqcert=allow tls_cipher_suite=HIGH:MEDIUM:SSLv2
search
base="cn=config" scope=sub schemachecking=off type=refreshAndPersist
retry="5
  5 300 +" logbase="cn=accesslog"
logfilter="(&(objectClass=auditWriteObject)(
reqResult=0))" syncdata=accesslog

dn: olcOverlay={0}syncprov,olcDatabase={0}config,cn=config
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: {1}syncprov
olcSpCheckpoint: 1000 60

dn: olcOverlay={1}accesslog,olcDatabase={0}config,cn=config
objectClass: olcOverlayConfig
objectClass: olcAccessLogConfig
olcOverlay: {1}accesslog
olcAccessLogDB: cn=accesslog
olcAccessLogOps: writes
olcAccessLogSuccess: TRUE
olcAccessLogPurge: 2+00:00 1+00:00

dn: olcDatabase={1}hdb,cn=config
objectClass: olcDatabaseConfig
objectClass: olcConfig
objectClass: top
objectClass: olcHdbConfig
olcDbDirectory: /var/lib/ldap/accesslog
olcSuffix: cn=accesslog
olcDbConfig: [Deleted]
aXIgLXEgb3B0aW9uKS4g
olcAddContentAcl: FALSE
olcDbCacheFree: 1
olcDbCacheSize: 1000
olcAccess: {0}to *  by self write  by
dn.base="cn=rootdn,dc=o3bnetworks.net" r
ead by dn.base="cn=authdn,dc=o3bnetworks.net" read  by
dn.base="cn=syncdn,dc=
o3bnetworks.net" read
olcDbDirtyRead: FALSE
olcDbIDLcacheSize: 0
olcDbDNcacheSize: 0
olcDbIndex: default eq
olcMaxDerefDepth: 15
olcLimits: {0}dn.base="cn=syncdn,dc=o3bnetworks.net" size=unlimited
time=unli
mited
olcDbSearchStack: 16
olcLastMod: TRUE
olcDbLinearIndex: FALSE
olcDbMode: 0600
olcDbNoSync: FALSE
olcDbShmKey: 0
olcReadOnly: FALSE
olcSecurity: tls=1
olcRootDN: cn=accesslogdn
olcDatabase: {1}hdb

dn: olcOverlay={0}syncprov,olcDatabase={1}hdb,cn=config
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: {0}syncprov
olcSpNoPresent: TRUE
olcSpReloadHint: TRUE

dn: olcDatabase={3}monitor,cn=config
objectClass: olcDatabaseConfig
olcAddContentAcl: FALSE
olcLastMod: TRUE
olcMaxDerefDepth: 15
olcReadOnly: FALSE
olcRootDN: cn=monitor,cn=Monitor
olcRootPW:: bW9uaXRvcg==
olcSecurity: tls=1
olcMonitoring: FALSE
olcDatabase: {3}monitor

dn: olcDatabase={3}bdb,cn=config
objectClass: olcDatabaseConfig
objectClass: olcBdbConfig
olcSuffix: dc=o3bnetworks.net
olcAddContentAcl: FALSE
olcLastMod: TRUE
olcLimits: {0}dn.base="cn=syncdn,dc=o3bnetworks.net" size=unlimited
time=unli
mited
olcMaxDerefDepth: 15
olcReadOnly: FALSE
olcRootDN: cn=rootdn,dc=o3bnetworks.net
olcRootPW:: ***
olcSecurity: tls=1
olcMirrorMode: TRUE
olcMonitoring: TRUE
olcDbDirectory: /var/lib/ldap
olcDbConfig: [Deleted]
olcDbNoSync: FALSE
olcDbDirtyRead: FALSE
olcDbIDLcacheSize: 0
olcDbIndex: objectClass pres,eq
olcDbIndex: cn pres,eq,sub
olcDbIndex: uid pres,eq,sub
olcDbIndex: uidNumber pres,eq
olcDbIndex: gidNumber pres,eq
olcDbIndex: memberUid pres,eq,sub
olcDbIndex: displayName pres,eq,sub
olcDbIndex: sambaSID pres,eq,sub
olcDbIndex: sambaDomainName pres,eq
olcDbIndex: sambaGroupType pres,eq
olcDbIndex: ou pres,eq,sub
olcDbIndex: sambaSIDList pres,eq
olcDbLinearIndex: FALSE
olcDbMode: 0600
olcDbSearchStack: 16
olcDbShmKey: 0
olcDbCacheFree: 1
olcDbDNcacheSize: 0
olcAccess: {0}to *  by self write  by
group/groupOfNames/member.exact="cn=ldap
admins,dc=o3bnetworks.net" write  by
dn.base="cn=authdn,dc=o3bnetworks.net" r
ead  by dn.base="cn=syncdn,dc=o3bnetworks.net" read  by users read  by
anonym
ous read
olcDbCacheSize: 1000
olcDatabase: {3}bdb
olcSyncrepl: {0}rid=011 provider=ldap://auth1noc.man.o3b.local
bindmethod=simp
le binddn="cn=syncdn,dc=o3bnetworks.net" credentials="33jJ9nSkSD"
keepalive=0
:5:0 starttls=yes tls_reqcert=allow tls_cipher_suite=HIGH:MEDIUM:SSLv2
search
base="dc=o3bnetworks.net" scope=sub schemachecking=off
type=refreshAndPersist
  retry="5 5 300 +" logbase="cn=accesslog"
logfilter="(&(objectClass=auditWrit
eObject)(reqResult=0))" syncdata=accesslog
olcSyncrepl: {1}rid=012 provider=ldap://auth2noc.man.o3b.local
bindmethod=simp
le binddn="cn=syncdn,dc=o3bnetworks.net" credentials="33jJ9nSkSD"
keepalive=0
:5:0 starttls=yes tls_reqcert=allow tls_cipher_suite=HIGH:MEDIUM:SSLv2
search
base="dc=o3bnetworks.net" scope=sub schemachecking=off
type=refreshAndPersist
  retry="5 5 300 +" logbase="cn=accesslog"
logfilter="(&(objectClass=auditWrit
eObject)(reqResult=0))" syncdata=accesslog
olcSyncrepl: {2}rid=013 provider=ldap://auth1noc.btz.o3b.local
bindmethod=simp
le binddn="cn=syncdn,dc=o3bnetworks.net" credentials="33jJ9nSkSD"
keepalive=0
:5:0 starttls=yes tls_reqcert=allow tls_cipher_suite=HIGH:MEDIUM:SSLv2
filter
="(objectclass=*)" searchbase="dc=o3bnetworks.net" scope=sub
schemachecking=o
ff type=refreshAndPersist retry="5 5 300 +" logbase="cn=accesslog"
logfilter=
"(&(objectClass=auditWriteObject)(reqResult=0))" syncdata=accesslog
olcSyncrepl: {3}rid=014 provider=ldap://auth2noc.btz.o3b.local
bindmethod=simp
le binddn="cn=syncdn,dc=o3bnetworks.net" credentials="33jJ9nSkSD"
keepalive=0
:5:0 starttls=yes tls_reqcert=allow tls_cipher_suite=HIGH:MEDIUM:SSLv2
filter
="(objectclass=*)" searchbase="dc=o3bnetworks.net" scope=sub
schemachecking=o
ff type=refreshAndPersist retry="5 5 300 +" logbase="cn=accesslog"
logfilter=
"(&(objectClass=auditWriteObject)(reqResult=0))" syncdata=accesslog

dn: olcOverlay={0}memberof,olcDatabase={3}bdb,cn=config
objectClass: olcOverlayConfig
objectClass: olcMemberOf
olcOverlay: {0}memberof
olcMemberOfDangling: ignore
olcMemberOfRefInt: FALSE

dn: olcOverlay={1}syncprov,olcDatabase={3}bdb,cn=config
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: {1}syncprov
olcSpCheckpoint: 1000 60

dn: olcOverlay={2}ppolicy,olcDatabase={3}bdb,cn=config
objectClass: olcOverlayConfig
objectClass: olcConfig
objectClass: top
objectClass: olcPPolicyConfig
olcOverlay: {2}ppolicy
olcPPolicyDefault: cn=O3b,ou=Password,ou=Policy,dc=o3bnetworks.net

dn: olcOverlay={3}accesslog,olcDatabase={3}bdb,cn=config
objectClass: olcOverlayConfig
objectClass: olcAccessLogConfig
olcOverlay: {3}accesslog
olcAccessLogOps: writes
olcAccessLogSuccess: TRUE
olcAccessLogDB: cn=accesslog
olcAccessLogPurge: 2+00:00 1+00:00



Reply via email to