Mircea Baciu wrote:
> Hi,
>
> I have an issue with a consumer replication starting to fail until OpenLDAP
> is restarted.
>
> My setup consists of a pair of on-prem MirrorMode replicated providers (only
> one is active at a given time using a virtual IP managed by Keepalived), and
> one
> off-site (AWS) consumer. The providers use a dedicated port (LDAPS on 1636)
> for their own replication, as well as for the consumer to connect to them, so
> the
> consumer has access to both servers, regardless of where the providers'
> virtual IP is residing.
>
> All the connections happen over LDAPS, and the syncrepl configs have the
> tls_reqcert=allow option.
>
> The providers are always in sync and I'm able to switch make one or the other
> one the "active" one with ease. The consumer does the initial sync and stays
> in
> sync for a while, but I find it often (almost daily) out of sync. I see error
> messages on both the consumer and provider side:
Sounds like an issue in the TLS layer. You should increase the debug level on
both provider and consumer to see
if there are any TLS-specific error messages being generated. If you have
cn=monitor configured you can set the
debuglevel using ldapmodify, so no need to restart the servers for it to take
effect. That'll let you see the
problem as it's occurring.
>
> On the consumer (every minute):
> Sep 20 08:19:31 <consumer> slapd[1440]: slap_client_connect:
> URI=ldaps://<provider1>:1636/
> DN="uid=replication,ou=sysaccounts,dc=example,dc=com"
> ldap_sasl_bind_s failed (-1)
> Sep 20 08:19:31 <consumer> slapd[1440]: do_syncrepl: rid=001 rc -1 retrying
> Sep 20 08:19:31 <consumer> slapd[1440]: slap_client_connect:
> URI=ldaps://<provider2>:1636/
> DN="uid=replication,ou=sysaccounts,dc=example,dc=com"
> ldap_sasl_bind_s failed (-1)
> Sep 20 08:19:31 <consumer> slapd[1440]: do_syncrepl: rid=002 rc -1 retrying
> Sep 20 08:20:31 <consumer> slapd[1440]: slap_client_connect:
> URI=ldaps://<provider1>:1636/
> DN="uid=replication,ou=sysaccounts,dc=example,dc=com"
> ldap_sasl_bind_s failed (-1)
> Sep 20 08:20:31 <consumer> slapd[1440]: do_syncrepl: rid=001 rc -1 retrying
> Sep 20 08:20:31 <consumer> slapd[1440]: slap_client_connect:
> URI=ldaps://<provider2>:1636/
> DN="uid=replication,ou=sysaccounts,dc=example,dc=com"
> ldap_sasl_bind_s failed (-1)
> Sep 20 08:20:31 <consumer> slapd[1440]: do_syncrepl: rid=002 rc -1 retrying
>
> On the provider (every minute):
> Sep 20 08:19:31 <provider1> slapd[1057]: conn=11242 fd=14 ACCEPT from
> IP=<consumer IP>:45438 (IP=<provider1 IP>:1636)
> Sep 20 08:19:31 <provider1> slapd[1057]: conn=11242 fd=14 TLS established
> tls_ssf=256 ssf=256
> Sep 20 08:19:31 <provider1> slapd[1057]: conn=11242 fd=14 closed (connection
> lost)
> Sep 20 08:20:31 <provider1> slapd[1057]: conn=11243 fd=14 ACCEPT from
> IP=<consumer IP>:45458 (IP=<provider1 IP>:1636)
> Sep 20 08:20:31 <provider1> slapd[1057]: conn=11243 fd=14 TLS established
> tls_ssf=256 ssf=256
> Sep 20 08:20:31 <provider1> slapd[1057]: conn=11243 fd=14 closed (connection
> lost)
>
> Sep 20 08:19:31 <provider2> slapd[1051]: conn=215893 fd=18 ACCEPT from
> IP=<consumer IP>:41706 (IP=<provider2 IP>:1636)
> Sep 20 08:19:31 <provider2> slapd[1051]: conn=215893 fd=18 TLS established
> tls_ssf=256 ssf=256
> Sep 20 08:19:31 <provider2> slapd[1051]: conn=215893 fd=18 closed (connection
> lost)
> Sep 20 08:20:31 <provider2> slapd[1051]: conn=215898 fd=18 ACCEPT from
> IP=<consumer IP>:41726 (IP=<provider2 IP>:1636)
> Sep 20 08:20:31 <provider2> slapd[1051]: conn=215898 fd=18 TLS established
> tls_ssf=256 ssf=256
> Sep 20 08:20:31 <provider2> slapd[1051]: conn=215898 fd=18 closed (connection
> lost)
>
> There must be something wrong on the consumer side since when the issue
> starts, the consumer is not able to connect to either provider.
>
> Once I restart the consumer, it quickly resyncs and works just fine, for a
> while.
>
> The providers are OpenLDAP 2.4.44 (openldap-2.4.44-24.el7_9.x86_64), running
> on RHEL 7.
> The consumer is OpenLDAP 2.4.44 (openldap-2.4.44-24.el7_9.x86_64), running on
> CentOS 7.
>
> The consumer syncrepl config is:
> olcSyncrepl: {0}rid=001
> provider=ldaps://<provider1>:1636/
> searchbase="dc=example,dc=com"
> type=refreshAndPersist
> retry="60 +"
> timeout=1
> bindmethod=simple
> binddn="uid=replication,ou=SysAccounts,dc=example,dc=com"
> credentials=<credentials>
> tls_reqcert=allow
> olcSyncrepl: {1}rid=002
> provider=ldaps://<provider1>:1636/
> searchbase="dc=example,dc=com"
> type=refreshAndPersist
> retry="60 +"
> timeout=1
> bindmethod=simple
> binddn="uid=replication,ou=SysAccounts,dc=example,dc=com"
> credentials=<credentials>
> tls_reqcert=allow
>
> The "uid=replication,ou=SysAccounts,dc=example,dc=com" DN has full read-only
> permissions for the entire "dc=example,dc=com" tree.
>
> Any idea on what might be my issue here?
>
> Thank you,
> Mircea
> --
> Mircea Baciu | Senior Unix Systems Administrator
> Simmons University | 300 The Fenway | Boston, MA 02115 | 617-521-2194
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/