This January, Mark Potter first experienced what is now killing my new
freeipa setup as well. Once the primary server on which dns records
are being programatically added sees named-pcks11 putting "writeback to
ldap failed" in the log: 'systemctl' reports nothing amiss yet ns-slapd
becomes entirely unresponsive and all that depend upon dirsrv
hangs/times out.
There is nothing in any of the log entries detailing failure, even with
ample memory and storage space and processor resources. There are
perhaps 10 hosts in the little lab sandbox, while the system does
service DNS requests from the public internet.
I looked through the 'changelogs' and could find nothing that seems to
address this, is there a 'work-around' known? I'd like to avoid using
'bleeding edge' versions of freeipa.
The details of Mark's discussion are here:
https://lists.fedoraproject.org/archives/list/[email protected]/thread/WZKFEUAYYABTF5BIFQTNGPIYEJTXXJPY/
What's common to his and my case is that the problem arises during an
'initial programmatic load' of many DNS records in a few dozen zones
(often ~200 / in a *.arpa domain), even when slowed to not more than
one every 8 seconds. But only when at least one replication agreement
exists. It's possible the fact that dnssec is enabled and so 'adding a
record' generates further signature set computing activity either
provides ldap/bind9 server load that exposes the race-condition-feeling
problem or (less likely) is directly involved in the cause.
I agree with Mark's quote: "Are we going to see everything fail in a
spectacular manner and is there anything I can do to mitigate the
failure with adding DNS entries as I still need to complete the addition
and have ~5k per zone left for two zones." He 'worked around' the
problem by adding a record at a time and slowly (without knowing what
'slowly' means in seconds). I'm unable to guess what 'slowly' means in
my setup, but it appears to be more than 8 seconds / update, so fails
for me.
Maybe some kind of 'flow control' is required, or a some way to
determine whether all replication activity and triggered further
'signature computation' dns activity has concluded after writing a new
dns record?
The only 'way out' is to restart ipa. Of some interest, when that is
attempted after these failures, ns-slapd uses 100% CPU, dirsrv hang for
several minutes, often longer than the systemctl shutdown / reboot
process allows before forcibly killing the process. This is 100%
repeatable. After a few dozen records are loaded, ns-slapd just stops
responding. The program adding the records is using the python ldap2
interface native on the local host.
Some further details:
Aug 06 15:58:58 registry1.1.somedomain.com named-pkcs11[1497]: zone
177.168.192.in-addr.arpa/IN: sending notifies (serial 1691355538)
Aug 06 15:58:58 registry1.1.somedomain.com myprogram.py[35998]: rssmain:
ipa command dnsrecord_show: Summary: None args: ('7.somedomain.com.',
'email19')
Aug 06 15:59:01 registry1.1.somedomain.com named-pkcs11[1497]:
checkhints: unable to get root NS rrset from cache: not found
Aug 06 15:59:06 registry1.1.somedomain.com myprogram.py[35998]: rssmain:
ipa command dnsrecord_add: Summary: None
Aug 06 15:59:07 registry1.1.somedomain.com myprogram.py[35998]: rssmain:
ipa command dnsrecord_find: Summary: None args:
('177.168.192.in-addr.arpa.',), 144 records total
Aug 06 15:59:07 registry1.1.somedomain.com myprogram.py[35998]: rssmain:
ipa command dnsrecord_show: Summary: None args:
('177.168.192.in-addr.arpa.', '147')
Aug 06 15:59:07 registry1.1.somedomain.com myprogram.py[35998]: rssmain:
ipa command dnsrecord_find: Summary: None args:
('6.0.c.0.0.0.0.0.0.0.0.1.0.0.c.f.ip6.arpa.',), 22 records total
Aug 06 15:59:07 registry1.1.somedomain.com ns-slapd[1434]:
[06/Aug/2023:15:59:07.848595593 -0500] - WARN - content-sync-plugin -
sync_update_persist_betxn_pre_op - DB retried operation targets
"changenumber=59921,cn=changelog" (op=0x7f6b6fc1f400 idx_pl=0) => op not c>
Aug 06 15:59:16 registry1.1.somedomain.com myprogram.py[35998]: rssmain:
ipa command dnsrecord_add: Summary: None
Aug 06 15:59:16 registry1.1.somedomain.com myprogram.py[35998]: rssmain:
ipa command dnsrecord_show: Summary: None args: ('7.somedomain.com.',
'email19')
Aug 06 15:59:32 registry1.1.somedomain.com named-pkcs11[1497]: zone
7.somedomain.com/IN (unsigned): serial (1691355547) write back to LDAP
failed
Aug 06 15:59:33 registry1.1.somedomain.com named-pkcs11[1497]: zone
7.somedomain.com/IN (signed): serial 1691355547 (unsigned 1691355547)
Aug 06 15:59:33 registry1.1.somedomain.com named-pkcs11[1497]: zone
7.somedomain.com/IN (signed): sending notifies (serial 1691355547)
Aug 06 16:01:01 registry1.1.somedomain.com CROND[52537]: (root) CMD
(run-parts /etc/cron.hourly)
Aug 06 16:01:01 registry1.1.somedomain.com run-parts[52540]:
(/etc/cron.hourly) starting 0anacron
Aug 06 16:01:01 registry1.1.somedomain.com run-parts[52546]:
(/etc/cron.hourly) finished 0anacron
Aug 06 16:02:05 registry1.1.somedomain.com sssd_kcm[19097]: Shutting
down (status = 0)
Aug 06 16:02:05 registry1.1.somedomain.com systemd[1]: sssd-kcm.service:
Succeeded.
Aug 06 16:02:20 registry1.1.somedomain.com systemd[1]: loclibrary.mount:
Succeeded.
Aug 06 16:02:20 registry1.1.somedomain.com systemd[1]: qflibrary.mount:
Succeeded.
Aug 06 16:07:37 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:07:37 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:08:34 registry1.1.somedomain.com named-pkcs11[1497]: zone
9.somedomain.com/IN (signed): Key 9.somedomain.com/RSASHA256/60467
missing or inactive and has no replacement: retaining signatures.
Aug 06 16:08:34 registry1.1.somedomain.com named-pkcs11[1497]: zone
9.somedomain.com/IN (signed): sending notifies (serial 1691350608)
Aug 06 16:09:08 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:09:08 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:09:18 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:09:18 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:09:30 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:09:30 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:11:05 registry1.1.somedomain.com sssd_be[659]: Backend is offline
Aug 06 16:11:38 registry1.1.somedomain.com named-pkcs11[1497]: zone
3.somedomain.com/IN (signed): Key 3.somedomain.com/RSASHA256/18257
missing or inactive and has no replacement: retaining signatures.
Aug 06 16:11:38 registry1.1.somedomain.com named-pkcs11[1497]: zone
3.somedomain.com/IN (signed): sending notifies (serial 1691350754)
Aug 06 16:12:37 registry1.1.somedomain.com named-pkcs11[1497]: zone
7.somedomain.com/IN (signed): Key 7.somedomain.com/RSASHA256/40511
missing or inactive and has no replacement: retaining signatures.
Aug 06 16:12:37 registry1.1.somedomain.com named-pkcs11[1497]: zone
7.somedomain.com/IN (signed): sending notifies (serial 1691355548)
Aug 06 16:20:01 registry1.1.somedomain.com CROND[52573]: (root) CMD
(test -f /var/lock/subsys/ods-enforcerd && kill -s SIGHUP `cat
/var/run/opendnssec/enforcerd.pid` > /dev/null 2> /dev/null)
Aug 06 16:29:21 registry1.1.somedomain.com systemd[1]: Stopping 389
Directory Server 1-somedomain-COM....
Aug 06 16:29:21 registry1.1.somedomain.com ns-slapd[1434]:
[06/Aug/2023:16:29:21.659348276 -0500] - INFO - op_thread_cleanup -
slapd shutting down - signaling operation threads - op stack size 0 max
work q size 600 max work q stack size 4
Aug 06 16:29:21 registry1.1.somedomain.com ns-slapd[1434]:
[06/Aug/2023:16:29:21.735442997 -0500] - INFO - slapd_daemon - slapd
shutting down - waiting for 16 threads to terminate
Aug 06 16:29:22 registry1.1.somedomain.com ns-slapd[1434]:
[06/Aug/2023:16:29:22.278851959 -0500] - ERR - NSMMReplicationPlugin -
release_replica - agmt="cn=meToregistry2.1.somedomain.com"
(registry2:389): Attempting to release replica, but unable to receive endRe>
Aug 06 16:29:22 registry1.1.somedomain.com ns-slapd[1434]:
[06/Aug/2023:16:29:22.307800436 -0500] - WARN - NSMMReplicationPlugin -
acquire_replica - agmt="cn=meToregistry2.1.somedomain.com"
(registry2:389): Unable to receive the response for a startReplication ext>
Aug 06 16:34:03 registry1.1.somedomain.com ods-enforcerd[2124]:
[enforcer] update zone: _domainkey.anotherdomain.com
Aug 06 16:34:04 registry1.1.somedomain.com ods-enforcerd[2124]:
[signconf_cmd] performing signconf for zone _domainkey.anotherdomain.com
Aug 06 16:34:04 registry1.1.somedomain.com ods-enforcerd[2124]:
[signconf_cmd] signconf done for zone _domainkey.anotherdomain.com,
notifying signer
Aug 06 16:34:04 registry1.1.somedomain.com systemd[1]: Started IPA
OpenDNSSEC Signer replacement.
Aug 06 16:34:07 registry1.1.somedomain.com ipa-ods-exporter[52623]:
ipa-ods-exporter: INFO To increase debugging set debug=True in
dns.conf See default.conf(5) for details
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
Traceback (most recent call last):
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line
1085, in error_handler
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
yield
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line
1257, in gssapi_bind
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
'', auth_tokens, server_controls, client_controls)
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
File "/usr/lib64/python3.6/site-packages/ldap/ldapobject.py", line
476, in sasl_interactive_bind_s
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
return
self._ldap_call(self._l.sasl_interactive_bind_s,who,auth,RequestControlTuples(serverctrls),RequestControlTuples(clientctrls),sasl_flags)
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
File "/usr/lib64/python3.6/site-packages/ldap/ldapobject.py", line
340, in _ldap_call
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
reraise(exc_type, exc_value, exc_traceback)
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
File "/usr/lib64/python3.6/site-packages/ldap/compat.py", line 46, in
reraise
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
raise exc_value
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
File "/usr/lib64/python3.6/site-packages/ldap/ldapobject.py", line
324, in _ldap_call
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
result = func(*args,**kwargs)
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
ldap.SERVER_DOWN: {'result': -1, 'desc': "Can't contact LDAP server",
'errno': 111, 'ctrls': [], 'info': 'Connection refused'}
Name : ipa-server-common
Version : 4.9.11
Release : 6.module_el8.8.0+3593+1210bde8.alma.1
Architecture : noarch
Size : 621 k
Source : ipa-4.9.11-6.module_el8.8.0+3593+1210bde8.alma.1.src.rpm
Repository : appstream
Summary : Common files used by IPA server
...
Name : ipa-server-dns
Version : 4.9.11
Release : 6.module_el8.8.0+3593+1210bde8.alma.1
Architecture : noarch
Size : 197 k
Source : ipa-4.9.11-6.module_el8.8.0+3593+1210bde8.alma.1.src.rpm
Repository : appstream
[root@registry2 ~]# uname -r
4.18.0-477.15.1.el8_8.x86_64
_______________________________________________
FreeIPA-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedorahosted.org/archives/list/[email protected]
Do not reply to spam, report it:
https://pagure.io/fedora-infrastructure/new_issue