On 03/07/2017 11:29 AM, Christopher Young wrote: > Thank you very much for the response! > > To start: > ---- > [root@orldc-prod-ipa01 ~]# rpm -qa 389-ds-base > 389-ds-base-1.3.5.10-18.el7_3.x86_64 > ---- You are on the latest version with the latest replication fixes. > > So, I believe a good part of my problem is that I'm not _positive_ > which replica is good at this point (though my directory really isn't > that huge). > > Do you have any pointers on a good method of comparing the directory > data between them? I was wondering if anyone knows of any tools to > facilitate that. I was thinking that it might make sense for me to > dump the DB and restore, but I really don't know that procedure. As I > mentioned, my directory really isn't that large at all, however I'm > not positive the best bullet-item listed method to proceed. (I know > I'm not helping things :) ) Heh, well only you know what your data should be. You can always do a db2ldif.pl on each server and compare the ldif files that are generated. Then pick the one you think is the most up to date.
https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/10/html/Administration_Guide/Populating_Directory_Databases-Exporting_Data.html#Exporting-db2ldif Once you decide on a server, then you need to reinitialize all the other servers/replicas from the "good" one. Use " ipa-replica-manage re-initialize" for this. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Identity_Management_Guide/ipa-replica-manage.html#initialize That's it. Good luck, Mark > > Would it be acceptable to just 'assume' one of the replicas is good > (taking the risk of whatever missing pieces I'll have to deal with), > completely removing the others, and then rebuilding the replicas from > scratch? > > If I go that route, what are the potential pitfalls? > > > I want to decide on an approach and try and resolve this once and for all. > > Thanks again! It really is appreciated as I've been frustrated with > this for a while now. > > -- Chris > > On Tue, Mar 7, 2017 at 8:45 AM, Mark Reynolds <[email protected]> wrote: >> What version of 389-ds-base are you using? >> >> rpm -qa | grep 389-ds-base >> >> >> comments below.. >> >> On 03/06/2017 02:37 PM, Christopher Young wrote: >> >> I've seen similar posts, but in the interest of asking fresh and >> trying to understand what is going on, I thought I would ask for >> advice on how best to handle this situation. >> >> In the interest of providing some history: >> I have three (3) FreeIPA servers. Everything is running 4.4.0 now. >> The originals (orldc-prod-ipa01, orldc-prod-ipa02) were upgraded from >> the 3.x branch quite a while back. Everything had been working fine, >> however I ran into a replication issue (that I _think_ may have been a >> result of IPv6 being disabled by my default Ansible roles). I thought >> I had resolved that by reinitializing the 2nd replica, >> orldc-prod-ipa02. >> >> In any case, I feel like the replication has never been fully stable >> since then, and I have all types of errors in messages that indicate >> something is off. I had single introduced a 3rd replica such that the >> agreements would look like so: >> >> orldc-prod-ipa01 -> orldc-prod-ipa02 ---> bohdc-prod-ipa01 >> >> It feels like orldc-prod-ipa02 & bohdc-prod-ipa01 are out of sync. >> I've tried reinitializing them in order but with no positive results. >> At this point, I feel like I'm ready to 'bite the bullet' and tear >> them down quickly (remove them from IPA, delete the local >> DBs/directories) and rebuild them from scratch. >> >> I want to minimize my impact as much as possible (which I can somewhat >> do by redirecting LDAP/DNS request via my load-balancers temporarily) >> and do this right. >> >> (Getting to the point...) >> >> I'd like advice on the order of operations to do this. Give the >> errors (I'll include samples at the bottom of this message), does it >> make sense for me to remove the replicas on bohdc-prod-ipa01 & >> orldc-prod-ipa02 (in that order), wipe out any directories/residual >> pieces (I'd need some idea of what to do there), and then create new >> replicas? -OR- Should I export/backup the LDAP DB and rebuild >> everything from scratch. >> >> I need advice and ideas. Furthermore, if there is someone with >> experience in this that would be interested in making a little money >> on the side, let me know, because having an extra brain and set of >> hands would be welcome. >> >> DETAILS: >> ================= >> >> >> ERRORS I see on orldc-prod-ipa01 (the one whose LDAP DB seems the most >> up-to-date since my changes are usually directed at it): >> ------ >> Mar 6 14:36:24 orldc-prod-ipa01 ns-slapd: >> [06/Mar/2017:14:36:24.434956575 -0500] NSMMReplicationPlugin - >> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" >> (orldc-prod-ipa02:389): The remote replica has a different database >> generation ID than the local database. You may have to reinitialize >> the remote replica, or the local replica. >> Mar 6 14:36:25 orldc-prod-ipa01 ipa-dnskeysyncd: ipa : INFO >> LDAP bind... >> Mar 6 14:36:25 orldc-prod-ipa01 ipa-dnskeysyncd: ipa : INFO >> Commencing sync process >> Mar 6 14:36:26 orldc-prod-ipa01 ipa-dnskeysyncd: >> ipa.ipapython.dnssec.keysyncer.KeySyncer: INFO Initial LDAP dump >> is done, sychronizing with ODS and BIND >> Mar 6 14:36:27 orldc-prod-ipa01 ns-slapd: >> [06/Mar/2017:14:36:27.799519203 -0500] NSMMReplicationPlugin - >> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" >> (orldc-prod-ipa02:389): The remote replica has a different database >> generation ID than the local database. You may have to reinitialize >> the remote replica, or the local replica. >> Mar 6 14:36:30 orldc-prod-ipa01 ns-slapd: >> [06/Mar/2017:14:36:30.994760069 -0500] NSMMReplicationPlugin - >> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" >> (orldc-prod-ipa02:389): The remote replica has a different database >> generation ID than the local database. You may have to reinitialize >> the remote replica, or the local replica. >> Mar 6 14:36:34 orldc-prod-ipa01 ns-slapd: >> [06/Mar/2017:14:36:34.940115481 -0500] NSMMReplicationPlugin - >> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" >> (orldc-prod-ipa02:389): The remote replica has a different database >> generation ID than the local database. You may have to reinitialize >> the remote replica, or the local replica. >> Mar 6 14:36:35 orldc-prod-ipa01 named-pkcs11[32134]: client >> 10.26.250.66#49635 (56.10.in-addr.arpa): transfer of >> '56.10.in-addr.arpa/IN': AXFR-style IXFR started >> Mar 6 14:36:35 orldc-prod-ipa01 named-pkcs11[32134]: client >> 10.26.250.66#49635 (56.10.in-addr.arpa): transfer of >> '56.10.in-addr.arpa/IN': AXFR-style IXFR ended >> Mar 6 14:36:37 orldc-prod-ipa01 ns-slapd: >> [06/Mar/2017:14:36:37.977875463 -0500] NSMMReplicationPlugin - >> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" >> (orldc-prod-ipa02:389): The remote replica has a different database >> generation ID than the local database. You may have to reinitialize >> the remote replica, or the local replica. >> Mar 6 14:36:40 orldc-prod-ipa01 ns-slapd: >> [06/Mar/2017:14:36:40.999275184 -0500] NSMMReplicationPlugin - >> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" >> (orldc-prod-ipa02:389): The remote replica has a different database >> generation ID than the local database. You may have to reinitialize >> the remote replica, or the local replica. >> Mar 6 14:36:45 orldc-prod-ipa01 ns-slapd: >> [06/Mar/2017:14:36:45.211260414 -0500] NSMMReplicationPlugin - >> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" >> (orldc-prod-ipa02:389): The remote replica has a different database >> generation ID than the local database. You may have to reinitialize >> the remote replica, or the local replica. >> ------ >> >> These messages indicate that the replica does not have the same database as >> the master. So either the master or the replica needs to be reinitialized., >> More on this below... >> >> >> Errors on orldc-prod-ipa02: >> ------ >> r 6 14:16:04 orldc-prod-ipa02 ipa-dnskeysyncd: ipa : INFO >> Commencing sync process >> Mar 6 14:16:04 orldc-prod-ipa02 ipa-dnskeysyncd: >> ipa.ipapython.dnssec.keysyncer.KeySyncer: INFO Initial LDAP dump >> is done, sychronizing with ODS and BIND >> Mar 6 14:16:05 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:05.934405274 -0500] attrlist_replace - attr_replace >> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) >> failed. >> Mar 6 14:16:05 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:05.937278142 -0500] attrlist_replace - attr_replace >> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) >> failed. >> Mar 6 14:16:05 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:05.939434025 -0500] attrlist_replace - attr_replace >> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) >> failed. >> >> These are harmless "errors" which have been removed in newer versions of >> 389-ds-base. >> >> Mar 6 14:16:06 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:06.882795654 -0500] >> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) - >> Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988). >> If replication stops, the consumer may need to be reinitialized. >> Mar 6 14:16:06 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:06.886029272 -0500] NSMMReplicationPlugin - >> changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local" >> (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't >> as up to date, or we purged >> >> This "could" also be a known issue that is fixed in newer versions of >> 389-ds-base. Or this is a valid error message due to the replica being >> stale for a very long time and records actually being purged from the >> changelog before they were replicated. >> >> Mar 6 14:16:06 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:06.888679268 -0500] NSMMReplicationPlugin - >> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389): >> Data required to update replica has been purged from the changelog. >> The replica must be reinitialized. >> Mar 6 14:16:06 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:06.960804253 -0500] NSMMReplicationPlugin - >> agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" >> (orldc-prod-ipa01:389): The remote replica has a different database >> generation ID than the local database. You may have to reinitialize >> the remote replica, or the local replica. >> >> Okay, so your replication agreements/servers are not in sync. I suspect you >> created a new replica and used that to initialize a valid replica which >> broke things. Something like that. You need to find a "good" replica >> server and reinitialize the other replicas from that server. These errors >> needs to addressed asap, as it's halting replication for those agreements >> which explains the "instability" you are describing. >> >> Mark >> >> Mar 6 14:16:08 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:08.960622608 -0500] attrlist_replace - attr_replace >> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) >> failed. >> Mar 6 14:16:08 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:08.968927168 -0500] attrlist_replace - attr_replace >> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) >> failed. >> Mar 6 14:16:08 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:08.976952118 -0500] attrlist_replace - attr_replace >> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) >> failed. >> Mar 6 14:16:09 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:09.972315877 -0500] NSMMReplicationPlugin - >> agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" >> (orldc-prod-ipa01:389): The remote replica has a different database >> generation ID than the local database. You may have to reinitialize >> the remote replica, or the local replica. >> Mar 6 14:16:10 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:10.034810948 -0500] >> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) - >> Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988). >> If replication stops, the consumer may need to be reinitialized. >> Mar 6 14:16:10 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:10.040020359 -0500] NSMMReplicationPlugin - >> changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local" >> (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't >> as up to date, or we purged >> Mar 6 14:16:10 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:10.042846879 -0500] NSMMReplicationPlugin - >> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389): >> Data required to update replica has been purged from the changelog. >> The replica must be reinitialized. >> Mar 6 14:16:13 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:13.013253769 -0500] attrlist_replace - attr_replace >> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) >> failed. >> Mar 6 14:16:13 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:13.021514225 -0500] attrlist_replace - attr_replace >> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) >> failed. >> Mar 6 14:16:13 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:13.027521508 -0500] attrlist_replace - attr_replace >> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) >> failed. >> Mar 6 14:16:13 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:13.110566247 -0500] NSMMReplicationPlugin - >> agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" >> (orldc-prod-ipa01:389): The remote replica has a different database >> generation ID than the local database. You may have to reinitialize >> the remote replica, or the local replica. >> Mar 6 14:16:14 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:14.179819300 -0500] >> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) - >> Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988). >> If replication stops, the consumer may need to be reinitialized. >> Mar 6 14:16:14 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:14.188353328 -0500] NSMMReplicationPlugin - >> changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local" >> (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't >> as up to date, or we purged >> Mar 6 14:16:14 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:14.196463928 -0500] NSMMReplicationPlugin - >> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389): >> Data required to update replica has been purged from the changelog. >> The replica must be reinitialized. >> Mar 6 14:16:17 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:17.068292919 -0500] attrlist_replace - attr_replace >> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) >> failed. >> Mar 6 14:16:17 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:17.071241757 -0500] attrlist_replace - attr_replace >> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) >> failed. >> Mar 6 14:16:17 orldc-prod-ipa02 ns-slapd: >> [06/Mar/2017:14:16:17.073793922 -0500] attrlist_replace - attr_replace >> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) >> failed. >> ------ >> >> >> Thanks in advance!!! >> >> -- Chris >> >> -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
