John Desantis wrote: > Hello again, > > I was just wondering if there was an update on this thread? > > Since it is just one machine having an issue, do you (Rob and Rich) > think a re-initialization from the master on the affected host would > clear the clog? I have left it alone since Mark was brought into the > discussion.
A re-init won't help because the RUVs are stored outside of the replicated data. rob > > Thank you! > John DeSantis > > 2014-10-23 9:34 GMT-04:00 Rich Megginson <[email protected]>: >> On 10/23/2014 07:01 AM, John Desantis wrote: >>> >>> Rob and Rich, >>> >>>>> ipa-replica-manage del should have cleaned things up. You can clear out >>>>> old RUVs with ipa-replica-manage too via list-ruv and clean-ruv. You use >>>>> list-ruv to get the id# to clean and clean-ruv to do the actual >>>>> cleaning. >>>> >>>> I remember having previously tried this task, but it had failed on >>>> older RUV's which were not even active (the KDC was under some strain >>>> so ipa queries were timing out). However, I ran it again and have >>>> been able to delete all but 1 (it's still running) RUV referencing the >>>> previous replica. >>>> >>>> I'll report back once the tasks finishes or fails. >>> >>> The last RUV is "stuck" on another replica. It fails with the following >>> error: >>> >>> [23/Oct/2014:08:55:09 -0400] NSMMReplicationPlugin - CleanAllRUV Task: >>> Initiating CleanAllRUV Task... >>> [23/Oct/2014:08:55:10 -0400] NSMMReplicationPlugin - CleanAllRUV Task: >>> Retrieving maxcsn... >>> [23/Oct/2014:08:55:10 -0400] NSMMReplicationPlugin - CleanAllRUV Task: >>> Found maxcsn (5447f861000000180000) >>> [23/Oct/2014:08:55:10 -0400] NSMMReplicationPlugin - CleanAllRUV Task: >>> Cleaning rid (24)... >>> [23/Oct/2014:08:55:10 -0400] NSMMReplicationPlugin - CleanAllRUV Task: >>> Waiting to process all the updates from the deleted replica... >>> [23/Oct/2014:08:55:10 -0400] NSMMReplicationPlugin - CleanAllRUV Task: >>> Waiting for all the replicas to be online... >>> [23/Oct/2014:08:55:10 -0400] NSMMReplicationPlugin - CleanAllRUV Task: >>> Waiting for all the replicas to receive all the deleted replica >>> updates... >>> [23/Oct/2014:08:55:11 -0400] NSMMReplicationPlugin - CleanAllRUV Task: >>> Replica maxcsn (5447f56b000200180000) is not caught up with deleted >>> replica's maxcsn(5447f861000000180000) >>> [23/Oct/2014:08:55:11 -0400] NSMMReplicationPlugin - CleanAllRUV Task: >>> Replica not caught up (agmt="cn=meToiparepbackup.our.personal.domain" >>> (iparepbackup:389)) >>> [23/Oct/2014:08:55:11 -0400] NSMMReplicationPlugin - CleanAllRUV Task: >>> Not all replicas caught up, retrying in 10 seconds >>> [23/Oct/2014:08:55:23 -0400] NSMMReplicationPlugin - CleanAllRUV Task: >>> Replica maxcsn (5447f56b000200180000) is not caught up with deleted >>> replica's maxcsn(5447f861000000180000) >>> [23/Oct/2014:08:55:23 -0400] NSMMReplicationPlugin - CleanAllRUV Task: >>> Replica not caught up (agmt="cn=meToiparepbackup.our.personal.domain" >>> (iparepbackup:389)) >>> [23/Oct/2014:08:55:23 -0400] NSMMReplicationPlugin - CleanAllRUV Task: >>> Not all replicas caught up, retrying in 20 seconds >>> >>> I then abort the task since the retrying went up to 14400 seconds. >> >> >> Mark, do you know what is going on here? >> >> >>> >>> Would this be a simple re-initialization from the master on the host >>> "iparepbackup"? >>> >>> Thanks, >>> John DeSantis >>> >>> 2014-10-22 16:03 GMT-04:00 John Desantis <[email protected]>: >>>> >>>> Rob and Rich, >>>> >>>>> ipa-replica-manage del should have cleaned things up. You can clear out >>>>> old RUVs with ipa-replica-manage too via list-ruv and clean-ruv. You use >>>>> list-ruv to get the id# to clean and clean-ruv to do the actual >>>>> cleaning. >>>> >>>> I remember having previously tried this task, but it had failed on >>>> older RUV's which were not even active (the KDC was under some strain >>>> so ipa queries were timing out). However, I ran it again and have >>>> been able to delete all but 1 (it's still running) RUV referencing the >>>> previous replica. >>>> >>>> I'll report back once the tasks finishes or fails. >>>> >>>> Thanks, >>>> John DeSantis >>>> >>>> >>>> 2014-10-22 15:49 GMT-04:00 Rob Crittenden <[email protected]>: >>>>> >>>>> Rich Megginson wrote: >>>>>> >>>>>> On 10/22/2014 12:55 PM, John Desantis wrote: >>>>>>> >>>>>>> Richard, >>>>>>> >>>>>>>> You should remove the unused ruv elements. I'm not sure why they >>>>>>>> were not >>>>>>>> cleaned. You may have to use cleanallruv manually. >>>>>>>> >>>>>>>> https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/9.0/html/Administration_Guide/Managing_Replication-Solving_Common_Replication_Conflicts.html#cleanruv >>>>>>>> >>>>>>>> >>>>>>>> note - use the cleanallruv procedure, not cleanruv. >>>>>>> >>>>>>> I'll try that, thanks for the guidance. >>>>>>> >>>>>>>> What is the real problem you have? Did replication stop working? Are >>>>>>>> you >>>>>>>> getting error messages? >>>>>>> >>>>>>> I cannot get the host to be a replica. Each time I run >>>>>>> `ipa-replica-install >>>>>>> replica-info-host-in-question.our.personal.domain.gpg' it fails. I >>>>>>> had assumed it was due to the fact that the host was already a >>>>>>> replica, but had to be taken offline due to a hard disk failing. The >>>>>>> machine was re-provisioned after the new hard drive was installed. >>>>>> >>>>>> Ok. I don't know if we have a documented procedure for that case. I >>>>>> assumed that if you first ran ipa-replica-manage del, then >>>>>> ipa-replica-prepare, then ipa-replica-install, that would take care of >>>>>> that. >>>>> >>>>> ipa-replica-manage del should have cleaned things up. You can clear out >>>>> old RUVs with ipa-replica-manage too via list-ruv and clean-ruv. You use >>>>> list-ruv to get the id# to clean and clean-ruv to do the actual >>>>> cleaning. >>>>> >>>>>>> When I enabled extra debugging during the installation process, the >>>>>>> initial error was that the dirsrv instance couldn't be started. I >>>>>>> checked into this and found that there were missing files in >>>>>>> /etc/dirsrv/slapd-BLAH directory. I was then able to start dirsrv >>>>>>> after copying some schema files from another replica. The install did >>>>>>> move forward but then failed with Apache and its IPA configuration. >>>>>>> >>>>>>> I performed several uninstalls and re-installs, and at one point I got >>>>>>> error code 3 from ipa-replica-install, which is why I was thinking >>>>>>> that the old RUV's and tombstones were to blame. >>>>>> >>>>>> It could be. I'm really not sure what the problem is at this point. >>>>> >>>>> I think we'd need to see ipareplica-install.log to know for sure. It >>>>> could be the sort of thing where it fails early but doesn't kill the >>>>> install, so the last error is a red herring. >>>>> >>>>> rob >>>>> >>>>>>> Thanks, >>>>>>> John DeSantis >>>>>>> >>>>>>> >>>>>>> 2014-10-22 12:51 GMT-04:00 Rich Megginson <[email protected]>: >>>>>>>> >>>>>>>> On 10/22/2014 10:31 AM, John Desantis wrote: >>>>>>>>> >>>>>>>>> Richard, >>>>>>>>> >>>>>>>>> You helped me before in #freeipa, so I appreciate the assistance >>>>>>>>> again. >>>>>>>>> >>>>>>>>>> What version of 389 are you using? >>>>>>>>>> rpm -q 389-ds-base >>>>>>>>> >>>>>>>>> 389-ds-base-1.2.11.15-34.el6_5 >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> John DeSantis >>>>>>>>> >>>>>>>>> 2014-10-22 12:09 GMT-04:00 Rich Megginson <[email protected]>: >>>>>>>>>> >>>>>>>>>> On 10/22/2014 09:42 AM, John Desantis wrote: >>>>>>>>>>> >>>>>>>>>>> Hello all, >>>>>>>>>>> >>>>>>>>>>> First and foremost, a big "thank you!" to the FreeIPA developers >>>>>>>>>>> for a >>>>>>>>>>> great product! >>>>>>>>>>> >>>>>>>>>>> Now, to the point! >>>>>>>>>>> >>>>>>>>>>> We're trying to re-provision a previous replica using the standard >>>>>>>>>>> documentation via the Red Hat site: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Identity_Management_Guide/Setting_up_IPA_Replicas.html >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> However, we're running into errors during the import process. The >>>>>>>>>>> errors are varied and fail at random steps; there was an issue >>>>>>>>>>> with >>>>>>>>>>> NTP or HTTP or LDAP, etc. This did not happen when we promoted a >>>>>>>>>>> separate node to become a replica. >>>>>>>>>>> >>>>>>>>>>> We had previously removed the replica via `ipa-replica-manage del` >>>>>>>>>>> and >>>>>>>>>>> ensured that no trace of it being a replica existed: removed DNS >>>>>>>>>>> records and verified that the host enrollment was not present. I >>>>>>>>>>> did >>>>>>>>>>> not use the "--force" and "--cleanup" options. >>>>>>>>>> >>>>>>>>>> What version of 389 are you using? >>>>>>>>>> rpm -q 389-ds-base >>>>>>>> >>>>>>>> You should remove the unused ruv elements. I'm not sure why they >>>>>>>> were not >>>>>>>> cleaned. You may have to use cleanallruv manually. >>>>>>>> >>>>>>>> https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/9.0/html/Administration_Guide/Managing_Replication-Solving_Common_Replication_Conflicts.html#cleanruv >>>>>>>> >>>>>>>> >>>>>>>> note - use the cleanallruv procedure, not cleanruv. >>>>>>>> >>>>>>>>>>> When I check RUV's against the host in question, there are >>>>>>>>>>> several. I >>>>>>>>>>> also queried the tombstones against the host and found two entries >>>>>>>>>>> which have valid hex time stamps; coincidentally, out of the 9 >>>>>>>>>>> tombstone entries, 2 have "nsds50ruv" time stamps. I'll paste >>>>>>>>>>> sanitized output below: >>>>>>>>>>> >>>>>>>>>>> # ldapsaerch -x -W -LLL -D "cn=directory manager" -b >>>>>>>>>>> "dc=our,dc=personal,dc=domain" '(objectclass=nsTombstone)' >>>>>>>>>>> Enter LDAP Password: >>>>>>>>>>> dn: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=our,dc=personal,dc=domain >>>>>>>>>>> >>>>>>>>>>> objectClass: top >>>>>>>>>>> objectClass: nsTombstone >>>>>>>>>>> objectClass: extensibleobject >>>>>>>>>>> nsds50ruv: {replicageneration} 50ef13ae000000040000 >>>>>>>>>>> nsds50ruv: {replica 4 ldap://master.our.personal.domain:389} >>>>>>>>>>> 5164d147000000040000 5447bda 8000100040000 >>>>>>>>>>> nsds50ruv: {replica 22 >>>>>>>>>>> ldap://separatenode.our.personal.domain:389} >>>>>>>>>>> 54107f9f000000160000 54436b 25000000160000 >>>>>>>>>>> nsds50ruv: {replica 21 >>>>>>>>>>> ldap://iparepbackup.our.personal.domain:389} >>>>>>>>>>> 51b734de000000150000 51b7 34ef000200150000 >>>>>>>>>>> nsds50ruv: {replica 19 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} >>>>>>>>>>> 510d56c9000100130000 >>>>>>>>>>> 510d82 be000200130000 >>>>>>>>>>> nsds50ruv: {replica 18 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} >>>>>>>>>>> nsds50ruv: {replica 17 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} >>>>>>>>>>> nsds50ruv: {replica 16 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} >>>>>>>>>>> nsds50ruv: {replica 15 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} >>>>>>>>>>> nsds50ruv: {replica 14 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} >>>>>>>>>>> nsds50ruv: {replica 13 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} >>>>>>>>>>> nsds50ruv: {replica 12 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} >>>>>>>>>>> nsds50ruv: {replica 23 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} >>>>>>>>>>> 54187702000200170000 >>>>>>>>>>> 541878 9a000000170000 >>>>>>>>>>> dc: our >>>>>>>>>>> nsruvReplicaLastModified: {replica 4 >>>>>>>>>>> ldap://master.our.personal.domain:389} 5447bce8 >>>>>>>>>>> nsruvReplicaLastModified: {replica 22 >>>>>>>>>>> ldap://separatenode.our.personal.domain:389} 54436a5e >>>>>>>>>>> nsruvReplicaLastModified: {replica 21 >>>>>>>>>>> ldap://iparepbackup.our.personal.domain:389} 00000000 >>>>>>>>>>> nsruvReplicaLastModified: {replica 19 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} 00000000 >>>>>>>>>>> nsruvReplicaLastModified: {replica 18 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} 00000000 >>>>>>>>>>> nsruvReplicaLastModified: {replica 17 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} 00000000 >>>>>>>>>>> nsruvReplicaLastModified: {replica 16 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} 00000000 >>>>>>>>>>> nsruvReplicaLastModified: {replica 15 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} 00000000 >>>>>>>>>>> nsruvReplicaLastModified: {replica 14 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} 00000000 >>>>>>>>>>> nsruvReplicaLastModified: {replica 13 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} 00000000 >>>>>>>>>>> nsruvReplicaLastModified: {replica 12 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} 00000000 >>>>>>>>>>> nsruvReplicaLastModified: {replica 23 >>>>>>>>>>> ldap://host-in-question.our.personal.domain:389} 00000000 >>>>>>>>>>> >>>>>>>>>>> dn: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> nsuniqueid=c08a2803-5b5a11e2-a527ce8b-8fa47d35,cn=host-in-question.our.personal.domain,cn=maste >>>>>>>>>>> >>>>>>>>>>> rs,cn=ipa,cn=etc,dc=our,dc=personal,dc=domain >>>>>>>>>>> objectClass: top >>>>>>>>>>> objectClass: nsContainer >>>>>>>>>>> objectClass: nsTombstone >>>>>>>>>>> cn: host-in-question.our.personal.domain >>>>>>>>>>> nsParentUniqueId: e6fa9418-5b5711e2-a1a9825b-daf5b5b0 >>>>>>>>>>> >>>>>>>>>>> dn: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> nsuniqueid=664c4383-6d6311e2-8db6e946-de27dd8d,cn=host-in-question.our.personal.domain,cn=maste >>>>>>>>>>> >>>>>>>>>>> rs,cn=ipa,cn=etc,dc=our,dc=personal,dc=domain >>>>>>>>>>> objectClass: top >>>>>>>>>>> objectClass: nsContainer >>>>>>>>>>> objectClass: nsTombstone >>>>>>>>>>> cn: host-in-question.our.personal.domain >>>>>>>>>>> nsParentUniqueId: e6fa9418-5b5711e2-a1a9825b-daf5b5b0 >>>>>>>>>>> >>>>>>>>>>> As you can see, the "host-in-question" has many RUV's and of which >>>>>>>>>>> two >>>>>>>>>>> appear to be "active" and two entries which I believe (pardon my >>>>>>>>>>> ignorance) possibly correlate with the "active" entries of the >>>>>>>>>>> "host-in-question". >>>>>>>>>>> >>>>>>>>>>> Do these two tombstone entries need to be deleted with ldapdelete >>>>>>>>>>> before we can re-provision "host-in-question" and add it back as a >>>>>>>>>>> replica? >>>>>>>> >>>>>>>> No, you cannot delete tombstones manually. They will be cleaned up >>>>>>>> at some >>>>>>>> point by the dirsrv tombstone reap thread, and they should not be >>>>>>>> interfering with anything. >>>>>>>> >>>>>>>> What is the real problem you have? Did replication stop working? Are >>>>>>>> you >>>>>>>> getting error messages? >>>>>>>> >>>>>>>>>>> Thank you, >>>>>>>>>>> John DeSantis >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Manage your subscription for the Freeipa-users mailing list: >>>>>>>>>> https://www.redhat.com/mailman/listinfo/freeipa-users >>>>>>>>>> Go To http://freeipa.org for more info on the project >>>>>>>> >>>>>>>> -- >>>>>>>> Manage your subscription for the Freeipa-users mailing list: >>>>>>>> https://www.redhat.com/mailman/listinfo/freeipa-users >>>>>>>> Go To http://freeipa.org for more info on the project >> >> -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go To http://freeipa.org for more info on the project
