I followed the instructions that would give me a core dump, and for some reason, I don't see one in /var/log/dirsrv/slapd-EXAMPLE-COM/, even though I still see the Disorderly shutdown still shows up in the logs. I know that when I explicitly request those attributes, I get "-1 Total update abortedLDAP error: Can't contact LDAP server" for nds5ReplicaLastInitStatus (see below). Access logs stop completely on the replica after the time that you mentioned.
====================================================== [root@ipa2 ipaserver]# ldapsearch ldaps://ipa.example.com:636 -D 'cn=Directory Manager' -w ##### -b 'cn=meToipa2.example.com<http://metoipa2.example.com/>,cn=replica,cn=dc\=example\,dc\=com,cn=mapping tree,cn=config' '(objectClass=*)' -s base nsds5ReplicaLastInitStart nsds5replicaUpdateInProgress nsds5ReplicaLastInitStatus cn nsds5BeginReplicaRefresh nsds5ReplicaLastInitEnd # extended LDIF # # LDAPv3 # base <cn=meToipa2.example.com <http://metoipa2.example.com/>,cn=replica,cn=dc\=example\,dc\=com,cn=mapping tree,cn=config> with scope baseObject # filter: (objectclass=*) # requesting: ldaps://ipa.example.com:636 (objectClass=*) nsds5ReplicaLastInitStart nsds5replicaUpdateInProgress nsds5ReplicaLastInitStatus cn nsds5BeginReplicaRefresh nsds5ReplicaLastInitEnd # # meToipa2.example.com <http://metoipa2.example.com/>, replica, dc\3Dexample\2Cdc\3Dcom, mapping tree, config dn: cn=meToipa2.example.com <http://metoipa2.example.com/> ,cn=replica,cn=dc\3Dexample\2Cd c\3Dcom,cn=mapping tree,cn=config nsds5ReplicaLastInitStart: 20140401092800Z nsds5replicaUpdateInProgress: FALSE nsds5ReplicaLastInitStatus: -1 Total update abortedLDAP error: Can't contact L DAP server cn: meToipa2.example.com <http://metoipa2.example.com/> nsds5ReplicaLastInitEnd: 20140401092804Z # search result search: 2 result: 0 Success # numResponses: 2 # numEntries: 1 On Thu, Apr 3, 2014 at 6:32 PM, Rich Megginson <[email protected]> wrote: > On 04/03/2014 03:46 PM, Nevada Sanchez wrote: > > Okay, I updated the gist and extended some of the logs (ipa2-errors does > stop at 20:50:21). I'll follow up when I have the debug stuff in place. > > https://gist.github.com/nevsan/8b6f78d7396963dc5f70 > > > Another strange thing - it looks as if the initial replica init completes > successfully. > > [02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin - Beginning total > update of replica "agmt="cn=meToipa2.example.com" (ipa2:389)". > > On the replica: > > [02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin - > multimaster_be_state_change: replica dc=example,dc=com is going offline; > disabling replication > [02/Apr/2014:20:50:18 +0000] - WARNING: Import is running with > nsslapd-db-private-import-mem on; No other process is allowed to access the > database > [02/Apr/2014:20:50:21 +0000] - import userRoot: Workers finished; cleaning > up... > [02/Apr/2014:20:50:21 +0000] - import userRoot: Workers cleaned up. > [02/Apr/2014:20:50:21 +0000] - import userRoot: Indexing complete. > Post-processing... > [02/Apr/2014:20:50:21 +0000] - import userRoot: Generating numSubordinates > complete. > [02/Apr/2014:20:50:21 +0000] - import userRoot: Flushing caches... > [02/Apr/2014:20:50:21 +0000] - import userRoot: Closing files... > [02/Apr/2014:20:50:21 +0000] - import userRoot: Import complete. Processed > 453 entries in 3 seconds. (151.00 entries/sec) > [02/Apr/2014:20:50:21 +0000] NSMMReplicationPlugin - > multimaster_be_state_change: replica dc=example,dc=com is coming online; > enabling replication > > On the master, access log: > > [02/Apr/2014:20:50:17 +0000] conn=1365 op=15 MOD dn="cn= > meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping > tree,cn=config" > > This is the operation that triggers the replica init. Then > ipa-replica-install polls for agreement status: > [02/Apr/2014:20:50:19 +0000] conn=1365 op=16 SRCH base="cn= > meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping > tree,cn=config" scope=0 filter="(objectClass=*)" > attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress > nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh > nsds5replicaLastInitEnd" > [02/Apr/2014:20:50:19 +0000] conn=1365 op=16 RESULT err=0 tag=101 > nentries=1 etime=0 > [02/Apr/2014:20:50:20 +0000] conn=1365 op=17 SRCH base="cn= > meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping > tree,cn=config" scope=0 filter="(objectClass=*)" > attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress > nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh > nsds5replicaLastInitEnd" > [02/Apr/2014:20:50:20 +0000] conn=1365 op=17 RESULT err=0 tag=101 > nentries=1 etime=0 > [02/Apr/2014:20:50:21 +0000] conn=1365 op=18 SRCH base="cn= > meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping > tree,cn=config" scope=0 filter="(objectClass=*)" > attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress > nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh > nsds5replicaLastInitEnd" > [02/Apr/2014:20:50:21 +0000] conn=1365 op=18 RESULT err=0 tag=101 > nentries=1 etime=0 > [02/Apr/2014:20:50:22 +0000] conn=1365 op=19 SRCH base="cn= > meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping > tree,cn=config" scope=0 filter="(objectClass=*)" > attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress > nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh > nsds5replicaLastInitEnd" > [02/Apr/2014:20:50:22 +0000] conn=1365 op=19 RESULT err=0 tag=101 > nentries=1 etime=1 > > Something happens here. The replica init is done, according to the > replica error log. We don't have the replica access log from around this > time to see exactly when the connection was closed, but looking at the ipa > code, it would appear that ipa did not see a status of "Total update > succeeded". Not sure why the master would not have reported that, unless > there was some problem getting back the status from the replica. > > [02/Apr/2014:20:50:22 +0000] conn=1365 op=20 UNBIND > [02/Apr/2014:20:50:22 +0000] conn=1365 op=20 fd=114 closed - U1 > > Then ipa-replica-install closes the connection and reports the error. > > > > > On Thu, Apr 3, 2014 at 10:38 AM, Rich Megginson <[email protected]>wrote: > >> On 04/02/2014 09:22 PM, Nevada Sanchez wrote: >> >> Okay. Updated the gist with the additional logs: >> https://gist.github.com/nevsan/8b6f78d7396963dc5f70 >> >> >> >> 1) Dirsrv is crashing: >> [02/Apr/2014:20:49:53 +0000] - 389-Directory/1.3.1.22.a1 B2014.073.1751 >> starting up >> [02/Apr/2014:20:49:54 +0000] - Db home directory is not set. Possibly >> nsslapd-directory (optionally nsslapd-db-home-directory) is missing in the >> config file. >> [02/Apr/2014:20:49:54 +0000] - I'm resizing my cache now...cache was >> 710029312 and is now 8000000 >> [02/Apr/2014:20:49:54 +0000] - 389-Directory/1.3.1.22.a1 B2014.073.1751 >> starting up >> [02/Apr/2014:20:49:54 +0000] - Detected Disorderly Shutdown last time >> Directory Server was running, recovering database. >> [02/Apr/2014:20:49:55 +0000] - slapd started. Listening on All Interfaces >> port 389 for LDAP requests >> >> Please use the instructions at >> http://port389.org/wiki/FAQ#Debugging_Crashes to get a core dump and >> stack trace. >> >> 2) The first occurrence of the connection error is at >> [02/Apr/2014:20:52:38 +0000] but there isn't anything in the consumer error >> log after [02/Apr/2014:20:50:21 +0000] and in the consumer access log after >> [02/Apr/2014:20:50:22 +0000] >> >> >> On Wed, Apr 2, 2014 at 9:38 PM, Rich Megginson <[email protected]>wrote: >> >>> On 04/02/2014 03:01 PM, Nevada Sanchez wrote: >>> >>> Okay, I ran it with debug on. The output is quite large. I'm not sure >>> what the etiquette is for posting large logs, so I threw it on gist here: >>> https://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt<http://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt> >>> >>> Let me know if I should copy it into the thread instead. >>> >>> >>> Ok. Now can you post excerpts from the dirsrv errors log from both the >>> master replica and the replica from around the time of the failure? >>> >>> >>> >>> >>> On Wed, Apr 2, 2014 at 1:49 PM, Rich Megginson <[email protected]>wrote: >>> >>>> On 04/02/2014 11:45 AM, Nevada Sanchez wrote: >>>> >>>> My apologies. I mistakenly ran the failing ldapsearch from an >>>> unpriviliged user (couldn't read slapd-EXAMPLE-COM directory). Running as >>>> root, it now works just fine (same result as the one that worked). SSL >>>> seems to not be the issue. Also, I haven't change the SSL certs since I >>>> first set up the master. >>>> >>>> I have been doing the replica side things from scratch (even so far >>>> as starting with a new machine). For the master side, I have just been >>>> re-preparing the replica. I hope I don't have to start from scratch with >>>> the master replica. >>>> >>>> >>>> I guess the next step would be to do the ipa-replica-install using >>>> -ddd and review the extra debug information that comes out. >>>> >>>> >>>> >>>> >>>> On Wed, Apr 2, 2014 at 11:45 AM, Rob Crittenden <[email protected]>wrote: >>>> >>>>> Rich Megginson wrote: >>>>> >>>>>> On 04/02/2014 09:20 AM, Nevada Sanchez wrote: >>>>>> >>>>>>> Okay, we might be on to something: >>>>>>> >>>>>>> ipa -> ipa2 >>>>>>> ================================ >>>>>>> $ LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM ldapsearch -xLLLZZ >>>>>>> -h ipa2.example.com <http://ipa2.example.com> -s base -b "" >>>>>>> >>>>>>> 'objectclass=*' vendorVersion >>>>>>> dn: >>>>>>> vendorVersion: 389-Directory/1.3.1.22.a1 B2014.073.1751 >>>>>>> ================================ >>>>>>> >>>>>>> ipa2 -> ipa >>>>>>> ================================ >>>>>>> $ LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM ldapsearch -xLLLZZ >>>>>>> -h ipa.example.com <http://ipa.example.com> -s base -b "" >>>>>>> >>>>>>> 'objectclass=*' vendorVersion >>>>>>> ldap_start_tls: Connect error (-11) >>>>>>> additional info: TLS error -8172:Peer's certificate issuer has been >>>>>>> marked as not trusted by the user. >>>>>>> ================================ >>>>>>> >>>>>>> The original IPA trusts the replica (since it signed the cert, I >>>>>>> assume), but the replica doesn't trust the main IPA server. I guess >>>>>>> the ZZ option would have shown me the failure that I missed in my >>>>>>> initial ldapsearch tests. >>>>>>> >>>>>> -Z[Z] Issue StartTLS (Transport Layer Security) extended >>>>>> operation. If >>>>>> you use -ZZ, the command will require the operation >>>>>> to >>>>>> be suc- >>>>>> cessful. >>>>>> >>>>>> i.e. use SSL, and force a successful handshake >>>>>> >>>>>> >>>>>>> Anyway, what's the best way to remedy this in a way that makes IPA >>>>>>> happy? (I've found that LDAP can have different requirements on which >>>>>>> certs go where). >>>>>>> >>>>>> >>>>>> I'm not sure. >>>>>> ipa-server-install/ipa-replica-prepare/ipa-replica-install >>>>>> is supposed to take care of installing the CA cert properly for you. >>>>>> If >>>>>> you try to hack it and install the CA cert manually, you will probably >>>>>> miss something else that ipa install did not do. >>>>>> >>>>>> I think the only way to ensure that you have a properly configured ipa >>>>>> server + replicas is to get all of the ipa commands completing >>>>>> successfully. >>>>>> >>>>>> Which means going back to the drawing board and starting over from >>>>>> scratch. >>>>>> >>>>> >>>>> You can compare the certs that each side is using with: >>>>> >>>>> # certutil -L -d /etc/dirsrv/slapd-EXAMPLE-COM >>>>> >>>>> Did you by chance replace the SSL server certs that IPA uses on your >>>>> working master? >>>>> >>>>> rob >>>>> >>>> >>>> >>>> >>> >>> >> >> > >
_______________________________________________ Freeipa-users mailing list [email protected] https://www.redhat.com/mailman/listinfo/freeipa-users
