Re: [Freeipa-users] IPA Replica Issues (Total update abortedLDAP error: Can't contact LDAP server)

Nevada Sanchez Fri, 04 Apr 2014 06:20:15 -0700

I followed the instructions that would give me a core dump, and for some
reason, I don't see one in /var/log/dirsrv/slapd-EXAMPLE-COM/, even though
I still see the Disorderly shutdown still shows up in the logs. I know that
when I explicitly request those attributes, I get "-1 Total update
abortedLDAP error: Can't contact LDAP server" for nds5ReplicaLastInitStatus
(see below). Access logs stop completely on the replica after the time that
you mentioned.


======================================================
[root@ipa2 ipaserver]# ldapsearch  ldaps://ipa.example.com:636 -D
'cn=Directory Manager' -w ##### -b
'cn=meToipa2.example.com<http://metoipa2.example.com/>,cn=replica,cn=dc\=example\,dc\=com,cn=mapping
tree,cn=config' '(objectClass=*)' -s base nsds5ReplicaLastInitStart
nsds5replicaUpdateInProgress nsds5ReplicaLastInitStatus cn
nsds5BeginReplicaRefresh nsds5ReplicaLastInitEnd
# extended LDIF
#
# LDAPv3
# base <cn=meToipa2.example.com
<http://metoipa2.example.com/>,cn=replica,cn=dc\=example\,dc\=com,cn=mapping
tree,cn=config> with scope baseObject
# filter: (objectclass=*)
# requesting: ldaps://ipa.example.com:636 (objectClass=*)
nsds5ReplicaLastInitStart nsds5replicaUpdateInProgress
nsds5ReplicaLastInitStatus cn nsds5BeginReplicaRefresh
nsds5ReplicaLastInitEnd
#

# meToipa2.example.com <http://metoipa2.example.com/>, replica,
dc\3Dexample\2Cdc\3Dcom,
  mapping tree, config
dn: cn=meToipa2.example.com <http://metoipa2.example.com/>
,cn=replica,cn=dc\3Dexample\2Cd
 c\3Dcom,cn=mapping tree,cn=config
nsds5ReplicaLastInitStart: 20140401092800Z
nsds5replicaUpdateInProgress: FALSE
nsds5ReplicaLastInitStatus: -1 Total update abortedLDAP error: Can't
contact L
 DAP server
cn: meToipa2.example.com <http://metoipa2.example.com/>
nsds5ReplicaLastInitEnd: 20140401092804Z

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1


On Thu, Apr 3, 2014 at 6:32 PM, Rich Megginson <[email protected]> wrote:

>  On 04/03/2014 03:46 PM, Nevada Sanchez wrote:
>
> Okay, I updated the gist and extended some of the logs (ipa2-errors does
> stop at 20:50:21). I'll follow up when I have the debug stuff in place.
>
>  https://gist.github.com/nevsan/8b6f78d7396963dc5f70
>
>
> Another strange thing - it looks as if the initial replica init completes
> successfully.
>
> [02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin - Beginning total
> update of replica "agmt="cn=meToipa2.example.com" (ipa2:389)".
>
> On the replica:
>
> [02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin -
> multimaster_be_state_change: replica dc=example,dc=com is going offline;
> disabling replication
> [02/Apr/2014:20:50:18 +0000] - WARNING: Import is running with
> nsslapd-db-private-import-mem on; No other process is allowed to access the
> database
> [02/Apr/2014:20:50:21 +0000] - import userRoot: Workers finished; cleaning
> up...
> [02/Apr/2014:20:50:21 +0000] - import userRoot: Workers cleaned up.
> [02/Apr/2014:20:50:21 +0000] - import userRoot: Indexing complete.
> Post-processing...
> [02/Apr/2014:20:50:21 +0000] - import userRoot: Generating numSubordinates
> complete.
> [02/Apr/2014:20:50:21 +0000] - import userRoot: Flushing caches...
> [02/Apr/2014:20:50:21 +0000] - import userRoot: Closing files...
> [02/Apr/2014:20:50:21 +0000] - import userRoot: Import complete. Processed
> 453 entries in 3 seconds. (151.00 entries/sec)
> [02/Apr/2014:20:50:21 +0000] NSMMReplicationPlugin -
> multimaster_be_state_change: replica dc=example,dc=com is coming online;
> enabling replication
>
> On the master, access log:
>
> [02/Apr/2014:20:50:17 +0000] conn=1365 op=15 MOD dn="cn=
> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
> tree,cn=config"
>
> This is the operation that triggers the replica init.  Then
> ipa-replica-install polls for agreement status:
> [02/Apr/2014:20:50:19 +0000] conn=1365 op=16 SRCH base="cn=
> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
> tree,cn=config" scope=0 filter="(objectClass=*)"
> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
> nsds5replicaLastInitEnd"
> [02/Apr/2014:20:50:19 +0000] conn=1365 op=16 RESULT err=0 tag=101
> nentries=1 etime=0
> [02/Apr/2014:20:50:20 +0000] conn=1365 op=17 SRCH base="cn=
> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
> tree,cn=config" scope=0 filter="(objectClass=*)"
> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
> nsds5replicaLastInitEnd"
> [02/Apr/2014:20:50:20 +0000] conn=1365 op=17 RESULT err=0 tag=101
> nentries=1 etime=0
> [02/Apr/2014:20:50:21 +0000] conn=1365 op=18 SRCH base="cn=
> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
> tree,cn=config" scope=0 filter="(objectClass=*)"
> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
> nsds5replicaLastInitEnd"
> [02/Apr/2014:20:50:21 +0000] conn=1365 op=18 RESULT err=0 tag=101
> nentries=1 etime=0
> [02/Apr/2014:20:50:22 +0000] conn=1365 op=19 SRCH base="cn=
> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
> tree,cn=config" scope=0 filter="(objectClass=*)"
> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
> nsds5replicaLastInitEnd"
> [02/Apr/2014:20:50:22 +0000] conn=1365 op=19 RESULT err=0 tag=101
> nentries=1 etime=1
>
> Something happens here.  The replica init is done, according to the
> replica error log.  We don't have the replica access log from around this
> time to see exactly when the connection was closed, but looking at the ipa
> code, it would appear that ipa did not see a status of "Total update
> succeeded".  Not sure why the master would not have reported that, unless
> there was some problem getting back the status from the replica.
>
> [02/Apr/2014:20:50:22 +0000] conn=1365 op=20 UNBIND
> [02/Apr/2014:20:50:22 +0000] conn=1365 op=20 fd=114 closed - U1
>
> Then ipa-replica-install closes the connection and reports the error.
>
>
>
>
> On Thu, Apr 3, 2014 at 10:38 AM, Rich Megginson <[email protected]>wrote:
>
>>  On 04/02/2014 09:22 PM, Nevada Sanchez wrote:
>>
>> Okay. Updated the gist with the additional logs:
>> https://gist.github.com/nevsan/8b6f78d7396963dc5f70
>>
>>
>>
>>  1) Dirsrv is crashing:
>> [02/Apr/2014:20:49:53 +0000] - 389-Directory/1.3.1.22.a1 B2014.073.1751
>> starting up
>> [02/Apr/2014:20:49:54 +0000] - Db home directory is not set. Possibly
>> nsslapd-directory (optionally nsslapd-db-home-directory) is missing in the
>> config file.
>> [02/Apr/2014:20:49:54 +0000] - I'm resizing my cache now...cache was
>> 710029312 and is now 8000000
>> [02/Apr/2014:20:49:54 +0000] - 389-Directory/1.3.1.22.a1 B2014.073.1751
>> starting up
>> [02/Apr/2014:20:49:54 +0000] - Detected Disorderly Shutdown last time
>> Directory Server was running, recovering database.
>> [02/Apr/2014:20:49:55 +0000] - slapd started. Listening on All Interfaces
>> port 389 for LDAP requests
>>
>> Please use the instructions at
>> http://port389.org/wiki/FAQ#Debugging_Crashes to get a core dump and
>> stack trace.
>>
>> 2) The first occurrence of the connection error is at
>> [02/Apr/2014:20:52:38 +0000] but there isn't anything in the consumer error
>> log after [02/Apr/2014:20:50:21 +0000] and in the consumer access log after
>> [02/Apr/2014:20:50:22 +0000]
>>
>>
>>   On Wed, Apr 2, 2014 at 9:38 PM, Rich Megginson <[email protected]>wrote:
>>
>>>  On 04/02/2014 03:01 PM, Nevada Sanchez wrote:
>>>
>>> Okay, I ran it with debug on. The output is quite large. I'm not sure
>>> what the etiquette is for posting large logs, so I threw it on gist here:
>>> https://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt<http://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt>
>>>
>>>  Let me know if I should copy it into the thread instead.
>>>
>>>
>>>  Ok.  Now can you post excerpts from the dirsrv errors log from both the
>>> master replica and the replica from around the time of the failure?
>>>
>>>
>>>
>>>
>>> On Wed, Apr 2, 2014 at 1:49 PM, Rich Megginson <[email protected]>wrote:
>>>
>>>>  On 04/02/2014 11:45 AM, Nevada Sanchez wrote:
>>>>
>>>> My apologies. I mistakenly ran the failing ldapsearch from an
>>>> unpriviliged user (couldn't read slapd-EXAMPLE-COM directory). Running as
>>>> root, it now works just fine (same result as the one that worked). SSL
>>>> seems to not be the issue. Also, I haven't change the SSL certs since I
>>>> first set up the master.
>>>>
>>>>  I have been doing the replica side things from scratch (even so far
>>>> as starting with a new machine). For the master side, I have just been
>>>> re-preparing the replica. I hope I don't have to start from scratch with
>>>> the master replica.
>>>>
>>>>
>>>>  I guess the next step would be to do the ipa-replica-install using
>>>> -ddd and review the extra debug information that comes out.
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Apr 2, 2014 at 11:45 AM, Rob Crittenden <[email protected]>wrote:
>>>>
>>>>> Rich Megginson wrote:
>>>>>
>>>>>>  On 04/02/2014 09:20 AM, Nevada Sanchez wrote:
>>>>>>
>>>>>>>  Okay, we might be on to something:
>>>>>>>
>>>>>>> ipa -> ipa2
>>>>>>> ================================
>>>>>>> $ LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM ldapsearch -xLLLZZ
>>>>>>>  -h ipa2.example.com <http://ipa2.example.com> -s base -b ""
>>>>>>>
>>>>>>> 'objectclass=*' vendorVersion
>>>>>>> dn:
>>>>>>> vendorVersion: 389-Directory/1.3.1.22.a1 B2014.073.1751
>>>>>>> ================================
>>>>>>>
>>>>>>> ipa2 -> ipa
>>>>>>> ================================
>>>>>>> $ LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM ldapsearch -xLLLZZ
>>>>>>>  -h ipa.example.com <http://ipa.example.com> -s base -b ""
>>>>>>>
>>>>>>> 'objectclass=*' vendorVersion
>>>>>>> ldap_start_tls: Connect error (-11)
>>>>>>> additional info: TLS error -8172:Peer's certificate issuer has been
>>>>>>> marked as not trusted by the user.
>>>>>>> ================================
>>>>>>>
>>>>>>> The original IPA trusts the replica (since it signed the cert, I
>>>>>>> assume), but the replica doesn't trust the main IPA server. I guess
>>>>>>> the ZZ option would have shown me the failure that I missed in my
>>>>>>> initial ldapsearch tests.
>>>>>>>
>>>>>>          -Z[Z]  Issue StartTLS (Transport Layer Security) extended
>>>>>> operation. If
>>>>>>                you  use  -ZZ, the command will require the operation
>>>>>> to
>>>>>> be suc-
>>>>>>                cessful.
>>>>>>
>>>>>> i.e. use SSL, and force a successful handshake
>>>>>>
>>>>>>
>>>>>>> Anyway, what's the best way to remedy this in a way that makes IPA
>>>>>>> happy? (I've found that LDAP can have different requirements on which
>>>>>>> certs go where).
>>>>>>>
>>>>>>
>>>>>> I'm not sure.
>>>>>> ipa-server-install/ipa-replica-prepare/ipa-replica-install
>>>>>> is supposed to take care of installing the CA cert properly for you.
>>>>>> If
>>>>>> you try to hack it and install the CA cert manually, you will probably
>>>>>> miss something else that ipa install did not do.
>>>>>>
>>>>>> I think the only way to ensure that you have a properly configured ipa
>>>>>> server + replicas is to get all of the ipa commands completing
>>>>>> successfully.
>>>>>>
>>>>>> Which means going back to the drawing board and starting over from
>>>>>> scratch.
>>>>>>
>>>>>
>>>>> You can compare the certs that each side is using with:
>>>>>
>>>>> # certutil -L -d /etc/dirsrv/slapd-EXAMPLE-COM
>>>>>
>>>>> Did you by chance replace the SSL server certs that IPA uses on your
>>>>> working master?
>>>>>
>>>>> rob
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

_______________________________________________
Freeipa-users mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/freeipa-users

Re: [Freeipa-users] IPA Replica Issues (Total update abortedLDAP error: Can't contact LDAP server)

Reply via email to