On 04/03/2014 03:46 PM, Nevada Sanchez wrote:
Okay, I updated the gist and extended some of the logs (ipa2-errors
does stop at 20:50:21). I'll follow up when I have the debug stuff in
place.
https://gist.github.com/nevsan/8b6f78d7396963dc5f70
Another strange thing - it looks as if the initial replica init
completes successfully.
[02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin - Beginning total
update of replica "agmt="cn=meToipa2.example.com" (ipa2:389)".
On the replica:
[02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin -
multimaster_be_state_change: replica dc=example,dc=com is going offline;
disabling replication
[02/Apr/2014:20:50:18 +0000] - WARNING: Import is running with
nsslapd-db-private-import-mem on; No other process is allowed to access
the database
[02/Apr/2014:20:50:21 +0000] - import userRoot: Workers finished;
cleaning up...
[02/Apr/2014:20:50:21 +0000] - import userRoot: Workers cleaned up.
[02/Apr/2014:20:50:21 +0000] - import userRoot: Indexing complete.
Post-processing...
[02/Apr/2014:20:50:21 +0000] - import userRoot: Generating
numSubordinates complete.
[02/Apr/2014:20:50:21 +0000] - import userRoot: Flushing caches...
[02/Apr/2014:20:50:21 +0000] - import userRoot: Closing files...
[02/Apr/2014:20:50:21 +0000] - import userRoot: Import complete.
Processed 453 entries in 3 seconds. (151.00 entries/sec)
[02/Apr/2014:20:50:21 +0000] NSMMReplicationPlugin -
multimaster_be_state_change: replica dc=example,dc=com is coming online;
enabling replication
On the master, access log:
[02/Apr/2014:20:50:17 +0000] conn=1365 op=15 MOD
dn="cn=meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
tree,cn=config"
This is the operation that triggers the replica init. Then
ipa-replica-install polls for agreement status:
[02/Apr/2014:20:50:19 +0000] conn=1365 op=16 SRCH
base="cn=meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
tree,cn=config" scope=0 filter="(objectClass=*)"
attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
nsds5replicaLastInitEnd"
[02/Apr/2014:20:50:19 +0000] conn=1365 op=16 RESULT err=0 tag=101
nentries=1 etime=0
[02/Apr/2014:20:50:20 +0000] conn=1365 op=17 SRCH
base="cn=meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
tree,cn=config" scope=0 filter="(objectClass=*)"
attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
nsds5replicaLastInitEnd"
[02/Apr/2014:20:50:20 +0000] conn=1365 op=17 RESULT err=0 tag=101
nentries=1 etime=0
[02/Apr/2014:20:50:21 +0000] conn=1365 op=18 SRCH
base="cn=meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
tree,cn=config" scope=0 filter="(objectClass=*)"
attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
nsds5replicaLastInitEnd"
[02/Apr/2014:20:50:21 +0000] conn=1365 op=18 RESULT err=0 tag=101
nentries=1 etime=0
[02/Apr/2014:20:50:22 +0000] conn=1365 op=19 SRCH
base="cn=meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
tree,cn=config" scope=0 filter="(objectClass=*)"
attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
nsds5replicaLastInitEnd"
[02/Apr/2014:20:50:22 +0000] conn=1365 op=19 RESULT err=0 tag=101
nentries=1 etime=1
Something happens here. The replica init is done, according to the
replica error log. We don't have the replica access log from around
this time to see exactly when the connection was closed, but looking at
the ipa code, it would appear that ipa did not see a status of "Total
update succeeded". Not sure why the master would not have reported
that, unless there was some problem getting back the status from the
replica.
[02/Apr/2014:20:50:22 +0000] conn=1365 op=20 UNBIND
[02/Apr/2014:20:50:22 +0000] conn=1365 op=20 fd=114 closed - U1
Then ipa-replica-install closes the connection and reports the error.
On Thu, Apr 3, 2014 at 10:38 AM, Rich Megginson <[email protected]
<mailto:[email protected]>> wrote:
On 04/02/2014 09:22 PM, Nevada Sanchez wrote:
Okay. Updated the gist with the additional logs:
https://gist.github.com/nevsan/8b6f78d7396963dc5f70
1) Dirsrv is crashing:
[02/Apr/2014:20:49:53 +0000] - 389-Directory/1.3.1.22.a1
B2014.073.1751 starting up
[02/Apr/2014:20:49:54 +0000] - Db home directory is not set.
Possibly nsslapd-directory (optionally nsslapd-db-home-directory)
is missing in the config file.
[02/Apr/2014:20:49:54 +0000] - I'm resizing my cache now...cache
was 710029312 and is now 8000000
[02/Apr/2014:20:49:54 +0000] - 389-Directory/1.3.1.22.a1
B2014.073.1751 starting up
[02/Apr/2014:20:49:54 +0000] - Detected Disorderly Shutdown last
time Directory Server was running, recovering database.
[02/Apr/2014:20:49:55 +0000] - slapd started. Listening on All
Interfaces port 389 for LDAP requests
Please use the instructions at
http://port389.org/wiki/FAQ#Debugging_Crashes to get a core dump
and stack trace.
2) The first occurrence of the connection error is at
[02/Apr/2014:20:52:38 +0000] but there isn't anything in the
consumer error log after [02/Apr/2014:20:50:21 +0000] and in the
consumer access log after [02/Apr/2014:20:50:22 +0000]
On Wed, Apr 2, 2014 at 9:38 PM, Rich Megginson
<[email protected] <mailto:[email protected]>> wrote:
On 04/02/2014 03:01 PM, Nevada Sanchez wrote:
Okay, I ran it with debug on. The output is quite large. I'm
not sure what the etiquette is for posting large logs, so I
threw it on gist here:
https://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt
<http://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt>
Let me know if I should copy it into the thread instead.
Ok. Now can you post excerpts from the dirsrv errors log
from both the master replica and the replica from around the
time of the failure?
On Wed, Apr 2, 2014 at 1:49 PM, Rich Megginson
<[email protected] <mailto:[email protected]>> wrote:
On 04/02/2014 11:45 AM, Nevada Sanchez wrote:
My apologies. I mistakenly ran the failing ldapsearch
from an unpriviliged user (couldn't read
slapd-EXAMPLE-COM directory). Running as root, it now
works just fine (same result as the one that worked).
SSL seems to not be the issue. Also, I haven't change
the SSL certs since I first set up the master.
I have been doing the replica side things from scratch
(even so far as starting with a new machine). For the
master side, I have just been re-preparing the replica.
I hope I don't have to start from scratch with the
master replica.
I guess the next step would be to do the
ipa-replica-install using -ddd and review the extra
debug information that comes out.
On Wed, Apr 2, 2014 at 11:45 AM, Rob Crittenden
<[email protected] <mailto:[email protected]>> wrote:
Rich Megginson wrote:
On 04/02/2014 09:20 AM, Nevada Sanchez wrote:
Okay, we might be on to something:
ipa -> ipa2
================================
$
LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM
ldapsearch -xLLLZZ
-h ipa2.example.com
<http://ipa2.example.com>
<http://ipa2.example.com> -s base -b ""
'objectclass=*' vendorVersion
dn:
vendorVersion: 389-Directory/1.3.1.22.a1
B2014.073.1751
================================
ipa2 -> ipa
================================
$
LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM
ldapsearch -xLLLZZ
-h ipa.example.com <http://ipa.example.com>
<http://ipa.example.com> -s base -b ""
'objectclass=*' vendorVersion
ldap_start_tls: Connect error (-11)
additional info: TLS error -8172:Peer's
certificate issuer has been
marked as not trusted by the user.
================================
The original IPA trusts the replica (since
it signed the cert, I
assume), but the replica doesn't trust the
main IPA server. I guess
the ZZ option would have shown me the
failure that I missed in my
initial ldapsearch tests.
-Z[Z] Issue StartTLS (Transport Layer
Security) extended
operation. If
you use -ZZ, the command will require the
operation to
be suc-
cessful.
i.e. use SSL, and force a successful handshake
Anyway, what's the best way to remedy this
in a way that makes IPA
happy? (I've found that LDAP can have
different requirements on which
certs go where).
I'm not sure.
ipa-server-install/ipa-replica-prepare/ipa-replica-install
is supposed to take care of installing the CA
cert properly for you. If
you try to hack it and install the CA cert
manually, you will probably
miss something else that ipa install did not do.
I think the only way to ensure that you have a
properly configured ipa
server + replicas is to get all of the ipa
commands completing successfully.
Which means going back to the drawing board and
starting over from scratch.
You can compare the certs that each side is using with:
# certutil -L -d /etc/dirsrv/slapd-EXAMPLE-COM
Did you by chance replace the SSL server certs that
IPA uses on your working master?
rob
_______________________________________________
Freeipa-users mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/freeipa-users