Re: Replication issue during performance test with MMR configuration and LastBind enabled

falgon . comp Thu, 23 Nov 2023 13:39:04 -0800

Quanah Gibson-Mount wrote:
> --On Tuesday, November 7, 2023 12:56 PM +0000 falgon.comp(a)gmail.com wrote:
> 
> >  Hello, sorry for the delay.
> >  Thank's for the answers,
> > 
> > > Generally, with something like lastbind, you'll run into collissions of
> > > the  timestamp, which will cause a lot of havoc with replication.  It is
> > > not the  only case where this can occur.  I highly advise reading the
> > > caveats in the  admin guide about MPR replication.
> > 
> >  Yes, that's what we thought at first, but with the various tests we've
> >  carried out, we're doubtful about the collision problem. When testing
> >  with a single account that BIND more than 500 times per second, we can't
> >  reproduce the problem. The same applies to 10 accounts looping at 500
> >  BIND/s. 
> So I'm looking at your configuration and have some question:
> 
> a) olcPasswordCryptSaltFormat: $6$rounds=10000$%.16s -> Why are you using 
> crypt passwords?  OpenLDAP ships with multiple, secure module for password 
> hashing, such as argon2.  I'd advise using that.  Note that crypt is 
> non-portable.
> 
> 
> b) olcLogLevel: stats sync
> 
> This generally should be:
> 
> olcLogLevel: stats
> olcLogLevel: sync
> 
> c) olcPasswordHash: {CRYPT} -> See (a)
> 
> d) I'd suggest not using a root password at all for cn=config, and use 
> EXTERNAL auth over ldapi.  If you are going to use one, upgrade to argon2
> 
> e) Why do you have separate credentions for the monitor db?
> 
> f) Delete this index: olcDbIndex: pwdLastSuccess eq,pres
> 
> g) olcSpReloadHint: TRUE -> This setting should *not* be on the main DB, 
> delete it from
> dn: olcOverlay={0}syncprov,olcDatabase={2}mdb,cn=config
> 
> h) For your benchmark test, this is probably not frequent enough, as the 
> purge will never run since you're saying only data > 1 day old:
> olcAccessLogPurge: 01+00:00 00+04:00
> 
> i) For the accesslog DB, are you sure this is a large enough size? 
> olcDbMaxSize: 2147483648 or are you hitting 2GB?
> 
> 
> 
> Also it appears you're running this test on two slapds running on the same 
> server?  That's an incredibly bad idea, since the I/O will conflict 
> massively between the two processes writing to disk.


Hello, thank you for the answer and the time for reading the config files of 
Meheni.
I will can answer you for all your questions:

a) + c) Why are you using crypt passwords? 
- We're using Crypt because we're migrating from an old solution to OpenLDAP 
and the Crypt option is the most secure and compatible for us.

b) olcLogLevel: stats sync
- We running our tests with stats only. Meheni probably left this configuration 
to check before sending the config here.

d) I'd suggest not using a root password at all for cn=config
- Thank you for this option, we will probably try it

e) Why do you have separate credentions for the monitor db?
- sorry for this i don't understand the word credentions. Do you mean 
credentials ?

f) Delete this index: olcDbIndex: pwdLastSuccess eq,pres
- This Index are used in some filters, + we have trying another architecture 
with 1 provider and multiples consumers + a referalForward. 
But yes this is a good idea, we will try our tests without this index. With 
300+ BIND/s this Index is constantly recalculated. Thanks

g) olcSpReloadHint: TRUE -> This setting should *not* be on the main DB,
- Thanks yes we have it in the two DB, we will delete it.

h) For your benchmark test, this is probably not frequent enough, as the purge 
will never run since you're saying only data
- We've run endurance tests to include purging. This settings is from a month 
ago and we have change this settings multiples times for testing differents 
setup.To add the purge during tests, we actually  set it to 00+01:00 00+00:03. 
In the final configuration we will probably set it too : 03+00:00 00+00:03. We 
found that purging every 3 minutes reduced the impact on performance.

i) + last question : For the accesslog DB, are you sure this is a large enough 
size? Also it appears you're running this test on two slapds running on the 
same server? 

- This is because of Meheni's configuration when we cleaned up our 
configuration files to share it here for privacy reasons. (he tried to 
reproduce it on his virtual machine and reproduced it)
We running our test on 4 servers with only one slapd by server.
And actually the accesslog size on each server are : 64424509440

Further information : we tested the consumer/provider mode before MPR, but it 
didn't meet our needs. We have better performances with the current 
configuration. (all that remains is to find a solution to the replication 
problem)

I repost a previous question here too : What are the exact messages or errors 
messages we should find in case of a collision problem?

Thanks again for your time and your help

Re: Replication issue during performance test with MMR configuration and LastBind enabled

Reply via email to