On Wed, Nov 11, 2009 at 3:01 PM, Howard Chu <[email protected]> wrote: > Edward Capriolo wrote: >> On Thu, Nov 5, 2009 at 5:25 AM, Torsten Schlabach (Tascel eG) >> <[email protected]> wrote: >>> Hi Quanah! >>> >>>> I suggest you go read the CHANGES log for what has been fixed between >>>> 2.4.11 and the latest stable 2.4.19. >>> >>> I need to say, it worries me a bit that for problems with a core feature >>> which has been around for quite some time, the answer is more often that >>> I like to hear: You need to use the latest version released last week / >>> month or so. >>> >>> I have indeed read the CHANGES and seen that some issues have been >>> fixed. I have no idea if we are affected by those issues or now. >>> >>> Also how would I know that *now* in 2.4.19 all problems are fixed and >>> the answer next week won't be: You need to use 2.4.20. >>> >>> But as this is a FOSS project and not a product we pay for, we >>> understand that we should not blame people but try and help if we find a >>> a problem. >>> >>> For that reason I have asked in my email for help on *understanding* and >>> *diagnosing* problems to have a chance to contribute in case we will >>> find any new issues. >>> >>> Also our customers may not like it if in case of a problem we tell them: >>> Let's wait if in some weeks a new release will come which will fix it or >>> not. So I'd rather be in a position to get my hands dirty myself in case >>> of problems. >>> >>> Regards, >>> Torsten >>> >>> >>> Quanah Gibson-Mount schrieb: >>>> --On Wednesday, November 04, 2009 1:12 PM +0100 "Torsten Schlabach >>>> (Tascel eG)" <[email protected]> wrote: >>>> >>>>> Hi all! >>>>> >>>>> I am currently trying to chase some problems in an n-way multi-master >>>>> setup with three servers. We have used the instructions at >>>>> >>>>> http://www.openldap.org/doc/admin24/replication.html#N-Way%20Multi-Master >>>>> >>>>> as our guidance and we are using OpenLDAP version 2.4.11. >>>> >>>> I suggest you go read the CHANGES log for what has been fixed between >>>> 2.4.11 and the latest stable 2.4.19. >>>> >>>> --Quanah >>>> >>>> -- >>>> >>>> Quanah Gibson-Mount >>>> Principal Software Engineer >>>> Zimbra, Inc >>>> -------------------- >>>> Zimbra :: the leader in open source messaging and collaboration >>> >> >>>> Also how would I know that *now* in 2.4.19 all problems are fixed and >>>> the answer next week won't be: You need to use 2.4.20. >> >> Testing reveals the presence of bugs, not the absence :) So no one >> can every say version x.y.z is certified bug free. >> >> However, I do tend to agree, in that my MM just flaked out, and there >> is not much load/write/update going on so I am a bit worried. >> >> I am not trying to put down OpenLDAP but iplanet/fedora directory >> server/389 support up to a 4 way MM implementation and I have found >> the replication rock solid even under high load. So if MM is your >> requirement that may be a more valid option. > > The historical evidence disagrees with your assertion. Even at this late date, > FDS MMR still breaks irrecoverably. > > https://www.redhat.com/archives/fedora-directory-users/2009-November/msg00056.html > > > How many years have they been flogging this feature? They still haven't got it > right. They can't. > > MMR is inherently flawed, as we have been saying for years. > > http://www.watersprings.org/pub/id/draft-zeilenga-ldup-harmful-02.txt > > We have implemented it in OpenLDAP mainly for political reasons, not because > we changed our minds and now believe it to be technically sound. It is not. We > developed and recommend MirrorMode because the only safe way to do replication > is by preserving single-master consistency. > >>>> The answer is quite simple: do not use multimaster replication in a >>>> production environment. In most cases the requirement for multimaster >>>> replication is just based on poor directory design. >> >> Dieter, I do not agree with that. You can't blame a user for using a >> feature. It is not marked as experimental anymore so people are going >> to use it. Once it fails you can't call them a "Poor Directory >> Designer" for using it. >> >> http://www.openldap.org/faq/data/cache/1240.html > > If they have implemented MMR without reading all of the warnings, they are > certainly poor designers for not becoming fully informed of the topic before > deploying it. If they have implemented MMR after reading all of the warnings, > they made a conscious choice. > > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/ >
I understand that open LDAP does not do distributed locking, as a result I do not expect it to have ACID compliance. Fedora Directory Server/389 has a "last update wins policy" so this is a much more optimistic strategy, but it works (for what I was doing) Since I have joined this mailing list after my problems started, about a month ago, I have seem at least 4 other threads with similar issues. http://www.openldap.org/lists/openldap-software/200911/msg00015.html http://www.openldap.org/lists/openldap-software/200911/msg00021.html ... Upgrade to 2.4.19 is suggested as a resolution, and I found another thread with a bigger problem in that version. As to the link you have posted: https://www.redhat.com/archives/fedora-directory-users/2009-November/msg00056.html It is very easy to quickly search a mailing list and find some people having problems software. That does not prove FDS has many MM problems. I personally ran two node FDS instance with very active WRITE/UPDATE for two years and had only a few isolated problems. >> If they have implemented MMR without reading all of the warnings, >> they are certainly poor designers for not becoming fully informed of the >> topic before deploying it. >From my prospective, I find the reliability of M-M openldap on 2.4.16 brittle. I am not the only one having problems. Your comment seems to suggest I did not read enough. I would upgrade to 2.4.19 but someone else on this list is having problems with that so that does not seem like a safe option. Since I have installed openldap on two lightly traffic nodes: 1) One node locked up 2) After lockup/restart the nodes did not re-establish two way replication connection 3) I have out of sync data (which I do not believe was added during the downtime caused by 1) Linking to an RFC and implying that I "Don't read enough" is wrong. If my light usage is bringing to light obvious bugs and I am not the only one having these issues, not enough testing on the software development side is being done. As an administrator I ran 'make test' and watched test050-syncrepl-multimaster complete. That coupled with the fact that multi-master is no longer being labeled as an "experimental" feature led me believe it worked reasonably well. The RFC makes no mention of my #2 problem 'After lockup/restart the nodes did not re-establish two way replication connection'. Is that supposed to be the fault of the user? This is obviously a bug or an edge case. This is not the fault of a user not reading enough. Which is where the frustration is I think. People are willing to accept the failure cases covered in the RFC, but the RFC is not a blanked statement "WE told you not to run this" for every bug that appears.
