On Mon, Aug 10, 2015 at 05:19:22PM -0700, Brian Wright wrote: > We're trying to solve the problem of how to recover/replace a failed > node in a system containing a very large number of records and bring > it back into the cluster as quickly as possible. We're also trying > to resolve how to ensure that replication works consistently on > restart.
In terms of recovering a failed node, the very fastest method is to use a database backup made with mdb_copy. The output from that command is a file that can be used directly as an MDB database so all you have to do is put it in place and restart slapd. Even if the backup is a day or two old, the replication process should bring in the more recent changes from another server. If your servers have identical software, you can even take a backup from one server and install it on another one. That gives you a quick way of copying in very fresh data. Note that mdb_copy is not installed by default. For safety you must use a binary built from the same OpenLDAP distribution as your slapd. You will find the source for the MDB tools in openldap-2.4.*/libraries/liblmdb There are some caveats with mdb_copy. In particular it can cause database bloat if run on a server that has a heavy write load at the time. Andrew -- ----------------------------------------------------------------------- | From Andrew Findlay, Skills 1st Ltd | | Consultant in large-scale systems, networks, and directory services | | http://www.skills-1st.co.uk/ +44 1628 782565 | -----------------------------------------------------------------------
