Hi Erick, Thank you for the advice. Given what you've told us, we've modified our plan to include three physical boxes: one master indexer, and two slaves. I have a question, however.
Suppose that the master node goes down and we promote the designated slave to take over. The new master node will then have to start running the delta-import queries to grab new transactions. What happens when we bring the old master node back online? Its index version will be out of sync. If we switched the slave nodes back to it, I believe that the slaves would perform a full index download. What if we configured the main master node to also be a slave of the slaves? So, in the above scenario, once the original master node comes back online, it will sync its index with the new master node, and we can switch it back seamlessly. On Wed, Jun 15, 2011 at 2:57 PM, Erick Erickson <erickerick...@gmail.com>wrote: > More hardware <G>... > > Here's one scenario... > > If you set up a master and two slaves, and then front the slaves > with a load balancer your system will be more robust. > > In the event a slave goes down, all search requests will be handled > by the remaining slave while you create a new slave, have it replicate > once, then let the load balancer know about the new machine... > > If your master goes down, you can pretty quickly promote one of your > slaves to become the new master, then create a new slave as above. > > Once the new master is in place, you have your delta query re-index > all of the data that's changed since last known good commit that was > replicated to your slave. > > Hope this helps. > Erick > > On Wed, Jun 15, 2011 at 3:05 PM, Kyle Lee <randall.kyle....@gmail.com> > wrote: > > Hello, > > > > Our development team is currently looking into migrating our search > system > > to Apache Solr, and we would greatly appreciate some advice on setup. We > are > > indexing approximately two hundred million database rows. We add about a > > hundred thousand new rows throughout the day. These new database rows > must > > be searchable within two minutes of their receipt. > > > > We don't want the indexing to bog down the searcher, so our thought is to > > have two Solr servers running on different machines in a replication > setup. > > The first Solr instance will be the indexer. It will use the > > DataImportHandler to index the delta and have autocommit enabled to > prevent > > overzealous commit rates. Index optimization will take place during > > scheduled periods. The second Solr instance (the slave) will be the > primary > > searcher and will have its indexes stored on RAIDed solid state drives. > > > > What we are concerned about is failover. Our searches are > mission-critical. > > If the primary searcher goes down for whatever reason, our search service > > will automatically shunt queries over to the indexer node instead. > Indexing > > is equally critical, though. If the indexer dies, we need to have a warm > > failover standing by. Is there a recommended way to automate master node > > failover in Solr replication? I've begun looking into ZooKeeper, but I > > wasn't sure if this was the best approach. > > >