All, At my current customer we have developed a custom federator that will federate queries between Endeca and Solr to ease the transition from an extremely large (TBs of data) Endeca index to Solr. (Endeca is similar to Solr in terms of search/faceted navigation/etc).
During this transition plan we need to support multi datacenter failover which we have historically handled via load balancers with the appropriate failover configurations (think F5). We are currently playing our dataloads into multiple datacenters to ensure data consistency. (Each datacenter has a stand-alone instance of solrcloud with its own redundancy/failover) I am curious to see how the community handles multi datacenter failureover at the presentation layer (datacenter A goes down and we want to failover to B). Solrcloud within a datacenter will handle single datacenter failure within the instance, but in order to support multi datacenter failover I haven't seen a definitive ‘answer’ as to how to handle this situation. At this point the only two options I can come up with are 1) Fail the entire datacenter if Solrcloud goes offline (GUI/index/etc go offline) - This is problematic because some portion of user activity will fail, queries that are in transit will not complete 2) Implement failover at the custom federator level. In doing so we would need to detect a failure at datacenter A within our federator, then query datacenter B to fulfill the user request, then potentially fail the entire datacenter A once all transactions have been fulfilled against A Since we are looking up the active solr instance via zookeeper (solrcloud) per datacenter I don’t see any reasonable means of failing over to another datacenter if a given solrcloud instance goes down? Any thoughts are welcome at this point? Thanks Jaime