You're not missing anything that I know of. The best I've been able to come up with so far is to treat the disparate DCs as separate clusters. Your ingestion process needs to know enough to send updates to both DCs but that's the only point of contact.
The problem I see here is not only with inter-DC communications, but ZK. Let's say ZK1 is in DC1 and ZK2 and ZK3 are in DC2. Now anytime the connection is lost, DC1 is down since it can't sense a ZK quorum. I know of one person who put the ZK nodes in three separate DCs to help with that problem. But the bottom line is that SolrCloud chatters amongst nodes and you have no good ways to control it. Either you have to accept the latency between DCs or use separate clusters as far as I know. I do know there's some JIRAs about making SolrCloud "rack aware" which may address this, but I don't think they're in place yet. Best Erick On Tue, Feb 5, 2013 at 10:17 AM, Michael Tracey <mtra...@biblio.com> wrote: > Hey all, new to Solr 4.x, and am wondering if there is any way that I > could have a single collection (single or multiple shards) replicated into > two datacenters, where only 1 solr instance in each datacenter communicate. > (for example, 4 servers in one DC, 4 servers in another datacenter and > only one in each DC communicate). > > From everything I've seen, all zookeepers and replicas must have access to > all other members. Is there something I'm missing? > > Thanks, > > M. >