Erick, thanks. I now do see segment files in an index.<timestamp> directory at the replicas. Not sure why they were not getting populated earlier.
I have a couple more questions, the second is more elaborate - let me know if I should move it to a separate thread. (1) The speed of adding documents in SolrCloud is excruciatingly slow. It takes about 30-50 seconds to add a batch of 100 documents (and about twice that to add 200, etc.) to the primary but just ~10 seconds to add 5K documents in batches of 200 on a standalone solr 4 server. The log files indicate that the primary is timing out with messages like below and Cloud->Graph in the UI shows the other two replicas in orange after starting green. org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://localhost:7574/solr Any idea why? (3) I am seriously considering using symbolic links for a replicated solr setup with completely independent instances on a *single machine*. Tell me if I am thinking about this incorrectly. Here is my reasoning: (a) Master/slave replication in 3.6 simply seems old school as it doesn't have the nice consistency properties of SolrCloud. Polling say every 20 seconds means I don't know exactly how up-to-speed each replica is, which will complicate my request re-distribution. (b) SolrCloud seems like a great alternative to master/slave replication. But it seems slow (see 1) and having played with it, I don't feel comfortable with the maturity of ZK integration (or my comprehension of it) in solr 4 alpha. (c) Symbolic links seem like the fastest and most space-efficient solution *provided* there is only a single writer, which is just fine for me. I plan to run completely separate solr instances with one designated as the primary and do the following operations in sequence: Add a batch to the primary and commit --> From each replica's index directory, remove all symlinks and re-create symlinks to segment files in the primary (but not the write.lock file) --> Call update?commit=true to force replicas to re-load their in-memory index --> Do whatever read-only processing is required on the batch using the primary and all replicas by manually (randomly) distributing read requests --> Repeat sequence. Is there any downside to 3(c) (other than maintaining a trivial script to manage symlinks and call commit)? I tested it on small index sizes and it seems to work fine. The throughput improves with more replicas (for 2-4 replicas) as a single replica is not enough to saturate the machine (due to high query latency). Am I overlooking something in this setup? Overall, I need high throughput and minimal latency from the time a document is added to the time it is available at a replica. SolrCloud's automated request redirection, consistency, and fault-tolerance is awesome for a physically distributed setup, but I don't see how it beats 3(c) in a single-writer, single-machine, replicated setup. AV On Jul 9, 2012, at 9:43 AM, Erick Erickson [via Lucene] wrote: > No, you're misunderstanding the setup. Each replica has a complete > index. Updates get automatically forwarded to _both_ nodes for a > particular shard. So, when a doc comes in to be indexed, it gets > sent to the leader for, say, shard1. From there: > 1> it gets indexed on the leader > 2> it gets forwarded to the replica(s) where it gets indexed locally. > > Each replica has a complete index (for that shard). > > There is no master/slave setup any more. And you do > _not_ have to configure replication. > > Best > Erick > > On Sun, Jul 8, 2012 at 1:03 PM, avenka <[hidden email]> wrote: > > > I am trying to wrap my head around replication in SolrCloud. I tried the > > setup at http://wiki.apache.org/solr/SolrCloud/. I mainly need replication > > for high query throughput. The setup at the URL above appears to maintain > > just one copy of the index at the primary node (instead of a replicated > > index as in a master/slave configuration). Will I still get roughly an > > n-fold increase in query throughput with n replicas? And if so, why would > > one do master/slave replication with multiple copies of the index at all? > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > If you reply to this email, your message will be added to the discussion > below: > http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3993889.html > To unsubscribe from SolrCloud replication question, click here. > NAML -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3993960.html Sent from the Solr - User mailing list archive at Nabble.com.