Jason, Thanks for raising it! Erick, That's what I want to discuss for a long time. Frankly speaking, the question is:
if old-school (master/slave) search deployments doesn't comply to vision by SolrCloud/ElasticSearch, does it mean that they are wrong? Let me enumerate kinds of 'old-school search': - number of docs is not so dramatic to make sharding profitable from search latency's POV; - index updates are not frequent, they are rather rare nightly bulks; - search index is not a SOR (system of records) - it's a secondary system, provides the search service, still significant for the enterprise; - there is an SOR - primary system, which is kind of CMS or RDBMS or CMS with publish through RDBMS, etc; Does it look like your system? No, - click Delete button! // for few people who still read this: That's what I have with Solr Cloud in this case: - I can decide don't deal with sharding. Good! put numShards=0, and buy more (VM) instances to have more replicas to increase throughput; - start nightly reindex - delQ *:* , add(....), commit() - in this case all my instances will spend resources to indexing same docs, instead of handling search requests - BAD#1; - even I'm able to supply long Iterable<SolrInputDocument>, DistribudedUpdateProcessor will throw documents one by one, not by huge chunks, that leads to many small segments - eg. if I have 100Mb RAM buffer, and 10 servlet container threads I'll have sequence of 10Mb segments; - every of these flushes also flushes some part of current index mapped to the RAM that impacts search latency BAD#2; - when indexing is over I have a many small segments, and then The Merge starts, which also flushes current index from RAM BAD#3. In summary: I waste resources for indexing same stuff on searcher nodes, as a side effect I have longer period of latency impact. How I want to do it: - in the cloud I add small instances as replicas on demand to adjust for work load dynamically; - when I need to reindex (full import) I can rent super cool VM instance with manyway-CPU, run indexing on it; - if it blows off, no problem I can run full import from my CMS/DB again from the beginning - or i can run two imports simultaneously; - after indexing finished, I can push index to searchers or start new ones mounting index to them. Please tell me where I'm wrong, whether it SolrCloud features, 'cloud' economy, hard/VMware architecture or Lucene internals. Can Jason and myself adjust SolrCloud for our 'old-school' pattern? Thanks for sharing your opinion! On Thu, Dec 6, 2012 at 7:19 PM, Erick Erickson <erickerick...@gmail.com>wrote: > First, forget about master/slave with SolrCloud! Leaders really exist to > resolve conflicts, the old notion of M/S replication is largely irrelevant. > > Updates can go to any node in the cluster, leader, replica, whatever. The > node forwards the doc to the correct leader based on a hash of the > <uniqueKey>, which then forwards the raw document to all replicas. Then all > the replicas index the document separately. Note that this is true on > mutli-document packets too. You can't get NRT with the old-style > replication process where the master indexes the doc and then the _index_ > is replicated... > > As for your second question, it sounds like you want to go from > numShards=2, say to numShards=3. You can't do that as it stands. There are > two approaches: > 1> "shard splitting" which would redistribute the documents to a new set of > shards > 2> pluggable hashing which allows you to specify the code that does the > shard assignment. > Neither of these are available yet, although <2> is imminent. There is > active work on <1>, but I don't think that will be ready as soon. > > Best > Erick > > > On Tue, Dec 4, 2012 at 11:21 PM, Jason <hialo...@gmail.com> wrote: > > > I'm using master and slave server for scaling. > > Master is dedicated for indexing and slave is for searching. > > Now, I'm planning to move SolrCloud. > > It has leader and replicas. > > Leader acts like master and replicas acts like slave. Is it right? > > so, I'm wondering two things. > > > > First, > > How can I assign dedicated server for indexing in SolrCloud? > > > > Second, > > Consider I'm using two shard cluster with shard replicas > > < > > > http://wiki.apache.org/solr/SolrCloud#Example_B:_Simple_two_shard_cluster_with_shard_replicas > > > > > and I need to extend one more shard with replicas. > > In this case, existing two shards and replicas will already have many > docs. > > so, I want to add indexing docs in new one only. > > How can I do this? > > > > Actually, I don't understand perfectly about SolrCloud. > > So, my questions can be ridiculous. > > Any inputs are welcome. > > Thanks, > > > > > > > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/how-to-assign-dedicated-server-for-indexing-and-add-more-shard-in-SolrCloud-tp4024404.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>