Re: how to assign dedicated server for indexing and add more shard in SolrCloud

Mikhail Khludnev Thu, 06 Dec 2012 12:36:50 -0800

Jason,
Thanks for raising it!

Erick,
That's what I want to discuss for a long time. Frankly speaking, the
question is:


if old-school (master/slave) search deployments doesn't comply to vision by
SolrCloud/ElasticSearch, does it mean that they are wrong?

Let me enumerate kinds of 'old-school search':
- number of docs is not so dramatic to make sharding profitable from search
latency's POV;
- index updates are not frequent, they are rather rare nightly bulks;
- search index is not a SOR (system of records) - it's a secondary system,
provides the search service, still significant for the enterprise;
- there is an SOR - primary system, which is kind of CMS or RDBMS or CMS
with publish through RDBMS, etc;

Does it look like your system? No, - click Delete button!

// for few people who still read this:

That's what I have with Solr Cloud in this case:
- I can decide don't deal with sharding. Good! put numShards=0, and buy
more (VM) instances to have more replicas to increase throughput;
- start nightly reindex - delQ *:* , add(....), commit()
- in this case all my instances will spend resources to indexing same docs,
instead of handling search requests - BAD#1;
- even I'm able to supply long Iterable<SolrInputDocument>,
DistribudedUpdateProcessor will throw documents one by one, not by huge
chunks, that leads to many small segments - eg. if I have 100Mb RAM buffer,
and 10 servlet container threads I'll have sequence of 10Mb segments;
- every of these flushes also flushes some part of current index mapped to
the RAM that impacts search latency BAD#2;
- when indexing is over I have a many small segments, and then The Merge
starts, which also flushes current index from RAM BAD#3.

In summary: I waste resources for indexing same stuff on searcher nodes, as
a side effect I have longer period of latency impact.

How I want to do it:
 - in the cloud I add small instances as replicas on demand to adjust for
work load dynamically;
 - when I need to reindex (full import) I can rent super cool VM instance
with manyway-CPU, run indexing on it;
 - if it blows off, no problem I can run full import from my CMS/DB again
from the beginning - or i can run two imports simultaneously;
 - after indexing  finished, I can push index to searchers or start new
ones mounting index to them.

Please tell me where I'm wrong, whether it SolrCloud features, 'cloud'
economy, hard/VMware architecture or Lucene internals. Can Jason and myself
adjust SolrCloud for our 'old-school' pattern?

Thanks for sharing your opinion!



On Thu, Dec 6, 2012 at 7:19 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> First, forget about master/slave with SolrCloud! Leaders really exist to
> resolve conflicts, the old notion of M/S replication is largely irrelevant.
>
> Updates can go to any node in the cluster, leader, replica, whatever. The
> node forwards the doc to the correct leader based on a hash of the
> <uniqueKey>, which then forwards the raw document to all replicas. Then all
> the replicas index the document separately. Note that this is true on
> mutli-document packets too. You can't get NRT with the old-style
> replication process where the master indexes the doc and then the _index_
> is replicated...
>
> As for your second question, it sounds like you want to go from
> numShards=2, say to numShards=3. You can't do that as it stands. There are
> two approaches:
> 1> "shard splitting" which would redistribute the documents to a new set of
> shards
> 2> pluggable hashing which allows you to specify the code that does the
> shard assignment.
> Neither of these are available yet, although <2> is imminent. There is
> active work on <1>, but I don't think that will be ready as soon.
>
> Best
> Erick
>
>
> On Tue, Dec 4, 2012 at 11:21 PM, Jason <hialo...@gmail.com> wrote:
>
> > I'm using master and slave server for scaling.
> > Master is dedicated for indexing and slave is for searching.
> > Now, I'm planning to move SolrCloud.
> > It has leader and replicas.
> > Leader acts like master and replicas acts like slave. Is it right?
> > so, I'm wondering two things.
> >
> > First,
> > How can I assign dedicated server for indexing in SolrCloud?
> >
> > Second,
> > Consider I'm using  two shard cluster with shard replicas
> > <
> >
> http://wiki.apache.org/solr/SolrCloud#Example_B:_Simple_two_shard_cluster_with_shard_replicas
> > >
> > and I need to extend one more shard with replicas.
> > In this case, existing two shards and replicas will already have many
> docs.
> > so, I want to add indexing docs in new one only.
> > How can I do this?
> >
> > Actually, I don't understand perfectly about SolrCloud.
> > So, my questions can be ridiculous.
> > Any inputs are welcome.
> > Thanks,
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/how-to-assign-dedicated-server-for-indexing-and-add-more-shard-in-SolrCloud-tp4024404.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Re: how to assign dedicated server for indexing and add more shard in SolrCloud

Reply via email to