Re: basic solr cloud questions

Darren Govoni Thu, 29 Sep 2011 08:12:15 -0700

Agree. Thanks also for clarifying. It helps.

On 09/29/2011 08:50 AM, Yury Kats wrote:

On 9/29/2011 7:22 AM, Darren Govoni wrote:

That was kinda my point. The "new" cloud implementation
is not about replication, nor should it be. But rather about
horizontal scalability where "nodes" manage different parts
of a unified index.

It;s about many things. You stated one, but there are goals,
one of them being tolerance to node outages. In a cloud, when
one of your many nodes fail, you don't want to stop querying and
indexing. For this to happen, you need to maintain redundant copies
of the same pieces of the index, hence you need to replicate.

One of the design goals of the "new" cloud
implementation is for this to happen more or less automatically.

True, but there is a big gap between goals and current state.
Right now, there is distributed search, but not distributed indexing
or auto-sharding, or auto-replication. So if you want to use the SolrCloud
now (as many of us do), you need do a number of things yourself,
even if they might be done by SolrCloud automatically in the future.

To me that means one does not have to manually distributed
documents or enforce replication as Yurly suggests.
Replication is different to me than what was being asked.
And perhaps I misunderstood the original question.

Yurly's response introduced the term "core" where the original
person was referring to "nodes". For all I know, those are two
different things in the new cloud design terminology (I believe they are).

I guess understanding "cores" vs. "nodes" vs "shards" is helpful. :)

Shard is a slice of index. Index is managed/stored in a core.
Nodes are Solr instances, usually physical machines.

Each node can host multiple shards, and each shard can consist of multiple 
cores.
However, all cores within the same shard must have the same content.

This is where the OP ran into the problem. The OP had 1 shard, consisting of two
cores on two nodes. Since there is no distributed indexing yet, all documents 
were
indexed into a single core. However, there is distributed search, therefore 
queries
were sent randomly to different cores of the same shard. Since one core in the 
shard
had documents and the other didn't, the query result was random.

To solve this problem, the OP must make sure all cores within the same shard 
(be they
on the same node or not) have the same content. This can currently be achieved 
by:
a) setting up replication between cores. you index into one core and the other 
core
replicates the content
b) indexing into both cores

Hope this clarifies.

Re: basic solr cloud questions

Reply via email to