That was kinda my point. The "new" cloud implementation
is not about replication, nor should it be. But rather about
horizontal scalability where "nodes" manage different parts
of a unified index. One of the design goals of the "new" cloud
implementation is for this to happen more or less automatically.

To me that means one does not have to manually distributed
documents or enforce replication as Yurly suggests.
Replication is different to me than what was being asked.
And perhaps I misunderstood the original question.

Yurly's response introduced the term "core" where the original
person was referring to "nodes". For all I know, those are two
different things in the new cloud design terminology (I believe they are).

I guess understanding "cores" vs. "nodes" vs "shards" is helpful. :)

cheers!
Darren


On 09/29/2011 12:00 AM, Pulkit Singhal wrote:
@Darren: I feel that the question itself is misleading. Creating
shards is meant to separate out the data ... not keep the exact same
copy of it.

I think the two node setup that was attempted by Sam mislead him and
us into thinking that configuring two nodes which are to be named
"shard1" ... somehow means that they are instantly replicated too ...
this is not the case! I can see how this misunderstanding can develop
as I too was confused until Yury cleared it up.

@Sam: If you are interested in performing a quick exercise to
understand the pieces involved for replication rather than sharding
... perhaps this link would be of help in taking you through it:
http://pulkitsinghal.blogspot.com/2011/09/setup-solr-master-slave-replication.html

- Pulkit

2011/9/27 Yury Kats<yuryk...@yahoo.com>:
On 9/27/2011 5:16 PM, Darren Govoni wrote:
On 09/27/2011 05:05 PM, Yury Kats wrote:
You need to either submit the docs to both nodes, or have a replication
setup between the two. Otherwise they are not in sync.
I hope that's not the case. :/ My understanding (or hope maybe) is that
the new Solr Cloud implementation will support auto-sharding and
distributed indexing. This means that shards will receive different
documents regardless of which node received the submitted document
(spread evenly based on a hash<->node assignment). Distributed queries
will thus merge all the solr shard/node responses.
All cores in the same shard must somehow have the same index.
Only then can you continue servicing searches when individual cores
fail. Auto-sharding and distributed indexing don't have anything to
do with this.

In the future, SolrCloud may be managing replication between cores
in the same shard automatically. But right now it does not.


Reply via email to