Re: Terminology question: Core vs. Collection vs...

Walter Underwood Thu, 03 Jan 2013 08:58:44 -0800

A "factor" is multiplied, so multiplying the leader by a replicationFactor of 1 
means you have exactly one copy of that shard.


I think that recycling the term "replication" within Solr was confusing, but it 
is a bit late to change that. 

wunder

On Jan 3, 2013, at 7:33 AM, Mark Miller wrote:

> This has pretty much become the standard across other distributed systems and 
> in the literat…err…books.
> 
> I first implemented it as you mention you'd like, but Yonik correctly pointed 
> out that we were going against the grain.
> 
> - Mark
> 
> On Jan 3, 2013, at 10:01 AM, Per Steffensen <[email protected]> wrote:
> 
>> For the same reasons that "Replica" shouldnt be called "Replica" (it 
>> requires to long an explanation to agree that it is an ok name), 
>> "replicationFactor" shouldnt be called "replicationFactor" and long as it 
>> referes to the TOTAL number of cores you get for your "Shard". 
>> "replicationFactor" would be an ok name if replicationFactor=0 meant one 
>> core, replicationFactor=1 meant two cores etc., but as long as 
>> replicationFactor=1 means one core, replicationFactor=2 means two cores, it 
>> is bad naming (you will not get any replication with replicationFactor=1 - 
>> WTF!?!?). If we want to insist that you specify the total number of cores at 
>> least use "replicaPerShard" instead of "replicationFactor", or even better 
>> rename "Replica" to "Shard-instance" and use "instancesPerShard" instead of 
>> "replicationFactor".
>> 
>> Regards, Per Steffensen
>> 
>> On 1/3/13 3:52 PM, Per Steffensen wrote:
>>> Hi
>>> 
>>> Here is my version - do not believe the explanations have been very clear
>>> 
>>> We have the following concepts (here I will try to explain what each the 
>>> concept cover without naming it - its hard)
>>> 1) Machines (virtual or physical) running Solr server JVMs (one machine can 
>>> run several Solr server JVMs if you like)
>>> 2) Solr server JVMs
>>> 3) Logical "stores" where you can add/update/delete data-instances (closest 
>>> to "logical" tables in RDBMS)
>>> 4) Logical "slices" of a store (closest to non-overlapping "logical" sets 
>>> of rows for the "logical" table in a RDBMS)
>>> 5) Physical instances of "slices" (a physical (disk/memory) instance of the 
>>> a "logical" slice). This is where data actually goes on disk - the logical 
>>> "stores" and "slices" above are just non-physical concepts
>>> 
>>> Terminology
>>> 1) Believe we have no name for this (except of course machine :-) ), even 
>>> though Jack claims that this is called a "node". Maybe sometimes it is 
>>> called a "node", but I believe "node" is more often used to refer to a 
>>> "Solr server JVM".
>>> 2) "Node"
>>> 3) "Collection"
>>> 4) "Shard". Used to be called "Slice" but I believe now it is officially 
>>> called "Shard". I agree with that change, because I believe most of the 
>>> industry also uses the term "Shard" for this logical/non-physical concept  
>>> - just needs to be reflected it across documentation and code
>>> 5) "Replica". Used to be called "Shard" but I believe now it is officially 
>>> called "Replica". I certainly do not agree with the name "Replica", because 
>>> it suggests that it is a copy of an "original", but it isnt. I would prefer 
>>> "Shard-instance" here, to avoid the confusion. I understand that you can 
>>> argue (if you argue long enough) that "Replica" is a fine name, but you 
>>> really need the explanation to understand why "Replica" can be defended as 
>>> the name for this. Is is not immediately obvious what this is as long as it 
>>> is called "Replica". A "Replica" is basically a Solr Cloud managed Core and 
>>> behind every Replica/Core lives a physical Lucene index. So Replica=Core) 
>>> contains/maintains Lucene index behind the scenes. The term "Replica" also 
>>> needs to be reflected across documentation and code.
>>> 
>>> Regards, Per Steffensen
>> 
> 

--
Walter Underwood
[email protected]

Re: Terminology question: Core vs. Collection vs...

Reply via email to