Re: Terminology question: Core vs. Collection vs...

Mark Miller Thu, 03 Jan 2013 07:33:50 -0800

This has pretty much become the standard across other distributed systems and 
in the literat…err…books.


I first implemented it as you mention you'd like, but Yonik correctly pointed 
out that we were going against the grain.

- Mark

On Jan 3, 2013, at 10:01 AM, Per Steffensen <st...@designware.dk> wrote:

> For the same reasons that "Replica" shouldnt be called "Replica" (it requires 
> to long an explanation to agree that it is an ok name), "replicationFactor" 
> shouldnt be called "replicationFactor" and long as it referes to the TOTAL 
> number of cores you get for your "Shard". "replicationFactor" would be an ok 
> name if replicationFactor=0 meant one core, replicationFactor=1 meant two 
> cores etc., but as long as replicationFactor=1 means one core, 
> replicationFactor=2 means two cores, it is bad naming (you will not get any 
> replication with replicationFactor=1 - WTF!?!?). If we want to insist that 
> you specify the total number of cores at least use "replicaPerShard" instead 
> of "replicationFactor", or even better rename "Replica" to "Shard-instance" 
> and use "instancesPerShard" instead of "replicationFactor".
> 
> Regards, Per Steffensen
> 
> On 1/3/13 3:52 PM, Per Steffensen wrote:
>> Hi
>> 
>> Here is my version - do not believe the explanations have been very clear
>> 
>> We have the following concepts (here I will try to explain what each the 
>> concept cover without naming it - its hard)
>> 1) Machines (virtual or physical) running Solr server JVMs (one machine can 
>> run several Solr server JVMs if you like)
>> 2) Solr server JVMs
>> 3) Logical "stores" where you can add/update/delete data-instances (closest 
>> to "logical" tables in RDBMS)
>> 4) Logical "slices" of a store (closest to non-overlapping "logical" sets of 
>> rows for the "logical" table in a RDBMS)
>> 5) Physical instances of "slices" (a physical (disk/memory) instance of the 
>> a "logical" slice). This is where data actually goes on disk - the logical 
>> "stores" and "slices" above are just non-physical concepts
>> 
>> Terminology
>> 1) Believe we have no name for this (except of course machine :-) ), even 
>> though Jack claims that this is called a "node". Maybe sometimes it is 
>> called a "node", but I believe "node" is more often used to refer to a "Solr 
>> server JVM".
>> 2) "Node"
>> 3) "Collection"
>> 4) "Shard". Used to be called "Slice" but I believe now it is officially 
>> called "Shard". I agree with that change, because I believe most of the 
>> industry also uses the term "Shard" for this logical/non-physical concept  - 
>> just needs to be reflected it across documentation and code
>> 5) "Replica". Used to be called "Shard" but I believe now it is officially 
>> called "Replica". I certainly do not agree with the name "Replica", because 
>> it suggests that it is a copy of an "original", but it isnt. I would prefer 
>> "Shard-instance" here, to avoid the confusion. I understand that you can 
>> argue (if you argue long enough) that "Replica" is a fine name, but you 
>> really need the explanation to understand why "Replica" can be defended as 
>> the name for this. Is is not immediately obvious what this is as long as it 
>> is called "Replica". A "Replica" is basically a Solr Cloud managed Core and 
>> behind every Replica/Core lives a physical Lucene index. So Replica=Core) 
>> contains/maintains Lucene index behind the scenes. The term "Replica" also 
>> needs to be reflected across documentation and code.
>> 
>> Regards, Per Steffensen
>

Re: Terminology question: Core vs. Collection vs...

Reply via email to