Yes, in the context of SolrCloud, "Node" = "Solr server JVM".
So, "node" is an instance of Solr, which can support multiple cores and
multiple collections - or at least shards of multiple collections.
-- Jack Krupansky
-----Original Message-----
From: Per Steffensen
Sent: Thursday, January 03, 2013 9:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Terminology question: Core vs. Collection vs...
Hi
Here is my version - do not believe the explanations have been very clear
We have the following concepts (here I will try to explain what each the
concept cover without naming it - its hard)
1) Machines (virtual or physical) running Solr server JVMs (one machine
can run several Solr server JVMs if you like)
2) Solr server JVMs
3) Logical "stores" where you can add/update/delete data-instances
(closest to "logical" tables in RDBMS)
4) Logical "slices" of a store (closest to non-overlapping "logical"
sets of rows for the "logical" table in a RDBMS)
5) Physical instances of "slices" (a physical (disk/memory) instance of
the a "logical" slice). This is where data actually goes on disk - the
logical "stores" and "slices" above are just non-physical concepts
Terminology
1) Believe we have no name for this (except of course machine :-) ),
even though Jack claims that this is called a "node". Maybe sometimes it
is called a "node", but I believe "node" is more often used to refer to
a "Solr server JVM".
2) "Node"
3) "Collection"
4) "Shard". Used to be called "Slice" but I believe now it is officially
called "Shard". I agree with that change, because I believe most of the
industry also uses the term "Shard" for this logical/non-physical
concept - just needs to be reflected it across documentation and code
5) "Replica". Used to be called "Shard" but I believe now it is
officially called "Replica". I certainly do not agree with the name
"Replica", because it suggests that it is a copy of an "original", but
it isnt. I would prefer "Shard-instance" here, to avoid the confusion. I
understand that you can argue (if you argue long enough) that "Replica"
is a fine name, but you really need the explanation to understand why
"Replica" can be defended as the name for this. Is is not immediately
obvious what this is as long as it is called "Replica". A "Replica" is
basically a Solr Cloud managed Core and behind every Replica/Core lives
a physical Lucene index. So Replica=Core) contains/maintains Lucene
index behind the scenes. The term "Replica" also needs to be reflected
across documentation and code.
Regards, Per Steffensen
On 1/3/13 10:42 AM, Alexandre Rafalovitch wrote:
Hello,
I am trying to understand the core Solr terminology. I am looking for
correct rather than loose meaning as I am trying to teach an example that
starts from easy scenario and may scale to multi-core, multi-machine
situation.
Here are the terms that seem to be all overlapping and/or crossing over in
my mind a the moment.
1) Index
2) Core
3) Collection
4) Instance
5) Replica (Replica of _what_?)
6) Others?
I tried looking through documentation, but either there is a terminology
drift or I am having trouble understanding the distinctions.
If anybody has a clear picture in their mind, I would appreciate a
clarification.
Regards,
Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)