Re: Managing Collections when you are scaling up and down the number of Nodes, including Scale to 1.

Gus Heck Thu, 12 Mar 2026 06:43:33 -0700

<rant offtopic="slightly">

Inconsistent terminology in search has irritated me for some time.
Terminology ought to be standardized (across search software, including
elastic, OS etc).

If I were to choose names and define them (ignoring how docs and blogs use
them otherwise) I would go with:

*Node*: The instance of a running process that services search requests
*Server*: The hardware or VM containing the node software (possibly more
than one node on large servers)
*Collection*: The set of indexed, retreivable documents sharing a schema,
this also gets called index especially when talking about standalone or in
elastic/opensearch. Collection is better because it is clearly distinct
from the thing lucene writes to disk. (or if you're lucky it's called a
"logical index" which somewhat clarifies that one is not talking about the
physical bits on disk, but something at a higher level of abstraction),
*Corpus*: The set of documents irrespective of whether or not they are
(yet) indexed.
*Shard*: The logical slice of a collection, completely detached from any
physical manifestation. Unfortunately, it often seems to be used to mean
the first/leader replica) In code this is sometimes also called a Slice
*Replica*: A physical manifestation of a shard. When there is only one copy
of a shard we *should* be saying it has one replica. (not zero replicas,
because this obfuscates the interchangeability of replicas)
*Index*: The disk representation of the contents of a replica, written by
and interacted with via Lucene.

The word core sometimes seems to mean index, sometimes means replica (the
things held by "CoreContainer" for example) and occasionally seems to get
used for the entire collection when there is only one shard in a small
standalone instance. Core is the worst of our terminology because in the
English language it implies centrality and unity, but in our
implementation, it actually refers to the leaf node of which we have
many...  If I could, I would delete the word core from all docs and code.
Back in 2010 when I first encountered solr, the biggest stumbling block I
had in understanding Solr was disassociating the English meaning of "core"
from the way solr uses it.

The definition in our glossary for core is self contradictory which caused
me no end of confusion when I was new (and it's still there).. it seems to
say that "core" is equivalent to "collection" above, but this doesn't match
anything in code or the rest of the documentation AFICT

Unfortunately people hate it when you change their words for things, so we
are unlikely to achieve clarity because many folks will likely oppose
change.

</rant>

On Thu, Mar 12, 2026 at 8:28 AM David Eric Pugh via dev <[email protected]>
wrote:

>  David --> I oddly struggled to write the email cause I was stumbling over
> my words...   I wasn't sure if I should have said "shard", and how to
> phrase a collection made up of a single core....   And then copies of that
> core...    After I read your response, I thought, I should check out the
> Solr Concepts page and see what it tells me and discovered that we have
> nothing under Solr Concepts specific to this topic.  I looked up in the
> glossary "SolrCloud" and it led me to
> https://solr.apache.org/guide/solr/latest/deployment-guide/cluster-types.html#solrcloud-mode.
> Only later when I scrolled up in the page did I see the section "Cluster
> Concepts".   Thoughts on maybe moving that section under the Solr Concepts
> hierarchy in Ref Guide?  It would have helped me use the right terms!
>
>
>
>
>     On Wednesday, March 11, 2026 at 06:38:17 PM EDT, David Smiley <
> [email protected]> wrote:
>
>  An aside:  Remember that the leader is a replica too, so your numbers are
> off & confusing.  "single shard with no replicas" -- I guessed what you
> mean but surprised to hear a misuse of SolrCloud lingo from you.  AFAICT,
> leadership isn't even pertinent to your inquiry either.
>
> On Wed, Mar 11, 2026 at 5:38 PM David Eric Pugh via dev <
> [email protected]>
> wrote:
>
> > Hey all, I wanted to get some feedback from you'all on a recent usecase I
> > was asked about.  I suspect the answer will be "Use Solr Operator", but
> > here goes!
> > I have an environment where I have 5 or so single shard collections.
> >  Much of the time I run just a single node and each collection is a
> single
> > shard with no replicas.  Sometimes, to support load, I'll add another
> node
> > or two.  Then I'll add replicas so cover the new nodes, 1 per node.  So
> > with three nodes, I have one leader and two replicas.  Add two more
> nodes,
> > move to one leader and four replicas.
> > However, when I remove a node by shutting it down, then Zookeeper never
> > get's notified about this, and so the replica is listed as down, and the
> > node is listed as down in red in the UI.  When it isn't really red, it's
> > just we don't need it for now, and it's not coming back.
> > I'd like to just declare "For this collection, I want one replica per
> node
> > based on however many nodes are current".  I don't want to call the
> > various commands myself to add replicas and or remove then as nodes are
> > added or removed.  And I don't want to call various apis or other complex
> > things when I add or remove a node, I just want bin/solr stop and
> bin/solr
> > start to be run ;-).
> > I think this is what Replica Placement Plugins were for maybe?  Could I
> > have a Replica Placement strategy that when ZK sees a new node added,
> then
> > creates a new replica on it, and vice versa, when a node goes away, it
> just
> > removes that replica instead of treating it as "down"?
> > Thoughts?
> > Eric
> >
>

-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)

Re: Managing Collections when you are scaling up and down the number of Nodes, including Scale to 1.

Reply via email to