Hello everyone,

I am working with a Solr collection that is several terabytes in size over
 several hundred millions of documents.  Each document is very rich, and
over the past few years we have consistently quadrupled the size our
collection annually.  Unfortunately, this sits on a single node with only a
few hundred megabytes of memory - so our performance is less than ideal.

I am looking into implementing a SolrCloud cluster.  From reading a few
books (i.e. Solr in Action), various internet blogs, and the reference
guide, it states to build a cluster with room to grow.  I can probably
provision enough hardware for a year from today worth of growth, however I
would like to have a plan beyond that.  Shard splitting seems pretty
straight forward.  We are in a continuous adding documents and never change
existing ones.  Based on that, one individual recommended for me to
implement custom hashing and route the latest documents to the shard with
the least documents, and when that shard fills up add a new shard and index
on the new shard, rinse and repeat.

The last one makes sense.  However, my concern with the last one is I lose
the distributed indexing, implementation concerns and the maintainability.
My question for the community is what are your thoughts are on this, and do
you have any suggestion and/or recommendations on planning for future
growth?

Look forward to your responses,
Patrick

Reply via email to