Hello everyone, I am working with a Solr collection that is several terabytes in size over several hundred millions of documents. Each document is very rich, and over the past few years we have consistently quadrupled the size our collection annually. Unfortunately, this sits on a single node with only a few hundred megabytes of memory - so our performance is less than ideal.
I am looking into implementing a SolrCloud cluster. From reading a few books (i.e. Solr in Action), various internet blogs, and the reference guide, it states to build a cluster with room to grow. I can probably provision enough hardware for a year from today worth of growth, however I would like to have a plan beyond that. Shard splitting seems pretty straight forward. We are in a continuous adding documents and never change existing ones. Based on that, one individual recommended for me to implement custom hashing and route the latest documents to the shard with the least documents, and when that shard fills up add a new shard and index on the new shard, rinse and repeat. The last one makes sense. However, my concern with the last one is I lose the distributed indexing, implementation concerns and the maintainability. My question for the community is what are your thoughts are on this, and do you have any suggestion and/or recommendations on planning for future growth? Look forward to your responses, Patrick