On 10/27/2015 6:16 PM, Brian Scholl wrote:
> - a shard replica is larger than 50% of the available disk

This detail indicates a potential problem even without any of the other
details.  The bottom line here is that if you don't have enough disk
space to hold your index three times, you can have problems even without
index replication.

Because Solr cores are Lucene indexes, general rules for Lucene apply.
During normal operation, when indexing data, Lucene will perform segment
merges.  When Lucene merges segments, it requires enough disk space to
hold the segments it is merging as well as a new copy of that data.  It
is possible for a merge to cover the entire index, which is what happens
by definition when you optimize an index.  This means that at a minimum,
you must have enough disk space to hold your entire index twice.

There are some worst-case merging scenarios, mostly when doing a full
optimize, but potentially happening during a regular merge, where a
large merge will require 3 times the original size of the segments being
merged, so the recommendation is to have enough disk space for three
copies of all your index data.

Index replication is a higher-level Solr function, but the overall idea
is similar to merging -- make a new copy of the data by copying it from
the original, switch it with the current copy once the new one is in
place, then delete the old copy.  The index is always available
throughout the procedure.  When compared to replication, merging does a
little more -- it removes deleted documents and reduces segment count.

If your system meets the general disk space criteria for Lucene merges,
then index replication for SolrCloud will also have enough room.

Large-scale Solr is not an inexpensive undertaking.  Fortunately, in the
overall picture, disk space is cheap, and individual hard disks are
available up to 8TB.

Thanks,
Shawn

Reply via email to