On 4/2/2017 8:16 AM, Putul S wrote:
> I am migrating Solr 4 index to Solr  5. The upgrade tool/script works well. 
> But ran out disk space upgrading 4 GB index. The server had at least 8 GB 
> free then. On production, the index is about 200 GB.
>
> How much disk space is needed for indexing? Also, how long does it take to 
> upgrade large index? It took about a minute to upgrade less than half GB 
> index.

You've asked questions that have no generic answer.  Answering them
requires a lot of very specific information about your index and the
data it contains, and even if that information is provided, the answers
will only be guesses.  The only way to find out for sure is to try it.

https://lucidworks.com/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Nobody can tell you how much disk space is needed for indexing.  That
will depend on how your schema is configured and how much data you
index.  Small changes can increase or decrease the disk space required.

Upgrading an index runs an operation that Lucene calls "forceMerge" on
the index.  Solr calls this procedure "optimize".  Exactly how fast the
optimize proceeds will depend on the precise contents of the index,
which will depend on the schema and exactly what data has been indexed.

I have some 50GB indexes that take about two hours to optimize (on
systems with very fast disks), which means that it would take about two
hours to upgrade.  Somebody else who has a 50GB index might take a very
different amount of time to optimize, because the contents of their
index are likely to be different than the contents of mine, and their
hardware probably has different capabilities.

An upgrade or an optimize should only require enough disk space to store
the full index again.  It may double in size, then shrink back down to
about the same size, unless there are deleted documents, in which case
the new index will be smaller than the original.

General recommendations for Lucene and Solr are to have FREE disk space
equivalent to *double* the size of all your index data.  This is because
in certain situations when reindexing the bulk of your data and
optimizing the index, it can triple in size temporarily.  In most
situations, the increase will only be double, but the recommendation is
that you have the disk space to handle triple.

Thanks,
Shawn

Reply via email to