On 4/2/2017 8:16 AM, Putul S wrote: > I am migrating Solr 4 index to Solr 5. The upgrade tool/script works well. > But ran out disk space upgrading 4 GB index. The server had at least 8 GB > free then. On production, the index is about 200 GB. > > How much disk space is needed for indexing? Also, how long does it take to > upgrade large index? It took about a minute to upgrade less than half GB > index.
You've asked questions that have no generic answer. Answering them requires a lot of very specific information about your index and the data it contains, and even if that information is provided, the answers will only be guesses. The only way to find out for sure is to try it. https://lucidworks.com/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Nobody can tell you how much disk space is needed for indexing. That will depend on how your schema is configured and how much data you index. Small changes can increase or decrease the disk space required. Upgrading an index runs an operation that Lucene calls "forceMerge" on the index. Solr calls this procedure "optimize". Exactly how fast the optimize proceeds will depend on the precise contents of the index, which will depend on the schema and exactly what data has been indexed. I have some 50GB indexes that take about two hours to optimize (on systems with very fast disks), which means that it would take about two hours to upgrade. Somebody else who has a 50GB index might take a very different amount of time to optimize, because the contents of their index are likely to be different than the contents of mine, and their hardware probably has different capabilities. An upgrade or an optimize should only require enough disk space to store the full index again. It may double in size, then shrink back down to about the same size, unless there are deleted documents, in which case the new index will be smaller than the original. General recommendations for Lucene and Solr are to have FREE disk space equivalent to *double* the size of all your index data. This is because in certain situations when reindexing the bulk of your data and optimizing the index, it can triple in size temporarily. In most situations, the increase will only be double, but the recommendation is that you have the disk space to handle triple. Thanks, Shawn