On 11/20/2017 9:35 AM, Zheng Lin Edwin Yeo wrote:
Does anyone knows how long usually the merging in Solr will take?

I am currently merging about 3.5TB of data, and it has been running for
more than 28 hours and it is not completed yet. The merging is running on
SSD disk.

The following will apply if you mean Solr's "optimize" feature when you say "merging".

In my experience, merging proceeds at about 20 to 30 megabytes per second -- even if the disks are capable of far faster data transfer.  Merging is not just copying the data. Lucene is completely rebuilding very large data structures, and *not* including data from deleted documents as it does so.  It takes a lot of CPU power and time.

If we average the data rates I've seen to 25, then that would indicate that an optimize on a 3.5TB is going to take about 39 hours, and might take as long as 48 hours.  And if you're running SolrCloud with multiple replicas, multiply that by the number of copies of the 3.5TB index.  An optimize on a SolrCloud collection handles one shard replica at a time and works its way through the entire collection.

If you are merging different indexes *together*, which a later message seems to state, then the actual Lucene operation is probably nearly identical, but I'm not really familiar with it, so I cannot say for sure.

Thanks,
Shawn

Reply via email to