On 11/20/2017 9:35 AM, Zheng Lin Edwin Yeo wrote:
Does anyone knows how long usually the merging in Solr will take?
I am currently merging about 3.5TB of data, and it has been running for
more than 28 hours and it is not completed yet. The merging is running on
SSD disk.
The following will apply if you mean Solr's "optimize" feature when you
say "merging".
In my experience, merging proceeds at about 20 to 30 megabytes per
second -- even if the disks are capable of far faster data transfer.
Merging is not just copying the data. Lucene is completely rebuilding
very large data structures, and *not* including data from deleted
documents as it does so. It takes a lot of CPU power and time.
If we average the data rates I've seen to 25, then that would indicate
that an optimize on a 3.5TB is going to take about 39 hours, and might
take as long as 48 hours. And if you're running SolrCloud with multiple
replicas, multiply that by the number of copies of the 3.5TB index. An
optimize on a SolrCloud collection handles one shard replica at a time
and works its way through the entire collection.
If you are merging different indexes *together*, which a later message
seems to state, then the actual Lucene operation is probably nearly
identical, but I'm not really familiar with it, so I cannot say for sure.
Thanks,
Shawn