On 11/22/2017 6:19 PM, Zheng Lin Edwin Yeo wrote:
I'm doing the merging on the SSD drive, the speed should be ok?

The speed of virtually all modern disks will have almost no influence on the speed of the merge.  The bottleneck isn't disk transfer speed, it's the operation of the merge code in Lucene.

As I said earlier in this thread, a merge is **NOT** just a copy. Lucene must completely rebuild the data structures of the index to incorporate all of the segments of the source indexes into a single segment in the target index, while simultaneously *excluding* information from documents that have been deleted.

The best speed I have ever personally seen for a merge is 30 megabytes per second.  This is far below the sustained transfer rate of a typical modern SATA disk.  SSD is capable of far faster data transfer ...but it will NOT make merges go any faster.

We need to merge because the data are indexed in two different collections,
and we need them to be under the same collection, so that we can do things
like faceting more accurately.
Will sharding alone achieve this? Or do we have to merge first before we do
the sharding?

If you want the final index to be sharded, it's typically best to index from scratch into a new empty collection that has the number of shards you want.  The merging tool you're using isn't aware of concepts like shards.  It combines everything into a single index.

It's not entirely clear what you're asking with the question about sharding alone.  Making a guess:  I have never heard of facet accuracy being affected by whether or not the index is sharded.  If that *is* possible, then I would expect an index that is NOT sharded to have better accuracy.

Thanks,
Shawn

Reply via email to