Hi Shawn, Thanks for the info. We will most likely be doing sharding when we migrate to Solr 7.1.0, and re-index the data.
But as Solr 7.1.0 is still not ready to index EML files yet due to this JIRA, https://issues.apache.org/jira/browse/SOLR-11622, we have to make use with our current Solr 6.5.1 first, which was already created without sharding from the start. Regards, Edwin On 23 November 2017 at 12:50, Shawn Heisey <apa...@elyograg.org> wrote: > On 11/22/2017 6:19 PM, Zheng Lin Edwin Yeo wrote: > >> I'm doing the merging on the SSD drive, the speed should be ok? >> > > The speed of virtually all modern disks will have almost no influence on > the speed of the merge. The bottleneck isn't disk transfer speed, it's the > operation of the merge code in Lucene. > > As I said earlier in this thread, a merge is **NOT** just a copy. Lucene > must completely rebuild the data structures of the index to incorporate all > of the segments of the source indexes into a single segment in the target > index, while simultaneously *excluding* information from documents that > have been deleted. > > The best speed I have ever personally seen for a merge is 30 megabytes per > second. This is far below the sustained transfer rate of a typical modern > SATA disk. SSD is capable of far faster data transfer ...but it will NOT > make merges go any faster. > > We need to merge because the data are indexed in two different collections, >> and we need them to be under the same collection, so that we can do things >> like faceting more accurately. >> Will sharding alone achieve this? Or do we have to merge first before we >> do >> the sharding? >> > > If you want the final index to be sharded, it's typically best to index > from scratch into a new empty collection that has the number of shards you > want. The merging tool you're using isn't aware of concepts like shards. > It combines everything into a single index. > > It's not entirely clear what you're asking with the question about > sharding alone. Making a guess: I have never heard of facet accuracy > being affected by whether or not the index is sharded. If that *is* > possible, then I would expect an index that is NOT sharded to have better > accuracy. > > Thanks, > Shawn > >