Hi Shawn,

Thanks for the info. We will most likely be doing sharding when we migrate
to Solr 7.1.0, and re-index the data.

But as Solr 7.1.0 is still not ready to index EML files yet due to this
JIRA, https://issues.apache.org/jira/browse/SOLR-11622, we have to make use
with our current Solr 6.5.1 first, which was already created without
sharding from the start.

Regards,
Edwin

On 23 November 2017 at 12:50, Shawn Heisey <apa...@elyograg.org> wrote:

> On 11/22/2017 6:19 PM, Zheng Lin Edwin Yeo wrote:
>
>> I'm doing the merging on the SSD drive, the speed should be ok?
>>
>
> The speed of virtually all modern disks will have almost no influence on
> the speed of the merge.  The bottleneck isn't disk transfer speed, it's the
> operation of the merge code in Lucene.
>
> As I said earlier in this thread, a merge is **NOT** just a copy. Lucene
> must completely rebuild the data structures of the index to incorporate all
> of the segments of the source indexes into a single segment in the target
> index, while simultaneously *excluding* information from documents that
> have been deleted.
>
> The best speed I have ever personally seen for a merge is 30 megabytes per
> second.  This is far below the sustained transfer rate of a typical modern
> SATA disk.  SSD is capable of far faster data transfer ...but it will NOT
> make merges go any faster.
>
> We need to merge because the data are indexed in two different collections,
>> and we need them to be under the same collection, so that we can do things
>> like faceting more accurately.
>> Will sharding alone achieve this? Or do we have to merge first before we
>> do
>> the sharding?
>>
>
> If you want the final index to be sharded, it's typically best to index
> from scratch into a new empty collection that has the number of shards you
> want.  The merging tool you're using isn't aware of concepts like shards.
> It combines everything into a single index.
>
> It's not entirely clear what you're asking with the question about
> sharding alone.  Making a guess:  I have never heard of facet accuracy
> being affected by whether or not the index is sharded.  If that *is*
> possible, then I would expect an index that is NOT sharded to have better
> accuracy.
>
> Thanks,
> Shawn
>
>

Reply via email to