[ 
https://issues.apache.org/jira/browse/SOLR-15109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned SOLR-15109:
-----------------------------------

    Component/s: SolrCloud
       Assignee: David Smiley
        Summary: Optimize shard splitByPrefix logic to reduce number of splits 
required  (was: Optimize splitByPrefix logic to reduce number of splits 
required)

> Optimize shard splitByPrefix logic to reduce number of splits required
> ----------------------------------------------------------------------
>
>                 Key: SOLR-15109
>                 URL: https://issues.apache.org/jira/browse/SOLR-15109
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: Megan Carey
>            Assignee: David Smiley
>            Priority: Major
>         Attachments: Split 1 (1).png, Split 2 (1).png, Split 3 (1).png
>
>
> The goal of SplitByPrefix logic is to identify "buckets" within a shard that 
> contain documents that should be co-located (according to their doc prefix), 
> and split such that those buckets are preserved. One issue that we have found 
> with splitByPrefix in practice is that it often takes several splits to 
> isolate a particularly large bucket within the hash range. 
> [~dsmiley] came up with a simple optimization that will reduce the number of 
> splits needed to isolate such a bucket: 
> {quote}Loop over all RangeCounts... does it intersect the middle third of the 
> input?  If not, move on.  If so, track the biggest.  When this loop finishes, 
> you will have the biggest that also intersects the middle third.  Then simply 
> choose the side of this biggest RangeCount that is closest to the middle of 
> the input range.{quote}
> This should be clearer with the following diagrams:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to