[ 
https://issues.apache.org/jira/browse/SOLR-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-13813:
--------------------------------
    Attachment: SOLR-13813.patch
        Status: Open  (was: Open)

Attaching updated patch with fixed test.  This updated test succeeds with NRT 
replicas, but often fails with SHARED replicas.  It simply kills the leader 
while the split is taking place and leaves it down.

NOTES:
 - sometimes there is a failure because of bad replica placement (both replicas 
of a sub-shard being placed on the same node that is being split, leading to 
both replicas being down at the end of the test.  If this is the case, you get 
a "No live SolrServers available to handle this request" error message.
- interestingly when there are missing documents (search for "MISSING" in 
output), if one looks at the final cluster state, the original shard is 
inactive and the new shards are active!  Although the test doesn't enforce that 
the split needs to still be in progress when the leader is brought down, it's 
unlikely that the split could always be finishing this fast.  It's more likely 
that the split should have failed.


> Shared storage online split support
> -----------------------------------
>
>                 Key: SOLR-13813
>                 URL: https://issues.apache.org/jira/browse/SOLR-13813
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Yonik Seeley
>            Priority: Major
>         Attachments: SOLR-13813.patch, SOLR-13813.patch
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> The strategy for online shard splitting is the same as that for normal (non 
> SHARED shards.)
> During a split, the leader will forward updates to sub-shard leaders, those 
> updates will be buffered by the transaction log while the split is in 
> progress, and then the buffered updates are replayed.
> One change that was added was to push the local index to blob store after 
> buffered updates are applied (but before it is marked as ACTIVE):
> See 
> https://github.com/apache/lucene-solr/commit/fe17c813f5fe6773c0527f639b9e5c598b98c7d4#diff-081b7c2242d674bb175b41b6afc21663
> This issue is about adding tests and ensuring that online shard splitting 
> (while updates are flowing) works reliably.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to