[jira] [Commented] (SOLR-14778) Disabling UpdateLog leads to silently lost updates

David Smiley (Jira) Tue, 01 Sep 2020 12:45:50 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188780#comment-17188780
 ]


David Smiley commented on SOLR-14778:
-------------------------------------

Yes; I'm imagining one "indexing" replica per shard.  Maybe something could 
work with more than one but I think that would maybe complicate things too much 
right now?  To make this conversation more concrete, imagine using a shared 
storage system like say HDFS that handles the redundancy of data, and so we 
don't want Solr to redundantly store indexes beyond local cached SSDs for 
search.  Imagine any/all Solr nodes could be blown away for that matter 
(without hurting durability; would hurt availability).  In such an environment, 
the updateLog might provide no durability value (since any/all machines can be 
blown away and the updateLog might not be worth putting on the durable storage 
tier; just index/flush always), and all indexing batches need to fully flush 
down to the storage system so that we know it's durable.  I recognize such an 
environment has trade-offs and isn't for everyone.  In particular, all indexing 
is rather expensive (sync to storage tier, always flush), and should an 
indexing node fail, it might take 10s of seconds perhaps for either a PULL 
replica to be promoted + sync or if there is no replica then to "hydrate" a new 
one from the storage tier.  In saying all this, I'm not trying to dramatically 
increase the scope of this JIRA (to do all the things I speak of above), only 
to provide some context into a world, perhaps not far away, where the updateLog 
doesn't have a role to play.  I like to think even without that fancy stuff, 
the SolrCloud we have today (with changes this JIRA issue wishes to address), 
would be better off.

bq. the leader that needs to start rejecting updates because one or more 
replicas are recovering

I'm hoping we don't do that.  If the remote replica is a PULL replica, it's a 
best-effort visibility on whatever the latest state is at the time it started 
synchronizing.


> Disabling UpdateLog leads to silently lost updates
> --------------------------------------------------
>
>                 Key: SOLR-14778
>                 URL: https://issues.apache.org/jira/browse/SOLR-14778
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud, update
>    Affects Versions: 8.6.1
>            Reporter: Megan Carey
>            Priority: Minor
>
> Solr currently "supports" disabling the UpdateLog, though it is "required" 
> for NRT replicas (per the 
> [docs|https://lucene.apache.org/solr/guide/8_6/updatehandlers-in-solrconfig.html#transaction-log]).
>  However, when the update log is disabled and a replica is in BUFFERING state 
> (e.g. during MigrateCmd or SplitShardCmd), updates are *lost silently*. While 
> most users will likely never consider disabling the updateLog, it seems 
> pertinent to provide a better support option.
> Options as discussed in [ASF 
> Slack|https://the-asf.slack.com/archives/CEKUCUNE9/p1598373062262300]:
>  # No longer support disabling the updateLog as it is considered an integral 
> feature in SolrCloud. This might be undesirable for use cases where some data 
> loss is acceptable and the updateLog takes up too much space.
>  # Improve Solr documentation to explicitly outline the risks of disabling 
> the updateLog.
>  # Add logging to indicate when an update is swallowed in this state.
>  # _My preferred option:_ Support disabling the updateLog by providing 
> additional replica states besides BUFFERING, so that there is no data loss 
> when updateLog is disabled and replica goes offline for an operation like 
> split. Some ideas:
>  ## REJECTING: Fail updates so that the client can retry again once the 
> operation is complete.
>  ## BLOCKING: Stall update until operation is complete, and then execute 
> update.
> Feedback is welcome; once we establish a path forward I'd be happy to pick it 
> up. If others are interested I can document my findings as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14778) Disabling UpdateLog leads to silently lost updates

Reply via email to