[jira] [Updated] (SOLR-13101) Shared storage via a new SHARED replica type

David Smiley (Jira) Thu, 10 Dec 2020 19:20:35 -0800


     [ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Smiley updated SOLR-13101:
--------------------------------
    Description: 
_This issue is closed as Won't-Fix because the particular approach here won't 
be contributed. Linked issues may appear approaching it differently._
----
Solr should have first-class support for shared storage (blob/object stores 
like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, etc).

The key component will likely be a new replica type for shared storage. It 
would have many of the benefits of the current "pull" replicas (not indexing on 
all replicas, all shards identical with no shards getting out-of-sync, etc), 
but would have additional benefits:
 - Any shard could become leader (the blob store always has the index)
 - Better elasticity scaling down
 - durability not linked to number of replcias.. a single replica could be 
common for write workloads
 - could drop to 0 replicas for a shard when not needed (blob store always has 
index)
 - Allow for higher performance write workloads by skipping the transaction log
 - don't pay for what you don't need
 - a commit will be necessary to flush to stable storage (blob store)
 - A lot of the complexity and failure modes go away

An additional component a Directory implementation that will work well with 
blob stores. We probably want one that treats local disk as a cache since the 
latency to remote storage is so large. I think there are still some "locking" 
issues to be solved here (ensuring that more than one writer to the same index 
won't corrupt it). This should probably be pulled out into a different JIRA 
issue.

  was:
Solr should have first-class support for shared storage (blob/object stores 
like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, etc).

The key component will likely be a new replica type for shared storage.  It 
would have many of the benefits of the current "pull" replicas (not indexing on 
all replicas, all shards identical with no shards getting out-of-sync, etc), 
but would have additional benefits:
 - Any shard could become leader (the blob store always has the index)
 - Better elasticity scaling down
   - durability not linked to number of replcias.. a single replica could be 
common for write workloads
   - could drop to 0 replicas for a shard when not needed (blob store always 
has index)
 - Allow for higher performance write workloads by skipping the transaction log
   - don't pay for what you don't need
   - a commit will be necessary to flush to stable storage (blob store)
 - A lot of the complexity and failure modes go away

An additional component a Directory implementation that will work well with 
blob stores.  We probably want one that treats local disk as a cache since the 
latency to remote storage is so large.  I think there are still some "locking" 
issues to be solved here (ensuring that more than one writer to the same index 
won't corrupt it).  This should probably be pulled out into a different JIRA 
issue.


        Summary: Shared storage via a new SHARED replica type  (was: Shared 
storage support in SolrCloud)

> Shared storage via a new SHARED replica type
> --------------------------------------------
>
>                 Key: SOLR-13101
>                 URL: https://issues.apache.org/jira/browse/SOLR-13101
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Yonik Seeley
>            Priority: Major
>          Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> _This issue is closed as Won't-Fix because the particular approach here won't 
> be contributed. Linked issues may appear approaching it differently._
> ----
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage. It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>  - durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>  - could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>  - don't pay for what you don't need
>  - a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores. We probably want one that treats local disk as a cache since the 
> latency to remote storage is so large. I think there are still some "locking" 
> issues to be solved here (ensuring that more than one writer to the same 
> index won't corrupt it). This should probably be pulled out into a different 
> JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-13101) Shared storage via a new SHARED replica type

Reply via email to