[ 
https://issues.apache.org/jira/browse/SOLR-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999475#comment-16999475
 ] 

Andy Vuong commented on SOLR-14044:
-----------------------------------

CollectionDeletion and ShardDeletion add new deletion flows we need to support. 
A few functional requirements for collection deletion:
 * Deletion of all index files belonging to a collection located in the blob 
store
 * Removal of any local in-memory metadata used by the collection such as 
SharedConcurrencyMetadataCache 

By nature of using shared storage (S3, GCS), the first requirement may always 
be “best effort”. Reason being these are eventually consistent systems. In S3, 
list commands are eventually consistent so if we issue collection API delete, 
finding all the files belonging to a collection (we’re always key-ed on 
collection name), then we might not find everything. Fortunately the same isn’t 
true in GCS. Our design calls for adding an “orphaned” file deleter in the 
future. By orphan, we mean any index file not referenced by any core.metadata 
file in the shared store. This isn’t covered in this JIRA but it’s likely where 
we handle these instances of stale reads. 

The second requirement refers to an implementation detail of our shard indexing 
concurrency but is required if we want to support reusing shard/collection 
names. We store in the JVM cache metadata that needs to be evicted. Achieving 
this via distributed clean up might be difficult so we may want to do some kind 
of clean up on creation of replicas that are similarly named. The downside is 
if no such thing happens, then we have objects sitting in memory until the node 
restarts.  

Design-wise, we may want the deletion processes to be flexible to extend beyond 
these functional requirements if down the line we expand shared collections to 
store other objects aside from index files in blob.

I'd prefer to refactor the BlobDeleteManager and extend its capability beyond 
the aync deletions it does now but I'll unlikely reuse the same queue we've 
established as the single deletion process won't scale with more 
collections/shards per solr node and something like a Collection:Delete API 
call is likely a task with higher priority than the files being deleted on the 
indexing flow (also not async).

> Support shard/collection deletion in shared storage
> ---------------------------------------------------
>
>                 Key: SOLR-14044
>                 URL: https://issues.apache.org/jira/browse/SOLR-14044
>             Project: Solr
>          Issue Type: Sub-task
>          Components: SolrCloud
>            Reporter: Andy Vuong
>            Priority: Major
>
> The Solr Cloud deletion APIs for collections and shards are not currently 
> supported by shared storage but are an essential functionality required by 
> the shared storage design. Deletion of objects from shared storage currently 
> only happens in the indexing path (on pushes) and after the index file 
> listings between the local solr process and external store have been resolved.
>  
> This task is to track supporting the delete shard/collection API commands and 
> its scope does not include cleaning up so called “orphaned” index files from 
> blob (i.e. files that are no longer referenced by any core.metadata file on 
> the external store). This will be designed/covered in another subtask.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to