[ https://issues.apache.org/jira/browse/SOLR-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999475#comment-16999475 ]
Andy Vuong commented on SOLR-14044: ----------------------------------- CollectionDeletion and ShardDeletion add new deletion flows we need to support. A few functional requirements for collection deletion: * Deletion of all index files belonging to a collection located in the blob store * Removal of any local in-memory metadata used by the collection such as SharedConcurrencyMetadataCache By nature of using shared storage (S3, GCS), the first requirement may always be “best effort”. Reason being these are eventually consistent systems. In S3, list commands are eventually consistent so if we issue collection API delete, finding all the files belonging to a collection (we’re always key-ed on collection name), then we might not find everything. Fortunately the same isn’t true in GCS. Our design calls for adding an “orphaned” file deleter in the future. By orphan, we mean any index file not referenced by any core.metadata file in the shared store. This isn’t covered in this JIRA but it’s likely where we handle these instances of stale reads. The second requirement refers to an implementation detail of our shard indexing concurrency but is required if we want to support reusing shard/collection names. We store in the JVM cache metadata that needs to be evicted. Achieving this via distributed clean up might be difficult so we may want to do some kind of clean up on creation of replicas that are similarly named. The downside is if no such thing happens, then we have objects sitting in memory until the node restarts. Design-wise, we may want the deletion processes to be flexible to extend beyond these functional requirements if down the line we expand shared collections to store other objects aside from index files in blob. I'd prefer to refactor the BlobDeleteManager and extend its capability beyond the aync deletions it does now but I'll unlikely reuse the same queue we've established as the single deletion process won't scale with more collections/shards per solr node and something like a Collection:Delete API call is likely a task with higher priority than the files being deleted on the indexing flow (also not async). > Support shard/collection deletion in shared storage > --------------------------------------------------- > > Key: SOLR-14044 > URL: https://issues.apache.org/jira/browse/SOLR-14044 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud > Reporter: Andy Vuong > Priority: Major > > The Solr Cloud deletion APIs for collections and shards are not currently > supported by shared storage but are an essential functionality required by > the shared storage design. Deletion of objects from shared storage currently > only happens in the indexing path (on pushes) and after the index file > listings between the local solr process and external store have been resolved. > > This task is to track supporting the delete shard/collection API commands and > its scope does not include cleaning up so called “orphaned” index files from > blob (i.e. files that are no longer referenced by any core.metadata file on > the external store). This will be designed/covered in another subtask. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org