mbwaheed commented on a change in pull request #1055: SOLR-13932 Review directory locking and Blob interactions URL: https://github.com/apache/lucene-solr/pull/1055#discussion_r356424243
########## File path: solr/core/src/java/org/apache/solr/store/blob/metadata/ServerSideMetadata.java ########## @@ -132,35 +127,15 @@ public ServerSideMetadata(String coreName, CoreContainer container, boolean take generation = latestCommit.getGeneration(); latestCommitFiles = latestCommitBuilder.build(); - // Capture now the hash and verify again if we need to pull content from the Blob store into this directory, - // to make sure there are no local changes at the same time that might lead to a corruption in case of interaction - // with the download. - // TODO: revise with "design assumptions around pull pipeline" mentioned in allCommits TODO below + // Capture now the hash and verify again after files have been pulled and before the directory is updated (or before + // the index is switched to use a new directory) to make sure there are no local changes at the same time that might + // lead to a corruption in case of interaction with the download or might be a sign of other problems (it is not + // expected that indexing can happen on a local directory of a SHARED replica if that replica is not up to date with + // the Blob store version). directoryHash = getSolrDirectoryHash(coreDir); - allCommitsFiles = latestCommitFiles; - // TODO: allCommits was added to detect special cases where inactive file segments can potentially conflict - // with whats in shared store. But given the recent understanding of semantics around index directory locks - // we need to revise our design assumptions around pull pipeline, including this one. - // Disabling this for now so that unreliability around introspection of older commits - // might not get in the way of steady state indexing. -// // A note on listCommits says that it does not guarantee consistent results if a commit is in progress. -// // But in blob context we serialize commits and pulls by proper locking therefore we should be good here. -// List<IndexCommit> allCommits = DirectoryReader.listCommits(coreDir); -// -// // we should always have a commit point as verified in the beginning of this method. -// assert (allCommits.size() > 1) || (allCommits.size() == 1 && allCommits.get(0).equals(latestCommit)); -// -// // optimization: normally we would only be dealing with one commit point. In that case just reuse latest commit files builder. -// ImmutableCollection.Builder<CoreFileData> allCommitsBuilder = latestCommitBuilder; -// if (allCommits.size() > 1) { -// allCommitsBuilder = new ImmutableSet.Builder<>(); -// for (IndexCommit commit : allCommits) { -// // no snapshot for inactive segments files -// buildCommitFiles(coreDir, commit, allCommitsBuilder, /* snapshotDir */ null); -// } -// } -// allCommitsFiles = allCommitsBuilder.build(); + // Need to inventory all local files in case files that need to be pulled from Blob conflict with them. + allFiles = ImmutableSet.copyOf(coreDir.listAll()); Review comment: coreDir.listAll is called twice, once here and once above inside getSolrDirectoryHash. This can be optimized. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org