[GitHub] [pinot] kishoreg commented on a change in pull request #7301: mark and sweep indices for V3 segment format

GitBox Sun, 15 Aug 2021 22:13:55 -0700


kishoreg commented on a change in pull request #7301:
URL: https://github.com/apache/pinot/pull/7301#discussion_r689242680




##########
File path: 
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/store/SingleFileIndexDirectory.java
##########
@@ -327,51 +336,176 @@ private void persistIndexMap(IndexEntry entry)
       throws IOException {
     File mapFile = new File(_segmentDirectory, 
V1Constants.INDEX_MAP_FILE_NAME);
     try (PrintWriter writer = new PrintWriter(new BufferedWriter(new 
FileWriter(mapFile, true)))) {
-      String startKey = getKey(entry.key.name, entry.key.type.getIndexName(), 
true);
-
-      StringBuilder sb = new StringBuilder();
-      sb.append(startKey).append(" = ").append(entry.startOffset);
-      writer.println(sb.toString());
-
-      String endKey = getKey(entry.key.name, entry.key.type.getIndexName(), 
false);
-      sb = new StringBuilder();
-      sb.append(endKey).append(" = ").append(entry.size);
-      writer.println(sb.toString());
+      persistIndexMap(entry, writer);
     }
   }
 
-  private String getKey(String column, String indexName, boolean 
isStartOffset) {
-    return column + MAP_KEY_SEPARATOR + indexName + MAP_KEY_SEPARATOR + 
(isStartOffset ? "startOffset" : "size");
-  }
-
   private String allocationContext(IndexKey key) {
     return this.getClass().getSimpleName() + key.toString();
   }
 
+  /**
+   * This method sweeps the indices marked for removal. Exception is simply 
bubbled up w/o
+   * trying to recover disk states from failure. This method is expected to 
run during segment
+   * reloading, which has failure handling by creating a backup folder before 
conduct reloading.
+   */
+  private void cleanupRemovedIndices()
+      throws IOException {
+    if (!_shouldCleanupRemovedIndices) {
+      return;
+    }
+
+    // To keep track of indices to be retained and put them together
+    // compactly in the new index file.
+    long nextOffset = 0;
+    List<IndexEntry> retained = new ArrayList<>();
+    File tmpIdxFile = new File(_segmentDirectory, V1Constants.INDEX_FILE_NAME 
+ ".tmp");
+
+    // With FileChannel, we can seek to the data flexibly.
+    try (FileChannel srcCh = new RandomAccessFile(_indexFile, 
"r").getChannel();

Review comment:
       I was thinking more about using the ColumnIndexType enum and iterating 
over all the columns. In other words, we should try to remove referring to 
DICTIONARY/FORWARD INDEX in this class. I see that we are referring to LUCENE 
before your change and that's already breaking the abstraction. cc 
@siddharthteotia we should try to remove the LUCENE reference from this class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] kishoreg commented on a change in pull request #7301: mark and sweep indices for V3 segment format

Reply via email to