klsince commented on a change in pull request #7301:
URL: https://github.com/apache/pinot/pull/7301#discussion_r689702332



##########
File path: 
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/store/SingleFileIndexDirectory.java
##########
@@ -327,51 +336,176 @@ private void persistIndexMap(IndexEntry entry)
       throws IOException {
     File mapFile = new File(_segmentDirectory, 
V1Constants.INDEX_MAP_FILE_NAME);
     try (PrintWriter writer = new PrintWriter(new BufferedWriter(new 
FileWriter(mapFile, true)))) {
-      String startKey = getKey(entry.key.name, entry.key.type.getIndexName(), 
true);
-
-      StringBuilder sb = new StringBuilder();
-      sb.append(startKey).append(" = ").append(entry.startOffset);
-      writer.println(sb.toString());
-
-      String endKey = getKey(entry.key.name, entry.key.type.getIndexName(), 
false);
-      sb = new StringBuilder();
-      sb.append(endKey).append(" = ").append(entry.size);
-      writer.println(sb.toString());
+      persistIndexMap(entry, writer);
     }
   }
 
-  private String getKey(String column, String indexName, boolean 
isStartOffset) {
-    return column + MAP_KEY_SEPARATOR + indexName + MAP_KEY_SEPARATOR + 
(isStartOffset ? "startOffset" : "size");
-  }
-
   private String allocationContext(IndexKey key) {
     return this.getClass().getSimpleName() + key.toString();
   }
 
+  /**
+   * This method sweeps the indices marked for removal. Exception is simply 
bubbled up w/o
+   * trying to recover disk states from failure. This method is expected to 
run during segment
+   * reloading, which has failure handling by creating a backup folder before 
conduct reloading.
+   */
+  private void cleanupRemovedIndices()
+      throws IOException {
+    if (!_shouldCleanupRemovedIndices) {
+      return;
+    }
+
+    // To keep track of indices to be retained and put them together
+    // compactly in the new index file.
+    long nextOffset = 0;
+    List<IndexEntry> retained = new ArrayList<>();
+    File tmpIdxFile = new File(_segmentDirectory, V1Constants.INDEX_FILE_NAME 
+ ".tmp");
+
+    // With FileChannel, we can seek to the data flexibly.
+    try (FileChannel srcCh = new RandomAccessFile(_indexFile, 
"r").getChannel();

Review comment:
       One side effect to loop over all column names and all index types in 
ColumnIndexType enum is that the data layout of index file will change to: 
either key'ed by column name or key'ed by index type (depends on which is the 
outer loop). But today the data layout is all columns' dictionary+forward data 
followed by other index data, like below (which I think is determined by the 
logic in the 
[SegmentV1V2ToV3FormatConverter](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/converter/SegmentV1V2ToV3FormatConverter.java#L150),
 and it explicitly refers to index type for the layout)
   ```
   [col1.dictionary]
   [col1.forward index]
   [col2.dictionary]
   [col2.forward index]
   [col3.forward index] // e.g. non-dictionary encoded
   --------------------------
   [col1.inverted index]
   [col2.json index]
   [col1.bloom filter]
   ...
   ```
   
   I don't find there is logic assumes the data layout has to like the above, 
but was trying to keep it just in case. If there is no risk about changing data 
layout, I can remove the direct reference to specific index type.
   
   P.S. the code in 
[SegmentV1V2ToV3FormatConverter](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/converter/SegmentV1V2ToV3FormatConverter.java#L150)
 seems having a bug as it doesn't copy FST, H3 and BloomFilter index during 
conversion. This shouldn't affect correctness, as Pinot server would 
re-generate them upon reloading the segment, just that it incurs unnecessary 
overhead.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to