Copilot commented on code in PR #17998:
URL: https://github.com/apache/pinot/pull/17998#discussion_r3002605044


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/TextIndexHandler.java:
##########
@@ -210,6 +227,9 @@ private void 
createTextIndexForColumn(SegmentDirectory.Writer segmentWriter, Col
       // Clean up any existing text index files
       cleanupExistingTextIndexFiles(indexDir, columnName);
     }
+    TextIndexUtils.cleanupTextIndex(indexDir, columnName);
+    FileUtils.deleteQuietly(inProgress);
+    FileUtils.touch(inProgress);

Review Comment:
   Text index files are created under the versioned segment directory (e.g. 
`<indexDir>/v3`), but this cleanup is running against the top-level `indexDir`. 
For v3 segments this will not remove existing Lucene/native text index 
artifacts, and `LuceneTextIndexCreator` may append to an existing index 
(default OpenMode is CREATE_OR_APPEND), producing incorrect results. Clean up 
using the same `segmentDirectory` that is later passed to the index creator 
(and consider also cleaning top-level only for legacy files/markers if needed).



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/TextIndexHandler.java:
##########
@@ -384,4 +394,15 @@ private void 
convertTextIndexToV3Format(SegmentDirectory.Writer segmentWriter, S
 
     LOGGER.info("Successfully converted text index to V3 combined format for 
column: {}", columnName);
   }
+
+  private Set<String> getColumnsWithLegacyNativeTextIndex() {
+    File indexDir = _segmentDirectory.getSegmentMetadata().getIndexDir();
+    Set<String> columns = new HashSet<>();
+    for (String column : 
_segmentDirectory.getSegmentMetadata().getAllColumns()) {
+      if (new File(indexDir, column + 
V1Constants.Indexes.NATIVE_TEXT_INDEX_FILE_EXTENSION).exists()) {
+        columns.add(column);
+      }
+    }
+    return columns;

Review Comment:
   `getColumnsWithLegacyNativeTextIndex()` only checks for 
`<indexDir>/<col>.nativetext.idx`. Older segments converted to v3 could have 
the legacy native text index file under `<indexDir>/v3/` (the previous v1/v2→v3 
converter explicitly copied native text index files there), so they would not 
be detected or cleaned up on reload. Consider checking both the top-level dir 
and the v3 subdirectory (or using a version-agnostic lookup similar to other 
SegmentDirectoryPaths helpers).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to