Re: [PR] Remove native text and FST index support [pinot]

via GitHub Fri, 27 Mar 2026 16:13:51 -0700


xiangfu0 commented on code in PR #17998:
URL: https://github.com/apache/pinot/pull/17998#discussion_r3003628001



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/TextIndexHandler.java:
##########
@@ -210,6 +227,9 @@ private void 
createTextIndexForColumn(SegmentDirectory.Writer segmentWriter, Col
       // Clean up any existing text index files
       cleanupExistingTextIndexFiles(indexDir, columnName);
     }
+    TextIndexUtils.cleanupTextIndex(indexDir, columnName);
+    FileUtils.deleteQuietly(inProgress);
+    FileUtils.touch(inProgress);

Review Comment:
   Updated `TextIndexHandler` to clean both the top-level segment root and the 
versioned segment directory before recreating v3 text indexes. I also added a 
v3 regression in `SegmentPreProcessorTest.testV3ReplaceLegacyNativeTextIndex` 
that drops a stale file into the Lucene text index dir and verifies reload 
removes it before rebuild.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/LuceneFSTIndexReader.java:
##########
@@ -77,4 +84,68 @@ public void close()
       throws IOException {
     // Do Nothing
   }
+
+  private static List<Long> regexMatch(String regexQuery, FST<Long> fst)
+      throws IOException {
+    Automaton automaton = new RegExp(regexQuery).toAutomaton();
+    if (automaton.getNumStates() == 0) {
+      return Collections.emptyList();
+    }
+
+    List<Path<Long>> queue = new ArrayList<>();
+    List<Long> matchingDictIds = new ArrayList<>();
+    queue.add(new Path<>(0, fst.getFirstArc(new FST.Arc<>()), 
fst.outputs.getNoOutput(), new IntsRefBuilder()));
+
+    FST.Arc<Long> scratchArc = new FST.Arc<>();
+    FST.BytesReader fstReader = fst.getBytesReader();
+    Transition transition = new Transition();
+    while (!queue.isEmpty()) {
+      Path<Long> path = queue.remove(queue.size() - 1);
+      if (automaton.isAccept(path._state) && path._fstNode.isFinal()) {
+        matchingDictIds.add(path._output);
+      }

Review Comment:
   Updated `LuceneFSTIndexReader` to include `nextFinalOutput()` when a final 
arc matches, matching the IFST reader. I also added exact-match coverage in 
`LuceneFSTIndexCreatorTest` so the final-output path is exercised directly.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/TextIndexHandler.java:
##########
@@ -384,4 +394,15 @@ private void 
convertTextIndexToV3Format(SegmentDirectory.Writer segmentWriter, S
 
     LOGGER.info("Successfully converted text index to V3 combined format for 
column: {}", columnName);
   }
+
+  private Set<String> getColumnsWithLegacyNativeTextIndex() {
+    File indexDir = _segmentDirectory.getSegmentMetadata().getIndexDir();
+    Set<String> columns = new HashSet<>();
+    for (String column : 
_segmentDirectory.getSegmentMetadata().getAllColumns()) {
+      if (new File(indexDir, column + 
V1Constants.Indexes.NATIVE_TEXT_INDEX_FILE_EXTENSION).exists()) {
+        columns.add(column);
+      }
+    }
+    return columns;

Review Comment:
   Handled in the same patch. Legacy native text detection now checks both the 
segment root and the versioned v3 directory, and the new v3 regression test 
covers a legacy native text file under the versioned directory.



##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessorTest.java:
##########
@@ -1337,6 +1338,37 @@ public void testV1CleanupIndices()
     assertFalse(bfFile.exists());
   }
 
+  @Test
+  public void testV1ReplaceLegacyNativeFstIndex()
+      throws Exception {
+    buildV1Segment();
+
+    String strColumn = "column3";
+    _fieldConfigMap.put(strColumn, new FieldConfig(strColumn, 
FieldConfig.EncodingType.DICTIONARY,
+        List.of(FieldConfig.IndexType.FST), null, null));
+
+    File fstFile = new File(INDEX_DIR, strColumn + 
V1Constants.Indexes.LUCENE_V912_FST_INDEX_FILE_EXTENSION);
+    int legacyNativeFstMagic = ('\\' << 24) | ('f' << 16) | ('s' << 8) | 'a';
+

Review Comment:
   Reused the production value here by exposing 
`FstIndexUtils.LEGACY_NATIVE_FST_MAGIC` and switched the test to use that 
constant instead of duplicating the magic number inline.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Remove native text and FST index support [pinot]

Reply via email to