Copilot commented on code in PR #17998:
URL: https://github.com/apache/pinot/pull/17998#discussion_r3003347566


##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessorTest.java:
##########
@@ -1337,6 +1338,37 @@ public void testV1CleanupIndices()
     assertFalse(bfFile.exists());
   }
 
+  @Test
+  public void testV1ReplaceLegacyNativeFstIndex()
+      throws Exception {
+    buildV1Segment();
+
+    String strColumn = "column3";
+    _fieldConfigMap.put(strColumn, new FieldConfig(strColumn, 
FieldConfig.EncodingType.DICTIONARY,
+        List.of(FieldConfig.IndexType.FST), null, null));
+
+    File fstFile = new File(INDEX_DIR, strColumn + 
V1Constants.Indexes.LUCENE_V912_FST_INDEX_FILE_EXTENSION);
+    int legacyNativeFstMagic = ('\\' << 24) | ('f' << 16) | ('s' << 8) | 'a';
+

Review Comment:
   This test redefines the legacy native FST magic value inline. To keep the 
definition consistent with production logic (and avoid future drift), consider 
reusing the constant from FstIndexUtils (or exposing it for tests) instead of 
duplicating the bit-shift expression here.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/LuceneFSTIndexReader.java:
##########
@@ -77,4 +84,68 @@ public void close()
       throws IOException {
     // Do Nothing
   }
+
+  private static List<Long> regexMatch(String regexQuery, FST<Long> fst)
+      throws IOException {
+    Automaton automaton = new RegExp(regexQuery).toAutomaton();
+    if (automaton.getNumStates() == 0) {
+      return Collections.emptyList();
+    }
+
+    List<Path<Long>> queue = new ArrayList<>();
+    List<Long> matchingDictIds = new ArrayList<>();
+    queue.add(new Path<>(0, fst.getFirstArc(new FST.Arc<>()), 
fst.outputs.getNoOutput(), new IntsRefBuilder()));
+
+    FST.Arc<Long> scratchArc = new FST.Arc<>();
+    FST.BytesReader fstReader = fst.getBytesReader();
+    Transition transition = new Transition();
+    while (!queue.isEmpty()) {
+      Path<Long> path = queue.remove(queue.size() - 1);
+      if (automaton.isAccept(path._state) && path._fstNode.isFinal()) {
+        matchingDictIds.add(path._output);
+      }

Review Comment:
   In the new regexMatch implementation, when a path reaches a final FST arc 
you add `path._output` directly. Lucene FST outputs can also have a non-empty 
`nextFinalOutput()` on final arcs, so this can return incorrect dictIds for 
some compiled FSTs. Include the arc’s `nextFinalOutput()` (when present) in the 
accumulated output before adding to `matchingDictIds` (similar to how 
LuceneIFSTIndexReader handles final output).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to