Tony-X opened a new issue, #12906: URL: https://github.com/apache/lucene/issues/12906
### Description as part of https://github.com/apache/lucene/pull/12688, I'm trying to develop an algorithm that can intersect FST and FSA efficiently. For FSA this means we can leverage the sorted transitions of a given state and perform binary search in order to advance to a target. So the algorithm uses `TransitionAccessor` and there are three relevant APIs ``` int initTransition(int state, Transition t); /** Iterate to the next transition after the provided one */ void getNextTransition(Transition t); /** * Fill the provided {@link Transition} with the index'th transition leaving the specified state. */ void getTransition(int state, int index, Transition t); ``` One could call `initTransition` followed by many `getNextTransition` to iterate the transitions one by one. Or in my case, Binary search needs to use `getTransition` to randomly access the middle point. I debugged for hours and realized for a given test case my algo works as expected but the transitions I got were messed up. The original tests uses quite complicated automatons so I tried to find a small and simple enough to exhibit the behavior. Here is the test ``` public void testAutomaton() { RegExp regExp = new RegExp("+*.", RegExp.NONE); Automaton a = regExp.toAutomaton(); CompiledAutomaton compiledAutomaton = new CompiledAutomaton(a); System.out.println("isFinite: " + compiledAutomaton.finite); var byteRunnable = compiledAutomaton.getByteRunnable(); var transitionAccessor = compiledAutomaton.getTransitionAccessor(); // dfsAutomaton(byteRunnable, transitionAccessor, 0, ""); // dumpTransitionsViaNext(byteRunnable, transitionAccessor, 0, new HashSet<>()); dumpTransitionsViaRA(byteRunnable, transitionAccessor, 0, new HashSet<>()); } void dumpTransitionsViaNext(ByteRunnable a, TransitionAccessor transitionAccessor, int currentState, Set<Integer> seenStates) { if (seenStates.contains(currentState)) { return; } seenStates.add(currentState); var t = new Transition(); var numStates = transitionAccessor.initTransition(currentState, t); for (int i = 0; i < numStates; i++) { transitionAccessor.getNextTransition(t); System.out.println("At: src: " + t.source + " [" + t.min +", " + t.max + "] " + "dest: " + t.dest + " is dest accept: " + (a.isAccept(t.dest) ? "yes" : "no")); dumpTransitionsViaNext(a, transitionAccessor, t.dest, seenStates); } } void dumpTransitionsViaRA(ByteRunnable a, TransitionAccessor transitionAccessor, int currentState, Set<Integer> seenStates) { if (seenStates.contains(currentState)) { return; } seenStates.add(currentState); var t = new Transition(); var numStates = transitionAccessor.initTransition(currentState, t); // transitionAccessor.getTransition(currentState, numStates - 1, t); for (int i = 0; i < numStates; i++) { transitionAccessor.getTransition(currentState, i, t); System.out.println("At: src: " + t.source + " [" + t.min +", " + t.max + "] " + "dest: " + t.dest + " is dest accept: " + (a.isAccept(t.dest) ? "yes" : "no")); dumpTransitionsViaRA(a, transitionAccessor, t.dest, seenStates); } } ``` `dumpTransitionsViaNext` gives expected transitions but `dumpTransitionsViaRA` produces a mess. ``` Via next At: src: 0 arcIdx: 0 [0, 42] dest: 1 is dest accept: yes At: src: 0 arcIdx: 1 [43, 43] dest: 2 is dest accept: yes At: src: 2 arcIdx: 0 [0, 42] dest: 1 is dest accept: yes At: src: 2 arcIdx: 1 [43, 43] dest: 2 is dest accept: yes At: src: 2 arcIdx: 2 [44, 127] dest: 1 is dest accept: yes At: src: 2 arcIdx: 3 [194, 223] dest: 7 is dest accept: no At: src: 7 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes At: src: 7 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes At: src: 7 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes At: src: 7 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes At: src: 2 arcIdx: 4 [224, 239] dest: 8 is dest accept: no At: src: 8 arcIdx: 0 [128, 142] dest: 11 is dest accept: no At: src: 11 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes At: src: 11 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes At: src: 11 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes At: src: 11 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes At: src: 8 arcIdx: 1 [143, 143] dest: 11 is dest accept: no At: src: 8 arcIdx: 2 [144, 190] dest: 11 is dest accept: no At: src: 8 arcIdx: 3 [191, 191] dest: 11 is dest accept: no At: src: 2 arcIdx: 5 [240, 243] dest: 9 is dest accept: no At: src: 9 arcIdx: 0 [128, 142] dest: 12 is dest accept: no At: src: 12 arcIdx: 0 [128, 142] dest: 13 is dest accept: no At: src: 13 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes At: src: 13 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes At: src: 13 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes At: src: 13 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes At: src: 12 arcIdx: 1 [143, 143] dest: 13 is dest accept: no At: src: 12 arcIdx: 2 [144, 190] dest: 13 is dest accept: no At: src: 12 arcIdx: 3 [191, 191] dest: 13 is dest accept: no At: src: 9 arcIdx: 1 [143, 143] dest: 12 is dest accept: no At: src: 9 arcIdx: 2 [144, 190] dest: 12 is dest accept: no At: src: 9 arcIdx: 3 [191, 191] dest: 12 is dest accept: no At: src: 2 arcIdx: 6 [244, 244] dest: 10 is dest accept: no At: src: 10 arcIdx: 0 [128, 142] dest: 14 is dest accept: no At: src: 14 arcIdx: 0 [128, 142] dest: 16 is dest accept: no At: src: 16 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes At: src: 16 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes At: src: 16 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes At: src: 16 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes At: src: 14 arcIdx: 1 [143, 143] dest: 16 is dest accept: no At: src: 14 arcIdx: 2 [144, 190] dest: 16 is dest accept: no At: src: 14 arcIdx: 3 [191, 191] dest: 16 is dest accept: no At: src: 10 arcIdx: 1 [143, 143] dest: 15 is dest accept: no At: src: 15 arcIdx: 0 [128, 142] dest: 17 is dest accept: no At: src: 17 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes At: src: 17 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes At: src: 17 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes At: src: 17 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes At: src: 15 arcIdx: 1 [143, 143] dest: 17 is dest accept: no At: src: 15 arcIdx: 2 [144, 190] dest: 17 is dest accept: no At: src: 15 arcIdx: 3 [191, 191] dest: 18 is dest accept: no At: src: 18 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes At: src: 18 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes At: src: 18 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes At: src: 18 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes At: src: 0 arcIdx: 2 [44, 127] dest: 1 is dest accept: yes At: src: 0 arcIdx: 3 [194, 223] dest: 3 is dest accept: no At: src: 3 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes At: src: 3 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes At: src: 3 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes At: src: 3 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes At: src: 0 arcIdx: 4 [224, 239] dest: 4 is dest accept: no At: src: 4 arcIdx: 0 [128, 142] dest: 19 is dest accept: no At: src: 19 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes At: src: 19 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes At: src: 19 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes At: src: 19 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes At: src: 4 arcIdx: 1 [143, 143] dest: 19 is dest accept: no At: src: 4 arcIdx: 2 [144, 190] dest: 19 is dest accept: no At: src: 4 arcIdx: 3 [191, 191] dest: 19 is dest accept: no At: src: 0 arcIdx: 5 [240, 243] dest: 5 is dest accept: no At: src: 5 arcIdx: 0 [128, 142] dest: 20 is dest accept: no At: src: 20 arcIdx: 0 [128, 142] dest: 21 is dest accept: no At: src: 21 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes At: src: 21 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes At: src: 21 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes At: src: 21 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes At: src: 20 arcIdx: 1 [143, 143] dest: 21 is dest accept: no At: src: 20 arcIdx: 2 [144, 190] dest: 21 is dest accept: no At: src: 20 arcIdx: 3 [191, 191] dest: 21 is dest accept: no At: src: 5 arcIdx: 1 [143, 143] dest: 20 is dest accept: no At: src: 5 arcIdx: 2 [144, 190] dest: 20 is dest accept: no At: src: 5 arcIdx: 3 [191, 191] dest: 20 is dest accept: no At: src: 0 arcIdx: 6 [244, 244] dest: 6 is dest accept: no At: src: 6 arcIdx: 0 [128, 142] dest: 22 is dest accept: no At: src: 22 arcIdx: 0 [128, 142] dest: 24 is dest accept: no At: src: 24 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes At: src: 24 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes At: src: 24 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes At: src: 24 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes At: src: 22 arcIdx: 1 [143, 143] dest: 24 is dest accept: no At: src: 22 arcIdx: 2 [144, 190] dest: 24 is dest accept: no At: src: 22 arcIdx: 3 [191, 191] dest: 24 is dest accept: no At: src: 6 arcIdx: 1 [143, 143] dest: 23 is dest accept: no At: src: 23 arcIdx: 0 [128, 142] dest: 25 is dest accept: no At: src: 25 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes At: src: 25 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes At: src: 25 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes At: src: 25 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes At: src: 23 arcIdx: 1 [143, 143] dest: 25 is dest accept: no At: src: 23 arcIdx: 2 [144, 190] dest: 25 is dest accept: no At: src: 23 arcIdx: 3 [191, 191] dest: 26 is dest accept: no At: src: 26 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes At: src: 26 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes At: src: 26 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes At: src: 26 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes Via RA isFinite: false At: src: 0 arcIdx: 0 [0, 42] dest: 0 is dest accept: no At: src: 0 arcIdx: 1 [43, 43] dest: 0 is dest accept: no At: src: 0 arcIdx: 2 [44, 127] dest: 0 is dest accept: no At: src: 0 arcIdx: 3 [194, 223] dest: 0 is dest accept: no At: src: 0 arcIdx: 4 [224, 239] dest: 0 is dest accept: no At: src: 0 arcIdx: 5 [240, 243] dest: 0 is dest accept: no At: src: 0 arcIdx: 6 [244, 244] dest: 0 is dest accept: no ``` I shared this with @zhaih and he seemed to have an idea of the fix. ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org