Tony-X opened a new issue, #12906:
URL: https://github.com/apache/lucene/issues/12906

   ### Description
   
   as part of https://github.com/apache/lucene/pull/12688,  
   I'm trying to develop an algorithm that can intersect FST and FSA 
efficiently. For FSA this means we can leverage the sorted transitions of a 
given state and perform binary search in order to advance to a target.
   
   So the algorithm uses `TransitionAccessor` and there are three relevant APIs
   
   ```
     int initTransition(int state, Transition t);
   
     /** Iterate to the next transition after the provided one */
     void getNextTransition(Transition t);
   
     /**
      * Fill the provided {@link Transition} with the index'th transition 
leaving the specified state.
      */
     void getTransition(int state, int index, Transition t);
   ```
   
   One could call `initTransition` followed by many `getNextTransition` to 
iterate the transitions one by one. Or in my case, Binary search needs to use 
`getTransition` to randomly access the middle point. 
   
   I debugged for hours and realized for a given test case my algo works as 
expected but the transitions I got were messed up. The original tests uses 
quite complicated automatons so I tried to find a small and simple enough to 
exhibit the behavior. Here is the test 
   
   ```
     public void testAutomaton() {
       RegExp regExp = new RegExp("+*.", RegExp.NONE);
       Automaton a = regExp.toAutomaton();
       CompiledAutomaton compiledAutomaton = new CompiledAutomaton(a);
       System.out.println("isFinite: " + compiledAutomaton.finite);
   
       var byteRunnable = compiledAutomaton.getByteRunnable();
       var transitionAccessor = compiledAutomaton.getTransitionAccessor();
       // dfsAutomaton(byteRunnable, transitionAccessor, 0, "");
       // dumpTransitionsViaNext(byteRunnable, transitionAccessor, 0, new 
HashSet<>());
       dumpTransitionsViaRA(byteRunnable, transitionAccessor, 0, new 
HashSet<>());
     }
   
     void dumpTransitionsViaNext(ByteRunnable a, TransitionAccessor 
transitionAccessor, int currentState, Set<Integer> seenStates) {
       if (seenStates.contains(currentState)) {
         return;
       }
   
       seenStates.add(currentState);
   
       var t = new Transition();
       var numStates = transitionAccessor.initTransition(currentState, t);
   
       for (int i = 0; i < numStates; i++) {
         transitionAccessor.getNextTransition(t);
         System.out.println("At: src: " + t.source + " [" + t.min +", " + t.max 
+ "] "
                 + "dest: " + t.dest + " is dest accept: "
                 + (a.isAccept(t.dest) ? "yes" : "no"));
         dumpTransitionsViaNext(a, transitionAccessor, t.dest, seenStates);
       }
     }
   
     void dumpTransitionsViaRA(ByteRunnable a, TransitionAccessor 
transitionAccessor, int currentState, Set<Integer> seenStates) {
       if (seenStates.contains(currentState)) {
         return;
       }
   
       seenStates.add(currentState);
   
       var t = new Transition();
       var numStates = transitionAccessor.initTransition(currentState, t);
   
       // transitionAccessor.getTransition(currentState, numStates - 1, t);
       for (int i = 0; i < numStates; i++) {
         transitionAccessor.getTransition(currentState, i, t);
         System.out.println("At: src: " + t.source + " [" + t.min +", " + t.max 
+ "] "
                 + "dest: " + t.dest + " is dest accept: "
                 + (a.isAccept(t.dest) ? "yes" : "no"));
         dumpTransitionsViaRA(a, transitionAccessor, t.dest, seenStates);
       }
     }
   ```
   
   `dumpTransitionsViaNext` gives expected transitions but 
`dumpTransitionsViaRA` produces a mess. 
   
   
   
   ```
   Via next
   At: src: 0 arcIdx: 0  [0, 42] dest: 1 is dest accept: yes
   At: src: 0 arcIdx: 1  [43, 43] dest: 2 is dest accept: yes
   At: src: 2 arcIdx: 0  [0, 42] dest: 1 is dest accept: yes
   At: src: 2 arcIdx: 1  [43, 43] dest: 2 is dest accept: yes
   At: src: 2 arcIdx: 2  [44, 127] dest: 1 is dest accept: yes
   At: src: 2 arcIdx: 3  [194, 223] dest: 7 is dest accept: no
   At: src: 7 arcIdx: 0  [128, 142] dest: 1 is dest accept: yes
   At: src: 7 arcIdx: 1  [143, 143] dest: 1 is dest accept: yes
   At: src: 7 arcIdx: 2  [144, 190] dest: 1 is dest accept: yes
   At: src: 7 arcIdx: 3  [191, 191] dest: 1 is dest accept: yes
   At: src: 2 arcIdx: 4  [224, 239] dest: 8 is dest accept: no
   At: src: 8 arcIdx: 0  [128, 142] dest: 11 is dest accept: no
   At: src: 11 arcIdx: 0  [128, 142] dest: 1 is dest accept: yes
   At: src: 11 arcIdx: 1  [143, 143] dest: 1 is dest accept: yes
   At: src: 11 arcIdx: 2  [144, 190] dest: 1 is dest accept: yes
   At: src: 11 arcIdx: 3  [191, 191] dest: 1 is dest accept: yes
   At: src: 8 arcIdx: 1  [143, 143] dest: 11 is dest accept: no
   At: src: 8 arcIdx: 2  [144, 190] dest: 11 is dest accept: no
   At: src: 8 arcIdx: 3  [191, 191] dest: 11 is dest accept: no
   At: src: 2 arcIdx: 5  [240, 243] dest: 9 is dest accept: no
   At: src: 9 arcIdx: 0  [128, 142] dest: 12 is dest accept: no
   At: src: 12 arcIdx: 0  [128, 142] dest: 13 is dest accept: no
   At: src: 13 arcIdx: 0  [128, 142] dest: 1 is dest accept: yes
   At: src: 13 arcIdx: 1  [143, 143] dest: 1 is dest accept: yes
   At: src: 13 arcIdx: 2  [144, 190] dest: 1 is dest accept: yes
   At: src: 13 arcIdx: 3  [191, 191] dest: 1 is dest accept: yes
   At: src: 12 arcIdx: 1  [143, 143] dest: 13 is dest accept: no
   At: src: 12 arcIdx: 2  [144, 190] dest: 13 is dest accept: no
   At: src: 12 arcIdx: 3  [191, 191] dest: 13 is dest accept: no
   At: src: 9 arcIdx: 1  [143, 143] dest: 12 is dest accept: no
   At: src: 9 arcIdx: 2  [144, 190] dest: 12 is dest accept: no
   At: src: 9 arcIdx: 3  [191, 191] dest: 12 is dest accept: no
   At: src: 2 arcIdx: 6  [244, 244] dest: 10 is dest accept: no
   At: src: 10 arcIdx: 0  [128, 142] dest: 14 is dest accept: no
   At: src: 14 arcIdx: 0  [128, 142] dest: 16 is dest accept: no
   At: src: 16 arcIdx: 0  [128, 142] dest: 1 is dest accept: yes
   At: src: 16 arcIdx: 1  [143, 143] dest: 1 is dest accept: yes
   At: src: 16 arcIdx: 2  [144, 190] dest: 1 is dest accept: yes
   At: src: 16 arcIdx: 3  [191, 191] dest: 1 is dest accept: yes
   At: src: 14 arcIdx: 1  [143, 143] dest: 16 is dest accept: no
   At: src: 14 arcIdx: 2  [144, 190] dest: 16 is dest accept: no
   At: src: 14 arcIdx: 3  [191, 191] dest: 16 is dest accept: no
   At: src: 10 arcIdx: 1  [143, 143] dest: 15 is dest accept: no
   At: src: 15 arcIdx: 0  [128, 142] dest: 17 is dest accept: no
   At: src: 17 arcIdx: 0  [128, 142] dest: 1 is dest accept: yes
   At: src: 17 arcIdx: 1  [143, 143] dest: 1 is dest accept: yes
   At: src: 17 arcIdx: 2  [144, 190] dest: 1 is dest accept: yes
   At: src: 17 arcIdx: 3  [191, 191] dest: 1 is dest accept: yes
   At: src: 15 arcIdx: 1  [143, 143] dest: 17 is dest accept: no
   At: src: 15 arcIdx: 2  [144, 190] dest: 17 is dest accept: no
   At: src: 15 arcIdx: 3  [191, 191] dest: 18 is dest accept: no
   At: src: 18 arcIdx: 0  [128, 142] dest: 1 is dest accept: yes
   At: src: 18 arcIdx: 1  [143, 143] dest: 1 is dest accept: yes
   At: src: 18 arcIdx: 2  [144, 190] dest: 1 is dest accept: yes
   At: src: 18 arcIdx: 3  [191, 191] dest: 1 is dest accept: yes
   At: src: 0 arcIdx: 2  [44, 127] dest: 1 is dest accept: yes
   At: src: 0 arcIdx: 3  [194, 223] dest: 3 is dest accept: no
   At: src: 3 arcIdx: 0  [128, 142] dest: 1 is dest accept: yes
   At: src: 3 arcIdx: 1  [143, 143] dest: 1 is dest accept: yes
   At: src: 3 arcIdx: 2  [144, 190] dest: 1 is dest accept: yes
   At: src: 3 arcIdx: 3  [191, 191] dest: 1 is dest accept: yes
   At: src: 0 arcIdx: 4  [224, 239] dest: 4 is dest accept: no
   At: src: 4 arcIdx: 0  [128, 142] dest: 19 is dest accept: no
   At: src: 19 arcIdx: 0  [128, 142] dest: 1 is dest accept: yes
   At: src: 19 arcIdx: 1  [143, 143] dest: 1 is dest accept: yes
   At: src: 19 arcIdx: 2  [144, 190] dest: 1 is dest accept: yes
   At: src: 19 arcIdx: 3  [191, 191] dest: 1 is dest accept: yes
   At: src: 4 arcIdx: 1  [143, 143] dest: 19 is dest accept: no
   At: src: 4 arcIdx: 2  [144, 190] dest: 19 is dest accept: no
   At: src: 4 arcIdx: 3  [191, 191] dest: 19 is dest accept: no
   At: src: 0 arcIdx: 5  [240, 243] dest: 5 is dest accept: no
   At: src: 5 arcIdx: 0  [128, 142] dest: 20 is dest accept: no
   At: src: 20 arcIdx: 0  [128, 142] dest: 21 is dest accept: no
   At: src: 21 arcIdx: 0  [128, 142] dest: 1 is dest accept: yes
   At: src: 21 arcIdx: 1  [143, 143] dest: 1 is dest accept: yes
   At: src: 21 arcIdx: 2  [144, 190] dest: 1 is dest accept: yes
   At: src: 21 arcIdx: 3  [191, 191] dest: 1 is dest accept: yes
   At: src: 20 arcIdx: 1  [143, 143] dest: 21 is dest accept: no
   At: src: 20 arcIdx: 2  [144, 190] dest: 21 is dest accept: no
   At: src: 20 arcIdx: 3  [191, 191] dest: 21 is dest accept: no
   At: src: 5 arcIdx: 1  [143, 143] dest: 20 is dest accept: no
   At: src: 5 arcIdx: 2  [144, 190] dest: 20 is dest accept: no
   At: src: 5 arcIdx: 3  [191, 191] dest: 20 is dest accept: no
   At: src: 0 arcIdx: 6  [244, 244] dest: 6 is dest accept: no
   At: src: 6 arcIdx: 0  [128, 142] dest: 22 is dest accept: no
   At: src: 22 arcIdx: 0  [128, 142] dest: 24 is dest accept: no
   At: src: 24 arcIdx: 0  [128, 142] dest: 1 is dest accept: yes
   At: src: 24 arcIdx: 1  [143, 143] dest: 1 is dest accept: yes
   At: src: 24 arcIdx: 2  [144, 190] dest: 1 is dest accept: yes
   At: src: 24 arcIdx: 3  [191, 191] dest: 1 is dest accept: yes
   At: src: 22 arcIdx: 1  [143, 143] dest: 24 is dest accept: no
   At: src: 22 arcIdx: 2  [144, 190] dest: 24 is dest accept: no
   At: src: 22 arcIdx: 3  [191, 191] dest: 24 is dest accept: no
   At: src: 6 arcIdx: 1  [143, 143] dest: 23 is dest accept: no
   At: src: 23 arcIdx: 0  [128, 142] dest: 25 is dest accept: no
   At: src: 25 arcIdx: 0  [128, 142] dest: 1 is dest accept: yes
   At: src: 25 arcIdx: 1  [143, 143] dest: 1 is dest accept: yes
   At: src: 25 arcIdx: 2  [144, 190] dest: 1 is dest accept: yes
   At: src: 25 arcIdx: 3  [191, 191] dest: 1 is dest accept: yes
   At: src: 23 arcIdx: 1  [143, 143] dest: 25 is dest accept: no
   At: src: 23 arcIdx: 2  [144, 190] dest: 25 is dest accept: no
   At: src: 23 arcIdx: 3  [191, 191] dest: 26 is dest accept: no
   At: src: 26 arcIdx: 0  [128, 142] dest: 1 is dest accept: yes
   At: src: 26 arcIdx: 1  [143, 143] dest: 1 is dest accept: yes
   At: src: 26 arcIdx: 2  [144, 190] dest: 1 is dest accept: yes
   At: src: 26 arcIdx: 3  [191, 191] dest: 1 is dest accept: yes
   
   Via RA
   
   isFinite: false
   At: src: 0 arcIdx: 0  [0, 42] dest: 0 is dest accept: no
   At: src: 0 arcIdx: 1  [43, 43] dest: 0 is dest accept: no
   At: src: 0 arcIdx: 2  [44, 127] dest: 0 is dest accept: no
   At: src: 0 arcIdx: 3  [194, 223] dest: 0 is dest accept: no
   At: src: 0 arcIdx: 4  [224, 239] dest: 0 is dest accept: no
   At: src: 0 arcIdx: 5  [240, 243] dest: 0 is dest accept: no
   At: src: 0 arcIdx: 6  [244, 244] dest: 0 is dest accept: no
   ```
   
   I shared this with @zhaih and he seemed to have an idea of the fix.
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to