dungba88 opened a new issue, #12697:
URL: https://github.com/apache/lucene/issues/12697

   ### Description
   
   After writing the FSTStore-backed FST to DataOutput, and specifying a 
different DataOutput for meta, if we try to read from these (using the FST 
public ctor) we will get the following the exception:
   
   ```
   java.lang.ArrayIndexOutOfBoundsException: Index 17 out of bounds for length 
17
   
        at 
__randomizedtesting.SeedInfo.seed([CBCB30F6D2F8FEA1:821F24747AC56DDD]:0)
        at 
org.apache.lucene.store.ByteArrayDataInput.readVLong(ByteArrayDataInput.java:133)
        at org.apache.lucene.util.fst.FST.<init>(FST.java:494)
        at org.apache.lucene.util.fst.FST.<init>(FST.java:443)
   ```
   
   The reason is that, when writing to metadata, if the FST is backed by 
FSTStore, it would not write the numBytes: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/FST.java#L555-L562
   
   The numBytes is instead written by FSTStore to the main DataOutput: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/OnHeapFSTStore.java
   
   Thus if we set the metaOut and dataOut as the same DataOutput, they will 
subsequently write the numBytes correctly. However if we use different 
DataOutput, the metaOut will thus lack of the numBytes and cause the index out 
of bounds exception.
   
   To illustrate:
   
   When writing on the same DataOutput
   
   ```
   [ HEADER ] [ EMPTY_OUTPUT_FLAG ] [ EMPTY_OUTPUT ] [INPUT_TYPE ] [ START_NODE 
] [ NUM_BYTES ] [ MAIN ]
   ```
   
   When writing on the different DataOutput
   
   ```
   metaOut: [ HEADER ] [ EMPTY_OUTPUT_FLAG ] [ EMPTY_OUTPUT ] [INPUT_TYPE ] [ 
START_NODE ]
   dataOut: [ NUM_BYTES ] [ MAIN ]
   ```
   
   I can put a fix to this
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to