mikemccand commented on issue #14429:
URL: https://github.com/apache/lucene/issues/14429#issuecomment-2773775763

   Phew, this is a spooky exception!
   
   I think it means that the same term was fed to the FST Builder twice in row. 
 FST Builder in general can support this case, and it means that a single 
output can have multiple outputs, and the `Outputs` impl is supposed to be able 
to combine multiple outputs into a set (internally).  But you're right: in this 
context (BlockTree) there should never be the same term added more than once, 
and each term has a single output, and the Outputs impl does not support it.  
It is indeed NOT supposed to happen!
   
   BlockTree is confusing in how it builds up its blocks.  It does it one 
sub-tree at a time, using intermediate FSTs to hold each sub-tree, and then 
regurgitating the terms from each subtree with `FSTTermsEnum`, adding them into 
a bigger FST Builder to combine multiple sub-trees into a single FST.  It keeps 
doing this up and up the terms trie until it gets to empty string and then that 
FST is the terms index.
   
   So .... somehow this regurgitation process added the same term twice in a 
row.  This means either a given `FSTTermsEnum` returned the same term twice in 
a row, or, somehow a term was duplicated at the boundary (where one 
`FSTTermsEnum` ended from a sub-block, and next `FSTTermsEnum` began).
   
   Do we know any fun details about the use case?  Maybe an exotic/old JVM?  
Massive numbers of terms...?  Or the terms are some crazy binary gene sequences 
or something?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to