mikemccand commented on PR #12985:
URL: https://github.com/apache/lucene/pull/12985#issuecomment-1934057526

   `BlockTree` is kinda crazy how it builds up the final FST: each little 
backwards-recursive chunk of term-space stores its subset of terms into a baby 
FST, and then on grouping N such child blocks into a new parent block, decodes 
those N baby FSTs, appending them into a new big-baby FST.  This backwards 
recursive process continues for all chunks of term space, finally ending in the 
root block, where all the giant-baby FSTs are finally logically appended to one 
another to make the final FST.
   
   This change makes only that final FST construction run off-heap, which is a 
good baby step.
   
   But what if in my index all terms start with say `a`?  I think that will 
mean we do on-heap construction of basically the full sized FST?  Hmm, or, will 
the root block have the `a` prefix pointing to it, so we will in fact build the 
whole FST off-heap, from baby FSTs for aa*, ab*, ac*, ad*, etc.?
   
   We might instead just switch to off-heap building once the expected FST size 
crosses a threshold?  We can use `createTempOutput` to make temporary files as 
needed for the non-root FSTs that are "too big"?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to