mikemccand commented on PR #12985: URL: https://github.com/apache/lucene/pull/12985#issuecomment-1934057526
`BlockTree` is kinda crazy how it builds up the final FST: each little backwards-recursive chunk of term-space stores its subset of terms into a baby FST, and then on grouping N such child blocks into a new parent block, decodes those N baby FSTs, appending them into a new big-baby FST. This backwards recursive process continues for all chunks of term space, finally ending in the root block, where all the giant-baby FSTs are finally logically appended to one another to make the final FST. This change makes only that final FST construction run off-heap, which is a good baby step. But what if in my index all terms start with say `a`? I think that will mean we do on-heap construction of basically the full sized FST? Hmm, or, will the root block have the `a` prefix pointing to it, so we will in fact build the whole FST off-heap, from baby FSTs for aa*, ab*, ac*, ad*, etc.? We might instead just switch to off-heap building once the expected FST size crosses a threshold? We can use `createTempOutput` to make temporary files as needed for the non-root FSTs that are "too big"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org