dungba88 commented on PR #12985: URL: https://github.com/apache/lucene/pull/12985#issuecomment-1934080177
> We might instead just switch to off-heap building once the expected FST size crosses a threshold? We can use createTempOutput to make temporary files as needed for the non-root FSTs that are "too big"? I think this is a good idea. Wondering how should we choose a reasonable threshold? Maybe it could be a parameter? (Was afraid introducing another parameter would also increase the configuration complexity of the system). One of the trade-off here is that they could potentially slow down the indexing: Apart from the root node, we need to traverse and iterate through the whole FST, and off-heap traversal might be slower than on-heap traversal (I think we saw 17% increases in the Synonym off-heap reading https://github.com/apache/lucene/pull/13054). For root node, it doesn't need to be traversed, and we need to save it to IndexOutput anyway, so doing it off-heap actually save time: There's no need to construct the on-heap FST. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org