dungba88 commented on PR #12985:
URL: https://github.com/apache/lucene/pull/12985#issuecomment-1934080177

   > We might instead just switch to off-heap building once the expected FST 
size crosses a threshold? We can use createTempOutput to make temporary files 
as needed for the non-root FSTs that are "too big"?
   
   I think this is a good idea. Wondering how should we choose a reasonable 
threshold? Maybe it could be a parameter? (Was afraid introducing another 
parameter would also increase the configuration complexity of the system).
   
   One of the trade-off here is that they could potentially slow down the 
indexing: Apart from the root node, we need to traverse and iterate through the 
whole FST, and off-heap traversal might be slower than on-heap traversal (I 
think we saw 17% increases in the Synonym off-heap reading 
https://github.com/apache/lucene/pull/13054). For root node, it doesn't need to 
be traversed, and we need to save it to IndexOutput anyway, so doing it 
off-heap actually save time: There's no need to construct the on-heap FST.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to