msfroh opened a new pull request, #13054:
URL: https://github.com/apache/lucene/pull/13054

   ### Description
   This stores the synonym map's FST and word lookup off-heap in a separate, 
configurable directory. 
   
   The initial implementation is rough, but the unit tests pass with this 
change randomly enabled.
   
   Obvious things that need work are:
   1. I tried to do something like a codec, but not really a codec for the 
synonym map files. For a solution that could evolve over time, we should 
probably at least write something to the metadata file saying what format was 
used.
   2. Right now it makes no effort to detect changes to the synonym files. I 
would suggest that SynonymGraphFilterFactory rebuild the directory if a 
checksum of the input files doesn't match a value recorded in the metadata file.
   3. I don't think I like the random seeks in `OffHeapBytesRefHashLike`, but I 
don't see an alternative (besides moving it on-heap). Given that the original 
issue was only about moving the FST off-heap, maybe we can keep the word 
dictionary on-heap.
   
   <!--
   If this is your first contribution to Lucene, please make sure you have 
reviewed the contribution guide.
   https://github.com/apache/lucene/blob/main/CONTRIBUTING.md
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to