dungba88 commented on issue #12543:
URL: https://github.com/apache/lucene/issues/12543#issuecomment-1782469977
I put a new revision with support for DataOutput and FileChannel.
When using DataOutput, if suffix sharing is enabled one also needs to pass a
RandomAccessInput for reading. Otherwise it can be left null. So one can pass a
IndexOutput, and RandomAccessInput can be created from IndexInput.
When using FileChannel, one only needs to pass the FileChannel as that
already allows both read & write at the same time. This FileChannel
implementation is just for demonstration of feasibility.
Some stuffs I'd like to discuss:
- Should we write the rootNode + numBytes to the end of the FST instead of
the front? We only have them after constructing the FST and we can't prepend a
DataOutput (that's costly). Otherwise we would need to save the metadata
separately from the main body. That's why I added a new method `saveMetadata()`
- Should we move to value-based LRU cache? It has pros and cons:
- Pros: We make NodeHash independent of FST completely. It would allow the
suffix sharing without the need of RandomAccessInput, and thus without the need
for IndexOutput & IndexInput to be open at the same time. Also accessing from
RAM is much faster than accessing from disk.
- Cons: More RAM required than the address-based cache. For truly minimal
FST it would require the same (or more) RAM needed for the entire FST.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]