dungba88 commented on issue #12714: URL: https://github.com/apache/lucene/issues/12714#issuecomment-1786608097
I think that makes sense. I attempted to implement the copy bytes (not optimizing though, and there are lots of non-optimal bytes read/write). With the same FST as above, it uses 513KB cache size, while with the address-based it's 150KB, so it's aligned with the 3x number reported by Mike. There are some quirks I found while implementing: - As the ByteBlockPool seem to merely a very long byte array (which was divided into multiple byte array), we still need to record and map the FST real address to the offset of the copied bytes (unless there's already a tracking mechanism that I'm unaware of). Maybe we can use an additional PagedBytesWriter? - As FST operations acts on the real, absolute address, I created a new layer of `ReverseBytesReader` which does the mapping automatically. - The implementation first copy the node bytes from BytesStore into a new temporary byte[], and then write this byte[] into the primary table ByteBlockPool. We could directly from BytesStore into ByteBlockPool. I could put a draft and gradually improve it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org