Re: [PR] Use value-based LRU cache in NodeHash [lucene]

via GitHub Wed, 01 Nov 2023 03:41:13 -0700


mikemccand commented on PR #12738:
URL: https://github.com/apache/lucene/pull/12738#issuecomment-1788744551


   > But I ended up using a `List<byte[]>` where each item is a node instead of 
ByteBlockPool due to the following reasons:
   
   Hmm -- this is sizable added RAM overhead per entry.  Added array header (16 
or 20 bytes), added pointers to these arrays (4 or 8 bytes), tiny objects for 
GC to crawl, though maybe they mostly die young if the RAM budget is low 
enough.  Yes, we can share some of these `byte[]` with both hashes but I 
suspect that's not a net/net win.
   
   > We automatically get the length of the node without any traversing
   
   We are paying 4 bytes per entry for this :)  And it seems a shame since 
FST's node encoding is already self-delimiting, and, as a side effect of 
looking up that node we will already have computed that length.
   
   > With ByteBlockPool we have 2 unavoidable double byte-copies: (1) when 
write from BytesStore to the primary table and (2) when promote an entry from 
the fallback table to the primary table. In both situations we need to first 
write into a temporary byte[].
   
   I don't think this added cost is so much.  We freeze the node into the true 
FST appending byte store, then, we copy those "last N bytes" over to primary 
hash.  Eventually it's moved to fallback, and, maybe it never gets promoted 
back (single copy), or, maybe it does (+1 copy) but that's "worth it" since we 
achieve some minimization, i.e. the cost correlates nicely with the win (the 
whole point of this double LRU hash).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

Reply via email to