original-brownbear commented on PR #13906:
URL: https://github.com/apache/lucene/pull/13906#issuecomment-2411864517

   > One thing that is a bit awkward to me is that it makes clones cheaper than 
slices, so e.g. refactoring TermsEnum#postings to work on a slice that contains 
just the postings of the current term would be a regression vs. cloning the 
whole file like today. Not a blocker, it just gets me thinking.
   
   I had the same thought :D that's what got me here in part. Moving the offset 
handling into whatever object that uses an `IndexInput` now and replacing that 
with a stateless shared `RandomAccessInput` should allow for some performance 
gains here and there and more importantly, should allow for massive heap 
savings.
   It's still quite wild today for ES, I took a heap dump of a http_logs track 
benchmark node and somehow tens of shards today translate into MBs of 
`IndexInput` instances for us. I get that there's two sides to this, ES should 
probably have fewer segments but also still there's no need to redundantly 
clone (stateful) `IndexInput` when we often also track file pointers in the 
users of those things anyway in Lucene?
   
   > I am not sure if this really helps much....
   
   See above. But also, even in the Lucene benchmarks it's about 1% of 
allocations in wikimedium saved for me. Not a massive win, but not entirely 
irrelevant either :)
   Also, this is somewhat helpful as far as making heap-dumps easier to 
interpret actually. Always easier to find redundancies if object sharing isn't 
super far down the reference chains :)  
   
   On it, looking looking into a test :)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to