original-brownbear commented on PR #13906: URL: https://github.com/apache/lucene/pull/13906#issuecomment-2411864517
> One thing that is a bit awkward to me is that it makes clones cheaper than slices, so e.g. refactoring TermsEnum#postings to work on a slice that contains just the postings of the current term would be a regression vs. cloning the whole file like today. Not a blocker, it just gets me thinking. I had the same thought :D that's what got me here in part. Moving the offset handling into whatever object that uses an `IndexInput` now and replacing that with a stateless shared `RandomAccessInput` should allow for some performance gains here and there and more importantly, should allow for massive heap savings. It's still quite wild today for ES, I took a heap dump of a http_logs track benchmark node and somehow tens of shards today translate into MBs of `IndexInput` instances for us. I get that there's two sides to this, ES should probably have fewer segments but also still there's no need to redundantly clone (stateful) `IndexInput` when we often also track file pointers in the users of those things anyway in Lucene? > I am not sure if this really helps much.... See above. But also, even in the Lucene benchmarks it's about 1% of allocations in wikimedium saved for me. Not a massive win, but not entirely irrelevant either :) Also, this is somewhat helpful as far as making heap-dumps easier to interpret actually. Always easier to find redundancies if object sharing isn't super far down the reference chains :) On it, looking looking into a test :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org