[ https://issues.apache.org/jira/browse/LUCENE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298807#comment-17298807 ]
Adrien Grand commented on LUCENE-9387: -------------------------------------- [~dweiss] Actually this index is not that small, it is a 1-segment index that has 97,931,850 documents and takes 13.1 GB on disk. There is indeed a test at https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/index/BaseIndexFileFormatTestCase.java#L287-L351 but it cheats a bit by: - Not testing absolute RAM usage but relative RAM usage as more documents get added to the index, in order to ignore constant factors to RAM usage (https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/index/BaseIndexFileFormatTestCase.java#L323-L329). This was necessary so that we wouldn't need to create a huge index so that the test would pass. - Reproducing the same approximations that SegmentReader#ramBytesUsed does (https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/index/BaseIndexFileFormatTestCase.java#L111-L164), ignoring threadlocals, index inputs, field infos, and more. > Remove RAM accounting from LeafReader > ------------------------------------- > > Key: LUCENE-9387 > URL: https://issues.apache.org/jira/browse/LUCENE-9387 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Blocker > Fix For: master (9.0) > > > Context for this issue can be found at > https://lists.apache.org/thread.html/r06b6a63d8689778bbc2736ec7e4e39bf89ae6973c19f2ec6247690fd%40%3Cdev.lucene.apache.org%3E. > RAM accounting made sense when readers used lots of memory. E.g. when norms > were on heap, we could return memory usage of the norms array and memory > estimates would be very close to actual memory usage. > However nowadays, readers consume very little memory, so RAM accounting has > become less valuable. Furthermore providing good estimates has become > incredibly complex as we can no longer focus on a couple main contributors to > memory usage, but would need to start considering things that we historically > ignored, such as field infos, segment infos, NIOFS buffers, etc. > Let's remove RAM accounting from LeafReader? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org