[ 
https://issues.apache.org/jira/browse/LUCENE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298807#comment-17298807
 ] 

Adrien Grand commented on LUCENE-9387:
--------------------------------------

[~dweiss] Actually this index is not that small, it is a 1-segment index that 
has 97,931,850 documents and takes 13.1 GB on disk.

There is indeed a test at 
https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/index/BaseIndexFileFormatTestCase.java#L287-L351
 but it cheats a bit by:
 - Not testing absolute RAM usage but relative RAM usage as more documents get 
added to the index, in order to ignore constant factors to RAM usage 
(https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/index/BaseIndexFileFormatTestCase.java#L323-L329).
 This was necessary so that we wouldn't need to create a huge index so that the 
test would pass.
 - Reproducing the same approximations that SegmentReader#ramBytesUsed does 
(https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/index/BaseIndexFileFormatTestCase.java#L111-L164),
 ignoring threadlocals, index inputs, field infos, and more.

> Remove RAM accounting from LeafReader
> -------------------------------------
>
>                 Key: LUCENE-9387
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9387
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Blocker
>             Fix For: master (9.0)
>
>
> Context for this issue can be found at 
> https://lists.apache.org/thread.html/r06b6a63d8689778bbc2736ec7e4e39bf89ae6973c19f2ec6247690fd%40%3Cdev.lucene.apache.org%3E.
> RAM accounting made sense when readers used lots of memory. E.g. when norms 
> were on heap, we could return memory usage of the norms array and memory 
> estimates would be very close to actual memory usage.
> However nowadays, readers consume very little memory, so RAM accounting has 
> become less valuable. Furthermore providing good estimates has become 
> incredibly complex as we can no longer focus on a couple main contributors to 
> memory usage, but would need to start considering things that we historically 
> ignored, such as field infos, segment infos, NIOFS buffers, etc.
> Let's remove RAM accounting from LeafReader?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to