smuching202 opened a new issue, #15026: URL: https://github.com/apache/lucene/issues/15026
# Context While implementing `Accountable.ramBytesUsed()`, I noticed a discrepancy between the values returned by `RamUsageEstimator.sizeOf(Query, long)` and `RamUsageTester.ramUsed(obj)` in Lucene 10.2.2. For example, given the following test: ``` @Test void testTermQueryRamUsage() { Query query = new TermQuery(new Term("field", "value")); long actual = RamUsageTester.ramUsed(query); // 152 bytes long expected = RamUsageEstimator.sizeOf(query, 0); // 176 bytes assertThat(actual).isEqualTo(expected); } ``` `RamUsageTester` reports: **152 bytes** `RamUsageEstimator` reports: **176 bytes** (using `RamUsageQueryVisitor` internally) # Observed RamUsageEstimator Flow 1. Invoke `RamUsageEstimator.sizeOf(query, 0)` (see [method](https://github.com/apache/lucene/blob/279eb7aaafe985e5d0552b7f2a10b63185a3f893/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java#L367C22-L367C28)) a) Since TermQuery does not implement the Accountable interface, it should use `RamUsageQueryVisitor` 2. Create an instance of `RamUsageQueryVisitor`(see [constructor](https://github.com/apache/lucene/blob/279eb7aaafe985e5d0552b7f2a10b63185a3f893/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java#L308)) a) No default size was passed, so it will invoke `RamUsageEstimator.shallowSizeOf(Query)`, which in my case resulted in **24 bytes** 3. The query uses RamUsageQueryVisitor to traverse through a) We reach `RamUsageQueryVisitor.consumeTerms` (see [method](https://github.com/apache/lucene/blob/279eb7aaafe985e5d0552b7f2a10b63185a3f893/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java#L319)) . Since query == root, we skip that and go directly into invoking sizeOf(terms) b) `sizeOf(terms)` (see [method](https://github.com/apache/lucene/blob/279eb7aaafe985e5d0552b7f2a10b63185a3f893/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java#L623)) sets size to `shallowSizeOf(accountables)`, which resulted in **24 bytes**, and then sums together with results of `Accountable.ramBytesUsed()` c) `Term.ramBytesUsed()` (see [method](https://github.com/apache/lucene/blob/279eb7aaafe985e5d0552b7f2a10b63185a3f893/lucene/core/src/java/org/apache/lucene/index/Term.java#L189)) returns back a total of: 48 + 56 + 24 = **128 bytes** - `BASE_RAM_BYTES` = **48 bytes** (from shallow size of Term and BytesRef) - `RamUsageEstimator.sizeOfObject(field)` = **56 bytes** - `RamUsageEstimator.alignObjectSize( bytes.bytes.length + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER)` = aligned(5 + 16) = **24 bytes** 4. The total returned is **176 bytes** (24 + 128 + 24), which is **24 bytes more** than the actual usage. # Problem: Double Counting In step 3b, the method sets the initial size to the shallow size of the Accountable[] array: ``` public static long sizeOf(Accountable[] accountables) { long size = shallowSizeOf(accountables); // Term shallow size is 24 bytes for (Accountable accountable : accountables) { if (accountable != null) { size += accountable.ramBytesUsed(); } } return size; } ``` This means the shallow size of the Term is counted twice. Once in `shallowSizeOf(accountables)` and again within each `Accountable.ramBytesUsed()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org