salvatorecampagna opened a new issue, #16113: URL: https://github.com/apache/lucene/issues/16113
### Description ## Problem `Accountable.ramBytesUsed()` is specified as "an estimate of the bytes used by this object" without saying whether bytes from referenced objects are included. In practice, implementations in the Lucene codebase vary: some count everything reachable from the object, including borrowed/shared state passed in via the constructor; some count only what they allocated; a few have explicit comments noting that a referenced object "should be accounted separately." When external code sums `ramBytesUsed()` across multiple `Accountable` instances that share state, the same memory can be counted more than once. `getChildResources()` is conventionally a diagnostic accessor for printing and inspection. It does not advertise itself as a deduplicated ownership tree, and consumers should not assume that summing over it yields a well-defined per-node total. ## Evidence in the codebase **`GlobalOrdinalsQuery`** counts a shared `OrdinalMap` received through its constructor: ``` // lucene/join/src/java/org/apache/lucene/search/join/GlobalOrdinalsQuery.java this.ramBytesUsed = BASE_RAM_BYTES + ... + RamUsageEstimator.sizeOfObject(this.globalOrds) + ... ``` The `OrdinalMap` is cached on the top reader context and shared across queries that target the same join field on the same index. Two queries that include it each report its full size; the same bytes are credited to multiple queries. **`GlobalOrdinalsWithScoreQuery`** in `lucene/join/src/java/org/apache/lucene/search/join/GlobalOrdinalsWithScoreQuery.java` has the same pattern at the equivalent site. **`Lucene99HnswVectorsWriter.FieldWriter`** includes `flatFieldVectorsWriter.ramBytesUsed()` in its own count. The outer writer's per-field loop comments that the field tracks the delegate's usage and the outer omits the delegate from its own count to compensate. The pair has cross-class coupling: the inner counts something it does not allocate, and the outer relies on that to avoid counting it itself. The current accounting therefore depends on the inner and outer staying in step. ## Downstream impact Embedders (Elasticsearch, Solr, OpenSearch, custom indexers) build memory inventories by summing `ramBytesUsed()` across the Accountables they hold open. When two Accountables that share state both report the shared size, the totals inflate, and downstream consumers cannot tell from the API which value reflects actual heap. Concrete consequences: - Capacity planning and circuit-breaker thresholds drift; breakers trip at the wrong actual heap. - Diagnostics become forensic: "why does this reader report N GB on an M GB heap?" is not answerable from the API alone. - Newer subsystems that build categorical inventories (per-shard, per-segment, per-tenant) cannot produce useful totals without a deduplication layer that papers over the ambiguity. ## Proposal This issue proposes three concrete actions: 1. **Clarify `Accountable`'s javadoc** to recommend an ownership-oriented convention for implementations, especially new and updated ones. Define "owned" as memory allocated for this object's lifetime, or released by its `close()` if applicable. Inputs received via constructors or factories may be borrowed, wrapped, sliced, or copied; implementations should report only the bytes they actually own, not the deep content of referenced storage they do not own. The reference slot for a borrowed input is part of the holder's own object layout; re-adding the referent's deep content can lead to double counting when consumers sum across multiple Accountables. State that `getChildResources()` is a diagnostic accessor and not a deduplicated ownership tree, and that summing over it is not sum-safe in general. 2. **Document the convention** alongside the javadoc clarification. Report bytes you allocate, including derived allocations and your own wrapper overhead; do not report deep content of storage you only reference; if you `close()` something, you report its bytes. Recommend the convention for new and updated implementations. 3. **Fix clear offenders incrementally** with targeted PRs. Anchor cases for this issue: the two `GlobalOrdinals*` queries and the HNSW writer pair above. Each PR addresses one or a few classes, includes the size of the doubly-counted object and the resulting change to the reported value, and documents the semantic shift in the PR description. ## Backward compatibility The proposal does not change the `Accountable` interface contract or signature. The javadoc clarification documents a convention already used in parts of the codebase. Targeted fixes to clear offenders will change the numbers those classes report. Downstream consumers that have tuned thresholds against the inflated values will see different numbers after each fix lands. Each such change is bounded by three constraints: - the existing behavior is clearly misleading (counts widely-shared state, like the `OrdinalMap` cases above), - the fix is narrow (one or a few classes per PR), - the PR description documents the semantic shift so downstream consumers can adjust expectations. `Accountable.getChildResources()` is unchanged. Existing patterns where a parent class already aggregates child `ramBytesUsed()` while also exposing those children via `getChildResources()` do not need to change. Examples in the codebase: - `DocumentsWriterPerThread` aggregates child `ramBytesUsed()` into its total while also returning children via `getChildResources()`. - `IndexingChain` does the same. - `TestLRUQueryCache` and similar tests assume aggregate semantics on existing implementations. These continue to work as today. ## Open questions - What is the right wording for the javadoc clarification? "Owned" needs to be defined unambiguously and in terms that fit Java/Lucene conventions. - Should the convention live in `Accountable`'s class javadoc, in a `package-info.java`, or in a dedicated developer-docs page? The first is most visible to implementers. - Should developer docs encourage authors of new Accountable implementations to call out which fields are owned vs borrowed in code comments at construction? Useful for review, with some friction. ## Related issues - [apache/lucene#15097](https://github.com/apache/lucene/issues/15097) (closed): "Incorrect size estimation of BooleanQuery in LRUQueryCache causing heap exhaustion". Operational evidence that query-size estimation affects cache budgets: `BooleanQuery` falling back to a default size in `LRUQueryCache` contributed to heap exhaustion in production. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
