Re: BitDocSet humongous objects

2024-08-20 Thread David Smiley
While orthogonal, I'd rather we not write code that exists purely only to solve this problem. Segment sizes are capped configurably in the MergePolicy; you can lower it if the FixedBitSet would be too large for the largest segments. For single-segment indexes (and the status quo today), maybe the

Re: BitDocSet humongous objects

2024-08-20 Thread Michael Gibney
Interesting -- although certainly related, I think these are somewhat orthogonal questions. You could well have a merge strategy/heap and gc configuration/index size that would have the same "humongous object" problem even under a per-segment cache approach (certainly for cores optimized to a singl

Re: BitDocSet humongous objects

2024-08-19 Thread David Smiley
On Mon, Aug 19, 2024 at 2:32 PM Michael Gibney wrote: > For a more robust solution than fussing with G1HeapRegionSize, I'm > wondering if it might be appropriate to change the implementation of > BitDocSet so that larger instances will be backed by an array of > multiple smaller FixedBitSet instan

BitDocSet humongous objects

2024-08-19 Thread Michael Gibney
We're encountering a relatively large amount of "humongous object" allocation with larger cores. For just under 32G configured heap, G1HeapRegionSize seems to default to 8M or 16M depending on jdk version. For 100M docs in a shard (e.g.), the long[] backing a FixedBitSet (BitDocSet) will be ~12M, c