itschrispeck commented on code in PR #15685: URL: https://github.com/apache/pinot/pull/15685#discussion_r2098860296
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java: ########## @@ -117,9 +131,15 @@ private void addFlattenedRecords(List<Map<String, String>> records) { for (Map.Entry<String, String> entry : record.entrySet()) { // Put both key and key-value into the posting list. Key is useful for checking if a key exists in the json. String key = entry.getKey(); - _postingListMap.computeIfAbsent(key, k -> new RoaringBitmap()).add(_nextFlattenedDocId); + _postingListMap.computeIfAbsent(key, k -> { + _bytesSize += Utf8.encodedLength(key); Review Comment: > I meant the size of the bitmaps since that is also maintained on heap. Yeah, it doesn't track that (I don't know if there is an easy to get heap usage for bitmaps) - mentioned this in the PR description: > This is a slight undercount of actual usage, as we do not track the size of bitmaps - however the intention is to more safely handle high cardinality/blob/binary data in JSON and we expect bitmap size to be relatively small in this case. You're right it's not exact. I felt this was a good enough cheap estimate. It's been quite useful internally for us in identifying tables w/ large increase in heap from json index (e.g., sudden cardinality increase) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org