itschrispeck commented on code in PR #15685:
URL: https://github.com/apache/pinot/pull/15685#discussion_r2098860296


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -117,9 +131,15 @@ private void addFlattenedRecords(List<Map<String, String>> 
records) {
       for (Map.Entry<String, String> entry : record.entrySet()) {
         // Put both key and key-value into the posting list. Key is useful for 
checking if a key exists in the json.
         String key = entry.getKey();
-        _postingListMap.computeIfAbsent(key, k -> new 
RoaringBitmap()).add(_nextFlattenedDocId);
+        _postingListMap.computeIfAbsent(key, k -> {
+          _bytesSize += Utf8.encodedLength(key);

Review Comment:
   > I meant the size of the bitmaps since that is also maintained on heap.
   
   Yeah, it doesn't track that (I don't know if there is an easy to get heap 
usage for bitmaps) - mentioned this in the PR description: 
   > This is a slight undercount of actual usage, as we do not track the size 
of bitmaps - however the intention is to more safely handle high 
cardinality/blob/binary data in JSON and we expect bitmap size to be relatively 
small in this case.
   
   You're right it's not exact. I felt this was a good enough cheap estimate. 
It's been quite useful internally for us in identifying tables w/ large 
increase in heap from json index (e.g., sudden cardinality increase)
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to