Re: [PR] handle overflow for `MutableOffHeapByteArrayStore` buffer starting size [pinot]

via GitHub Fri, 24 May 2024 16:36:36 -0700


itschrispeck commented on PR #13215:
URL: https://github.com/apache/pinot/pull/13215#issuecomment-2130517286

> Having 2GB as start size doesn't look correct. Can you check the high
level logic and see if this is expected? Seems like we are trying to use one
single buffer to hold everything?

Looks like we're hitting an edge case. The contributing factors are:
1. MV columns [will always use a mutable
dictionary](https://github.com/apache/pinot/blob/fed2d5f1b613371237b5a29348f0c043200671ad/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java#L450)
2. We have a extremely large MV raw column generated by
SchemaConformingTransformerV2
3. Column is text indexed, so we use `noRawDataForTextIndex` config and
final segment is not nearly as large

Together they can result in the estimated size based on
`RealtimeSegmentStatsHistory` being extremely large even though our target
segment size is ~1.2G.

I think the solution is to allow MV columns to be raw encoded even in the
mutable segment - but I'm not sure that should be in the scope of this PR. What
do you think?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [PR] handle overflow for `MutableOffHeapByteArrayStore` buffer starting size [pinot]

Reply via email to