Jackie-Jiang commented on code in PR #14479:
URL: https://github.com/apache/pinot/pull/14479#discussion_r1847162809


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java:
##########
@@ -797,6 +801,18 @@ private void addNewRow(int docId, GenericRow row) {
             recordIndexingError(indexEntry.getKey(), e);
           }
         }
+
+        if (_thresholdForNumOfColValuesEnabled) {
+          int prevCount = indexContainer._valuesInfo.getNumValues();
+          long newCount = prevCount + 1L + values.length;

Review Comment:
   Total values itself is not enough. We should perform a per-index check (add 
an api to the `MutableIndex` and let it return if it can take more values).
   E.g. for MV forward index, if we get 1B values, but each value takes more 
than 2 bytes, we will run into the same exception



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java:
##########
@@ -362,6 +363,13 @@ private boolean endCriteriaReached() {
               _numRowsConsumed, _numRowsIndexed);
           _stopReason = 
SegmentCompletionProtocol.REASON_FORCE_COMMIT_MESSAGE_RECEIVED;
           return true;
+        } else if (_thresholdForNumOfColValuesEnabled && 
_realtimeSegment.isNumOfColValuesAboveThreshold()) {

Review Comment:
   Nice, so it is fairly easy to stop consumption and commit



##########
pinot-common/src/main/java/org/apache/pinot/common/protocols/SegmentCompletionProtocol.java:
##########
@@ -149,6 +149,7 @@ public enum ControllerResponseStatus {
   public static final String REASON_END_OF_PARTITION_GROUP = 
"endOfPartitionGroup";
   // Stop reason sent by server as force commit message received
   public static final String REASON_FORCE_COMMIT_MESSAGE_RECEIVED = 
"forceCommitMessageReceived";
+  public static final String REASON_NUM_OF_COL_VALUES_ABOVE_THRESHOLD = 
"numColValuesAboveThreshold";

Review Comment:
   This should be always on. We may introduce a config to turn it off if we are 
not confident about this new logic, but if it is not very complicated we can 
remove this config



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to