jackluo923 commented on code in PR #14365: URL: https://github.com/apache/pinot/pull/14365#discussion_r1848916745
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java: ########## @@ -525,6 +526,9 @@ public boolean index(GenericRow row, @Nullable RowMetadata rowMetadata) } } + // NOTE: we must do this before we index a single column to avoid partially indexing the row Review Comment: @deemoliu After inspecting the code and writing a unit test, it seems the problem is not reproducible in a unit test because Java is running in `non-production` mode (i.e. `java -ea xxx`) with assertions enabled. This validation check is pretty much equivalent to the assertion found in `org.apache.pinot.segment.local.realtime.impl.forward.FixedByteMVMutableForwardIndex#updateHeader`, except we now throw an exception at run-time.: ``` private int updateHeader(int row, int numValues) { assert (numValues <= _maxNumberOfMultiValuesPerRow); int newStartIndex = _prevRowStartIndex + _prevRowLength; if (newStartIndex + numValues > _currentCapacity) { addDataBuffer(_incrementalCapacity); _prevRowStartIndex = 0; _prevRowLength = 0; newStartIndex = 0; } writeIntoHeader(row, _dataWriters.size() - 1, newStartIndex, numValues); _prevRowStartIndex = newStartIndex; _prevRowLength = numValues; return newStartIndex; } ``` In production-mode, the assert is an no-op and if a value inserted into the MV forward index is greater than the maximum, we get inconsistencies later on in the code. In `validateLengthOfMVColumns`, we use a more relaxed check which validate against `_incrementalCapacity`. FixedByteMVMutableForwardIndex derives the maximum # of values per entry using `_maxNumberOfMultiValuesPerRow`, but `_incrementalCapacity` is the real maximum. See code in `org.apache.pinot.segment.local.realtime.impl.forward.FixedByteMVMutableForwardIndex#FixedByteMVMutableForwardIndex` for more detail. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org