jackluo923 commented on code in PR #14365:
URL: https://github.com/apache/pinot/pull/14365#discussion_r1848916745


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java:
##########
@@ -525,6 +526,9 @@ public boolean index(GenericRow row, @Nullable RowMetadata 
rowMetadata)
       }
     }
 
+    // NOTE: we must do this before we index a single column to avoid 
partially indexing the row

Review Comment:
   @deemoliu After inspecting the code and writing a unit test, it seems the 
problem is not reproducible in a unit test because Java is running in 
`non-production` mode (i.e. `java -ea xxx`) with assertions enabled. This 
validation check is pretty much equivalent to the assertion found in 
`org.apache.pinot.segment.local.realtime.impl.forward.FixedByteMVMutableForwardIndex#updateHeader`,
 except we now throw an exception at run-time.:
   
   ```
     private int updateHeader(int row, int numValues) {
       assert (numValues <= _maxNumberOfMultiValuesPerRow);
       int newStartIndex = _prevRowStartIndex + _prevRowLength;
       if (newStartIndex + numValues > _currentCapacity) {
         addDataBuffer(_incrementalCapacity);
         _prevRowStartIndex = 0;
         _prevRowLength = 0;
         newStartIndex = 0;
       }
       writeIntoHeader(row, _dataWriters.size() - 1, newStartIndex, numValues);
       _prevRowStartIndex = newStartIndex;
       _prevRowLength = numValues;
       return newStartIndex;
     }
   ```
   
   In production-mode, the assert is an no-op and if a value inserted into the 
MV forward index is greater than the maximum, we get inconsistencies later on 
in the code. In `validateLengthOfMVColumns`, we use a more relaxed check which 
validate against `_incrementalCapacity`. FixedByteMVMutableForwardIndex derives 
the maximum # of values per entry using `_maxNumberOfMultiValuesPerRow`, but 
`_incrementalCapacity` is the real maximum. See code in 
`org.apache.pinot.segment.local.realtime.impl.forward.FixedByteMVMutableForwardIndex#FixedByteMVMutableForwardIndex`
 for more detail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to