Jackie-Jiang opened a new pull request, #18494:
URL: https://github.com/apache/pinot/pull/18494

   ## Summary
   
   Adds disk persistence for `getMaxRowLengthInBytes()` on MV variable-length 
columns so the per-row max byte length collected during stats gathering 
survives the round-trip through segment metadata.
   
   Previously the value was reconstructed at load time as 
`maxNumberOfMultiValues * lengthOfLongestElement`, which is `UNAVAILABLE` 
whenever `shortest != longest` — the common case for real-world var-length MV 
columns.
   
   ## Changes
   
   - New `MAX_ROW_LENGTH_IN_BYTES = "maxRowLengthInBytes"` property key in 
`V1Constants.MetadataKeys.Column`.
   - `BaseSegmentCreator.addColumnMetadataInfo()` writes 
`ColumnStatistics.getMaxRowLengthInBytes()` for MV variable-length columns.
   - `ColumnMetadataImpl.fromPropertiesConfiguration()` reads it via a new 
`Builder.setMaxRowLengthInBytes(int)`.
   - `Builder.build()` keeps existing canonicalization for SV (uses 
`lengthOfLongestElement`) and fixed-width MV (uses `maxNumberOfMultiValues * 
storedType.size()`), and trusts the persisted value for MV variable-length. 
Pre-1.6.0 segments missing the key surface as `UNAVAILABLE`.
   
   Also trims now-redundant legacy-fallback comments around 
`FORWARD_INDEX_ENCODING` that were superseded by the tightened constant javadoc.
   
   ## Backward compatibility
   
   - Old segments missing the key fall through to `UNAVAILABLE` at load time. 
The previous "reconstruct as `maxMV * longest` when `shortest == longest`" 
fallback for var-length MV is dropped — that path only produced a correct value 
in the (rare) all-elements-same-length case.
   - New segments emit the key for MV variable-length columns only; SV and 
fixed-width MV are fully derivable and remain underived on disk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to