Copilot commented on code in PR #18496:
URL: https://github.com/apache/pinot/pull/18496#discussion_r3243238220


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java:
##########
@@ -144,6 +142,11 @@ public void updateIndices(SegmentDirectory.Writer 
segmentWriter)
       String column = entry.getKey();
       List<Operation> operations = entry.getValue();
 
+      // Backfill missing 1.6.0-era stats for this column before dispatching 
ops so the per-op handlers can read
+      // them off `ColumnMetadata` when they construct their forward-index 
creator. No-op when the column is
+      // fixed-width, already has the stats, or has no forward index on disk.
+      backfillMissingStats(column, segmentWriter);

Review Comment:
   This pre-pass also runs for columns whose only operation is 
`DISABLE_FORWARD_INDEX`, even though that path never constructs a new 
forward-index creator and immediately drops the forward index later. On large 
legacy var-length columns, a reload that is just disabling the forward index 
will now perform a full column scan and metadata rewrite unnecessarily. Gate 
the backfill to operations that actually need length stats for a 
rewrite/rebuild.
   



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java:
##########
@@ -116,32 +118,32 @@ public class 
InvertedIndexAndDictionaryBasedForwardIndexCreator implements AutoC
   private PinotDataBuffer _forwardIndexMaxSizeBuffer;
 
   public InvertedIndexAndDictionaryBasedForwardIndexCreator(SegmentDirectory 
segmentDirectory,
-      SegmentDirectory.Writer segmentWriter, TableConfig tableConfig, String 
columnName, boolean dictionaryPresent,
-      boolean dictionaryBasedForwardIndex, ForwardIndexConfig 
forwardIndexConfig, boolean isTemporaryForwardIndex)
+      SegmentDirectory.Writer segmentWriter, TableConfig tableConfig, String 
columnName,
+      FieldIndexConfigs fieldIndexConfigs, boolean isTemporaryForwardIndex)
       throws IOException {
     _segmentDirectory = segmentDirectory;
     _segmentWriter = segmentWriter;
     _tableConfig = tableConfig;
     _columnName = columnName;
-    _dictionaryPresent = dictionaryPresent;
-    _forwardIndexConfig = forwardIndexConfig;
+    _keepDictionary = 
fieldIndexConfigs.getConfig(StandardIndexes.dictionary()).isEnabled();
+    _forwardIndexConfig = 
fieldIndexConfigs.getConfig(StandardIndexes.forward());
     _isTemporaryForwardIndex = isTemporaryForwardIndex;
 
     _segmentMetadata = segmentDirectory.getSegmentMetadata();
     _columnMetadata = _segmentMetadata.getColumnMetadataFor(columnName);
     _singleValue = _columnMetadata.isSingleValue();
-    _cardinality = _columnMetadata.getCardinality();
     _numDocs = _columnMetadata.getTotalDocs();
+    _cardinality = _columnMetadata.getCardinality();
+    assert _numDocs > 0 && _cardinality > 0;
     _totalNumberOfEntries = _columnMetadata.getTotalNumberOfEntries();
     _maxNumberOfMultiValues = _columnMetadata.getMaxNumberOfMultiValues();
-    int numValues = _singleValue ? _numDocs : _totalNumberOfEntries;
-    _useMMapBuffer = numValues > NUM_VALUES_THRESHOLD_FOR_MMAP_BUFFER;
+    _useMMapBuffer = _totalNumberOfEntries > 
NUM_VALUES_THRESHOLD_FOR_MMAP_BUFFER;

Review Comment:
   This now sizes SV rebuilds from `totalNumberOfEntries`, but that metadata 
key is optional/MV-only and is read as `ColumnMetadata.UNAVAILABLE` when 
absent. For legacy SV forward-index-disabled segments with the key missing, 
`_totalNumberOfEntries` is `-1`, so the temp buffer allocation fails instead of 
using `_numDocs` as before. Use `_numDocs` for SV columns and 
`totalNumberOfEntries` only for MV sizing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to