Copilot commented on code in PR #18496:
URL: https://github.com/apache/pinot/pull/18496#discussion_r3243238220
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java:
##########
@@ -144,6 +142,11 @@ public void updateIndices(SegmentDirectory.Writer
segmentWriter)
String column = entry.getKey();
List<Operation> operations = entry.getValue();
+ // Backfill missing 1.6.0-era stats for this column before dispatching
ops so the per-op handlers can read
+ // them off `ColumnMetadata` when they construct their forward-index
creator. No-op when the column is
+ // fixed-width, already has the stats, or has no forward index on disk.
+ backfillMissingStats(column, segmentWriter);
Review Comment:
This pre-pass also runs for columns whose only operation is
`DISABLE_FORWARD_INDEX`, even though that path never constructs a new
forward-index creator and immediately drops the forward index later. On large
legacy var-length columns, a reload that is just disabling the forward index
will now perform a full column scan and metadata rewrite unnecessarily. Gate
the backfill to operations that actually need length stats for a
rewrite/rebuild.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java:
##########
@@ -116,32 +118,32 @@ public class
InvertedIndexAndDictionaryBasedForwardIndexCreator implements AutoC
private PinotDataBuffer _forwardIndexMaxSizeBuffer;
public InvertedIndexAndDictionaryBasedForwardIndexCreator(SegmentDirectory
segmentDirectory,
- SegmentDirectory.Writer segmentWriter, TableConfig tableConfig, String
columnName, boolean dictionaryPresent,
- boolean dictionaryBasedForwardIndex, ForwardIndexConfig
forwardIndexConfig, boolean isTemporaryForwardIndex)
+ SegmentDirectory.Writer segmentWriter, TableConfig tableConfig, String
columnName,
+ FieldIndexConfigs fieldIndexConfigs, boolean isTemporaryForwardIndex)
throws IOException {
_segmentDirectory = segmentDirectory;
_segmentWriter = segmentWriter;
_tableConfig = tableConfig;
_columnName = columnName;
- _dictionaryPresent = dictionaryPresent;
- _forwardIndexConfig = forwardIndexConfig;
+ _keepDictionary =
fieldIndexConfigs.getConfig(StandardIndexes.dictionary()).isEnabled();
+ _forwardIndexConfig =
fieldIndexConfigs.getConfig(StandardIndexes.forward());
_isTemporaryForwardIndex = isTemporaryForwardIndex;
_segmentMetadata = segmentDirectory.getSegmentMetadata();
_columnMetadata = _segmentMetadata.getColumnMetadataFor(columnName);
_singleValue = _columnMetadata.isSingleValue();
- _cardinality = _columnMetadata.getCardinality();
_numDocs = _columnMetadata.getTotalDocs();
+ _cardinality = _columnMetadata.getCardinality();
+ assert _numDocs > 0 && _cardinality > 0;
_totalNumberOfEntries = _columnMetadata.getTotalNumberOfEntries();
_maxNumberOfMultiValues = _columnMetadata.getMaxNumberOfMultiValues();
- int numValues = _singleValue ? _numDocs : _totalNumberOfEntries;
- _useMMapBuffer = numValues > NUM_VALUES_THRESHOLD_FOR_MMAP_BUFFER;
+ _useMMapBuffer = _totalNumberOfEntries >
NUM_VALUES_THRESHOLD_FOR_MMAP_BUFFER;
Review Comment:
This now sizes SV rebuilds from `totalNumberOfEntries`, but that metadata
key is optional/MV-only and is read as `ColumnMetadata.UNAVAILABLE` when
absent. For legacy SV forward-index-disabled segments with the key missing,
`_totalNumberOfEntries` is `-1`, so the temp buffer allocation fails instead of
using `_numDocs` as before. Use `_numDocs` for SV columns and
`totalNumberOfEntries` only for MV sizing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]