Jackie-Jiang commented on code in PR #18496:
URL: https://github.com/apache/pinot/pull/18496#discussion_r3243273664
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java:
##########
@@ -144,6 +142,11 @@ public void updateIndices(SegmentDirectory.Writer
segmentWriter)
String column = entry.getKey();
List<Operation> operations = entry.getValue();
+ // Backfill missing 1.6.0-era stats for this column before dispatching
ops so the per-op handlers can read
+ // them off `ColumnMetadata` when they construct their forward-index
creator. No-op when the column is
+ // fixed-width, already has the stats, or has no forward index on disk.
+ backfillMissingStats(column, segmentWriter);
Review Comment:
Good catch. Added a `needsBackfill` gate in `updateIndices`: the pre-pass
only runs when the column has an op that actually constructs a new
forward-index creator reading length stats — `CHANGE_INDEX_COMPRESSION_TYPE`,
`DISABLE_DICTIONARY` on a dict-encoded forward index that's staying on, and
`ENABLE_RAW_FORWARD_INDEX` when the forward index already exists and is
dict-encoded. `DISABLE_FORWARD_INDEX` (alone or with `DISABLE_DICTIONARY`),
`ENABLE_DICT_FORWARD_INDEX` / `ENABLE_RAW_FORWARD_INDEX` rebuilds (which go
through the inv→fwd creator with its own inline backfill), and
`ENABLE_DICTIONARY` (which has its own stats collector) all skip the pre-pass
now.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java:
##########
@@ -116,32 +118,32 @@ public class
InvertedIndexAndDictionaryBasedForwardIndexCreator implements AutoC
private PinotDataBuffer _forwardIndexMaxSizeBuffer;
public InvertedIndexAndDictionaryBasedForwardIndexCreator(SegmentDirectory
segmentDirectory,
- SegmentDirectory.Writer segmentWriter, TableConfig tableConfig, String
columnName, boolean dictionaryPresent,
- boolean dictionaryBasedForwardIndex, ForwardIndexConfig
forwardIndexConfig, boolean isTemporaryForwardIndex)
+ SegmentDirectory.Writer segmentWriter, TableConfig tableConfig, String
columnName,
+ FieldIndexConfigs fieldIndexConfigs, boolean isTemporaryForwardIndex)
throws IOException {
_segmentDirectory = segmentDirectory;
_segmentWriter = segmentWriter;
_tableConfig = tableConfig;
_columnName = columnName;
- _dictionaryPresent = dictionaryPresent;
- _forwardIndexConfig = forwardIndexConfig;
+ _keepDictionary =
fieldIndexConfigs.getConfig(StandardIndexes.dictionary()).isEnabled();
+ _forwardIndexConfig =
fieldIndexConfigs.getConfig(StandardIndexes.forward());
_isTemporaryForwardIndex = isTemporaryForwardIndex;
_segmentMetadata = segmentDirectory.getSegmentMetadata();
_columnMetadata = _segmentMetadata.getColumnMetadataFor(columnName);
_singleValue = _columnMetadata.isSingleValue();
- _cardinality = _columnMetadata.getCardinality();
_numDocs = _columnMetadata.getTotalDocs();
+ _cardinality = _columnMetadata.getCardinality();
+ assert _numDocs > 0 && _cardinality > 0;
_totalNumberOfEntries = _columnMetadata.getTotalNumberOfEntries();
_maxNumberOfMultiValues = _columnMetadata.getMaxNumberOfMultiValues();
- int numValues = _singleValue ? _numDocs : _totalNumberOfEntries;
- _useMMapBuffer = numValues > NUM_VALUES_THRESHOLD_FOR_MMAP_BUFFER;
+ _useMMapBuffer = _totalNumberOfEntries >
NUM_VALUES_THRESHOLD_FOR_MMAP_BUFFER;
Review Comment:
No code change — the reported failure can't happen.
`ColumnMetadataImpl.Builder.build()` at lines 634-635 unconditionally sets
`_totalNumberOfEntries = _totalDocs` for SV columns, regardless of whether the
`totalNumberOfEntries` key was present in `metadata.properties`:
```java
if (_fieldSpec.isSingleValueField()) {
_totalNumberOfEntries = _totalDocs;
...
}
```
So for any SV column (legacy or modern),
`_columnMetadata.getTotalNumberOfEntries()` returns `_totalDocs > 0` (the
empty-segment case is gated out by `BaseIndexHandler`'s
`Preconditions.checkState`). For MV columns, `TOTAL_NUMBER_OF_ENTRIES` is
mandatory writer-side and is always persisted. The buffer sizing is safe.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]