xiangfu0 commented on code in PR #18504:
URL: https://github.com/apache/pinot/pull/18504#discussion_r3246103166
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java:
##########
@@ -98,6 +99,19 @@ public Dictionary getDictionary() {
return _dataSource.getDictionary();
}
+ /// A column with {@code EncodingType.RAW} + an explicit {@code
dictionaryIndex} has {@link #getDictionary()}
+ /// non-null but a RAW forward index that throws on {@link
ForwardIndexReader#readDictIds}. Callers selecting
+ /// between dict-id and value paths must gate on this method, not {@code
getDictionary() != null}.
+ @Override
+ public boolean isDictionaryEncoded() {
+ Dictionary dictionary = _dataSource.getDictionary();
+ if (dictionary == null) {
+ return false;
+ }
+ ForwardIndexReader<?> forwardIndex = _dataSource.getForwardIndex();
+ return forwardIndex == null || forwardIndex.isDictionaryEncoded();
Review Comment:
Fixed in bf6327d21e — `ProjectionBlockValSet.isDictionaryEncoded()` now
returns `false` when the forward index is null. Added explicit Javadoc
enumerating the two cases (RAW + dictionaryIndex, and disabled forward index)
where a non-null dictionary still means dict-id reads are not callable.
##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/RawForwardIndexWithDictionaryTest.java:
##########
@@ -542,6 +541,146 @@ RAW_DICT_DIMENSION, RAW_DICT_INT_DIMENSION,
getTableName(),
"Multi-column GROUP BY rows must match between dictionary-only and
raw+dictionary columns");
}
+ /// Multi-column DISTINCT exercises {@link
org.apache.pinot.core.query.distinct.DistinctExecutorFactory}'s
+ /// multi-column path. The factory routes to {@code
DictionaryBasedMultiColumnDistinctExecutor} whenever every
+ /// column has a non-null dictionary, then that executor calls {@code
BlockValSet#getDictionaryIdsSV()}
+ /// — which throws on a RAW+dictionary column.
+ @Test(dataProvider = "useBothQueryEngines")
+ public void
testMultiColumnDistinctWithRawDictColumnReturnsSameResults(boolean
useMultiStageQueryEngine)
+ throws Exception {
+ setUseMultiStageQueryEngine(useMultiStageQueryEngine);
+ JsonNode dictRows = postQuery(
+ String.format("SELECT DISTINCT %s, %s FROM %s ORDER BY %s, %s",
+ DICT_DIMENSION, DICT_INT_DIMENSION, getTableName(),
DICT_DIMENSION, DICT_INT_DIMENSION))
+ .get("resultTable").get("rows");
+ JsonNode rawRows = postQuery(
+ String.format("SELECT DISTINCT %s, %s FROM %s ORDER BY %s, %s",
+ RAW_DICT_DIMENSION, RAW_DICT_INT_DIMENSION, getTableName(),
+ RAW_DICT_DIMENSION, RAW_DICT_INT_DIMENSION))
+ .get("resultTable").get("rows");
+ assertEquals(rawRows, dictRows,
+ "Multi-column DISTINCT rows must match between dictionary-only and
raw+dictionary columns");
+ }
+
+ /// {@code DISTINCTCOUNT} on a RAW+dictionary column drives
+ /// {@link
org.apache.pinot.core.query.aggregation.function.BaseDistinctAggregateAggregationFunction#svAggregate}
+ /// down the dictionary-id path because {@code blockValSet.getDictionary()
!= null}. That path then calls
+ /// {@code blockValSet.getDictionaryIdsSV()} on the RAW forward index. A
{@code WHERE} predicate is included to
+ /// bypass {@link
org.apache.pinot.core.operator.query.NonScanBasedAggregationOperator}, which
would otherwise
+ /// serve the aggregation directly from the dictionary and hide the bug.
Review Comment:
Reworded in bf6327d21e — each affected docstring now describes the pre-fix
behavior in past tense ("previously crashed inside …") and references
`BlockValSet#isDictionaryEncoded()` / `ColumnContext#isDictionaryEncoded()` as
the correct gate.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]