Jackie-Jiang commented on code in PR #8601:
URL: https://github.com/apache/pinot/pull/8601#discussion_r875258083
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -134,6 +139,20 @@ public void init(SegmentGeneratorConfig
segmentCreationSpec, SegmentIndexCreatio
invertedIndexColumns.add(columnName);
}
+ Set<String> bloomFilterColumns = new HashSet<>();
+ for (String columnName : _config.getBloomFilterCreationColumns()) {
+ Preconditions.checkState(schema.hasColumn(columnName),
+ "Cannot create bloom filter for column: %s because it is not in
schema", columnName);
+ bloomFilterColumns.add(columnName);
+ }
+
+ Set<String> rangeIndexColumns = new HashSet<>();
+ for (String columnName : _config.getRangeIndexCreationColumns()) {
+ Preconditions.checkState(schema.hasColumn(columnName),
+ "Cannot create bloom filter for column: %s because it is not in
schema", columnName);
Review Comment:
Update the error message
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -89,6 +92,8 @@ public class SegmentColumnarIndexCreator implements
SegmentCreator {
private SegmentGeneratorConfig _config;
private Map<String, ColumnIndexCreationInfo> _indexCreationInfoMap;
private final IndexCreatorProvider _indexCreatorProvider =
IndexingOverrides.getIndexCreatorProvider();
+ private final Map<String, BloomFilterCreator> _bloomFilterCreatorMap = new
HashMap<>();
+ private final Map<String, CombinedInvertedIndexCreator>
_rangeIndexFilterCreatorMap = new HashMap<>();
Review Comment:
(minor) Let's re-order the variables a little bit for readability. Suggest
putting them between `invertedIndex` and `textIndex`. Keep the same order in
the handling logic
##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java:
##########
@@ -120,6 +120,18 @@ public enum TimeColumnType {
private SegmentZKPropsConfig _segmentZKPropsConfig;
+ private final List<String> _bloomFilterCreationColumns = new ArrayList<>();
Review Comment:
(minor) Suggest moving these 2 list between `_invertedIndexCreationColumns`
and `_textIndexCreationColumns`, also move the getters along with other getters
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -346,6 +384,37 @@ public void indexRow(GenericRow row)
//get dictionaryCreator, will be null if column is not dictionaryEncoded
SegmentDictionaryCreator dictionaryCreator =
_dictionaryCreatorMap.get(columnName);
+ // bloom filter
+ BloomFilterCreator bloomFilterCreator =
_bloomFilterCreatorMap.get(columnName);
+ if (bloomFilterCreator != null) {
+ bloomFilterCreator.add((String) columnValueToIndex);
Review Comment:
(MAJOR)
```suggestion
bloomFilterCreator.add(columnValueToIndex.toString());
```
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -162,6 +181,9 @@ public void init(SegmentGeneratorConfig
segmentCreationSpec, SegmentIndexCreatio
}
// Initialize creators for dictionary, forward index and inverted index
+ IndexingConfig indexingConfig =
_config.getTableConfig().getIndexingConfig();
+ int rangeIndexVersion =
Review Comment:
(minor) IndexingConfig can never be null
```suggestion
int rangeIndexVersion =
_config.getTableConfig().getIndexingConfig().getRangeIndexVersion();
```
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -215,6 +237,22 @@ public void init(SegmentGeneratorConfig
segmentCreationSpec, SegmentIndexCreatio
dictionaryCreator.getNumBytesPerEntry());
throw e;
}
+
+ if (bloomFilterColumns.contains(columnName)) {
Review Comment:
(MAJOR) bloom filter can be applied to both dictionary encoded and raw index
columns.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]