Jackie-Jiang commented on code in PR #8601: URL: https://github.com/apache/pinot/pull/8601#discussion_r875258083
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java: ########## @@ -134,6 +139,20 @@ public void init(SegmentGeneratorConfig segmentCreationSpec, SegmentIndexCreatio invertedIndexColumns.add(columnName); } + Set<String> bloomFilterColumns = new HashSet<>(); + for (String columnName : _config.getBloomFilterCreationColumns()) { + Preconditions.checkState(schema.hasColumn(columnName), + "Cannot create bloom filter for column: %s because it is not in schema", columnName); + bloomFilterColumns.add(columnName); + } + + Set<String> rangeIndexColumns = new HashSet<>(); + for (String columnName : _config.getRangeIndexCreationColumns()) { + Preconditions.checkState(schema.hasColumn(columnName), + "Cannot create bloom filter for column: %s because it is not in schema", columnName); Review Comment: Update the error message ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java: ########## @@ -89,6 +92,8 @@ public class SegmentColumnarIndexCreator implements SegmentCreator { private SegmentGeneratorConfig _config; private Map<String, ColumnIndexCreationInfo> _indexCreationInfoMap; private final IndexCreatorProvider _indexCreatorProvider = IndexingOverrides.getIndexCreatorProvider(); + private final Map<String, BloomFilterCreator> _bloomFilterCreatorMap = new HashMap<>(); + private final Map<String, CombinedInvertedIndexCreator> _rangeIndexFilterCreatorMap = new HashMap<>(); Review Comment: (minor) Let's re-order the variables a little bit for readability. Suggest putting them between `invertedIndex` and `textIndex`. Keep the same order in the handling logic ########## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java: ########## @@ -120,6 +120,18 @@ public enum TimeColumnType { private SegmentZKPropsConfig _segmentZKPropsConfig; + private final List<String> _bloomFilterCreationColumns = new ArrayList<>(); Review Comment: (minor) Suggest moving these 2 list between `_invertedIndexCreationColumns` and `_textIndexCreationColumns`, also move the getters along with other getters ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java: ########## @@ -346,6 +384,37 @@ public void indexRow(GenericRow row) //get dictionaryCreator, will be null if column is not dictionaryEncoded SegmentDictionaryCreator dictionaryCreator = _dictionaryCreatorMap.get(columnName); + // bloom filter + BloomFilterCreator bloomFilterCreator = _bloomFilterCreatorMap.get(columnName); + if (bloomFilterCreator != null) { + bloomFilterCreator.add((String) columnValueToIndex); Review Comment: (MAJOR) ```suggestion bloomFilterCreator.add(columnValueToIndex.toString()); ``` ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java: ########## @@ -162,6 +181,9 @@ public void init(SegmentGeneratorConfig segmentCreationSpec, SegmentIndexCreatio } // Initialize creators for dictionary, forward index and inverted index + IndexingConfig indexingConfig = _config.getTableConfig().getIndexingConfig(); + int rangeIndexVersion = Review Comment: (minor) IndexingConfig can never be null ```suggestion int rangeIndexVersion = _config.getTableConfig().getIndexingConfig().getRangeIndexVersion(); ``` ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java: ########## @@ -215,6 +237,22 @@ public void init(SegmentGeneratorConfig segmentCreationSpec, SegmentIndexCreatio dictionaryCreator.getNumBytesPerEntry()); throw e; } + + if (bloomFilterColumns.contains(columnName)) { Review Comment: (MAJOR) bloom filter can be applied to both dictionary encoded and raw index columns. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org