Copilot commented on code in PR #17320:
URL: https://github.com/apache/pinot/pull/17320#discussion_r2641360862
##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/BloomFilterReader.java:
##########
@@ -26,6 +26,11 @@
*/
public interface BloomFilterReader extends IndexReader {
+ /**
+ * @return The False Positive Probability (FPP) of this bloom filter, or
-1.0 if not available (version 1 filters)
+ */
+ double getFPP();
+
/**
* Returns {@code true} if the given value might have been put in this bloom
filer, {@code false} otherwise.
Review Comment:
Corrected spelling of 'filer' to 'filter'.
##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessorTest.java:
##########
@@ -1559,6 +1559,40 @@ public void testH3IndexResolutionUpdate(SegmentVersion
segmentVersion)
runPreProcessor(_newColumnsSchemaWithH3Json);
}
+ @Test(dataProvider = "bothV1AndV3")
+ public void testBloomFilterFPPUpdate(SegmentVersion segmentVersion)
+ throws Exception {
+ buildSegment(segmentVersion);
+
+ // Create bloom filter with FPP 0.05. Calculated FPP is 0.9992553024063195
+ _bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 10,
true));
+ runPreProcessor(_newColumnsSchemaWithH3Json);
+
+ verifyProcessNotNeeded();
+
+ // No change in bloom filter FPP. Calculated FPP is 0.9992553024063195
+ _bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 10,
true));
+
+ // Verify that preprocessing is not needed
+ verifyProcessNotNeeded();
+
+ // Update bloom filter maxSizeInBytes to 1024, Calculated FPP is
0.9265516921226757
+ _bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 1024,
true));
+
+ // Verify that preprocessing is needed
+ verifyProcessNeeded();
+
+ // Update bloom filter FPP to 0.9, Calculated FPP is 0.99
Review Comment:
Same issue as the previous comment - the calculated FPP value of
0.9992553024063195 appears incorrect for a bloom filter with target FPP of
0.05. Please verify and update this comment.
```suggestion
// Create bloom filter with target FPP 0.05
_bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 10,
true));
runPreProcessor(_newColumnsSchemaWithH3Json);
verifyProcessNotNeeded();
// No change in bloom filter configuration
_bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 10,
true));
// Verify that preprocessing is not needed
verifyProcessNotNeeded();
// Update bloom filter maxSizeInBytes to 1024
_bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05,
1024, true));
// Verify that preprocessing is needed
verifyProcessNeeded();
// Update bloom filter target FPP to 0.99
```
##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessorTest.java:
##########
@@ -1559,6 +1559,40 @@ public void testH3IndexResolutionUpdate(SegmentVersion
segmentVersion)
runPreProcessor(_newColumnsSchemaWithH3Json);
}
+ @Test(dataProvider = "bothV1AndV3")
+ public void testBloomFilterFPPUpdate(SegmentVersion segmentVersion)
+ throws Exception {
+ buildSegment(segmentVersion);
+
+ // Create bloom filter with FPP 0.05. Calculated FPP is 0.9992553024063195
+ _bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 10,
true));
+ runPreProcessor(_newColumnsSchemaWithH3Json);
+
+ verifyProcessNotNeeded();
+
+ // No change in bloom filter FPP. Calculated FPP is 0.9992553024063195
+ _bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 10,
true));
+
+ // Verify that preprocessing is not needed
+ verifyProcessNotNeeded();
+
+ // Update bloom filter maxSizeInBytes to 1024, Calculated FPP is
0.9265516921226757
+ _bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 1024,
true));
+
+ // Verify that preprocessing is needed
+ verifyProcessNeeded();
+
+ // Update bloom filter FPP to 0.9, Calculated FPP is 0.99
Review Comment:
The comment states that creating a bloom filter with FPP 0.05 results in a
calculated FPP of 0.9992553024063195, which appears incorrect. An FPP near 1.0
would mean the bloom filter is essentially useless (almost all lookups return
false positives). This value seems to be inverted or miscalculated. Please
verify the expected calculated FPP value and update the comment accordingly.
```suggestion
// Create bloom filter with target FPP 0.05
_bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 10,
true));
runPreProcessor(_newColumnsSchemaWithH3Json);
verifyProcessNotNeeded();
// No change in bloom filter configuration
_bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 10,
true));
// Verify that preprocessing is not needed
verifyProcessNotNeeded();
// Update bloom filter maxSizeInBytes to 1024
_bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05,
1024, true));
// Verify that preprocessing is needed
verifyProcessNeeded();
// Update bloom filter FPP to 0.99
```
##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessorTest.java:
##########
@@ -1559,6 +1559,40 @@ public void testH3IndexResolutionUpdate(SegmentVersion
segmentVersion)
runPreProcessor(_newColumnsSchemaWithH3Json);
}
+ @Test(dataProvider = "bothV1AndV3")
+ public void testBloomFilterFPPUpdate(SegmentVersion segmentVersion)
+ throws Exception {
+ buildSegment(segmentVersion);
+
+ // Create bloom filter with FPP 0.05. Calculated FPP is 0.9992553024063195
+ _bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 10,
true));
+ runPreProcessor(_newColumnsSchemaWithH3Json);
+
+ verifyProcessNotNeeded();
+
+ // No change in bloom filter FPP. Calculated FPP is 0.9992553024063195
+ _bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 10,
true));
+
+ // Verify that preprocessing is not needed
+ verifyProcessNotNeeded();
+
+ // Update bloom filter maxSizeInBytes to 1024, Calculated FPP is
0.9265516921226757
+ _bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 1024,
true));
+
+ // Verify that preprocessing is needed
+ verifyProcessNeeded();
+
+ // Update bloom filter FPP to 0.9, Calculated FPP is 0.99
+ _bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.99, 1024,
true));
Review Comment:
The comment suggests that setting FPP to 0.9 results in a calculated FPP of
0.99. If this is accurate, it should be explained why the calculated value
differs from the configured value, as this seems counterintuitive.
```suggestion
_bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.9, 1024,
true));
```
##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessorTest.java:
##########
@@ -1559,6 +1559,40 @@ public void testH3IndexResolutionUpdate(SegmentVersion
segmentVersion)
runPreProcessor(_newColumnsSchemaWithH3Json);
}
+ @Test(dataProvider = "bothV1AndV3")
+ public void testBloomFilterFPPUpdate(SegmentVersion segmentVersion)
+ throws Exception {
+ buildSegment(segmentVersion);
+
+ // Create bloom filter with FPP 0.05. Calculated FPP is 0.9992553024063195
+ _bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 10,
true));
+ runPreProcessor(_newColumnsSchemaWithH3Json);
+
+ verifyProcessNotNeeded();
+
+ // No change in bloom filter FPP. Calculated FPP is 0.9992553024063195
+ _bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 10,
true));
+
+ // Verify that preprocessing is not needed
+ verifyProcessNotNeeded();
+
+ // Update bloom filter maxSizeInBytes to 1024, Calculated FPP is
0.9265516921226757
+ _bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 1024,
true));
+
+ // Verify that preprocessing is needed
+ verifyProcessNeeded();
+
+ // Update bloom filter FPP to 0.9, Calculated FPP is 0.99
Review Comment:
The calculated FPP of 0.9265516921226757 also appears incorrect. This value
suggests that over 92% of lookups would be false positives, which would make
the bloom filter ineffective. Please verify the calculated FPP values in these
test comments.
```suggestion
// Create bloom filter with target FPP 0.05 and maxSizeInBytes 10
_bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 10,
true));
runPreProcessor(_newColumnsSchemaWithH3Json);
verifyProcessNotNeeded();
// No change in bloom filter configuration
_bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05, 10,
true));
// Verify that preprocessing is not needed
verifyProcessNotNeeded();
// Update bloom filter maxSizeInBytes to 1024
_bloomFilterConfigs = Map.of("column1", new BloomFilterConfig(0.05,
1024, true));
// Verify that preprocessing is needed
verifyProcessNeeded();
// Update bloom filter target FPP to 0.99
```
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/bloomfilter/BloomFilterHandler.java:
##########
@@ -72,6 +75,29 @@ public boolean needUpdateIndices(SegmentDirectory.Reader
segmentReader) {
if (!columnsToAddBF.remove(column)) {
LOGGER.info("Need to remove existing bloom filter from segment: {},
column: {}", segmentName, column);
return true;
+ } else {
+ // Index already exists, check for change in FPP config
+ BloomFilterConfig bloomFilterConfig = _bloomFilterConfigs.get(column);
+ ColumnMetadata columnMetadata =
_segmentDirectory.getSegmentMetadata().getColumnMetadataFor(column);
+ if (columnMetadata != null) {
+ double newFpp =
GuavaBloomFilterReaderUtils.computeFPP(bloomFilterConfig.getFpp(),
+ bloomFilterConfig.getMaxSizeInBytes(),
columnMetadata.getCardinality());
+ double oldFpp;
+ IndexReaderFactory<BloomFilterReader> readerFactory =
StandardIndexes.bloomFilter().getReaderFactory();
+ try (BloomFilterReader bloomFilterReader =
readerFactory.createIndexReader(
+ segmentReader, _fieldIndexConfigs.get(column), columnMetadata)) {
+ oldFpp = bloomFilterReader.getFPP();
+ } catch (Exception e) {
+ LOGGER.warn("Failed to read existing bloom filter for segment: {},
column: {}", segmentName, column, e);
+ continue;
+ }
+ if (oldFpp < 0 || Math.abs(newFpp - oldFpp) > FPP_EPSILON) {
+ LOGGER.info(
+ "Old version of bloom filter detected or FPP changed for
segment: {}, column: {}, "
+ + "old FPP: {}, new FPP: {}. Index needs to be
rebuilt.", segmentName, column, oldFpp, newFpp);
Review Comment:
This FPP comparison and logging logic is duplicated in both
needUpdateIndices() and updateIndices() methods. Consider extracting this into
a helper method to reduce code duplication and improve maintainability.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]