singhpk234 commented on code in PR #11799:
URL: https://github.com/apache/iceberg/pull/11799#discussion_r1893197627


##########
parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java:
##########
@@ -266,6 +272,43 @@ private WriteBuilder createContextFunc(
       return this;
     }
 
+    private <T> void setBloomFilterConfig(
+        Context context,
+        MessageType parquetSchema,
+        BiConsumer<String, Boolean> withBloomFilterEnabled,
+        BiConsumer<String, Double> withBloomFilterFPP) {
+
+      Map<Integer, String> fieldIdToParquetPath =
+          parquetSchema.getColumns().stream()
+              .collect(
+                  Collectors.toMap(
+                      col -> col.getPrimitiveType().getId().intValue(),
+                      col -> String.join(".", col.getPath())));
+
+      context
+          .columnBloomFilterEnabled()
+          .forEach(
+              (colPath, isEnabled) -> {
+                Types.NestedField fieldId = schema.findField(colPath);
+                if (fieldId == null) {
+                  LOG.warn("Skipping bloom filter config for missing field: 
{}", colPath);
+                  return;
+                }
+
+                String parquetColumnPath = 
fieldIdToParquetPath.get(fieldId.fieldId());
+                if (parquetColumnPath == null) {
+                  LOG.warn("Skipping bloom filter config for missing field: 
{}", fieldId);

Review Comment:
   should we update this message to say something like 
   ```suggestion
                     LOG.warn("Skipping bloom filter config for field: {} due 
to missing parquetColumnPath for fieldId: {}", colPath, fiedId);
   ```
   
   mostly coming from the above log lines are identical mostly though at one 
part we add columnPath and the other we do fielId



##########
parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java:
##########
@@ -266,6 +272,43 @@ private WriteBuilder createContextFunc(
       return this;
     }
 
+    private <T> void setBloomFilterConfig(
+        Context context,
+        MessageType parquetSchema,
+        BiConsumer<String, Boolean> withBloomFilterEnabled,
+        BiConsumer<String, Double> withBloomFilterFPP) {
+
+      Map<Integer, String> fieldIdToParquetPath =
+          parquetSchema.getColumns().stream()
+              .collect(
+                  Collectors.toMap(
+                      col -> col.getPrimitiveType().getId().intValue(),
+                      col -> String.join(".", col.getPath())));
+
+      context
+          .columnBloomFilterEnabled()
+          .forEach(
+              (colPath, isEnabled) -> {

Review Comment:
   [question] do we need to do anything for isEnabled as false ? or can parquet 
pro-actively decide if it should have a BF for a column and this isEnabled as 
false serves as explicit deny ?



##########
parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java:
##########
@@ -266,6 +272,43 @@ private WriteBuilder createContextFunc(
       return this;
     }
 
+    private <T> void setBloomFilterConfig(
+        Context context,
+        MessageType parquetSchema,
+        BiConsumer<String, Boolean> withBloomFilterEnabled,
+        BiConsumer<String, Double> withBloomFilterFPP) {
+
+      Map<Integer, String> fieldIdToParquetPath =
+          parquetSchema.getColumns().stream()
+              .collect(
+                  Collectors.toMap(
+                      col -> col.getPrimitiveType().getId().intValue(),
+                      col -> String.join(".", col.getPath())));
+
+      context
+          .columnBloomFilterEnabled()
+          .forEach(
+              (colPath, isEnabled) -> {
+                Types.NestedField fieldId = schema.findField(colPath);

Review Comment:
   can we call this as field instead ?
   ```suggestion
                   Types.NestedField field = schema.findField(colPath);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to