singhpk234 commented on code in PR #11799: URL: https://github.com/apache/iceberg/pull/11799#discussion_r1893197627
########## parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java: ########## @@ -266,6 +272,43 @@ private WriteBuilder createContextFunc( return this; } + private <T> void setBloomFilterConfig( + Context context, + MessageType parquetSchema, + BiConsumer<String, Boolean> withBloomFilterEnabled, + BiConsumer<String, Double> withBloomFilterFPP) { + + Map<Integer, String> fieldIdToParquetPath = + parquetSchema.getColumns().stream() + .collect( + Collectors.toMap( + col -> col.getPrimitiveType().getId().intValue(), + col -> String.join(".", col.getPath()))); + + context + .columnBloomFilterEnabled() + .forEach( + (colPath, isEnabled) -> { + Types.NestedField fieldId = schema.findField(colPath); + if (fieldId == null) { + LOG.warn("Skipping bloom filter config for missing field: {}", colPath); + return; + } + + String parquetColumnPath = fieldIdToParquetPath.get(fieldId.fieldId()); + if (parquetColumnPath == null) { + LOG.warn("Skipping bloom filter config for missing field: {}", fieldId); Review Comment: should we update this message to say something like ```suggestion LOG.warn("Skipping bloom filter config for field: {} due to missing parquetColumnPath for fieldId: {}", colPath, fiedId); ``` mostly coming from the above log lines are identical mostly though at one part we add columnPath and the other we do fielId ########## parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java: ########## @@ -266,6 +272,43 @@ private WriteBuilder createContextFunc( return this; } + private <T> void setBloomFilterConfig( + Context context, + MessageType parquetSchema, + BiConsumer<String, Boolean> withBloomFilterEnabled, + BiConsumer<String, Double> withBloomFilterFPP) { + + Map<Integer, String> fieldIdToParquetPath = + parquetSchema.getColumns().stream() + .collect( + Collectors.toMap( + col -> col.getPrimitiveType().getId().intValue(), + col -> String.join(".", col.getPath()))); + + context + .columnBloomFilterEnabled() + .forEach( + (colPath, isEnabled) -> { Review Comment: [question] do we need to do anything for isEnabled as false ? or can parquet pro-actively decide if it should have a BF for a column and this isEnabled as false serves as explicit deny ? ########## parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java: ########## @@ -266,6 +272,43 @@ private WriteBuilder createContextFunc( return this; } + private <T> void setBloomFilterConfig( + Context context, + MessageType parquetSchema, + BiConsumer<String, Boolean> withBloomFilterEnabled, + BiConsumer<String, Double> withBloomFilterFPP) { + + Map<Integer, String> fieldIdToParquetPath = + parquetSchema.getColumns().stream() + .collect( + Collectors.toMap( + col -> col.getPrimitiveType().getId().intValue(), + col -> String.join(".", col.getPath()))); + + context + .columnBloomFilterEnabled() + .forEach( + (colPath, isEnabled) -> { + Types.NestedField fieldId = schema.findField(colPath); Review Comment: can we call this as field instead ? ```suggestion Types.NestedField field = schema.findField(colPath); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org