Re: [PR] Add table property to disable/enable parquet column statistics #12770 [iceberg]

via GitHub Tue, 22 Apr 2025 23:42:25 -0700


szehon-ho commented on code in PR #12771:
URL: https://github.com/apache/iceberg/pull/12771#discussion_r2055306628



##########
parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java:
##########
@@ -306,33 +309,35 @@ private WriteBuilder createContextFunc(
       return this;
     }
 
+    // Utility method to get the column path
+    private String getParquetColumnPath(
+        Map<Integer, String> fieldIdToParquetPath, String colPath, String 
configStr) {
+      Types.NestedField fieldId = schema.findField(colPath);

Review Comment:
   nit: should it be called 'field' instead of a fieldId?



##########
docs/docs/configuration.md:
##########
@@ -52,6 +52,8 @@ Iceberg tables support table properties to configure table 
behavior, like the de
 | write.parquet.bloom-filter-enabled.column.col1       | (not set)             
      | Hint to parquet to write a bloom filter for the column: 'col1'          
                                                                                
                                          |
 | write.parquet.bloom-filter-max-bytes                 | 1048576 (1 MB)        
      | The maximum number of bytes for a bloom filter bitset                   
                                                                                
                                          |
 | write.parquet.bloom-filter-fpp.column.col1           | 0.01                  
      | The false positive probability for a bloom filter applied to 'col1' 
(must > 0.0 and < 1.0)                                                          
                                              |
+| write.parquet.stats-enabled.default                  | true                  
      | Controls whether to collect parquet column statistics when not 
specified on column level, can be configured to false with 
[parquet-java#3189](https://github.com/apache/parquet-java/issues/3188) |

Review Comment:
   nit: should it be consistent and use 3189 even for link?



##########
parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java:
##########
@@ -306,33 +309,35 @@ private WriteBuilder createContextFunc(
       return this;
     }
 
+    // Utility method to get the column path
+    private String getParquetColumnPath(

Review Comment:
   I was wondering, the field Id seems not used, couldnt we just make a map of 
column name to column path?
   
   
   ```    
    Map<String, String> colNameToPaths =
             parquetSchema.getColumns().stream()
                 .filter(col -> col.getPrimitiveType().getId() != null)
                 .collect(
                     Collectors.toMap(
                         col ->  {
                                int fieldId = 
col.getPrimitiveType().getId().intValue();
                                return schema.findColumnName(fieldId);
                           },
                         col -> String.join(".", col.getPath())));
   ```
   
   Worth to double check the logic, i think this would make the logic simpler 
if it is correct?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add table property to disable/enable parquet column statistics #12770 [iceberg]

Reply via email to