adriangb commented on code in PR #9628:
URL: https://github.com/apache/arrow-rs/pull/9628#discussion_r3035633245


##########
parquet/src/file/properties.rs:
##########
@@ -53,8 +53,14 @@ pub const DEFAULT_CREATED_BY: &str = concat!("parquet-rs 
version ", env!("CARGO_
 pub const DEFAULT_COLUMN_INDEX_TRUNCATE_LENGTH: Option<usize> = Some(64);
 /// Default value for [`BloomFilterProperties::fpp`]
 pub const DEFAULT_BLOOM_FILTER_FPP: f64 = 0.05;
-/// Default value for [`BloomFilterProperties::ndv`]
-pub const DEFAULT_BLOOM_FILTER_NDV: u64 = 1_000_000_u64;
+/// Default value for [`BloomFilterProperties::ndv`].
+///
+/// Note: this is only the fallback default used when constructing 
[`BloomFilterProperties`]
+/// directly. When using [`WriterPropertiesBuilder`], columns with bloom 
filters enabled
+/// but without an explicit NDV will have their NDV resolved at build time to
+/// [`WriterProperties::max_row_group_row_count`], which may differ from this 
constant
+/// if the user configured a custom row group size.
+pub const DEFAULT_BLOOM_FILTER_NDV: u64 = DEFAULT_MAX_ROW_GROUP_ROW_COUNT as 
u64;

Review Comment:
   Note that this is also a slightly different number 
(`DEFAULT_MAX_ROW_GROUP_ROW_COUNT` is `1_048_576`, negligible and I think the 
old constant would have been rounded up to this anyway since the sizes need to 
be powers of 2).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to