adriangb commented on code in PR #9628:
URL: https://github.com/apache/arrow-rs/pull/9628#discussion_r3035633245
##########
parquet/src/file/properties.rs:
##########
@@ -53,8 +53,14 @@ pub const DEFAULT_CREATED_BY: &str = concat!("parquet-rs
version ", env!("CARGO_
pub const DEFAULT_COLUMN_INDEX_TRUNCATE_LENGTH: Option<usize> = Some(64);
/// Default value for [`BloomFilterProperties::fpp`]
pub const DEFAULT_BLOOM_FILTER_FPP: f64 = 0.05;
-/// Default value for [`BloomFilterProperties::ndv`]
-pub const DEFAULT_BLOOM_FILTER_NDV: u64 = 1_000_000_u64;
+/// Default value for [`BloomFilterProperties::ndv`].
+///
+/// Note: this is only the fallback default used when constructing
[`BloomFilterProperties`]
+/// directly. When using [`WriterPropertiesBuilder`], columns with bloom
filters enabled
+/// but without an explicit NDV will have their NDV resolved at build time to
+/// [`WriterProperties::max_row_group_row_count`], which may differ from this
constant
+/// if the user configured a custom row group size.
+pub const DEFAULT_BLOOM_FILTER_NDV: u64 = DEFAULT_MAX_ROW_GROUP_ROW_COUNT as
u64;
Review Comment:
Note that this is also a slightly different number
(`DEFAULT_MAX_ROW_GROUP_ROW_COUNT` is `1_048_576`, negligible and I think the
old constant would have been rounded up to this anyway since the sizes need to
be powers of 2).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]