alamb commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3156498869
##########
datafusion/datasource-parquet/src/row_group_filter.rs:
##########
@@ -357,13 +366,38 @@ impl RowGroupAccessPlanFilter {
return;
};
+ // Collect unique column names referenced by the predicate so we can
+ // check for NULLs. Rows with NULL predicate columns evaluate to NULL
+ // (not true), so a row group with NULLs cannot be "fully matched."
+ let predicate_columns =
+
datafusion_physical_expr::utils::collect_columns(predicate.orig_expr());
+
+ let null_count_converters: Vec<StatisticsConverter> = predicate_columns
+ .iter()
+ .filter_map(|col| {
+ StatisticsConverter::try_new(col.name(), arrow_schema,
parquet_schema)
Review Comment:
We should probably set this option to `false` (it defaults to true) to be
super safe:
```
pub fn with_missing_null_counts_as_zero(mut self,
missing_null_counts_as_zero: bool) -> Self
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]