xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165248469


##########
datafusion/datasource-parquet/src/row_group_filter.rs:
##########
@@ -357,13 +366,38 @@ impl RowGroupAccessPlanFilter {
             return;
         };
 
+        // Collect unique column names referenced by the predicate so we can
+        // check for NULLs. Rows with NULL predicate columns evaluate to NULL
+        // (not true), so a row group with NULLs cannot be "fully matched."
+        let predicate_columns =
+            
datafusion_physical_expr::utils::collect_columns(predicate.orig_expr());
+
+        let null_count_converters: Vec<StatisticsConverter> = predicate_columns
+            .iter()
+            .filter_map(|col| {
+                StatisticsConverter::try_new(col.name(), arrow_schema, 
parquet_schema)

Review Comment:
   THe PR https://github.com/apache/datafusion/pull/21907 uses a different way 
by adding IS NULL checks for nullable columns referenced by the predicate 
before evaluating the inverted pruning predicate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to