mdashti opened a new pull request, #23173: URL: https://github.com/apache/datafusion/pull/23173
## Which issue does this close? Closes #23126. ## Rationale for this change `x NOT IN (subquery)` plans to a null-aware `LeftAnti` hash join (build = outer `x`, probe = subquery). Join dynamic filter pushdown pushes a bounds + membership filter, built from the build keys, onto the probe scan. That filter can prune every probe row. A null-aware `LeftAnti` reads an empty probe as a genuinely-empty subquery, so it emits build-side NULL rows that should drop: `NULL NOT IN (non-empty)` is UNKNOWN, not TRUE. The result is scan-dependent, so it's a silent correctness bug. A `VALUES` scan ignores the pushed filter and stays correct; a parquet scan applies it and is wrong. `#23103` (the probe-side NULL drop) is orthogonal; this is the build-side NULL. ## What changes are included in this PR? Skip join dynamic filter pushdown for a null-aware anti join when the build key can be NULL. The build-side NULL emission depends on whether the probe is truly empty, which the pushed filter can change by emptying it. A NOT NULL build key has no such NULL, so it keeps the pushdown. ## Are these changes tested? Yes. A `push_down_filter_parquet.slt` case reproduces it (build-side NULL, a non-matching parquet probe) and asserts the single correct row. Without the change it returns the extra NULL. ## Are there any user-facing changes? `NOT IN` over a parquet (or otherwise prunable) scan with a nullable outer key now returns correct results. Such joins lose the dynamic filter pushdown. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
