sgrebnov opened a new pull request, #21297:
URL: https://github.com/apache/datafusion/pull/21297
## Which issue does this PR close?
N/A
PR improves the unparser to handle plans where a `Filter` node is preserved
above a `TableScan` with pushed-down filters — either because
`supports_filters_pushdown` returns `Inexact`/`Unsupported`, or because the
plan was constructed manually.
## Rationale for this change
When `supports_filters_pushdown` returns `Inexact`, the optimizer preserves
a `Filter` node above the `TableScan`. The unparser's
`unparse_table_scan_pushdown` only handled `TableScan`, `SubqueryAlias`, and
`Projection` nodes — it did not handle this `SubqueryAlias(Filter(TableScan))`
shape, producing wrong column references and lost projections in the generated
SQL.
Additionally, `try_transform_to_simple_table_scan_with_filters` (used by the
JOIN handler) collected `Filter` node predicates without alias rewriting, while
`TableScan` filters were rewritten. This caused duplicate un-aliased predicates
in JOIN ON clauses (e.g. `nation.n_name` instead of `n1.n_name`), producing
invalid SQL for queries like TPC-H Q7.
Example 1:
**Before (incorrect):**
```sql
SELECT * FROM nation AS n1 WHERE (nation.n_name = 'FRANCE') AND
(nation.n_name = 'FRANCE')
```
**After (correct):**
```sql
SELECT n1.n_nationkey, n1.n_name FROM nation AS n1 WHERE (n1.n_name =
'FRANCE') AND (n1.n_name = 'FRANCE')
```
Example 2:
**Before (JOIN, incorrect — fails with "No field named nation.n_name"):**
```sql
... INNER JOIN nation AS n1 ON (supplier.s_nationkey = n1.n_nationkey)
AND ((n1.n_name = 'FRANCE') OR (n1.n_name = 'GERMANY'))
AND ((nation.n_name = 'FRANCE') OR (nation.n_name = 'GERMANY'))
```
**After (JOIN, correct — deduplicated):**
```sql
... INNER JOIN nation AS n1 ON (supplier.s_nationkey = n1.n_nationkey)
AND ((n1.n_name = 'FRANCE') OR (n1.n_name = 'GERMANY'))
```
Filters duplicated between the `Filter` node and `TableScan.filters` (common
with `Inexact` pushdown) appear redundantly in the generated SQL. No
optimization currently exists to detect and remove these duplicates in that
path. This does not affect query correctness and can be optimized separately.
## What changes are included in this PR?
1. Add a `LogicalPlan::Filter` arm to `unparse_table_scan_pushdown` —
recurse through the filter to the `TableScan`, then rewrite the filter
predicate's column references to use the alias via `TableAliasRewriter`. Skip
predicates containing subquery expressions
(`Exists`/`InSubquery`/`ScalarSubquery`) since `TableAliasRewriter` cannot
rewrite `OuterReferenceColumn` inside subquery `LogicalPlan`s — returning
`None` falls back to wrapping the plan as a derived table, preserving the
original table name for outer references.
2. In `try_transform_to_simple_table_scan_with_filters`, rewrite
already-collected `Filter` node predicates with `TableAliasRewriter` before
deduplicating against rewritten `TableScan` filters.
## Are these changes tested?
Yes. Added 4 regression tests and also verified end-to-end with all 22 TPC-H
queries against a `TableProvider` returning `Inexact` for
`supports_filters_pushdown`.
## Are there any user-facing changes?
No API changes. The unparser now generates correct SQL for
`SubqueryAlias(Filter(TableScan))` and `Filter(TableScan)` plans produced by
`Inexact` filter pushdown or constructed manually, including correct column
alias rewriting and filter deduplication in JOINs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]