sgrebnov opened a new pull request, #21297:
URL: https://github.com/apache/datafusion/pull/21297

   ## Which issue does this PR close?
   
   N/A
   
   PR improves the unparser to handle plans where a `Filter` node is preserved 
above a `TableScan` with pushed-down filters — either because 
`supports_filters_pushdown` returns `Inexact`/`Unsupported`, or because the 
plan was constructed manually.
   
   ## Rationale for this change
   
   When `supports_filters_pushdown` returns `Inexact`, the optimizer preserves 
a `Filter` node above the `TableScan`. The unparser's 
`unparse_table_scan_pushdown` only handled `TableScan`, `SubqueryAlias`, and 
`Projection` nodes — it did not handle this `SubqueryAlias(Filter(TableScan))` 
shape, producing wrong column references and lost projections in the generated 
SQL.
   
   Additionally, `try_transform_to_simple_table_scan_with_filters` (used by the 
JOIN handler) collected `Filter` node predicates without alias rewriting, while 
`TableScan` filters were rewritten. This caused duplicate un-aliased predicates 
in JOIN ON clauses (e.g. `nation.n_name` instead of `n1.n_name`), producing 
invalid SQL for queries like TPC-H Q7.
   
   Example 1:
   
   **Before (incorrect):**
   ```sql
   SELECT * FROM nation AS n1 WHERE (nation.n_name = 'FRANCE') AND 
(nation.n_name = 'FRANCE')
   ```
   
   **After (correct):**
   ```sql
   SELECT n1.n_nationkey, n1.n_name FROM nation AS n1 WHERE (n1.n_name = 
'FRANCE') AND (n1.n_name = 'FRANCE')
   ```
   
   Example 2:
   
   **Before (JOIN, incorrect — fails with "No field named nation.n_name"):**
   ```sql
   ... INNER JOIN nation AS n1 ON (supplier.s_nationkey = n1.n_nationkey)
     AND ((n1.n_name = 'FRANCE') OR (n1.n_name = 'GERMANY'))
     AND ((nation.n_name = 'FRANCE') OR (nation.n_name = 'GERMANY'))
   ```
   
   **After (JOIN, correct — deduplicated):**
   ```sql
   ... INNER JOIN nation AS n1 ON (supplier.s_nationkey = n1.n_nationkey)
     AND ((n1.n_name = 'FRANCE') OR (n1.n_name = 'GERMANY'))
   ```
   
   Filters duplicated between the `Filter` node and `TableScan.filters` (common 
with `Inexact` pushdown) appear redundantly in the generated SQL. No 
optimization currently exists to detect and remove these duplicates in that 
path. This does not affect query correctness and can be optimized separately.
   
   ## What changes are included in this PR?
   
   1. Add a `LogicalPlan::Filter` arm to `unparse_table_scan_pushdown` — 
recurse through the filter to the `TableScan`, then rewrite the filter 
predicate's column references to use the alias via `TableAliasRewriter`. Skip 
predicates containing subquery expressions 
(`Exists`/`InSubquery`/`ScalarSubquery`) since `TableAliasRewriter` cannot 
rewrite `OuterReferenceColumn` inside subquery `LogicalPlan`s — returning 
`None` falls back to wrapping the plan as a derived table, preserving the 
original table name for outer references.
   
   2. In `try_transform_to_simple_table_scan_with_filters`, rewrite 
already-collected `Filter` node predicates with `TableAliasRewriter` before 
deduplicating against rewritten `TableScan` filters.
   
   ## Are these changes tested?
   
   Yes. Added 4 regression tests and also verified end-to-end with all 22 TPC-H 
queries against a `TableProvider` returning `Inexact` for 
`supports_filters_pushdown`.
   
   ## Are there any user-facing changes?
   
   No API changes. The unparser now generates correct SQL for 
`SubqueryAlias(Filter(TableScan))` and `Filter(TableScan)` plans produced by 
`Inexact` filter pushdown or constructed manually, including correct column 
alias rewriting and filter deduplication in JOINs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to