SubhamSinghal opened a new issue, #21320:
URL: https://github.com/apache/datafusion/issues/21320
### Is your feature request related to a problem or challenge?
The `PropagateEmptyRelation` optimizer rule correctly handles inner joins,
semi joins, and anti joins when one or both sides
are an `EmptyRelation`. However, for outer joins (LEFT, RIGHT, FULL), when
the non-preserved side is empty, the rule leaves the
join untouched.
For example, a `LEFT JOIN` where the right side is an `EmptyRelation`:
Left Join (orders.id = returns.order_id)
├── TableScan: orders ← has rows
└── EmptyRelation ← 0 rows
The result is always all left rows with NULLs on the right — but the
engine still builds a hash table for the empty side,
probes every row, finds zero matches, and then pads NULLs. The join
operator is entirely wasted work.
There is an existing TODO in `propagate_empty_relation.rs` describing this
gap:
```rust
// TODO: For LeftOut/Full Join, if the right side is empty, the Join can
be eliminated
// with a Projection with left side columns + right side columns replaced
with null values.
// For RightOut/Full Join, if the left side is empty, the Join can be
eliminated
// with a Projection with right side columns + left side columns replaced
with null values.
```
### Describe the solution you'd like
Replace the outer join with a Projection that keeps the surviving side's
columns and substitutes CAST(NULL AS <type>) for each
column on the empty side:
Projection(orders.id, orders.amount, CAST(NULL AS Int64) AS order_id,
CAST(NULL AS Utf8) AS reason)
└── TableScan: orders
### Describe alternatives you've considered
Replace the outer join with a Projection that keeps the surviving side's
columns and substitutes CAST(NULL AS <type>) for each
column on the empty side:
Projection(orders.id, orders.amount, CAST(NULL AS Int64) AS order_id,
CAST(NULL AS Utf8) AS reason)
└── TableScan: orders
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]