Dandandan opened a new issue, #22184:
URL: https://github.com/apache/datafusion/issues/22184
### Describe the bug
The SQL frontend panics on any compound identifier of 6 or more parts that
doesn't match a registered column. Found by a libFuzzer target running over a
seed corpus extracted from `datafusion/sqllogictest/test_files/`.
### To Reproduce
```rust
use datafusion::prelude::SessionContext;
#[tokio::main]
async fn main() {
let ctx = SessionContext::new();
let _ = ctx.sql("SELECT zzz1.zzz2.zzz3.zzz4.zzz5.zzz6").await;
}
```
Panic:
```
thread 'main' panicked at datafusion/sql/src/expr/identifier.rs:219:65:
called `Result::unwrap()` on an `Err` value: Internal("Incorrect number of
identifiers: 6")
```
Triggers for **any** dotted identifier with ≥6 parts where no prefix matches
a registered column. 5-part identifiers are correctly handled via a
`not_impl_err!` early-return; 6+ parts fall through.
### Expected behavior
Return a `not_impl_err!` (same as the 5-part case) or any other proper
`Err`, not a panic. Public planner API should never panic on user-supplied SQL.
### Additional context
Root cause is in
[`datafusion/sql/src/expr/identifier.rs`](https://github.com/apache/datafusion/blob/main/datafusion/sql/src/expr/identifier.rs):
```rust
// line 186
if ids.len() == 5 { // ← only catches
len == 5
not_impl_err!("compound identifier: {ids:?}")
} else {
// ... loop over outer schemas, may not return ...
// line 217-219
// Safe unwrap as column name can never be empty or exceed the bounds
let (relation, column_name) =
form_identifier(&ids[0..ids.len()]).unwrap(); // ← panics for len
>= 5
...
}
```
`form_identifier` (line 282) returns `internal_err!("Incorrect number of
identifiers: {}", idents.len())` for any `len > 4`, so the `.unwrap()` is
unsound for `len >= 6` and the comment above it ("can never be empty or exceed
the bounds") is incorrect.
Minimal fixes (either works):
- Change line 186 from `if ids.len() == 5` to `if ids.len() >= 5`, **or**
- Change `.unwrap()` on line 219 to `?` and remove the misleading comment.
### Discovery
Found by a `cargo fuzz` target (`fuzz/fuzz_targets/sql_physical_plan.rs`)
seeded with SQL statements extracted from
`datafusion/sqllogictest/test_files/`. The fuzzer triggered it within the first
INITED pass once seeded — a `CopyPart` mutation chained nested-identifier seeds
together.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]