Dandandan opened a new issue, #22188:
URL: https://github.com/apache/datafusion/issues/22188
### Describe the bug
`generate_series` and `range` panic with `capacity overflow` when given an
integer range so large the count exceeds `isize::MAX` bytes. The panic comes
from `Vec::reserve` inside the integer-range implementation, hit during
planning (constant folding of the table-valued function).
### To Reproduce
```rust
use datafusion::prelude::SessionContext;
#[tokio::main]
async fn main() {
let ctx = SessionContext::new();
let _ = ctx
.sql("SELECT generate_series(0, 9223372036854775807)")
.await
.unwrap()
.create_physical_plan()
.await;
}
```
Panic:
```
thread 'main' panicked at .../alloc/src/raw_vec/mod.rs:28:5:
capacity overflow
```
Also reproduces with:
- `SELECT range(0, 9223372036854775807)`
- `SELECT range(9223372036854775807)`
- `SELECT generate_series(-9223372036854775808, 9223372036854775807)`
Bounded ranges like `SELECT generate_series(1, 100)` are fine.
### Expected behavior
Return a planning/execution error along the lines of "range too large to
materialize" (or, ideally, a streaming implementation that does not need to
materialize the full sequence eagerly). The public SQL API should never panic
on user-supplied SQL.
### Root cause
[`datafusion/functions-nested/src/range.rs`](https://github.com/apache/datafusion/blob/main/datafusion/functions-nested/src/range.rs),
in `generate_range_values`:
```rust
// line 563-565 (step > 0 branch)
let count =
(start.abs_diff(limit) / step.unsigned_abs()).saturating_add(1) as usize;
values.reserve(count); // ← panics here
// line 583-585 (step < 0 branch — identical pattern)
let count =
(start.abs_diff(limit) / step.unsigned_abs()).saturating_add(1) as usize;
values.reserve(count);
```
For `generate_series(0, i64::MAX, 1)` the `count` is ~`u64::MAX/8` (after
`saturating_add(1)`), which on a 64-bit target turns into a `usize` of ~`9.2 ×
10^18`. `Vec::<i64>::reserve` multiplies by `size_of::<i64>() = 8`, sees that
exceeds `isize::MAX`, and panics.
### Suggested fix
Bound `count` at allocation time:
```rust
const MAX_RANGE_ELEMENTS: usize = isize::MAX as usize /
std::mem::size_of::<i64>();
if count > MAX_RANGE_ELEMENTS {
return exec_err!(
"range too large: would produce {count} elements (max
{MAX_RANGE_ELEMENTS})"
);
}
values.reserve(count);
```
A friendlier limit (say, 1 GiB / 8 B = 128 M elements, configurable) would
also stop this from being a memory-exhaustion DoS.
### Additional context
Found by a `cargo fuzz` target (`fuzz/fuzz_targets/sql_physical_plan.rs`)
seeded with SQL extracted from `datafusion/sqllogictest/test_files/`. The
fuzzer triggered it from a mutated `generate_series` example by replacing a
small numeric literal with `9223372036854775807` (`i64::MAX`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]