MartinSahlen opened a new pull request, #22310:
URL: https://github.com/apache/datafusion/pull/22310
Which issue does this PR close?
- Closes #.
Rationale for this change
UNNEST as a table factor previously rejected both WITH ORDINALITY
(Postgres / SQL standard) and WITH OFFSET (BigQuery) with not_impl_err!. Both
spellings express the same semantic need — emit a per-element position
alongside the unnested value — and the SQL parser (sqlparser-rs) already
produces a single AST that distinguishes them by keyword.
Supporting both is justified by:
- They're syntactic siblings, not alternatives. sqlparser-rs parses both
into the same TableFactor::UNNEST. The physical execution and logical-plan
shape are identical modulo a constant index base.
- No SQL dialect supports both at once; each dialect picks one. Accepting
both makes DataFusion a clean target for queries written against either
Postgres/Trino (WITH ORDINALITY, 1-indexed) or BigQuery (WITH OFFSET,
0-indexed), with no rewriting required.
- The two keywords carry their standard semantics, not a configurable
flag: WITH ORDINALITY is always 1-indexed (SQL:2003), WITH OFFSET is always
0-indexed (BigQuery). This mirrors BigQuery's own precedent for array
indexing (arr[OFFSET(0)] vs arr[ORDINAL(1)]) — keyword carries the
semantics, no surprise.
- The two are mutually exclusive in the same statement; the planner
rejects them combined.
What changes are included in this PR?
- datafusion-common: IndexBase enum (Zero / One), PositionColumn { name,
base }, UnnestOptions.position: Option<PositionColumn>, builder method
with_position.
- Logical plan: Unnest::try_new appends a nullable Int64 field to the
output schema when options.position is set.
- SQL planner (relation/mod.rs): handles WITH ORDINALITY and WITH OFFSET
[alias], defaults position column to "ordinality" / "offset", rejects the
both-at-once combination. Postgres-style AS t(v, ord) column-list aliasing
works through the existing alias mechanism.
- try_process_unnest threads the position option through to
unnest_columns_with_options and projects the position column on the outer
SELECT.
- Physical plan (UnnestExec): create_position_indices materializes the
position column at the leaf unnest level, using the supplied IndexBase.
- Proto: new IndexBase enum and PositionColumn message;
UnnestOptions.position round-trips end-to-end through to_proto / from_proto.
- Not included: unparser support (kept best-effort; can be added later
based on feedback).
Are these changes tested?
Yes:
- datafusion/sqllogictest/test_files/unnest.slt: execution result tests
for WITH OFFSET, WITH OFFSET <alias>, WITH ORDINALITY, WITH ORDINALITY AS t(v,
ord), plus both error cases.
- datafusion/physical-plan/src/unnest.rs: unit tests for
create_position_indices (0-indexed and 1-indexed).
- datafusion/proto/tests/cases/roundtrip_logical_plan.rs: logical-plan
proto round-trip tests for both spellings.
- The prior expected-failure cases at unnest.slt:425-430 are now positive
cases.
Are there any user-facing changes?
Yes — the SQL surface accepts UNNEST(...) WITH ORDINALITY [AS
t(value_alias, ord_alias)] and UNNEST(...) WITH OFFSET [alias] in the FROM
clause. The UnnestOptions public type gains a position field (additive, default
None); existing DataFrame callers are source-compatible.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]