MartinSahlen opened a new pull request, #22310:
URL: https://github.com/apache/datafusion/pull/22310

    Which issue does this PR close?
   
     - Closes #.
   
     Rationale for this change
   
     UNNEST as a table factor previously rejected both WITH ORDINALITY 
(Postgres / SQL standard) and WITH OFFSET (BigQuery) with not_impl_err!. Both 
spellings express the same semantic need — emit a per-element position
     alongside the unnested value — and the SQL parser (sqlparser-rs) already 
produces a single AST that distinguishes them by keyword.
   
     Supporting both is justified by:
   
     - They're syntactic siblings, not alternatives. sqlparser-rs parses both 
into the same TableFactor::UNNEST. The physical execution and logical-plan 
shape are identical modulo a constant index base.
     - No SQL dialect supports both at once; each dialect picks one. Accepting 
both makes DataFusion a clean target for queries written against either 
Postgres/Trino (WITH ORDINALITY, 1-indexed) or BigQuery (WITH OFFSET,
     0-indexed), with no rewriting required.
     - The two keywords carry their standard semantics, not a configurable 
flag: WITH ORDINALITY is always 1-indexed (SQL:2003), WITH OFFSET is always 
0-indexed (BigQuery). This mirrors BigQuery's own precedent for array
     indexing (arr[OFFSET(0)] vs arr[ORDINAL(1)]) — keyword carries the 
semantics, no surprise.
     - The two are mutually exclusive in the same statement; the planner 
rejects them combined.
   
     What changes are included in this PR?
   
     - datafusion-common: IndexBase enum (Zero / One), PositionColumn { name, 
base }, UnnestOptions.position: Option<PositionColumn>, builder method 
with_position.
     - Logical plan: Unnest::try_new appends a nullable Int64 field to the 
output schema when options.position is set.
     - SQL planner (relation/mod.rs): handles WITH ORDINALITY and WITH OFFSET 
[alias], defaults position column to "ordinality" / "offset", rejects the 
both-at-once combination. Postgres-style AS t(v, ord) column-list aliasing
     works through the existing alias mechanism.
     - try_process_unnest threads the position option through to 
unnest_columns_with_options and projects the position column on the outer 
SELECT.
     - Physical plan (UnnestExec): create_position_indices materializes the 
position column at the leaf unnest level, using the supplied IndexBase.
     - Proto: new IndexBase enum and PositionColumn message; 
UnnestOptions.position round-trips end-to-end through to_proto / from_proto.
     - Not included: unparser support (kept best-effort; can be added later 
based on feedback).
   
     Are these changes tested?
   
     Yes:
   
     - datafusion/sqllogictest/test_files/unnest.slt: execution result tests 
for WITH OFFSET, WITH OFFSET <alias>, WITH ORDINALITY, WITH ORDINALITY AS t(v, 
ord), plus both error cases.
     - datafusion/physical-plan/src/unnest.rs: unit tests for 
create_position_indices (0-indexed and 1-indexed).
     - datafusion/proto/tests/cases/roundtrip_logical_plan.rs: logical-plan 
proto round-trip tests for both spellings.
     - The prior expected-failure cases at unnest.slt:425-430 are now positive 
cases.
   
     Are there any user-facing changes?
   
     Yes — the SQL surface accepts UNNEST(...) WITH ORDINALITY [AS 
t(value_alias, ord_alias)] and UNNEST(...) WITH OFFSET [alias] in the FROM 
clause. The UnnestOptions public type gains a position field (additive, default
     None); existing DataFrame callers are source-compatible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to