adriangb opened a new pull request, #21509:
URL: https://github.com/apache/datafusion/pull/21509

   ## Which issue does this PR close?
   
   <!-- We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` 
indicates that this PR will close issue #123. -->
   
   - N/A (small additive enhancement)
   
   ## Rationale for this change
   
   DataFusion already exposes `arrow_metadata(expr[, key])` for **reading** 
Arrow field metadata, but has no way to **attach** metadata to a column from 
SQL or the `Expr` DSL. Arrow field metadata is useful for propagating 
annotations (units, semantic types, provenance, downstream hints) through a 
query plan without materializing an extra value column.
   
   This PR adds `with_metadata`, the symmetric counterpart to `arrow_metadata`.
   
   ## What changes are included in this PR?
   
   A new core scalar UDF `with_metadata(expr, 'k1', 'v1'[, 'k2', 'v2', ...])`:
   
   - **Value semantics:** pure pass-through of the first argument.
   - **Schema semantics:** returns a `FieldRef` whose metadata is the input 
field's metadata merged with the supplied key/value pairs; new keys overwrite 
on collision. Input field **name**, **data type**, and **nullability** are 
preserved, so `with_metadata(col, ...)` behaves as a transparent annotation.
   - **Syntax:** variadic key/value literal pairs, modelled after 
`named_struct`. Chosen over a list-of-pairs form because SQL lacks a tuple 
literal and programmatic callers can simply splat an alternating `Vec<Expr>` of 
literals.
   - **Validation:** at planning time in `return_field_from_args`. Requires an 
odd arg count ≥ 3; each key must be a non-empty constant string; each value 
must be a constant string.
   
   Example usage:
   
   ```sql
   -- attach one key
   select arrow_metadata(with_metadata(id, 'unit', 'ms'), 'unit') from t;
   -- ms
   
   -- attach several and read the full map
   select arrow_metadata(with_metadata(id, 'unit', 'ms', 'source', 'sensor')) 
from t;
   -- {metadata_key: the id field, source: sensor, unit: ms}
   
   -- nesting composes; outer keys win on collision
   select arrow_metadata(with_metadata(with_metadata(id, 'a', '1'), 'b', '2')) 
from t;
   ```
   
   Files touched:
   - `datafusion/functions/src/core/with_metadata.rs` (new) — UDF impl + unit 
tests
   - `datafusion/functions/src/core/mod.rs` — registration in `functions()`, 
`make_udf_function!`, and `expr_fn`
   - `datafusion/sqllogictest/test_files/metadata.slt` — SQL-level coverage 
(merge, overwrite, nesting, pass-through, error cases)
   - `docs/source/user-guide/sql/scalar_functions.md` — regenerated via 
`dev/update_function_docs.sh`
   
   ## Are these changes tested?
   
   Yes:
   - **Unit tests** (`datafusion/functions/src/core/with_metadata.rs`) covering 
single-key attach, merge-with-overwrite on collision, multi-pair attach, 
even-arity rejection, too-few-args rejection, and non-literal-key rejection.
   - **SQL logic tests** (`metadata.slt`) covering attach/read roundtrip, 
merging with pre-existing field metadata, collision overwrite, nested 
`with_metadata(with_metadata(...))`, value pass-through, and planning-time 
errors (odd arity, missing args, non-literal key, empty key).
   - `cargo fmt --all` clean; `cargo clippy -p datafusion-functions 
--all-targets --all-features -- -D warnings` clean (the `mutable_key_type` 
error surfaced by `--all-targets --all-features` on the full workspace is 
pre-existing on `main` and unrelated to this PR).
   
   ## Are there any user-facing changes?
   
   Yes — a new built-in scalar function `with_metadata` is now available in SQL 
and via `datafusion_functions::expr_fn::with_metadata`. Generated docs are 
updated accordingly. No existing behavior changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to