rraulinio opened a new issue, #1360:
URL: https://github.com/apache/iceberg-go/issues/1360

   ### Feature Request / Improvement
   
   # Add SQL UDF metadata model and REST read support
   
   Spec: https://iceberg.apache.org/udf-spec/
   
   ## Background
   
   Apache Iceberg 1.11.0 introduced a **SQL UDF specification**.
   
   In plain terms, a catalog UDF is a named, reusable SQL routine stored in the 
catalog. A scalar UDF is a reusable expression, and a UDTF is closer to a 
parameterized view that returns rows. For example, a catalog can define 
`add_one(x int)` once, and engines such as Spark or Trino can discover it from 
the catalog and use the SQL representation written for their dialect.
   
   The metadata design deliberately mirrors Iceberg tables and views: each 
function is represented by one self-contained, immutable JSON metadata file. 
The catalog maps the function name to the current metadata file location and 
updates that mapping with an atomic swap. All overloads of a name live in a 
single metadata file, each overload has its own version history, and a 
`definition-log` records the selected definition-version mappings over time so 
the function can be rolled back without external state.
   
   ## What this unlocks
   
   - Go clients can discover and read catalog-managed UDFs from REST catalogs 
that serve them.
   - Go-based catalog servers and tools get a shared, spec-validated model for 
UDF metadata instead of each reinventing JSON handling, similar to the role the 
existing `view` package plays for views.
   
   ## Proposed decomposition - two PRs
   
   The split keeps review focused: **PR 1 is reviewable purely against the UDF 
format spec** with no HTTP involved, and **PR 2 is reviewable purely against 
the REST OpenAPI**. PR 2 depends on PR 1.
   
   - [ ] **PR 1 - `udf` metadata model package.** Add a new package named 
`udf`, matching Java's `org.apache.iceberg.udf` package and the spec title. The 
package would mirror the structure of the existing `view` package: metadata 
model (`function-uuid`, `format-version`, `definitions`, `definition-log`, 
`location`, `properties`, `secure`, `doc`), definitions (`parameters`, 
`return-type`, `function-type` `udf`/`udtf`, `versions`, `current-version-id`, 
optional `return-nullable`, optional `doc`, and optional `specific-name`), 
definition versions (`representations`, `deterministic`, `on-null-input`, 
`timestamp-ms`), SQL representations (one per dialect per version), an unknown 
representation fallback for forward-compatible round-tripping, 
parsing/serialization, validation of the spec invariants, and a metadata 
builder for writers.
   
       Important validation points include one definition per signature, 
`current-version-id` referencing an existing version, UDTF `return-type` being 
a struct, unique `specific-name` values when present, and the canonical 
`definition-id` serializer for parameter types, for example: 
`int,list<int>,struct<id:int,name:string>`.
   `specific-name` is present in the current published UDF spec on Iceberg 
main/latest and was added after the 1.11.0 tag. Because it is optional and 
additive, the model can include it while older metadata simply omits it.
   
       One design note: UDF types are Iceberg types **without field IDs**. The 
spec says extra fields must be ignored, while iceberg-go's schema JSON parsing 
expects field IDs. The package should therefore carry a small UDF type 
representation (`primitive string | list | map | struct`). The spec's Appendix 
A and Appendix B examples should become golden test fixtures.
   
   - [ ] **PR 2 - REST catalog read-side client.** Add the REST read-side 
function support from the current REST OpenAPI:
       - `GET /v1/{prefix}/namespaces/{namespace}/functions`
       - `GET /v1/{prefix}/namespaces/{namespace}/functions/{function}`
   
       These should be gated by existing capability discovery. Per the REST 
spec, function endpoints are **not** part of the assumed default endpoint set, 
so they should only be used when the server advertises them in 
`ConfigResponse.endpoints`.
   
       Add an optional `FunctionCatalog` capability interface, following the 
same type-assertion pattern used for optional catalog capabilities such as 
`TransactionalCatalog` and `PurgeableTable`. The initial read-side surface 
would include:
       - `ListFunctions`: paginated iterator, similar to `ListViews`.
       - `LoadFunction`: returns the identifier, parsed metadata, and optional 
metadata location.
       - `CheckFunctionExists`: implemented via `LoadFunction` / GET fallback, 
since the REST spec defines no HEAD endpoint for functions.
   
       Wire-shape notes for reviewers:
   
       - `listFunctions` returns `CatalogObjectIdentifier`, which is a flat 
string array of hierarchy levels such as `["accounting", "tax", "paid"]`, not 
the `{namespace, name}` object shape used by tables/views.
       - `loadFunction` returns all overloads in one metadata response.
       - Add a new sentinel `ErrNoSuchFunction`, mapped from the server's 
`NoSuchFunctionException` 404 response.
   
   ## Non-goals for now
   
   - **No create/replace/drop client methods.** There are no standardized REST 
write endpoints to call yet. Adding non-spec write calls would diverge from 
upstream. PR 1's metadata builder gives writers what they need; client CRUD can 
follow once the upstream REST write proposal lands.
   - **No UDF execution.** Resolving overloads and running the SQL body is the 
query engine's job. This issue is only about catalog metadata modeling and REST 
read-side plumbing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to