lukekim commented on issue #21301: URL: https://github.com/apache/datafusion/issues/21301#issuecomment-4170966028
+1 we already build these into Spice but I think the functions should be reviewed for consistency before making it official and having more people depend on them. It aligns most closely to the PostgreSQL syntax but not exactly/completely. LLM analysis: **Exact Function-by-Function Diff: DataFusion JSON Functions vs PostgreSQL** (PostgreSQL 17/18 – official `json`/`jsonb` functions and operators from https://www.postgresql.org/docs/current/functions-json.html) | PostgreSQL Exact Equivalent | DF Function / Operator | Key Differences | |-----------------------------|------------------------|----------------------------| | `col::jsonb ? 'key'` (top-level only) <br>or `col::jsonb @? '$."key1"."key2"[2]'` (full path via SQL/JSON path) | `json_contains(json: str, *keys: str \| int) → bool` <br>(alias: `?` operator) | DF supports full variadic path (keys + array indices as int). Postgres `?` is **only top-level key/element**. For nested path use `@?` (jsonpath) or `jsonb_path_exists(col::jsonb, '$."a"."b"[2]')`. Returns `bool` in both. | | `col::jsonb #> ARRAY['a','b','2']` <br>or `jsonb_extract_path(col::jsonb, 'a', 'b', '2')` <br>or chained: `(col::jsonb -> 'a' -> 'b') -> 2` | `json_get(json: str, *keys: str \| int) → JsonUnion` <br>(alias: `->` operator) | **Closest is `jsonb_extract_path` (variadic text[])** or `#>` operator. DF accepts **int directly** for array indices; Postgres requires **text strings** for indices (`'2'` not `2`). Returns `jsonb` (typed union). Input must be cast from `text`/`str`. | | `(col::jsonb #> ARRAY['a','b','2'])::text` <br>or `jsonb_extract_path_text(col::jsonb, 'a', 'b', '2')` | `json_get_str(json: str, *keys: str \| int) → str` | No dedicated typed getter. Use `#>` then cast, or `jsonb_extract_path_text` (variadic). Exact same as `json_get` + `::text`. | | `(col::jsonb #>> ARRAY['a','b','2'])::int` <br>or `(jsonb_extract_path_text(col::jsonb, 'a', 'b', '2'))::int` | `json_get_int(json: str, *keys: str \| int) → int` | No `json_get_int`. Extract as text (`#>>` or `_text`) then cast. Fails at runtime if not integer. | | `(col::jsonb #>> ARRAY['a','b','2'])::float` <br>or `(jsonb_extract_path_text(...))::double precision` | `json_get_float(json: str, *keys: str \| int) → float` | No dedicated function. Text extract + cast. Postgres uses `double precision` for float. | | `(col::jsonb #>> ARRAY['a','b','2'])::bool` | `json_get_bool(json: str, *keys: str \| int) → bool` | No dedicated function. Text extract + cast. | | `col::jsonb #> ARRAY['a','b','2']` <br>or `jsonb_extract_path(col::jsonb, 'a', 'b', '2')` (then `::text` if raw string needed) | `json_get_json(json: str, *keys: str \| int) → str` | Returns `jsonb` object (not raw string). Cast to `text` if you want the exact JSON string representation. Matches DF’s “nested raw JSON string”. | | `col::jsonb #> ARRAY['a','b']` (returns jsonb array) <br>then `jsonb_array_elements(...)` or cast to `text[]` / use in `FROM` | `json_get_array(json: str, *keys: str \| int) → array` | Returns `jsonb` (not native Arrow array). To expand: `SELECT * FROM jsonb_array_elements(col::jsonb #> '{path}')`. No direct Arrow return. | | `col::jsonb #>> ARRAY['a','b','2']` <br>or `jsonb_extract_path_text(col::jsonb, 'a', 'b', '2')` <br>or `col::jsonb ->> 'key'` (top-level) | `json_as_text(json: str, *keys: str \| int) → str` <br>(alias: `->>` operator) | **Exact functional match** to `->>` operator and `jsonb_extract_path_text`. Returns scalar as `text`. | | `jsonb_array_length(col::jsonb #> ARRAY['a','b'])` (arrays only) <br>or `cardinality(jsonb_object_keys(col::jsonb #> ARRAY['a','b']))` (objects) | `json_length(json: str, *keys: str \| int) → int` | **No single function** for both array and object. `jsonb_array_length` only works on top-level arrays. For objects use `jsonb_object_keys` + `count` / `cardinality`. DF works on both array length and object key count. | **Operator Aliases (exact mapping)** | Postgres Operator | DF Operator | Notes | |-------------------|-------------|-------| | `->` / `#>` | `->` | `->` is top-level only; `#>` for full path (text array). | | `->>` / `#>>` | `->>` | `->>` top-level; `#>>` for full path. | | `?` / `@?` | `?` | `?` top-level only; `@?` (jsonpath) for nested. | **Summary of Major Exact Diffs** - DF works on raw `text`/`str` JSON; Postgres requires `::json` or (preferred) `::jsonb`. - DF path syntax: variadic `*keys` (mix of `str` + `int` indices) → Postgres: `text[]` array or chained operators (indices as text strings `'0'`, `'1'`). - DF has **typed getters** (`_int`, `_float`, etc.); Postgres has **none** — always text extract + SQL cast. - DF `json_get_array` returns native Arrow array; Postgres returns `jsonb` (must expand manually). - DF `json_length` is unified; Postgres splits array vs object. - Postgres `jsonb` is binary & indexable (GIN); DF is string-based (no native indexing mentioned). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
