lukekim commented on issue #21301:
URL: https://github.com/apache/datafusion/issues/21301#issuecomment-4170966028

   +1 we already build these into Spice but I think the functions should be 
reviewed for consistency before making it official and having more people 
depend on them.
   
   It aligns most closely to the PostgreSQL syntax but not exactly/completely. 
LLM analysis:
   
   **Exact Function-by-Function Diff: DataFusion JSON Functions vs PostgreSQL** 
 
   (PostgreSQL 17/18 – official `json`/`jsonb` functions and operators from 
https://www.postgresql.org/docs/current/functions-json.html)
   
   | PostgreSQL Exact Equivalent | DF Function / Operator | Key Differences |
   
|-----------------------------|------------------------|----------------------------|
   | `col::jsonb ? 'key'` (top-level only) <br>or `col::jsonb @? 
'$."key1"."key2"[2]'` (full path via SQL/JSON path) | `json_contains(json: str, 
*keys: str \| int) → bool` <br>(alias: `?` operator) | DF supports full 
variadic path (keys + array indices as int). Postgres `?` is **only top-level 
key/element**. For nested path use `@?` (jsonpath) or 
`jsonb_path_exists(col::jsonb, '$."a"."b"[2]')`. Returns `bool` in both. |
   | `col::jsonb #> ARRAY['a','b','2']` <br>or `jsonb_extract_path(col::jsonb, 
'a', 'b', '2')` <br>or chained: `(col::jsonb -> 'a' -> 'b') -> 2` | 
`json_get(json: str, *keys: str \| int) → JsonUnion` <br>(alias: `->` operator) 
| **Closest is `jsonb_extract_path` (variadic text[])** or `#>` operator. DF 
accepts **int directly** for array indices; Postgres requires **text strings** 
for indices (`'2'` not `2`). Returns `jsonb` (typed union). Input must be cast 
from `text`/`str`. |
   | `(col::jsonb #> ARRAY['a','b','2'])::text` <br>or 
`jsonb_extract_path_text(col::jsonb, 'a', 'b', '2')` | `json_get_str(json: str, 
*keys: str \| int) → str` | No dedicated typed getter. Use `#>` then cast, or 
`jsonb_extract_path_text` (variadic). Exact same as `json_get` + `::text`. |
   | `(col::jsonb #>> ARRAY['a','b','2'])::int` <br>or 
`(jsonb_extract_path_text(col::jsonb, 'a', 'b', '2'))::int` | 
`json_get_int(json: str, *keys: str \| int) → int` | No `json_get_int`. Extract 
as text (`#>>` or `_text`) then cast. Fails at runtime if not integer. |
   | `(col::jsonb #>> ARRAY['a','b','2'])::float` <br>or 
`(jsonb_extract_path_text(...))::double precision` | `json_get_float(json: str, 
*keys: str \| int) → float` | No dedicated function. Text extract + cast. 
Postgres uses `double precision` for float. |
   | `(col::jsonb #>> ARRAY['a','b','2'])::bool` | `json_get_bool(json: str, 
*keys: str \| int) → bool` | No dedicated function. Text extract + cast. |
   | `col::jsonb #> ARRAY['a','b','2']` <br>or `jsonb_extract_path(col::jsonb, 
'a', 'b', '2')` (then `::text` if raw string needed) | `json_get_json(json: 
str, *keys: str \| int) → str` | Returns `jsonb` object (not raw string). Cast 
to `text` if you want the exact JSON string representation. Matches DF’s 
“nested raw JSON string”. |
   | `col::jsonb #> ARRAY['a','b']` (returns jsonb array) <br>then 
`jsonb_array_elements(...)` or cast to `text[]` / use in `FROM` | 
`json_get_array(json: str, *keys: str \| int) → array` | Returns `jsonb` (not 
native Arrow array). To expand: `SELECT * FROM jsonb_array_elements(col::jsonb 
#> '{path}')`. No direct Arrow return. |
   | `col::jsonb #>> ARRAY['a','b','2']` <br>or 
`jsonb_extract_path_text(col::jsonb, 'a', 'b', '2')` <br>or `col::jsonb ->> 
'key'` (top-level) | `json_as_text(json: str, *keys: str \| int) → str` 
<br>(alias: `->>` operator) | **Exact functional match** to `->>` operator and 
`jsonb_extract_path_text`. Returns scalar as `text`. |
   | `jsonb_array_length(col::jsonb #> ARRAY['a','b'])` (arrays only) <br>or 
`cardinality(jsonb_object_keys(col::jsonb #> ARRAY['a','b']))` (objects) | 
`json_length(json: str, *keys: str \| int) → int` | **No single function** for 
both array and object. `jsonb_array_length` only works on top-level arrays. For 
objects use `jsonb_object_keys` + `count` / `cardinality`. DF works on both 
array length and object key count. |
   
   **Operator Aliases (exact mapping)**
   
   | Postgres Operator | DF Operator | Notes |
   |-------------------|-------------|-------|
   | `->` / `#>` | `->` | `->` is top-level only; `#>` for full path (text 
array). |
   | `->>` / `#>>` | `->>` | `->>` top-level; `#>>` for full path. |
   | `?` / `@?` | `?` | `?` top-level only; `@?` (jsonpath) for nested. |
   
   **Summary of Major Exact Diffs**
   - DF works on raw `text`/`str` JSON; Postgres requires `::json` or 
(preferred) `::jsonb`.
   - DF path syntax: variadic `*keys` (mix of `str` + `int` indices) → 
Postgres: `text[]` array or chained operators (indices as text strings `'0'`, 
`'1'`).
   - DF has **typed getters** (`_int`, `_float`, etc.); Postgres has **none** — 
always text extract + SQL cast.
   - DF `json_get_array` returns native Arrow array; Postgres returns `jsonb` 
(must expand manually).
   - DF `json_length` is unified; Postgres splits array vs object.
   - Postgres `jsonb` is binary & indexable (GIN); DF is string-based (no 
native indexing mentioned).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to