paleolimbot opened a new issue, #44714:
URL: https://github.com/apache/arrow/issues/44714
### Describe the bug, including details regarding any error messages,
version, and platform.
When Map types are received via the C Data interface, field metadata
(including extension metadata) is dropped. This seems unintentional given that
we maintain that metadata for a list of structs:
```python
import duckdb
duckdb_cursor = duckdb.connect()
duckdb_cursor.execute("SET arrow_lossless_conversion = true")
arrow_table = duckdb_cursor.execute("select map {uuid(): 1::uhugeint,
uuid(): 2::uhugeint} as li").arrow()
res = duckdb_cursor.execute("select typeof(li) FROM arrow_table").fetchall()
print ("map type")
print (arrow_table.schema)
print (res)
# map type
# li: map<fixed_size_binary[16], fixed_size_binary[16]>
# child 0, entries: struct<key: fixed_size_binary[16] not null, value:
fixed_size_binary[16]> not null
# child 0, key: fixed_size_binary[16] not null
# child 1, value: fixed_size_binary[16]
# [('MAP(BLOB, BLOB)',)]
arrow_table = duckdb_cursor.execute("select [{'keys': uuid(), 'values':
uuid()}] as li").arrow()
res = duckdb_cursor.execute("select typeof(li) FROM arrow_table").fetchall()
print ("fixed size list type")
print (arrow_table.schema)
print (res)
# map type
# li: list<l: struct<keys: fixed_size_binary[16], values:
fixed_size_binary[16]>>
# child 0, l: struct<keys: fixed_size_binary[16], values:
fixed_size_binary[16]>
# child 0, keys: fixed_size_binary[16]
# -- field metadata --
# ARROW:extension:metadata: ''
# ARROW:extension:name: 'arrow.uuid'
# child 1, values: fixed_size_binary[16]
# -- field metadata --
# ARROW:extension:metadata: ''
# ARROW:extension:name: 'arrow.uuid'
# [('STRUCT(keys UUID, "values" UUID)[]',)]
```
This occurs because we reconstruct the fields to canonicalize the field
names:
https://github.com/apache/arrow/blob/d7bc3788ea2773399b7ef489438c725999bfa83d/cpp/src/arrow/c/bridge.cc#L1298-L1321
I think that we don't have that problem in the IPC type conversion:
https://github.com/apache/arrow/blob/d7bc3788ea2773399b7ef489438c725999bfa83d/cpp/src/arrow/ipc/metadata_internal.cc#L393-L395
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]