Deuxis opened a new issue, #47380:
URL: https://github.com/apache/arrow/issues/47380

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I have a `RecordBatch` `rb` with the following schema:
   
   ```
   schema = pa.schema([
       ("col1", pa.map_("string", "string")),
       ("col2", pa.struct([
           ("f1", pa.int32()),
           ("f2", pa.map_("string", pa.struct([
               ("item", pa.list_(pa.struct([
                   ("raw", "string"),
                   ("component", pa.map_("string", "string"))
               ])))
           ])))
       ]))
   ])
   ```
   
   When using `rb.to_pylist(maps_as_pydicts="strict")`, the map in `col1` 
exports correctly, as `{key:val}` dict, as does the map at `$['col2']['f2']` , 
while the map at `$['col2']['f2'][*]['item'][*]['component']` still exports as 
a `[(key, val)]` list of tuples:
   
   ```
   [{'col1': {'A': 'B'},
     'col2': {'f1': 1,
      'f2': {'C': {'item': [{'raw': 'D', 'component': [('E', 'F')]}]}}}}]
   ```
   
   Verified on pyarrow 21.0.0 on x86_64 Linux and Windows.
   
   Use case and issue:
   
   I need to adapt a pipeline to serialize a parquet file to JSON and 
deserialize back into a parquet file. Tuples serialize to JSON arrays and sadly 
even with the schema provided `RecordBatch.from_pylist` keels over and dies 
upon encountering a list of lists where it expected the Map. (`ArrowTypeError: 
Could not convert 'E' with type str: was expecting tuple of (key, value) pair`) 
Trying to transform the JSON into JSONL and load it via 
`pyarrow.json.read_json` has even worse results, as it straight-up doesn't 
support loading Maps and gives up when I hand it the schema.
   
   The maps correctly exported to dicts->json objects import with no problem, 
so fixing this bug will solve my issue.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to