kyrre opened a new issue, #45574: URL: https://github.com/apache/arrow/issues/45574
### Describe the usage question you have. Please include as many useful details as possible. We want to use PyArrow for ETL jobs where JSON files are periodically read from Azure Blob Storage and inserted to Delta Lake tables. While the schemas are available some of the columns have a "dynamic type", e.g., we could have two rows in which the ActivityObjects column have these values: ActivityObjects -> [{"TargetUser": 1, "OperationType": "NetworkShareCreation"}, ..., ] ActivityObjects -> [{"MachineId": "05-10-15"}, ..., ] The way we have dealt with this in Spark is just to treat ActivityObjects as `array<string>` (or `string`) and do any additional parsing at query time. However, if we try to do the same with PyArrow: ```python parse_options = pj.ParseOptions(explicit_schema=schema) events = ( ibis.memtable( pj.read_json( jsonl_stream, parse_options=parse_options ) ) ) ``` it throws an exception complaining it encountered a list instead of a string. Is there way to force this behaviour? As I understand this will eventually be solved by the introduction of VariantType. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org