maxfirman commented on issue #1128: URL: https://github.com/apache/iceberg-python/issues/1128#issuecomment-2328309497
Thanks @kevinjqliu. I can confirm that the workaround resolves the problem when using latest main branch but not v0.7.0 or v0.7.1. Setting `PYARROW_USE_LARGE_TYPES_ON_READ=False` will cause the test to fail the other way around, i.e a pyarrow table with a `large_string` will be read back with a `string`. I'm guess this is just a fundamental limitation in that Iceberg only has one string type. I would be tempted to change the default value of `PYARROW_USE_LARGE_TYPES_ON_READ` to `True`, as I would consider pyarrow `string` to be the more commonly used type compared to `large_string`. This would also give backwards compatibility with `pyiceberg <0.7.0`. A further improvement would be to write some kind of type hint into the iceberg metadata that would tell pyiceberg whether the string column was supposed to be interpreted as a pyarrow `large_string`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org