kevinjqliu commented on issue #520: URL: https://github.com/apache/iceberg-python/issues/520#issuecomment-1996377106
I can think of 2 options. 1. Add Arrow `LargeString` as an Iceberg data type. Map 1:1 with Arrow data type. The physical representation will still be backed by string. 2. Arrow `LargeString` is already converted to Iceberg `String` type in `create_table` by `_convert_schema_if_needed` (see #382). So when writing an Arrow table (in `overwrite`/`append`), convert the given Arrow table schema to the table's schema, after checking the two schemas are compatible. Example: https://github.com/apache/iceberg-python/blob/36a505f7741f814c00b8babf6f26e89efde5b688/pyiceberg/table/__init__.py#L1138 ``` _check_schema(self.schema(), other_schema=df.schema) # safe to cast from pyiceberg.io.pyarrow import schema_to_pyarrow pyarrow_schema = schema_to_pyarrow(self.schema()) df = df.cast(pyarrow_schema) ``` @Fokko @HonahX @syun64 would love your opinions on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org