Re: [I] Handle Arrow `large_string` data type [iceberg-python]

via GitHub Wed, 13 Mar 2024 21:22:11 -0700


kevinjqliu commented on issue #520:
URL: https://github.com/apache/iceberg-python/issues/520#issuecomment-1996377106


   I can think of 2 options. 
   1. Add Arrow `LargeString` as an Iceberg data type. Map 1:1 with Arrow data 
type. The physical representation will still be backed by string. 
   2. Arrow `LargeString` is already converted to Iceberg `String` type in 
`create_table` by `_convert_schema_if_needed` (see #382). So when writing an 
Arrow table (in `overwrite`/`append`), convert the given Arrow table schema to 
the table's schema, after checking the two schemas are compatible. 
   
   Example:
   
https://github.com/apache/iceberg-python/blob/36a505f7741f814c00b8babf6f26e89efde5b688/pyiceberg/table/__init__.py#L1138
 
   
   ```
           _check_schema(self.schema(), other_schema=df.schema)
           # safe to cast
           from pyiceberg.io.pyarrow import schema_to_pyarrow
           pyarrow_schema = schema_to_pyarrow(self.schema())
           df = df.cast(pyarrow_schema)
   ```
   
   @Fokko @HonahX @syun64 would love your opinions on this
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Handle Arrow `large_string` data type [iceberg-python]

Reply via email to