Re: [I] Regression in 0.7.0 due to type coercion from "string" to "large_string" [iceberg-python]

via GitHub Wed, 04 Sep 2024 02:07:39 -0700


maxfirman commented on issue #1128:
URL: 
https://github.com/apache/iceberg-python/issues/1128#issuecomment-2328309497


   Thanks @kevinjqliu. I can confirm that the workaround resolves the problem 
when using latest main branch but not v0.7.0 or v0.7.1.
   
   Setting `PYARROW_USE_LARGE_TYPES_ON_READ=False` will cause the test to fail 
the other way around, i.e a pyarrow table with a `large_string` will be read 
back with a `string`. I'm guess this is just a fundamental limitation in that 
Iceberg only has one string type.
   
   I would be tempted to change the default value of 
`PYARROW_USE_LARGE_TYPES_ON_READ` to `True`, as I would consider pyarrow 
`string` to be the more commonly used type compared to `large_string`. This 
would also give backwards compatibility with `pyiceberg <0.7.0`.
   
   A further improvement would be to write some kind of type hint into the 
iceberg metadata that would tell pyiceberg whether the string column was 
supposed to be interpreted as a pyarrow `large_string`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Regression in 0.7.0 due to type coercion from "string" to "large_string" [iceberg-python]

Reply via email to