[I] Regression in 0.7.0 due to type coercion from "string" to "large_string" [iceberg-python]

via GitHub Tue, 03 Sep 2024 10:07:50 -0700


maxfirman opened a new issue, #1128:
URL: https://github.com/apache/iceberg-python/issues/1128


   ### Apache Iceberg version
   
   0.7.0
   
   ### Please describe the bug 🐞
   
   There is a regression in introduced in version 0.7.0 where arrow tables 
written with a "string" data type, get cast to "large_string" when read back 
from Iceberg. 
   
   The code below reproduces the bug. The assertion  succeeds in v0.6.1, but 
fails in 0.7.0 because the schema is being changed from "string" to 
"large_string".
   
   
   ```python
   from tempfile import TemporaryDirectory
   
   import pyarrow
   from pyiceberg.catalog.sql import SqlCatalog
   
   
   def main():
       with TemporaryDirectory() as warehouse_path:
           catalog = SqlCatalog(
               "default",
               **{
                   "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
                   "warehouse": f"file://{warehouse_path}",
               },
           )
   
           catalog.create_namespace("default")
   
           schema = pyarrow.schema(
               [
                   pyarrow.field("foo", pyarrow.string(), nullable=True),
               ]
           )
   
           df = pyarrow.table(data={"foo": ["bar"]}, schema=schema)
   
           table = catalog.create_table(
               "default.test_table",
               schema=df.schema,
           )
   
           table.append(df)
   
           # read arrow table back table from iceberg
           df2 = table.scan().to_arrow()
   
           # this assert succeeds with 0.6.1, but fails with 0.7.0 because the 
column type
           # has changed from "string" to "large_string"
           assert df.equals(df2)
   
   
   if __name__ == "__main__":
       main()
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Regression in 0.7.0 due to type coercion from "string" to "large_string" [iceberg-python]

Reply via email to