maxfirman opened a new issue, #1128:
URL: https://github.com/apache/iceberg-python/issues/1128

   ### Apache Iceberg version
   
   0.7.0
   
   ### Please describe the bug 🐞
   
   There is a regression in introduced in version 0.7.0 where arrow tables 
written with a "string" data type, get cast to "large_string" when read back 
from Iceberg. 
   
   The code below reproduces the bug. The assertion  succeeds in v0.6.1, but 
fails in 0.7.0 because the schema is being changed from "string" to 
"large_string".
   
   
   ```python
   from tempfile import TemporaryDirectory
   
   import pyarrow
   from pyiceberg.catalog.sql import SqlCatalog
   
   
   def main():
       with TemporaryDirectory() as warehouse_path:
           catalog = SqlCatalog(
               "default",
               **{
                   "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
                   "warehouse": f"file://{warehouse_path}",
               },
           )
   
           catalog.create_namespace("default")
   
           schema = pyarrow.schema(
               [
                   pyarrow.field("foo", pyarrow.string(), nullable=True),
               ]
           )
   
           df = pyarrow.table(data={"foo": ["bar"]}, schema=schema)
   
           table = catalog.create_table(
               "default.test_table",
               schema=df.schema,
           )
   
           table.append(df)
   
           # read arrow table back table from iceberg
           df2 = table.scan().to_arrow()
   
           # this assert succeeds with 0.6.1, but fails with 0.7.0 because the 
column type
           # has changed from "string" to "large_string"
           assert df.equals(df2)
   
   
   if __name__ == "__main__":
       main()
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to