HonahX commented on PR #986: URL: https://github.com/apache/iceberg-python/pull/986#issuecomment-2268417308
@sungwy Thanks for working on this! It seems we also need to update `schema_to_pyarrow`/`_cast_if_needed` to honor the new property. Otherwise https://github.com/apache/iceberg-python/blob/846713b18c4f0adbb75fda14d9b863464e91d4bc/pyiceberg/io/pyarrow.py#L1471-L1478 If we have type promotion from `string` to `binary`, the `schema_to_parrow` will convert `BinaryType()` to `pa.large_binary` Example to reproduce: ```python @pytest.mark.integration @pytest.mark.parametrize("catalog", [pytest.lazy_fixture("session_catalog_hive")]) def test_table_scan_override_with_small_types(catalog: Catalog) -> None: identifier = "default.test_table_scan_override_with_small_types" arrow_table = pa.Table.from_arrays( [pa.array(["a", "b", "c"]), pa.array([b"a", b"b", b"c"]), pa.array([["a", "b"], ["c", "d"], ["e", "f"]])], names=["string", "binary", "list"], ) try: catalog.drop_table(identifier) except NoSuchTableError: pass tbl = catalog.create_table( identifier, schema=arrow_table.schema, ) tbl.append(arrow_table) with tbl.update_schema() as update_schema: update_schema.update_column("string", BinaryType()) tbl.io.properties[PYARROW_USE_LARGE_TYPES_ON_READ] = "False" result_table = tbl.scan().to_arrow() expected_schema = pa.schema([ pa.field("string", pa.large_binary()), # should be pa.binary() pa.field("binary", pa.binary()), pa.field("list", pa.list_(pa.string())), ]) assert result_table.schema.equals(expected_schema) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org