kevinjqliu commented on code in PR #2167: URL: https://github.com/apache/iceberg-python/pull/2167#discussion_r2206128117
########## tests/integration/test_writes/test_partitioned_writes.py: ########## @@ -547,14 +552,14 @@ def test_summaries_with_null(spark: SparkSession, session_catalog: Catalog, arro "total-records": "6", } assert summaries[5] == { - "removed-files-size": "16174", + "removed-files-size": "15774" if under_20_arrow else "16174", Review Comment: lets just do this instead since we're not really testing for the file size ```suggestion "removed-files-size": summaries[5]["removed-files-size"], ``` ########## tests/integration/test_writes/test_partitioned_writes.py: ########## @@ -451,6 +451,11 @@ def test_dynamic_partition_overwrite_unpartitioned_evolve_to_identity_transform( @pytest.mark.integration def test_summaries_with_null(spark: SparkSession, session_catalog: Catalog, arrow_table_with_null: pa.Table) -> None: + import pyarrow + from packaging import version + + under_20_arrow = version.parse(pyarrow.__version__) < version.parse("20.0.0") + Review Comment: > Any ideas? Maybe use a range of "safe" values instead of a single file size value? I'd be happy to open another PR if there is more work for this. I think we can just parameterize the file size. We're not really testing anything related to the size of the file. > It'd be great if PyIceberg wouldn't set an upper version for Arrow if possible. yea agreed. lets see if we can remove the upper bound -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org