Fokko commented on code in PR #2540:
URL: https://github.com/apache/iceberg-python/pull/2540#discussion_r2392242974
##########
dev/provision.py:
##########
@@ -23,35 +22,27 @@
from pyiceberg.schema import Schema
from pyiceberg.types import FixedType, NestedField, UUIDType
-# The configuration is important, otherwise we get many small
-# parquet files with a single row. When a positional delete
-# hits the Parquet file with one row, the parquet file gets
-# dropped instead of having a merge-on-read delete file.
-spark = (
- SparkSession
- .builder
- .config("spark.sql.shuffle.partitions", "1")
- .config("spark.default.parallelism", "1")
Review Comment:
The tests are passing, but we're not testing the positional deletes anymore
since Spark will throw away the whole file, instead of creating the positional
deletes:
The following test illustrates the problem:
```
diff --git a/tests/integration/test_reads.py
b/tests/integration/test_reads.py
index 375eb35b2..ed6e805e3 100644
--- a/tests/integration/test_reads.py
+++ b/tests/integration/test_reads.py
@@ -432,6 +432,11 @@ def test_pyarrow_deletes(catalog: Catalog,
format_version: int) -> None:
# (11, 'k'),
# (12, 'l')
test_positional_mor_deletes =
catalog.load_table(f"default.test_positional_mor_deletes_v{format_version}")
+
+ if format_version == 2:
+ files = test_positional_mor_deletes.scan().plan_files()
+ assert all([len(file.delete_files) > 0 for file in files])
+
arrow_table = test_positional_mor_deletes.scan().to_arrow()
assert arrow_table["number"].to_pylist() == [1, 2, 3, 4, 5, 6, 7, 8,
10, 11, 12]
```
This one passes on main but fails on this branch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]