ndrluis commented on PR #14027:
URL: https://github.com/apache/iceberg/pull/14027#issuecomment-3289655448
Quick update on this issue - I'm going to focus on solving this problem on
the Java side first. Once Iceberg Java has the correct behavior, I'll come back
to PyIceberg and make the necessary adjustments. So here's the minimal test
that I'm running using PySpark (since I have more familiarity with it than the
Java environment).
**Tested with the following Iceberg Runtimes**:
org.apache.iceberg#iceberg-spark-runtime-3.5_2.12;1.9.0
org.apache.iceberg#iceberg-spark-runtime-3.5_2.12;1.10.0
**Test Case**
```python
@pytest.mark.integration
def test_uuid_write_read_with_pyspark(session_catalog: Catalog, spark:
SparkSession) -> None:
identifier = "default.test_uuid_write_and_read_with_pyspark"
catalog = load_catalog("default", type="in-memory")
catalog.create_namespace("ns")
schema = Schema(NestedField(field_id=1, name="uuid_col",
field_type=UUIDType(), required=False))
try:
session_catalog.drop_table(identifier=identifier)
except NoSuchTableError:
pass
table = _create_table(session_catalog, identifier, {"format-version":
"2"}, schema=schema)
spark.sql(
f"""
INSERT INTO {identifier} VALUES
("22222222-2222-2222-2222-222222222222")
"""
)
df = spark.table(identifier)
assert df.count() == 1
result = df.where("uuid_col = '22222222-2222-2222-2222-222222222222'")
assert result.count() == 1
```
**Error**
The test passes for df.count() but fails when applying the WHERE condition
with the following error:
```
25/09/14 12:45:49 ERROR BaseReader: Error reading file(s):
s3://warehouse/default/test_uuid_write_and_read_with_pyspark/data/00000-0-c8b11c46-5ef7-426e-a1d5-de8aa720af6d-0-00001.parquet
java.lang.ClassCastException: class java.util.UUID cannot be cast to class
java.nio.ByteBuffer (java.util.UUID and java.nio.ByteBuffer are in module
java.base of loader 'bootstrap')
at java.base/java.nio.ByteBuffer.compareTo(ByteBuffer.java:267)
at
java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:52)
at
java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:47)
at
org.apache.iceberg.types.Comparators$NullSafeChainedComparator.compare(Comparators.java:253)
at
org.apache.iceberg.parquet.ParquetMetricsRowGroupFilter$MetricsEvalVisitor.eq(ParquetMetricsRowGroupFilter.java:352)
at
org.apache.iceberg.parquet.ParquetMetricsRowGroupFilter$MetricsEvalVisitor.eq(ParquetMetricsRowGroupFilter.java:79)
at
org.apache.iceberg.expressions.ExpressionVisitors$BoundExpressionVisitor.predicate(ExpressionVisitors.java:162)
at
org.apache.iceberg.expressions.ExpressionVisitors.visitEvaluator(ExpressionVisitors.java:390)
at
org.apache.iceberg.expressions.ExpressionVisitors.visitEvaluator(ExpressionVisitors.java:409)
at
org.apache.iceberg.parquet.ParquetMetricsRowGroupFilter$MetricsEvalVisitor.eval(ParquetMetricsRowGroupFilter.java:103)
at
org.apache.iceberg.parquet.ParquetMetricsRowGroupFilter.shouldRead(ParquetMetricsRowGroupFilter.java:73)
at org.apache.iceberg.parquet.ReadConf.<init>(ReadConf.java:108)
at
org.apache.iceberg.parquet.VectorizedParquetReader.init(VectorizedParquetReader.java:90)
at
org.apache.iceberg.parquet.VectorizedParquetReader.iterator(VectorizedParquetReader.java:99)
at
org.apache.iceberg.spark.source.BatchDataReader.open(BatchDataReader.java:116)
at
org.apache.iceberg.spark.source.BatchDataReader.open(BatchDataReader.java:43)
at
org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:134)
at
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:120)
at
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:158)
[... rest of stack trace ...]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]