kamijin-fanta opened a new issue, #8269: URL: https://github.com/apache/iceberg/issues/8269
### Apache Iceberg version 1.3.1 (latest release) ### Query engine None ### Please describe the bug 🐞 ## Overview - Data written by UnpartitionedWriter may not be included in the table scan results. - Files are not included in the PlanFiles results when a partitioned field is specified as a filter condition. - This problem was discovered when querying with Trino. https://github.com/trinodb/trino/issues/18580 ## Steps to Reproduce Code: https://github.com/kamijin-fanta/iceberg-sandbox/blob/master/src/main/kotlin/Main.kt ### 1. define table and write data I used the Java client UnpartitionedWriter to write rows to the table. ```kotlin val schema = Schema( Types.NestedField.of(1, false, "ts1", TimestampType.withZone()), Types.NestedField.of(2, false, "ts2", TimestampType.withZone()), Types.NestedField.of(3, false, "text", StringType()), ) val partitionSpec = PartitionSpec.builderFor(schema).year("ts1").build() // some codes val writer = UnpartitionedWriter( table.spec(), FileFormat.PARQUET, appenderFactory, fileFactory, table.io(), Long.MAX_VALUE ) ``` ### 2. scan a table with DataTableScan Perform a scan on the table. A non-partitioned ts2 query will terminate successfully. ```kotlin val filterTs2Files = table.newScan() .useSnapshot(table.currentSnapshot().snapshotId()) .filter(Expressions.greaterThan("ts2", 1690848000000000L)) // ts1 > 2023-08-01 00:00:00.000 UTC .planFiles() .toList() println("found files filtered by ts2: %d".format(filterTs2Files.size)) filterTs2Files.forEach{ println(it) } /* Outputs: found files filtered by ts2: 1 BaseFileScanTask{file=s3://warehouse/default/test1/data/00000-0-d1f1bc76-155f-4637-b041-615fcd61fcb7-00001.parquet, partition_data=PartitionData{ts1_year=null}, residual=ref(name="ts2") > 1690848000000000} */ ```` However, if specify a condition for a field in ts1 that is partitioned, the returned empty list. ```kotlin val filterTs1Files = table.newScan() .useSnapshot(table.currentSnapshot().snapshotId()) .filter(Expressions.greaterThan("ts1", 1690848000000000L)) // ts1 > 2023-08-01 00:00:00.000 UTC .planFiles() .toList() println("found files filtered by ts1: %d".format(filterTs1Files.size)) filterTs1Files.forEach{ println(it) } /* Outputs: found files filtered by ts1: 0 */ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
