kamijin-fanta opened a new issue, #8269:
URL: https://github.com/apache/iceberg/issues/8269

   ### Apache Iceberg version
   
   1.3.1 (latest release)
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   ## Overview
   
   - Data written by UnpartitionedWriter may not be included in the table scan 
results.
   - Files are not included in the PlanFiles results when a partitioned field 
is specified as a filter condition.
   - This problem was discovered when querying with Trino. 
https://github.com/trinodb/trino/issues/18580
   
   ## Steps to Reproduce
   
   Code: 
https://github.com/kamijin-fanta/iceberg-sandbox/blob/master/src/main/kotlin/Main.kt
   
   ### 1. define table and write data
   
   I used the Java client UnpartitionedWriter to write rows to the table.
   
   ```kotlin
   val schema = Schema(
       Types.NestedField.of(1, false, "ts1", TimestampType.withZone()),
       Types.NestedField.of(2, false, "ts2", TimestampType.withZone()),
       Types.NestedField.of(3, false, "text", StringType()),
   )
   val partitionSpec = PartitionSpec.builderFor(schema).year("ts1").build()
   
   // some codes
   
   val writer = UnpartitionedWriter(
     table.spec(),
     FileFormat.PARQUET,
     appenderFactory,
     fileFactory,
     table.io(),
     Long.MAX_VALUE
   )
   ```
   
   
   ### 2. scan a table with DataTableScan
   
   Perform a scan on the table. A non-partitioned ts2 query will terminate 
successfully.
   
   ```kotlin
   val filterTs2Files = table.newScan()
     .useSnapshot(table.currentSnapshot().snapshotId())
     .filter(Expressions.greaterThan("ts2", 1690848000000000L)) // ts1 > 
2023-08-01 00:00:00.000 UTC
     .planFiles()
     .toList()
   println("found files filtered by ts2: %d".format(filterTs2Files.size))
   filterTs2Files.forEach{ println(it) }
   
   /*
   Outputs:
   
   found files filtered by ts2: 1
   
BaseFileScanTask{file=s3://warehouse/default/test1/data/00000-0-d1f1bc76-155f-4637-b041-615fcd61fcb7-00001.parquet,
 partition_data=PartitionData{ts1_year=null}, residual=ref(name="ts2") > 
1690848000000000}
   */
   ````
   
   However, if specify a condition for a field in ts1 that is partitioned, the 
returned empty list.
   
   
   ```kotlin
   val filterTs1Files = table.newScan()
     .useSnapshot(table.currentSnapshot().snapshotId())
     .filter(Expressions.greaterThan("ts1", 1690848000000000L)) // ts1 > 
2023-08-01 00:00:00.000 UTC
     .planFiles()
     .toList()
   println("found files filtered by ts1: %d".format(filterTs1Files.size))
   filterTs1Files.forEach{ println(it) }
   
   /*
   Outputs:
   
   found files filtered by ts1: 0
   */
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to