GabrielM98 opened a new issue, #404:
URL: https://github.com/apache/iceberg-go/issues/404

   ### Apache Iceberg version
   
   main (development)
   
   ### Please describe the bug 🐞
   
   There appears to be an issue with the way in which partition filtering is 
applied to manifest entries when partitioning by a struct field. In my case, I 
have a table with the following partition spec...
   
   ```json
   {
         "spec-id": 3,
         "fields": [
           {
             "name": "event_metadata.timing.created_at_year",
             "transform": "year",
             "source-id": 19,
             "field-id": 1000
           },
           {
             "name": "event_metadata.timing.created_at_month",
             "transform": "month",
             "source-id": 19,
             "field-id": 1001
           },
           {
             "name": "user_uuid_bucket_256",
             "transform": "bucket[256]",
             "source-id": 5,
             "field-id": 1002
           }
         ]
   }
   ```
   
   When I then attempt to query said table, the call to 
`(*table.Scan).PlanFiles` returns an empty `[]table.FileScanTask`. With Delve & 
GoLand, I believe I've managed to narrow down the issue to the 
`getPartitionRecord` function 
[here](https://github.com/apache/iceberg-go/blob/091352672b4191a4bb11b603c1fb9bd2ab6c2aaf/table/scanner.go#L116)...
   
   ```go
   func getPartitionRecord(dataFile iceberg.DataFile, partitionType 
*iceberg.StructType) partitionRecord {
        partitionData := dataFile.Partition()
   
        out := make(partitionRecord, len(partitionType.FieldList))
        for i, f := range partitionType.FieldList {
                out[i] = partitionData[f.Name]
        }
   
        return out
   }
   ```
   
   It's returning a `partitionRecord` like so...
   
   <img width="315" alt="Image" 
src="https://github.com/user-attachments/assets/1485f26e-2f56-4090-a279-e4ed105d0bb6";
 />
   
   Whereas the first and second element in the slice should be `55` and `663` 
respectively.
   
   If we look at the `Name` field values for the first two elements of 
`partitionType.FieldList`, they're given as 
`event_metadata.timing.created_at_year` and 
`event_metadata.timing.created_at_month`, whereas in the `partitionData` map, 
the keys that correspond to these fields are given as 
`event_metadata_x2Etiming_x2Ecreated_at_year` and 
`event_metadata_x2Etiming_x2Ecreated_at_month`...
   
   
![Image](https://github.com/user-attachments/assets/49dc2ad7-c856-4e21-a789-23af2be893eb)
   
   Consequently, the access to the `partitionData` map by field name is 
returning a `nil` empty interface. Then the partition filter isn't applied 
correctly to the manifest entry and is filtered out, resulting in an empty 
slice of `table.ScanFileTask`.
   
   From what I can tell, the `partitionData` map comes from the 
`iceberg.DataFile` that is instantiated from decoding the Avro manifest entry. 
Placing a breakpoint 
[here](https://github.com/apache/iceberg-go/blob/091352672b4191a4bb11b603c1fb9bd2ab6c2aaf/manifest.go#L489)
 in the `fetchManifestEntries` function I can see the following as the result 
of the decoded Avro...
   
   <img width="1848" alt="Image" 
src="https://github.com/user-attachments/assets/6dbaee1c-c5ee-4e9e-813d-88047d45e339";
 />
   
   Is this maybe some Avro decoding quirk that hasn't been accounted for? I 
don't believe there's any issues with the manner in which the data is being 
written, as I've been able to reproduce this regardless of whether I've written 
the data via Spark or the Iceberg sink connector.
   
   Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to