fallintoplace opened a new issue, #1082:
URL: https://github.com/apache/iceberg-go/issues/1082

   ### Apache Iceberg version
   
   main (development)
   
   ### Please describe the bug
   
   `positionDeletePartitionedFanoutWriter` appears to derive partition paths 
using the positional-delete schema rather than the table schema.
   
   In `positionDeletePartitionedFanoutWriter.partitionPath`, the partition type 
and path are currently built from `p.schema`:
   
   ```go
   data := newPartitionRecord(partitionContext.partitionData, 
spec.PartitionType(p.schema))
   return spec.PartitionToPath(data, p.schema), nil
   ```
   
   The writer initializes `p.schema` to `iceberg.PositionalDeleteSchema`. For 
partition specs based on table data columns, those source fields are not 
present in the positional-delete schema. `PartitionSpec.PartitionType` skips 
missing source fields, so the derived partition type can be empty and 
`PartitionToPath` can return an empty path.
   
   That means different target data-file partitions can collapse to the same 
rolling writer key. Since `writerFactory.getOrCreateRollingDataWriter` reuses 
writers by partition path, the delete file can be opened with the first 
partition's metadata and then receive position-delete rows for later target 
files from different partitions.
   
   For example, for a table partitioned by `day`:
   
   | Target data file | Real partition |
   | --- | --- |
   | `file-a.parquet` | `day=2026-05-01` |
   | `file-b.parquet` | `day=2026-05-02` |
   
   If the position-delete writer derives the partition path from 
`iceberg.PositionalDeleteSchema`, both can produce the same empty partition 
path. The rolling writer can then reuse the writer opened for `day=2026-05-01` 
while writing deletes that target `day=2026-05-02`.
   
   The expected behavior is that the positional-delete file schema is still 
used for the delete file contents, but the table schema is used when deriving 
the table partition path from the target data file's partition data.
   
   This looks related to, but distinct from, #767, which fixed nondeterministic 
partition path ordering. This issue is about using the delete schema instead of 
the table schema when deriving the path.
   
   I have contributed to iceberg-python before, but this would be my first 
iceberg-go PR. I would like to work on a fix if this diagnosis sounds right.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to