nastra commented on code in PR #6569:
URL: https://github.com/apache/iceberg/pull/6569#discussion_r1068373480
##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java:
##########
@@ -335,6 +335,7 @@ public DeltaWriter<InternalRow> createWriter(int
partitionId, long taskId) {
OutputFileFactory dataFileFactory =
OutputFileFactory.builderFor(table, partitionId, taskId)
.format(context.dataFileFormat())
+ .operationId(context.queryId())
Review Comment:
https://github.com/apache/iceberg/blob/8c6adf6e5e17603025d23b2012aa576c071ff269/core/src/main/java/org/apache/iceberg/io/OutputFileFactory.java#L90
shows how the file name is being determined, and in the cases where the data
file was overwritten, `partitionId / taskId / operationId` were all the same
(since we manually set the same `operationId` as for data files - previously
the `operationId` for data+delete files was randomly generated).
Maybe we could add a different suffix into the name generation to indicate
that it's a data/delete file (although I'm not sure if there are any
implications to this)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]