tanmayrauth commented on code in PR #1021:
URL: https://github.com/apache/iceberg-go/pull/1021#discussion_r3204693327


##########
table/snapshot_producers.go:
##########
@@ -733,7 +733,20 @@ func (sp *snapshotProducer) manifestProducer(content 
iceberg.ManifestContent, fi
                }
                defer internal.CheckedClose(wr, &err)
 
+               // For v3 data manifests, assign first_row_id to each data file.
+               // Each file claims a contiguous range of row IDs starting from 
NextRowID.
+               assignRowIDs := sp.txn.meta.formatVersion >= 3 && content == 
iceberg.ManifestContentData
+               nextRowID := sp.txn.meta.NextRowID()

Review Comment:
    Yeah, this was a real correctness issue. The manifest-list rebuild on retry 
intentionally treats data manifests as immutable, and baking in per-file IDs at 
producer-build time breaks that invariant. On contention, stale IDs would get 
rejected by validateAndUpdateRowLineage deterministically,  making retries 
useless.                             



##########
table/snapshot_producers.go:
##########
@@ -733,7 +733,20 @@ func (sp *snapshotProducer) manifestProducer(content 
iceberg.ManifestContent, fi
                }
                defer internal.CheckedClose(wr, &err)
 
+               // For v3 data manifests, assign first_row_id to each data file.
+               // Each file claims a contiguous range of row IDs starting from 
NextRowID.
+               assignRowIDs := sp.txn.meta.formatVersion >= 3 && content == 
iceberg.ManifestContentData

Review Comment:
   You're correct, and I should have read the spec more carefully here. I went 
back and re-read the "First Row ID Inheritance" section - ADDED files carry 
null, manifest-list writer assigns the manifest-level value, readers derive 
per-file from record_count sums. We already implement both halves of that  
(list writer at manifest.go:1508, reader at manifest.go:785). So this was just 
duplicating existing machinery while also creating a second source of truth 
that would break Java replay.    



##########
manifest.go:
##########
@@ -2187,6 +2187,19 @@ func (b *DataFileBuilder) FirstRowID(id int64) 
*DataFileBuilder {
        return b
 }
 
+// SetDataFileFirstRowID sets the first_row_id on an existing DataFile.
+// This is used by the snapshot producer to assign row IDs at write time for 
v3 tables.
+// Returns false if the DataFile implementation does not support mutation.
+func SetDataFileFirstRowID(df DataFile, id int64) bool {

Review Comment:
   Yeah, you're right - this was a hack. Type-asserting to the unexported 
concrete type and silently returning false is not a contract anyone should have 
to reason about. I've removed it entirely in the latest push since the per-file 
assignment approach is gone now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to