laskoviymishka commented on code in PR #1184:
URL: https://github.com/apache/iceberg-go/pull/1184#discussion_r3395349624
##########
table/internal/utils.go:
##########
@@ -330,7 +337,11 @@ func (d *DataFileStatistics) ToDataFile(opts DataFileOpts)
iceberg.DataFile {
bldr.EqualityFieldIDs(d.EqualityFieldIDs)
}
- bldr.SortOrderID(opts.SortOrderID)
+ // Position deletes are ordered by (file_path, pos), never by a table
sort
+ // order; the spec requires their sort order id to stay null.
+ if opts.SortOrderID != unsortedSortOrderID && opts.Content !=
iceberg.EntryContentPosDeletes {
Review Comment:
This one `&&` is folding two invariants that want different handling. The
`!= unsortedSortOrderID` arm is "caller made no claim" — fine to skip silently.
But `Content == EntryContentPosDeletes && SortOrderID != 0` is a spec
violation, and right now it gets silently swallowed at the lowest layer rather
than surfaced.
I'd split it into two sequential guards: first, if the content is
pos-delete, skip stamping (and ideally assert `SortOrderID == 0`, since a
non-zero claim here means a bug upstream); then, separately, if the order is
non-unsorted, stamp it. That way the prohibited-claim case is detected instead
of masked. wdyt?
##########
table/rolling_data_writer_test.go:
##########
@@ -313,8 +313,8 @@ func (s *RollingDataWriterTestSuite)
TestWriterFactoryPropagatesSortOrderID() {
}
s.Require().Len(dataFiles, 1)
- s.Require().NotNil(dataFiles[0].SortOrderID(), "SortOrderID must be set
on the emitted DataFile")
- s.Equal(expectedSortOrderID, *dataFiles[0].SortOrderID())
+ s.Nil(dataFiles[0].SortOrderID(),
Review Comment:
These inversions correctly assert the defaults don't stamp, but nothing
exercises the explicit-claim path the other direction. Nothing constructs a
`WriteTask` with a non-zero `SortOrderID`, runs it through
`defaultDataFileWriter.writeFile`, and asserts `DataFile.SortOrderID()` equals
the input.
That's the one path where the field is actually load-bearing, and it's
currently untested — if the `writeFile` wiring broke, no test would catch it.
I'd add that integration test before merge.
##########
table/writer.go:
##########
@@ -38,6 +38,9 @@ type WriteTask struct {
FileCount int // FileCount is a sequential counter for files written
by this task.
Schema *iceberg.Schema
Batches []arrow.RecordBatch
+ // SortOrderID, when non-zero, is recorded on the resulting file as its
Review Comment:
This comment overpromises. It reads as unconditional, but it's not: the
field is honored on the `defaultDataFileWriter.writeFile` path and completely
ignored on the `writerFactory`/`RollingDataWriter` path (`openFileWriter`
builds its own `WriteFileInfo` and never reads it), and it's also suppressed
for pos-delete content by the `ToDataFile` guard.
A caller going through `WriteRecords` who sets this field gets a silent
no-op. I'd qualify the doc to say it only takes effect on the batch/`writeFile`
path and that pos-delete content drops it per spec, so the next person doesn't
wire it up expecting it to work everywhere. Fix the doc and this is good to
land.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]