laskoviymishka opened a new issue, #1050:
URL: https://github.com/apache/iceberg-go/issues/1050

   Parent: #589.
   
   `PlanFiles` builds `dvIndex` by iterating DV manifest entries and grouping 
by `ReferencedDataFile()`. Two gaps relative to Java's `DeleteFileIndex`:
   
   The first is a missing sequence-number guard. The spec says a DV applies to 
a data file only when the data file's `data_sequence_number` is less than or 
equal to the DV's `data_sequence_number`. Java's `DeleteFileIndex.findDV` 
enforces this with a `ValidationException`. The Go `dvIndex` build skips the 
check, so a stale DV from a prior epoch paired with a newer data file is 
silently applied — a silent over-deletion path that only triggers on malformed 
manifests, but the guard is cheap.
   
   The second is that multiple DVs per data file are silently unioned. Java's 
`Builder.add` errors with `ValidationException("Can't index multiple DVs for 
%s")`. The Go `dvIndex` is `map[string][]iceberg.DataFile`, so any number of 
entries are accepted per path. The scanner's `readAllDeletionVectors` 
defensively rejects this at read time as of #996, but the canonical home for 
the check is planning — that way callers inspecting 
`FileScanTask.DeletionVectorFiles` directly see only validated state.
   
   One PR naturally: both checks live in the same `dvIndex` construction loop. 
The loop was last edited by #996, so this work text-conflicts with that PR's 
pos-delete suppression block — rebase if both are in flight together.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to