jeremybarner opened a new pull request, #1183:
URL: https://github.com/apache/iceberg-go/pull/1183

   ## Problem
   
   `processPositionalDeletes` applies positional (merge-on-read) deletes one 
Arrow record batch at a time, but `combinePositionalDeletes` builds the 
surviving-row indices in **global, file-relative** coordinates `[start, end)` 
and passes them straight into `compute.Take` against the **current batch**, 
whose valid index range is `[0, NumRows)`.
   
   For the first batch of a data file `start == 0`, so it works. For the second 
and later batches `start > 0`, every index is `>= NumRows` and the scan fails:
   
   ```
   index error: <N> out of bounds
   ```
   
   where `<N>` is a multiple of the parquet read batch size 
(`read.parquet.batch-size`, default `1<<17 == 131072`). So any data file larger 
than one batch that also has positional delete files fails to scan at its 
second batch.
   
   ## Fix
   
   Rebase the Take indices to batch-local coordinates (`i - start`). The 
`deletes` set stays in global coordinates because it is matched against the 
global position `i`.
   
   ```go
   for i := start; i < end; i++ {
       if _, ok := deletes[i]; !ok {
           bldr.Append(i - start)
       }
   }
   ```
   
   ## Test
   
   Adds `TestProcessPositionalDeletesAcrossBatches`, which feeds two 
consecutive batches with a delete located in the **second** batch — the case 
the previous code got wrong. The test fails with `index error: 4 out of bounds` 
before the fix and passes after.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to