gerinsp opened a new issue, #1207:
URL: https://github.com/apache/iceberg-go/issues/1207
### Feature Request / Improvement
iceberg-go already has low-level support for equality deletes via
Transaction.WriteEqualityDeletes, and RowDelta can commit those delete files.
For CDC/upsert workloads, callers currently need to manually compose the two
step:
```go
deleteFiles, err := tx.WriteEqualityDeletes(ctx, equalityFieldIDs, records)
if err != nil {
return err
}
rd := tx.NewRowDelta(snapshotProps)
rd.AddDeletes(deleteFiles...)
err = rd.Commit(ctx)
```
Would maintainers be open to a small convenience API that wraps this flow,
for example:
```go
func (t *Transaction) DeleteEquality(
ctx context.Context,
equalityFieldIDs []int,
records iter.Seq2[arrow.RecordBatch, error],
snapshotProps iceberg.Properties,
) error
```
This would not change `Delete(filter)` semantics. it would only expose an
explicit delete-by-equality-record path for CDC-style deletes where the caller
already has the equality key values.
I have a downstream CDC/streaming engine using the same Iceberg
equality-delete pattern, and have validated the generated delete files with
Spark/Trino readers.
If this direction sound reasonable, I'd be happy to contribute it with
focused tests for:
- format v2 requirement
- empty equality field IDs
- upartitioned equality deletes
- partitioned equality deletes
- scan applying the committed equality deletes
- snapshot summary/delete counts
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]