blackmwk commented on PR #2590: URL: https://github.com/apache/iceberg-rust/pull/2590#issuecomment-4659480454
> > I think the key point is the design of MergingSnapshotProduder, which contains a lot of indices to speed up the confliction check. > > I think you meant [`DeleteFileIndex`](https://github.com/apache/iceberg/blob/1cea23eda51c9b9ddcfb88dd499b1fd14f3bf3b3/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L624-L625) in java implementation and the `conflictDetectionFilter`. In rust's [`DeleteFileIndex`](https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/delete_file_index.rs#L56), we don't allow filter to be pushed down at this point, so I didn't include that change. > > In the future, we could store the filter in snapshot operations like `RowDeltaAction` and pass the filter to `SnapshotValidator::validate_no_new_deletes` easily. > > The current implementation won't block that change, we will only need to change the API in `SnapshotValidator` after adding conflict_detecting_filter support to `DeleteFileIndex` No, I mean `ManifestMergeManager`, `ManifestFilterManager`, which are critial data structures enable efficient concurrency. Also I don't understand what a crate private `SnapshotValidator` is used for? If you look at java api, each snapshot tx action interface contains interface to allow standalone check. For example, [RowDelta](https://github.com/apache/iceberg/blob/2e94af5d1f7d71848518348c81d858b2209c3751/api/src/main/java/org/apache/iceberg/RowDelta.java#L32) has methods like `validateDeletedFiles`, `validateNoConflictingDataFiles`, `validateNoConflictingDeleteFiles`, but they are just marks about what to check during committing. This design is quite flexible since they allow different checks in differnt isolation levels. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
