viirya commented on issue #329: URL: https://github.com/apache/iceberg-rust/issues/329#issuecomment-2041597213
> calling the writer to write the DataFile create an instance of MergingSnapshotProducer -> responsible for writing the manifest, manifest_list, snapshot_update commit -> update_table() on the Catalog with TableUpdate & TableRequirements If any error happens during generating metadata relation info like manifest etc., as the writer already wrote DataFiles, should we go to delete the written DataFiles? > I think your understanding is correct - and I agree if the writer API already does the conversion from RecordBatch to DataFile, the Transaction shouldn't be concerned with this issue, since it is a higher-level API. However, the Transaction calls the writer that writes the actual DataFile, which seems reasonable. I think this is also what the python implementation does. In `Transaction.append`, it calls `_dataframe_to_data_files` to generate DataFiles based on the `pa.Table`. > we create a Transaction that basically does two things: 2.1. It creates a _MergingSnapshotProducer which is (on a high-level) responsible for writing a new ManifestList, creating a new Snapshot (returned as AddSnaphotUpdate) Yea, specifically, it is a `FastAppendFiles` for appending files. Although the manifest commit logic is actually implemented in `_MergingSnapshotProducer`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org