viirya commented on issue #329:
URL: https://github.com/apache/iceberg-rust/issues/329#issuecomment-2041597213

   > calling the writer to write the DataFile
   create an instance of MergingSnapshotProducer -> responsible for writing the 
manifest, manifest_list, snapshot_update
   commit -> update_table() on the Catalog with TableUpdate & TableRequirements
   
   If any error happens during generating metadata relation info like manifest 
etc., as the writer already wrote DataFiles, should we go to delete the written 
DataFiles?
   
   > I think your understanding is correct - and I agree if the writer API 
already does the conversion from RecordBatch to DataFile, the Transaction 
shouldn't be concerned with this issue, since it is a higher-level API. 
However, the Transaction calls the writer that writes the actual DataFile, 
which seems reasonable.
   
   I think this is also what the python implementation does. In 
`Transaction.append`, it calls `_dataframe_to_data_files` to generate DataFiles 
based on the `pa.Table`.
   
   > we create a Transaction that basically does two things:
   2.1. It creates a _MergingSnapshotProducer which is (on a high-level) 
responsible for writing a new ManifestList, creating a new Snapshot (returned 
as AddSnaphotUpdate)
   
   Yea, specifically, it is a `FastAppendFiles` for appending files. Although 
the manifest commit logic is actually implemented in `_MergingSnapshotProducer`.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to