ZENOTME commented on PR #960:
URL: https://github.com/apache/iceberg-rust/pull/960#issuecomment-2646905246

   > > @liurenjie1024 I believe FastAppend is for directly appending 
`DataFiles` to the snapshot, whereas this pr takes existing parquet file paths, 
parses the Parquet metadata and converts it into `DataFiles` which are then 
fast appended. @ZENOTME Would you like to verify this?
   > 
   > I have concerns to put this into transaction api. It seems that what's 
necessary is to build a `DataFile` from exisitng parquet file, then user could 
call fast append to do it. But this is typically a dangerous operation because 
the schema in parquet is not verified. Is it possible to use `FileWriter` to 
write data to parquet in your case?
   
   This API is different from FastAppend:
   - FastAppend is used to append DataFile
   - This API is used to extract DataFile from existing file and append 
DataFile 
   
   In iceberg-python, it's a API in transaction: 
https://github.com/apache/iceberg-python/blob/dd175aadfdf03df707bed37008f217258a916369/pyiceberg/table/__init__.py#L671.
   But interestingly, seems I can't find it in iceberg-java. cc @Fokko 
   
   > But this is typically a dangerous operation because the schema in parquet 
is not verified. Is it possible to use `FileWriter` to write data to parquet in 
your case?
   
   Yes, we should verify the schema. And it's also done in iceberg-python: 
https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/pyarrow.py#L2431.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to