ZENOTME commented on PR #960: URL: https://github.com/apache/iceberg-rust/pull/960#issuecomment-2646905246
> > @liurenjie1024 I believe FastAppend is for directly appending `DataFiles` to the snapshot, whereas this pr takes existing parquet file paths, parses the Parquet metadata and converts it into `DataFiles` which are then fast appended. @ZENOTME Would you like to verify this? > > I have concerns to put this into transaction api. It seems that what's necessary is to build a `DataFile` from exisitng parquet file, then user could call fast append to do it. But this is typically a dangerous operation because the schema in parquet is not verified. Is it possible to use `FileWriter` to write data to parquet in your case? This API is different from FastAppend: - FastAppend is used to append DataFile - This API is used to extract DataFile from existing file and append DataFile In iceberg-python, it's a API in transaction: https://github.com/apache/iceberg-python/blob/dd175aadfdf03df707bed37008f217258a916369/pyiceberg/table/__init__.py#L671. But interestingly, seems I can't find it in iceberg-java. cc @Fokko > But this is typically a dangerous operation because the schema in parquet is not verified. Is it possible to use `FileWriter` to write data to parquet in your case? Yes, we should verify the schema. And it's also done in iceberg-python: https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/pyarrow.py#L2431. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org