zeroshade commented on PR #330: URL: https://github.com/apache/iceberg-go/pull/330#issuecomment-2718373140
> There is quite a bit code here, but if I understand it correctly. We first write Parquet files, and then add them to the table using `AddFiles`. I think this is wrong, since `AddFiles` is meant as a migration feature. This method will fetch the footer, check the schema etc. Also, the writers that come with Iceberg-go, should always produce the Field-IDs. This *only* implements the migration feature of `AddFiles`. This does not yet implement any writing of parquet files by `iceberg-go` outside of the unit tests writing parquet files without field ids for testing the `AddFiles` function. > Instead, when we write Parquet, we directly want to convert that into a DataFile, and then we can add them to the table using add_file Yup, i plan on doing that as future work. It was an easier code path to put together the `AddFiles` migration functions than to implement all of the work for splitting up records and tables for determining when and how to write out the parquet files. When I changed the `Append` to just return a NotImplemented error a whole bunch of code for the manifest merging was now unused, so I've removed that from this PR to simplify it more. Is that sufficient to reduce this to make it easier to review? Otherwise I can look into splitting it more along what you've suggested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org