zeroshade commented on PR #330:
URL: https://github.com/apache/iceberg-go/pull/330#issuecomment-2718373140

   > There is quite a bit code here, but if I understand it correctly. We first 
write Parquet files, and then add them to the table using `AddFiles`. I think 
this is wrong, since `AddFiles` is meant as a migration feature. This method 
will fetch the footer, check the schema etc. Also, the writers that come with 
Iceberg-go, should always produce the Field-IDs.
   
   This *only* implements the migration feature of `AddFiles`. This does not 
yet implement any writing of parquet files by `iceberg-go` outside of the unit 
tests writing parquet files without field ids for testing the `AddFiles` 
function.
   
   > Instead, when we write Parquet, we directly want to convert that into a 
DataFile, and then we can add them to the table using add_file
   
   Yup, i plan on doing that as future work. It was an easier code path to put 
together the `AddFiles` migration functions than to implement all of the work 
for splitting up records and tables for determining when and how to write out 
the parquet files. 
   
   When I changed the `Append` to just return a NotImplemented error a whole 
bunch of code for the manifest merging was now unused, so I've removed that 
from this PR to simplify it more. Is that sufficient to reduce this to make it 
easier to review? Otherwise I can look into splitting it more along what you've 
suggested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to