zeroshade commented on issue #348:
URL: https://github.com/apache/iceberg-go/issues/348#issuecomment-2738339818

   > Would you rather have a DeleteFiles method, or rather a MergeFiles method 
that does both add and delete.
   Looking a the current code, I'm under the impression that going the 
DeleteFiles way would force to traverse the whole manifests twice (double the 
io).
   
   A `MergeFiles` method would likely be better, along with direct functions to 
simply add a new equality or positional delete.
   
   > One thing that would be useful, at least to me, would be to expose the 
function to get Iceberg schema from a Parque file public.
   That would allow to get rid of this code: 
https://github.com/agnosticeng/icepq/blob/main/internal/parquet/schema.go.
   
   My suggestion there would be to simply use the `pqarrow` package after you 
read the parquet file to get the Arrow schema representation (this is the way I 
currently do it inside the package). From there you can generate the iceberg 
schema. In general I've been trying to isolate the file format specifics from 
the rest of the logic to facilitate the ease of adding new file types in the 
future and potentially allowing users to inject their own file handling as long 
as it meets an interface. 
   
   > In my current use case, I create the table (and it's schema) but 
introspecting a bunch of Parquet files.
   
   That said, when I add the ability to update the schema, it could make sense 
to allow `AddFiles` to introspect and determine the schema or otherwise. I'll 
keep this in mind.
   
   >  But I see a deletedFiles field in snapshot producer, but it's not filling 
anywhere.
   What's the idea behind it ?
   
   Right now it's just a place holder. Nothing is populating it yet, but the 
plumbing is hooked up. So once deleting files is implemented (for compaction, 
deletions, etc.) it should be easy to hook it up into the snapshot producers 
and generation of the manifests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to