zeroshade commented on issue #348: URL: https://github.com/apache/iceberg-go/issues/348#issuecomment-2738339818
> Would you rather have a DeleteFiles method, or rather a MergeFiles method that does both add and delete. Looking a the current code, I'm under the impression that going the DeleteFiles way would force to traverse the whole manifests twice (double the io). A `MergeFiles` method would likely be better, along with direct functions to simply add a new equality or positional delete. > One thing that would be useful, at least to me, would be to expose the function to get Iceberg schema from a Parque file public. That would allow to get rid of this code: https://github.com/agnosticeng/icepq/blob/main/internal/parquet/schema.go. My suggestion there would be to simply use the `pqarrow` package after you read the parquet file to get the Arrow schema representation (this is the way I currently do it inside the package). From there you can generate the iceberg schema. In general I've been trying to isolate the file format specifics from the rest of the logic to facilitate the ease of adding new file types in the future and potentially allowing users to inject their own file handling as long as it meets an interface. > In my current use case, I create the table (and it's schema) but introspecting a bunch of Parquet files. That said, when I add the ability to update the schema, it could make sense to allow `AddFiles` to introspect and determine the schema or otherwise. I'll keep this in mind. > But I see a deletedFiles field in snapshot producer, but it's not filling anywhere. What's the idea behind it ? Right now it's just a place holder. Nothing is populating it yet, but the plumbing is hooked up. So once deleting files is implemented (for compaction, deletions, etc.) it should be easy to hook it up into the snapshot producers and generation of the manifests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org