ForeverAngry commented on issue #2130: URL: https://github.com/apache/iceberg-python/issues/2130#issuecomment-3006410326
So @MrDerecho , are you saying that you are looking for a function that can remove a `DataFile` entry, and then create a new snapshot with an updated `ManifestFile`? Does this illustration capture the problem? ```mermaid graph TD subgraph Iceberg Table Metadata manifest1["ManifestFile"] snapshot1["Snapshot"] dataFile1["DataFile A"] dataFile2["DataFile B"] parquetFile["Parquet File (s3://bucket/path/to/data.parquet)"] end snapshot1 --> manifest1 manifest1 --> dataFile1 manifest1 --> dataFile2 dataFile1 --> parquetFile dataFile2 --> parquetFile note1["Note: Both DataFile A and B point to the same Parquet file"] note1 --- parquetFile ``` Which i think could include situations like this as well: ```mermaid graph TD subgraph Iceberg Table Metadata snapshot1["Snapshot"] manifest1["ManifestFile A"] manifest2["ManifestFile B"] dataFile1["DataFile A (in Manifest A)"] dataFile2["DataFile B (in Manifest B)"] parquetFile["Parquet File (s3://bucket/path/to/data.parquet)"] end snapshot1 --> manifest1 snapshot1 --> manifest2 manifest1 --> dataFile1 manifest2 --> dataFile2 dataFile1 --> parquetFile dataFile2 --> parquetFile note1["Note: Both Manifest Files refer to DataFiles that share the same physical Parquet file"] note1 --- parquetFile ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org