ForeverAngry commented on issue #2130:
URL:
https://github.com/apache/iceberg-python/issues/2130#issuecomment-3006410326
So @MrDerecho , are you saying that you are looking for a function that can
remove a `DataFile` entry, and then create a new snapshot with an updated
`ManifestFile`?
MrDerecho commented on issue #2130:
URL:
https://github.com/apache/iceberg-python/issues/2130#issuecomment-2997900165
@kevinjqliu, for context I am referring to Trino (Athena) tables can deal
with duplicate files referenced in the metadata- other upstream consumers i.e.
snowflake external
jayceslesar commented on issue #2130:
URL:
https://github.com/apache/iceberg-python/issues/2130#issuecomment-2994344644
I think also the more files you can add in a single call for the
`file_paths` argument, the more performant it will be as we have to re-compute
the known data files for t
kevinjqliu commented on issue #2130:
URL:
https://github.com/apache/iceberg-python/issues/2130#issuecomment-2994346349
@MrDerecho @ForeverAngry can you help me understand the use case and
expected behavior?
> on occasion, there will be a duplicate file, I load so many files that I
kevinjqliu commented on issue #2130:
URL:
https://github.com/apache/iceberg-python/issues/2130#issuecomment-2994346640
I think a pseducode snippet of how youre using add_files would be really
helpful here!
--
This is an automated message from the Apache Git Service.
To respond to the me
jayceslesar commented on issue #2130:
URL:
https://github.com/apache/iceberg-python/issues/2130#issuecomment-2993695822
Looks like the performance hit comes from
https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L850
--
This is an automated message from the A
ForeverAngry commented on issue #2130:
URL:
https://github.com/apache/iceberg-python/issues/2130#issuecomment-2993011282
Thanks for raising this @MrDerecho , this is something my team members have
to deal with frequently, due to how we approach the use of `add_files`. Nice
to know that it
MrDerecho opened a new issue, #2130:
URL: https://github.com/apache/iceberg-python/issues/2130
### Feature Request / Improvement
I use pyiceberg add_files to perform enterprise-grade ETL loading and
backfilling of iceberg tables- on occasion, there will be a duplicate file, I
load so