munendrasn commented on PR #13975: URL: https://github.com/apache/iceberg/pull/13975#issuecomment-3285370340
@gaborkaszab @amogh-jahagirdar Thanks for the review. >Do you have any measurements how much extra runtime this puts to the metadata cleanup process? Not at the moment, but it would be function of Manifest files + dataFiles > in the proposed implementation it looks like we're always additionally reading all the manifests just to obtain any sort orders that are referenced The sortOrder used for write is only available at contentFile level as per spec. Except in case of Equality delete, both spark and flink seems to be not adding the sortOrderId to DataFile. Once the DataFile is written, sortOrder if not latest, might not be useful - can that expired? Please let me know if I'm missing any other references or usage. Also, any pointers on improving the current impl is appreciated. >at the very very least this traversal should only happen if the flag for cleaning up additional metadata is set having a flag would be helpful to consumer - can make it configurable to run the expiry sort-order only when required. Should we have this behavior for all other metadata - schema, spec too? >Beyond this being a spec change I assume you are referring to REST Open API spec. Please correct me if that's the not the case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
