munendrasn commented on PR #13975:
URL: https://github.com/apache/iceberg/pull/13975#issuecomment-3285370340

   @gaborkaszab @amogh-jahagirdar 
   Thanks for the review. 
   
   >Do you have any measurements how much extra runtime this puts to the 
metadata cleanup process?
   
   Not at the moment, but it would be function of Manifest files + dataFiles 
   
   > in the proposed implementation it looks like we're always additionally 
reading all the manifests just to obtain any sort orders that are referenced
   
   The sortOrder used for write is only available at contentFile level as per 
spec. Except in case of Equality delete, both spark and flink seems to be not 
adding the sortOrderId to DataFile.
   Once the DataFile is written, sortOrder if not latest, might not be useful - 
can that expired? Please let me know if I'm missing any other references or 
usage.
   Also, any pointers on improving the current impl is appreciated.
   
   >at the very very least this traversal should only happen if the flag for 
cleaning up additional metadata is set
   
   having a flag would be helpful to consumer - can make it configurable to run 
the expiry sort-order only when required. Should we have this behavior for all 
other metadata - schema, spec too?
   
   >Beyond this being a spec change
   
   I assume you are referring to REST Open API spec. Please correct me if 
that's the not the case. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to