gabeiglio opened a new pull request, #3011:
URL: https://github.com/apache/iceberg-python/pull/3011

   <!--
   Thanks for opening a pull request!
   -->
   
   <!-- In the case this PR will resolve an issue, please replace 
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
   <!-- Closes #${GITHUB_ISSUE_ID} -->
   
   # Rationale for this change
   
   Doing some performance tests for overwriting partitions, we noticed that 
PyIceberg took double the time it usually takes java based implementation, we 
noticed that `_exisiting_manifests` doe not take advantage of manifest pruning 
before reading all Manifest Entries 
   
   In this PR I:
   - Moved methods from _DeleteFiles to _SnapshotProducer parent class to share 
with other classes (_OverwriteFiles)
   - Implemented manifest pruning over all deleted files partitions to not read 
manifests that do not match file partitions
   - Refactored the method to only iterate once over all files (instead of 
multiple)
   
   ## Are these changes tested?
   
   I believe current tests in tests/integration/test_writes.py cover all cases
   
   ## Are there any user-facing changes?
   
   Nope
   
   <!-- In the case of user-facing changes, please add the changelog label. -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to