RussellSpitzer opened a new pull request, #6680:
URL: https://github.com/apache/iceberg/pull/6680

   Fixes #6670 
   
   When we determine the ResidualEvaluators for manifest entries in file we use 
a computeIfAbsent method using the file object's PartitionData as a key. The 
underlying issue here is that when being read using ManifestReader the 
PartitionData object is reused. This means once placed within the Map the value 
of the PartitionData changes every time a new entry is read. Because the 
original hashcode is correct, this isn't a problem until two values collide. 
Once they do the second key will end up retrieving the value of the first key, 
the underlying key retrieved will then also be equal because of the 
ManifestReader Container re-use. If the First Key was "always true" but the 
second key should be "false" the second key will return true and delete a file 
it should not.
   
   To fix this we only place a copy of the PartitionData inside the map instead 
of the PartitionData object itself. We can't use computeIfAbsent unless we want 
to make a brand new PartitionData object for every entry.
   
   
   To note the reason this has not been hit frequently before is that we must 
hit all of following conditions
   
   * There needs to be a conflict of structlikewrapper.wrap(PartitionData) 
hashcodes
   * The two files with the conflicting partition hashcodes MUST be in the same 
manifest
   * They must have different behaviors for the delete condition (eg if both 
collision values should be delete or not deleted then it isn't a problem)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to