RussellSpitzer opened a new pull request, #6680: URL: https://github.com/apache/iceberg/pull/6680
Fixes #6670 When we determine the ResidualEvaluators for manifest entries in file we use a computeIfAbsent method using the file object's PartitionData as a key. The underlying issue here is that when being read using ManifestReader the PartitionData object is reused. This means once placed within the Map the value of the PartitionData changes every time a new entry is read. Because the original hashcode is correct, this isn't a problem until two values collide. Once they do the second key will end up retrieving the value of the first key, the underlying key retrieved will then also be equal because of the ManifestReader Container re-use. If the First Key was "always true" but the second key should be "false" the second key will return true and delete a file it should not. To fix this we only place a copy of the PartitionData inside the map instead of the PartitionData object itself. We can't use computeIfAbsent unless we want to make a brand new PartitionData object for every entry. To note the reason this has not been hit frequently before is that we must hit all of following conditions * There needs to be a conflict of structlikewrapper.wrap(PartitionData) hashcodes * The two files with the conflicting partition hashcodes MUST be in the same manifest * They must have different behaviors for the delete condition (eg if both collision values should be delete or not deleted then it isn't a problem) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org