RussellSpitzer commented on issue #6670: URL: https://github.com/apache/iceberg/issues/6670#issuecomment-1406510853
OK So TLDR from my point of view: Re-used Containers causes the file.partition() object to be the same on every invocation of computeIfAbsent When putting the object the hashcode is used internally to determine the bucket the struct goes in The partition object is placed in the bucket The partition object is then changed by the avro reader for the next manifest entry The previously placed object is in the correct bucket, but now has a different value than it originally had When we request an object that is in a bucket that has already been filled we pull out the same object we are checking for ``` 1. Put PartitionData@1(X,Y) into Map BucketForXY[PartitionData@1(X, Y)] 2. Read new value (A,B) and set in PartitionData@1 3. Check for PartitionData@1(A, B) in Map if HashCode(A,B) == HashCode(X,Y) Return BucketForXY(PartitionData@1(A,B) // Since this is the same in memory Partition Data Object it has the newly read value else return not found ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org