RussellSpitzer commented on issue #6670:
URL: https://github.com/apache/iceberg/issues/6670#issuecomment-1406510853

   OK So TLDR from my point of view:
   
   Re-used Containers causes the file.partition() object to be the same on 
every invocation of computeIfAbsent
   When putting the object the hashcode is used internally to determine the 
bucket the struct goes in
   The partition object is placed in the bucket
   The partition object is then changed by the avro reader for the next 
manifest entry
   The previously placed object is in the correct bucket, but now has a 
different value than it originally had
   When we request an object that is in a bucket that has already been filled 
we pull out the same object we are checking for
   
   ```
   1. Put PartitionData@1(X,Y) into Map
   
   BucketForXY[PartitionData@1(X, Y)]
   
   2. Read new value (A,B) and set in PartitionData@1
   3. Check for PartitionData@1(A, B) in Map
   
   if HashCode(A,B) == HashCode(X,Y)
      Return BucketForXY(PartitionData@1(A,B) // Since this is the same in 
memory Partition Data Object it has the newly read value
   else 
      return not found
      ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to