bryanck opened a new pull request, #8834: URL: https://github.com/apache/iceberg/pull/8834
This PR fixes a critical bug in reading split offsets from manifests. A change in https://github.com/apache/iceberg/pull/8336 added caching of the offsets collection in `BaseFile` to avoid reallocation. However, the Avro reader will reuse the same BaseFile object when reading, thus only the offsets from the first entry are allocated, and then those are reused for all other entries. cc @aokolnychyi @RussellSpitzer @rdblue @danielcweeks This can result in corrupted metadata, as rewrite manifests will read in the invalid offsets then write those back. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org