nandorKollar commented on PR #13976: URL: https://github.com/apache/iceberg/pull/13976#issuecomment-3271748967
> @nandorKollar I think you're right that the row id and last updated sequence number reader have a leak because it looks like the overall batch reader is written in a way where the `reuse` vector _must_ be implemented in order to preserve that reference for subsequent cleanup. Otherwise it looks like we end up losing it and only end up cleaning the last batch we end up reading. > > We just may have not been exercising schemas which include this additional metadata columns in the recent memory leak fix; we should be able to construct a schema with the 2 additional metadata columns for row id and sequence number and read those, and reproduce the leak/validate the fix works. Actually when we close the allocators, which is implemented in this PR, now we do have failing tests in Spark (though they're a bit hard to debug why), when reuse vector fix in last update sequence, and rowId readers are not there. So the original recommendation made by Russel to close the allocators looks important. Nevertheless, I can investigate how to add an extra test case just for these two vectors, could you please help which test class would be the best place for such a test case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
