nandorKollar commented on PR #13976:
URL: https://github.com/apache/iceberg/pull/13976#issuecomment-3271748967

   > @nandorKollar I think you're right that the row id and last updated 
sequence number reader have a leak because it looks like the overall batch 
reader is written in a way where the `reuse` vector _must_ be implemented in 
order to preserve that reference for subsequent cleanup. Otherwise it looks 
like we end up losing it and only end up cleaning the last batch we end up 
reading.
   > 
   > We just may have not been exercising schemas which include this additional 
metadata columns in the recent memory leak fix; we should be able to construct 
a schema with the 2 additional metadata columns for row id and sequence number 
and read those, and reproduce the leak/validate the fix works.
   
   Actually when we close the allocators, which is implemented in this PR, now 
we do have failing tests in Spark (though they're a bit hard to debug why), 
when reuse vector fix in last update sequence, and rowId readers are not there. 
So the original recommendation made by Russel to close the allocators looks 
important. Nevertheless, I can investigate how to add an extra test case just 
for these two vectors, could you please help which test class would be the best 
place for such a test case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to