nandorKollar commented on PR #13976: URL: https://github.com/apache/iceberg/pull/13976#issuecomment-3285675426
Reading is done here: https://github.com/apache/iceberg/blob/main/spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/data/vectorized/parquet/TestParquetVectorizedReads.java#L280 Which you already wrapped in `assertNoLeak`. This method is called from each `writeAndValidate`. There are two observations though: in your commit, `assertNoLeak` always uses a child allocator, which is fine, because it is created for each test case, however the leak here happens in root allocator, as the two problematic readers allocate memory via the root allocator. If I change assertNoLeak to verify that there's no memory leak in root allocator, then that's going to error out, because unfortunately the allocator is a static, and `assertNoLeak` closes it. What we can do though is we simply don't close the root allocator! How does it sound? Something like this: ``` protected void assertNoLeak(Consumer<BufferAllocator> testFunction) { BufferAllocator allocator = ArrowAllocation.rootAllocator(); testFunction.accept(allocator); assertThat(allocator.getAllocatedMemory()) .as( "Should have released all memory prior to closing. Expected to find 0 bytes of memory in use.") .isEqualTo(0L); } ``` Here we check that root allocator has no unallocated memory left after each test case. I'm not sure that this actually covers child allocators too, I suppose it does (though it is easy to add a verification for child allocators here, if it doesn't). What do you thing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
