RussellSpitzer opened a new pull request, #13935: URL: https://github.com/apache/iceberg/pull/13935
For a very long time, we have leaked direct memory when reading a file whose page encoding changes from dictionary to not dictionary. Dictionary pages are encoded as IntVector while non-dictionary encoded pages could be any of several different representations based on the column type. When the vector type changes, we would silently drop the previous vector without clearing or releasing it. I determined this when attempting to work #13880 that we were having DirectMemory OOM's in some spark tests. After a lot of searching and debugging, I narrowed down the leakage to read tasks which read multiple pages with different encodings. Funnily enough we actually already have a test which checks files with mixed page encodings and that test *would* have failed if it actually checked for memory leaks. In this PR I fully instrument our tests for parquet vectorized reads to actually check for memory leaks and to fail if they do. Without the accompanying patch, the dictionaryMixedPages test fails. Another thing we may want to consider in the future is whether or not we want to close our allocators. Currently the code based has a Heap Memory leak in the ArrowAllocation.rootAllocator(). Every VectroizedReadBuilder creates a new child allocator which is used to allocate vectors for that particular reader but we never close these allocators even if we close all of the vectors we allocate from it. If we did close these, we would have seen these memory issues way earlier since every application would end with a string of "MemoryLeak Detected" messages. I'll put this into a followup issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
