RussellSpitzer commented on PR #13880: URL: https://github.com/apache/iceberg/pull/13880#issuecomment-3215795421
I couldn't figure out exactly what the memory leak is in our test suite that's causing an Issue but it seems like it's related to the task statuses never getting cleared from the Spark context during the TestRewriteDataFilesAction test suite. Because the suite now runs within an additional config, the number of tasks increased dramatically and I believe this was the base cause of the OOM. I tried disabling the UI but that didn't seem to help in any way, the statuses still stuck around. So I decided to take a different tack and just optimize the test suite instead. The main thing I did is to go through and take all of the "Spark Sorts" and switch them to normal Java collection sorts. This has two outcomes, first the test suite runs much faster since we had adaptive shuffle disabled for this suite and it had to do 200 tasks per sort and because local sort is much faster than using the Spark mechanism. Second, the number of tasks is reduced dramatically which decreases the amount of "Task Status" objects that hang around. If this ends up still being an issue in the future we can either track down the status issue or move some these tests into a different test suite. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
