[PR] [flink][test] Fix OOM when startTaskManager in FlinkMetricsITCase [fluss]

via GitHub Fri, 13 Mar 2026 03:33:50 -0700


Prajwal-banakar opened a new pull request, #2864:
URL: https://github.com/apache/fluss/pull/2864


   <!--
   *Thank you very much for contributing to Fluss - we are happy that you want 
to help us improve Fluss. To help the community review your contribution in the 
best possible way, please go through the checklist below, which will get the 
contribution into a shape in which it can be best reviewed.*
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GitHub 
issue](https://github.com/apache/fluss/issues). Exceptions are made for typos 
in JavaDoc or documentation files, which need no issue.
   
     - Name the pull request in the format "[component] Title of the pull 
request", where *[component]* should be replaced by the name of the component 
being changed. Typically, this corresponds to the component label assigned to 
the issue (e.g., [kv], [log], [client], [flink]). Skip *[component]* if you are 
unsure about which is the best component.
   
     - Fill out the template below to describe the changes contributed by the 
pull request. That will give reviewers the context they need to do the review.
   
     - Make sure that the change passes the automated tests, i.e., `mvn clean 
verify` passes.
   
     - Each pull request should address only one issue, not mix up code from 
multiple issues.
   
   
   **(The sections below can be removed for hotfixes or typos)**
   -->
   
   ### Purpose
   
   <!-- Linking this pull request to the issue -->
   Linked issue: close #2744
   
   <!-- What is the purpose of the change -->
   Fixes an OutOfMemoryError: Could not allocate enough memory segments for 
NetworkBufferPool that occurred when TaskManagerRunner.startTaskManager was 
called in FlinkMetricsITCase (and its Flink-version subclasses 
Flink119MetricsITCase, Flink120MetricsITCase, etc.) during sequential IT case 
execution in the same JVM fork.
   
   ### Brief change log
   
   <!-- Please describe the changes made in this pull request and explain how 
they address the issue -->
   The root cause is that MiniClusterWithClientResource allocates JVM direct 
memory via NetworkBufferPool during before(), and this memory was not reliably 
released between test classes, exhausting the JVM direct memory budget for 
subsequent classes.
   Three changes were made to FlinkMetricsITCase:
   
   beforeAll: Wrap MINI_CLUSTER_EXTENSION.before() in a try/catch that 
explicitly calls MINI_CLUSTER_EXTENSION.after() on failure. JUnit 5 does not 
invoke @AfterAll when @BeforeAll throws, so without this, any direct memory 
partially allocated before the failure would never be freed.
   afterAll: Wrap resource cleanup in a try/finally block so that 
MINI_CLUSTER_EXTENSION.after() is always called even if admin.close() or 
conn.close() throws.
   buildTestConfig: Reduce the NetworkBufferPool size from the default 64MB to 
32MB via taskmanager.memory.network.min/max. These tests do not exercise 
high-throughput network paths, so the smaller size is sufficient and reduces 
direct memory pressure when multiple IT cases run in the same JVM fork.
   
   ### Tests
   
   <!-- List UT and IT cases to verify this change -->
   Flink118MetricsITCase — passes
   Flink119MetricsITCase — passes
   Flink120MetricsITCase — passes
   Flink22MetricsITCase — passes
   Full fluss-flink-1.20 module (mvn verify -pl fluss-flink/fluss-flink-1.20 
-am) — BUILD SUCCESS (225 IT tests, 0 failures), confirming no regressions 
introduced by this change
   ### API and Format
   
   <!-- Does this change affect API or storage format -->
   No API or storage format changes
   ### Documentation
   
   <!-- Does this change introduce a new feature -->
   No new feature introduced. No documentation changes required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [flink][test] Fix OOM when startTaskManager in FlinkMetricsITCase [fluss]

Reply via email to