Prajwal-banakar opened a new pull request, #2864:
URL: https://github.com/apache/fluss/pull/2864
<!--
*Thank you very much for contributing to Fluss - we are happy that you want
to help us improve Fluss. To help the community review your contribution in the
best possible way, please go through the checklist below, which will get the
contribution into a shape in which it can be best reviewed.*
## Contribution Checklist
- Make sure that the pull request corresponds to a [GitHub
issue](https://github.com/apache/fluss/issues). Exceptions are made for typos
in JavaDoc or documentation files, which need no issue.
- Name the pull request in the format "[component] Title of the pull
request", where *[component]* should be replaced by the name of the component
being changed. Typically, this corresponds to the component label assigned to
the issue (e.g., [kv], [log], [client], [flink]). Skip *[component]* if you are
unsure about which is the best component.
- Fill out the template below to describe the changes contributed by the
pull request. That will give reviewers the context they need to do the review.
- Make sure that the change passes the automated tests, i.e., `mvn clean
verify` passes.
- Each pull request should address only one issue, not mix up code from
multiple issues.
**(The sections below can be removed for hotfixes or typos)**
-->
### Purpose
<!-- Linking this pull request to the issue -->
Linked issue: close #2744
<!-- What is the purpose of the change -->
Fixes an OutOfMemoryError: Could not allocate enough memory segments for
NetworkBufferPool that occurred when TaskManagerRunner.startTaskManager was
called in FlinkMetricsITCase (and its Flink-version subclasses
Flink119MetricsITCase, Flink120MetricsITCase, etc.) during sequential IT case
execution in the same JVM fork.
### Brief change log
<!-- Please describe the changes made in this pull request and explain how
they address the issue -->
The root cause is that MiniClusterWithClientResource allocates JVM direct
memory via NetworkBufferPool during before(), and this memory was not reliably
released between test classes, exhausting the JVM direct memory budget for
subsequent classes.
Three changes were made to FlinkMetricsITCase:
beforeAll: Wrap MINI_CLUSTER_EXTENSION.before() in a try/catch that
explicitly calls MINI_CLUSTER_EXTENSION.after() on failure. JUnit 5 does not
invoke @AfterAll when @BeforeAll throws, so without this, any direct memory
partially allocated before the failure would never be freed.
afterAll: Wrap resource cleanup in a try/finally block so that
MINI_CLUSTER_EXTENSION.after() is always called even if admin.close() or
conn.close() throws.
buildTestConfig: Reduce the NetworkBufferPool size from the default 64MB to
32MB via taskmanager.memory.network.min/max. These tests do not exercise
high-throughput network paths, so the smaller size is sufficient and reduces
direct memory pressure when multiple IT cases run in the same JVM fork.
### Tests
<!-- List UT and IT cases to verify this change -->
Flink118MetricsITCase — passes
Flink119MetricsITCase — passes
Flink120MetricsITCase — passes
Flink22MetricsITCase — passes
Full fluss-flink-1.20 module (mvn verify -pl fluss-flink/fluss-flink-1.20
-am) — BUILD SUCCESS (225 IT tests, 0 failures), confirming no regressions
introduced by this change
### API and Format
<!-- Does this change affect API or storage format -->
No API or storage format changes
### Documentation
<!-- Does this change introduce a new feature -->
No new feature introduced. No documentation changes required.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]