mattkduran opened a new pull request, #7852:
URL: https://github.com/apache/hadoop/pull/7852
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
The ABFS driver's auto-throttling feature
(`fs.azure.enable.autothrottling=true`) creates Timer threads in
AbfsClientThrottlingAnalyzer that are never properly cleaned up, leading to a
memory leak that eventually causes OutOfMemoryError in long-running
applications like Hive Metastore.
#### Impact:
- Thread count grows indefinitely (observed >100,000 timer threads)
- Affects any long-running service that creates multiple ABFS filesystem
instances
#### Root Cause:
AbfsClientThrottlingAnalyzer creates Timer objects in its constructor but
provides no mechanism to cancel them. When AbfsClient instances are closed, the
associated timer threads continue running indefinitely.
#### Solution
Implement proper resource cleanup by making the throttling components
implement Closeable and ensuring timers are cancelled when ABFS clients are
closed.
#### Changes Made
1. AbfsClientThrottlingAnalyzer.java
- Added: implements Closeable
- Added: close() method that calls timer.cancel() and timer.purge()
- Purpose: Ensures timer threads are properly terminated when analyzer is no
longer needed
2. AbfsThrottlingIntercept.java (Interface)
- Added: extends Closeable
- Added: close() method signature
- Purpose: Establishes cleanup contract for all throttling intercept
implementations
3. AbfsClientThrottlingIntercept.java
- Added: close() method that closes both readThrottler and writeThrottler
- Purpose: Coordinates cleanup of both read and write throttling analyzers
4. AbfsNoOpThrottlingIntercept.java
- Added: No-op close() method
- Purpose: Satisfies interface contract for no-op implementation
5. AbfsClient.java
- Added: IOUtils.cleanupWithLogger(LOG, intercept) in existing close() method
- Purpose: Integrates throttling cleanup into existing client resource
management
https://github.com/mattkduran/ABFSleaktest
https://www.mail-archive.com/[email protected]/msg43483.html
### How was this patch tested?
#### Standalone Validation Tool
This fix was validated using a standalone reproduction and testing tool that
directly exercises the ABFS auto-throttling components outside of a full Hadoop
deployment.
Repository: [ABFSLeakTest](https://github.com/mattkduran/ABFSleaktest)
#### Testing Scope
- Problem reproduction confirmed - demonstrates the timer thread leak
- Fix validation confirmed - proves close() method resolves the leak
- Resource cleanup verified - shows proper timer cancellation
- Limited integration testing - standalone tool, not full Hadoop test suite
#### Test Results
Leak Reproduction Evidence
```
# Without fix: Timer threads accumulate over filesystem creation cycles
Cycle Total Threads ABFS Timer Threads Status
1 50->52 0->2 LEAK DETECTED
50 150->152 98->100 LEAK GROWING
200 250->252 398->400 LEAK CONFIRMED
Final Analysis: 400 leaked timer threads named
"abfs-timer-client-throttling-analyzer-*"
Memory Impact: ~90MB additional heap usage
# Direct analyzer testing:
🔴 Without close(): +3 timer threads (LEAKED)
✅ With close(): +0 timer threads (NO LEAK)
```
#### Test Environment
- Java Version: OpenJDK 11.0.x
- Hadoop Version: 3.3.6/3.4.1 (both affected)
- Test Duration: 200 filesystem creation/destruction cycles
- Thread Monitoring: JMX ThreadMXBean
# Fix effectiveness: 100% - no threads leaked when close() called
### For code changes:
- [ X ] Does the title or this PR starts with the corresponding JIRA issue
id (e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]