Jinwoo Hwang created GEODE-10520:
------------------------------------
Summary: Security: Eliminate DirectBuffer Access to sun.nio.ch
Internal Package
Key: GEODE-10520
URL: https://issues.apache.org/jira/browse/GEODE-10520
Project: Geode
Issue Type: Improvement
Reporter: Jinwoo Hwang
Assignee: Jinwoo Hwang
Fix For: 2.1.0
h2. Summary
UnsafeThreadLocal currently uses reflection to access private internals of
{{java.lang.ThreadLocal}} and {{{}java.lang.ThreadLocalMap{}}}, requiring the
JVM flag {{{}--add-opens=java.base/java.lang=ALL-UNNAMED{}}}. This creates
maintenance burden, security concerns, and compatibility risks with future Java
versions.
h2. Description
h3. Current Implementation
The {{UnsafeThreadLocal}} class (located in
{{{}geode-core/src/main/java/org/apache/geode/distributed/internal/deadlock/UnsafeThreadLocal.java{}}})
extends {{ThreadLocal<T>}} and provides a {{get(Thread)}} method that allows
reading ThreadLocal values from arbitrary threads - a capability needed for
deadlock detection.
The current implementation uses reflection to:
# Access the private {{getMap(Thread)}} method of ThreadLocal
# Access the private {{getEntry(ThreadLocal)}} method of ThreadLocalMap
# Access the private {{value}} field of ThreadLocalMap.Entry
*Code excerpt showing reflection usage:*
{code:java}
private static Object get(ThreadLocal threadLocal, Thread thread) {
try {
Object threadLocalMap = invokePrivate(threadLocal, "getMap",
new Class[] {Thread.class}, new Object[] {thread});
if (threadLocalMap != null) {
Object entry = invokePrivate(threadLocalMap, "getEntry",
new Class[] {ThreadLocal.class}, new Object[] {threadLocal});
if (entry != null) {
return getPrivate(entry, "value");
}
}
return null;
} catch (Exception e) {
throw new RuntimeException("Unable to extract thread local", e);
}
}
{code}
h3. Required JVM Flags
To enable this reflection access, Geode requires the following JVM flag
(defined in {{{}MemberJvmOptions.java{}}}):
{code:java}
private static final String JAVA_LANG_OPEN =
"--add-opens=java.base/java.lang=ALL-UNNAMED";
{code}
This flag must be passed to every Geode member JVM, adding operational
complexity.
h3. Usage Context
UnsafeThreadLocal is used in three key areas for deadlock detection:
# *DLockService* ({{{}geode-core/.../locks/DLockService.java{}}})
** Tracks what distributed lock a thread is blocked on
** Used by: {{private final UnsafeThreadLocal<Object> blockedOn}}
# *MessageDependencyMonitor*
({{{}geode-core/.../deadlock/MessageDependencyMonitor.java{}}})
** Tracks ReplyProcessor21 instances threads are waiting for
** Tracks MessageWithReply instances being processed
** Used by: {{private final UnsafeThreadLocal<ReplyProcessor21>
waitingProcessors}}
** Used by: {{private final UnsafeThreadLocal<MessageWithReply>
processingMessages}}
# *DLockDependencyMonitor*
({{{}geode-core/.../deadlock/DLockDependencyMonitor.java{}}})
** Queries blockedOn status from DLockService for dependency graph construction
h2. Problems with Current Approach
h3. 1. Java Module System Violations
*Severity: High*
The reflection-based approach directly violates the Java Platform Module System
(JPMS) encapsulation:
* Accesses internal implementation details of {{java.lang}} package
* Bypasses the module system's access controls
* Requires explicit {{--add-opens}} directive to override module boundaries
{*}Risk{*}: Future Java versions may further restrict or remove the ability to
open internal packages, even with command-line flags.
h3. 2. Maintenance and Fragility
*Severity: High*
The implementation is brittle and depends on:
* Internal class names remaining unchanged (ThreadLocalMap, Entry)
* Internal method signatures staying stable (getMap, getEntry)
* Internal field names not changing (value)
* Class hierarchy not being refactored
Any changes to ThreadLocal internals in future JDK releases could break Geode
without warning, as these are non-public APIs with no stability guarantees.
h3. 3. Security Concerns
*Severity: Medium*
Using {{{}--add-opens=java.base/java.lang=ALL-UNNAMED{}}}:
* Opens the entire {{java.lang}} package to all unnamed modules
* Grants broader access than necessary (only need ThreadLocal access)
* Weakens Java's security model and encapsulation guarantees
* May trigger security audit flags in enterprise environments
h3. 4. Operational Complexity
*Severity: Medium*
Current deployment requirements:
* JVM flags must be correctly configured on all Geode servers
* Different flags needed for Java 8 vs. Java 11+ deployments
* Documentation and deployment scripts must maintain flag lists
* Missing flags cause runtime failures that are difficult to diagnose
>From {{{}MemberJvmOptions.java{}}}:
{code:java}
static final List<String> JAVA_11_OPTIONS = Arrays.asList(
COM_SUN_JMX_REMOTE_SECURITY_EXPORT,
SUN_NIO_CH_EXPORT,
COM_SUN_MANAGEMENT_INTERNAL_OPEN,
JAVA_LANG_OPEN, // ← Required only for UnsafeThreadLocal
JAVA_NIO_OPEN);
{code}
h3. 5. Performance Overhead
*Severity: Low*
Reflection calls have performance costs:
* Method lookup via reflection on each invocation
* Access permission checks
* Potential JIT optimization barriers
* Exception handling overhead
While not critical for deadlock detection (which is infrequent), it's still
unnecessary overhead.
h3. 6. Testing and CI/CD Complexity
*Severity: Low*
Test infrastructure must:
* Configure JVM flags correctly for all test executions
* Maintain separate flag lists for different Java versions
* Handle flag propagation in parallel test runs
* Document requirements for external contributors
h2. Business Impact
h3. Reduced Operational Risk
Eliminating the need for {{--add-opens}} flags:
* Simplifies deployment procedures
* Reduces configuration errors in production
* Removes a potential point of failure during Java upgrades
* Improves compatibility with container orchestration platforms
h3. Better Java Version Compatibility
A non-reflection implementation:
* Works seamlessly across all Java versions (8, 11, 17, 21+)
* Doesn't depend on JDK internal implementation details
* Reduces risk during major Java version upgrades
* Aligns with Java's "code to interfaces, not implementations" principle
h3. Enhanced Security Posture
Removing reflection-based access:
* Eliminates the need to weaken module system protections
* Reduces attack surface by not opening internal packages
* Simplifies security audits and compliance reviews
* Follows principle of least privilege
h3. Improved Developer Experience
A cleaner implementation:
* Easier to understand and maintain
* No special JVM configuration needed for development
* Simpler IDE setup and debugging
* Reduces onboarding friction for new contributors
h3. Long-Term Maintainability
Moving away from internal APIs:
* Protects against breaking changes in future JDK releases
* Reduces technical debt
* Makes the codebase more sustainable
* Demonstrates modern Java best practices
h2. Affected Components
h3. Files Modified in This Refactoring
*
{{geode-core/src/main/java/org/apache/geode/distributed/internal/deadlock/UnsafeThreadLocal.java}}
*
{{geode-gfsh/src/main/java/org/apache/geode/management/internal/cli/commands/MemberJvmOptions.java}}
h3. Files Using UnsafeThreadLocal (Not Modified)
These files use {{UnsafeThreadLocal}} but require no changes as the public API
remains unchanged:
*
{{geode-core/src/main/java/org/apache/geode/distributed/internal/locks/DLockService.java}}
- Uses {{UnsafeThreadLocal<Object> blockedOn}} for deadlock detection
*
{{geode-core/src/main/java/org/apache/geode/distributed/internal/deadlock/MessageDependencyMonitor.java}}
- Uses two UnsafeThreadLocal instances for tracking message dependencies
*
{{geode-core/src/main/java/org/apache/geode/distributed/internal/deadlock/DLockDependencyMonitor.java}}
- Queries UnsafeThreadLocal values for dependency graph construction
h3. Test Files (Executed, Not Modified)
*
{{geode-core/src/test/java/org/apache/geode/distributed/internal/deadlock/UnsafeThreadLocalJUnitTest.java}}
- All tests pass with new implementation
h2. Scope
* Refactoring UnsafeThreadLocal implementation to eliminate reflection
* Updating MemberJvmOptions to remove JAVA_LANG_OPEN flag
* Ensuring all existing tests pass with new implementation
* Maintaining exact same public API and behavior
* Supporting all currently supported Java versions (17, 21)
h2. Testing Requirements
h3. Functional Testing
* All existing UnsafeThreadLocal unit tests must pass
* Deadlock detection integration tests must pass
* DLock-related distributed tests must pass
* Message processing tests must pass
h3. Cross-Version Testing
* Verify functionality on Java 17, and 21
* Ensure no regression in deadlock detection accuracy
* Validate thread safety under concurrent access
h3. Performance Testing
* Verify no performance regression in deadlock detection
* Measure overhead reduction from removing reflection
* Test under high concurrency scenarios
h3. Negative Testing
* Verify proper cleanup when threads terminate
* Test behavior with many ThreadLocal instances
* Validate memory leak prevention
h2. Compatibility
h3. Backward Compatibility
* Public API of UnsafeThreadLocal must remain unchanged
* Behavior must be functionally equivalent to current implementation
* No changes to deadlock detection output or timing
h3. Forward Compatibility
* Implementation should work with future Java versions
* Should not depend on version-specific features
* Should gracefully handle JVM updates
h2. Priority Assessment
*Recommended Priority: Medium-High*
h3. Factors Supporting High Priority
* Eliminates security concerns with module system violations
* Improves long-term maintainability significantly
* Reduces operational complexity
* Removes technical debt that will only grow over time
h3. Factors Supporting Medium Priority
* Current implementation works (with JVM flags)
* Not causing immediate production issues
* Other reflection usage in Geode also needs addressing
* Requires careful testing of deadlock detection
h2. Acceptance Criteria
# UnsafeThreadLocal no longer uses reflection to access ThreadLocal internals
# No {{--add-opens=java.base/java.lang=ALL-UNNAMED}} flag required
# All existing tests pass without modification
# Deadlock detection functionality unchanged
# Works on Java 11, 17, and 21 without special configuration
# Performance is equal or better than current implementation
# No memory leaks when threads are terminated
# Code is well-documented explaining the new approach
h2. Related Issues
* Related to any other tickets addressing reflection usage in Geode
* May be part of larger "Java Module System Compliance" initiative
--
This message was sent by Atlassian Jira
(v8.20.10#820010)