pan3793 opened a new pull request, #8472:
URL: https://github.com/apache/hadoop/pull/8472

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   
   1. `BindException` was not caught — main cause of the failure shown in the 
log.
     Netty's `ChannelFuture#sync()` uses `PlatformDependent.throwException` to 
"sneaky-throw" the original cause. When the OS returns "Address already in 
use", the cause is `java.net.BindException` (which extends `IOException`), not 
`ChannelException`. The original `catch (InterruptedException | 
ChannelException e)` missed it entirely, so the exception propagated out of the 
loop and failed the test on the first attempt - exactly matching the stack 
trace in the failure log.
   
   ```
   Error:  Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
1.224 s <<< FAILURE! -- in org.apache.hadoop.oncrpc.TestFrameDecoder
   Error:  org.apache.hadoop.oncrpc.TestFrameDecoder.testFrames -- Time 
elapsed: 0.013 s <<< ERROR!
   java.net.BindException: Address already in use
        at java.base/sun.nio.ch.Net.bind0(Native Method)
        at java.base/sun.nio.ch.Net.bind(Net.java:567)
        at 
java.base/sun.nio.ch.ServerSocketChannelImpl.netBind(ServerSocketChannelImpl.java:337)
        at 
java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:294)
        at 
io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:141)
        at 
io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:561)
        at 
io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1281)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:600)
        at 
io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:579)
        at 
io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:922)
        at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:259)
        at 
io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:384)
        at 
io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
        at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
        at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:840)
        Suppressed: java.lang.RuntimeException: Rethrowing promise failure cause
                at 
io.netty.util.concurrent.DefaultPromise.rethrowIfFailed(DefaultPromise.java:686)
                at 
io.netty.util.concurrent.DefaultPromise.sync(DefaultPromise.java:420)
                at 
io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:119)
                at 
io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:30)
                at 
org.apache.hadoop.oncrpc.SimpleTcpServer.run(SimpleTcpServer.java:88)
                at 
org.apache.hadoop.oncrpc.TestFrameDecoder.startRpcServer(TestFrameDecoder.java:237)
                at 
org.apache.hadoop.oncrpc.TestFrameDecoder.testFrames(TestFrameDecoder.java:177)
                at java.base/java.lang.reflect.Method.invoke(Method.java:569)
                at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
                at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
   ```
   
   2. The port increment could be zero.
      `serverPort += rand.nextInt(20)` returns `[0, 20)`, so on retry the same 
busy port could be picked again. Changed to `1 + rand.nextInt(20)` so the port 
is always bumped.
   
   3. `InterruptedException` was lumped with "port in use".
      An external thread interrupt should not trigger a port-bump retry. Split 
into its own handler that restores the interrupt flag and propagates.
   
   Contains content generated by: Claude Opus 4.7
   
   ### How was this patch tested?
   
   Run dozens of rounds
   ```
   ./mvnw test -pl hadoop-common-project/hadoop-common -am 
-Dtest=TestFrameDecoder
   ...
   [INFO]  T E S T S
   [INFO] -------------------------------------------------------
   [INFO] Running org.apache.hadoop.oncrpc.TestFrameDecoder
   OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader 
classes because bootstrap classpath has been appended
   [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.036 
s -- in org.apache.hadoop.oncrpc.TestFrameDecoder
   [INFO] 
   [INFO] Results:
   [INFO] 
   [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
   ```
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(HADOOP-19881)?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   ### AI Tooling
   
   If an AI tool was used:
   
   - [x] The PR includes the phrase "Contains content generated by <tool>"
         where <tool> is the name of the AI tool used.
   - [x] My use of AI contributions follows the ASF legal policy
         https://www.apache.org/legal/generative-tooling.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to