[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073599#comment-18073599
 ] 

ASF GitHub Bot commented on MAPREDUCE-7536:
-------------------------------------------

konstantinb opened a new pull request, #8427:
URL: https://github.com/apache/hadoop/pull/8427

   …
   
   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   MAPREDUCE-7536: reduce log ERROR noise for ignorable ShuffleChannelHandler 
errors
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   ### AI Tooling
   
   If an AI tool was used:
   
   - [ ] The PR includes the phrase "Contains content generated by <tool>"
         where <tool> is the name of the AI tool used.
   - [ ] My use of AI contributions follows the ASF legal policy
         https://www.apache.org/legal/generative-tooling.html




> ShuffleChannelHandler logs ERROR for client disconnects during shuffle
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7536
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7536
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Konstantin Bereznyakov
>            Priority: Major
>
> ShuffleChannelHandler.operationComplete() logs at ERROR level when a 
> ChannelFuture completes unsuccessfully, even for expected conditions like 
> client disconnections. This creates unnecessary noise in shuffle service logs.
>   *Current Behavior*
>   When a shuffle client disconnects during data transfer, the handler logs:
>   ERROR Future is unsuccessful. channel='...' Cause: <ClosedChannelException 
> or connection reset>
>   These are expected during normal operation when reducers timeout, get 
> killed, or disconnect early.
>   E[xpected 
> Behavior|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleChannelHandler.java#L679]
>   Expected disconnection scenarios should be logged at DEBUG level:
>   - ClosedChannelException
>   - Connection reset by peer
>   - Other ignorable I/O errors matching existing IGNORABLE_ERROR_MESSAGE 
> pattern
>   Only unexpected failures should be logged at ERROR level.
>   *Impact*
>   - Log noise in production NodeManager shuffle service logs
>   - Difficult to identify real errors among expected disconnections
>   - Unnecessary alerting in monitoring systems that track ERROR log volume
>   [Affected 
> Component|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleChannelHandler.java#L679]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to