This is an automated email from the ASF dual-hosted git repository.

sunchao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
     new eaadb39c2950 [SPARK-49300][CORE][3.5] Fix Hadoop delegation token leak 
when tokenRenewalInterval is not set
eaadb39c2950 is described below

commit eaadb39c29509881d0778432e226ab0357e688c8
Author: zhangshuyan <[email protected]>
AuthorDate: Thu Aug 22 13:29:46 2024 -0700

    [SPARK-49300][CORE][3.5] Fix Hadoop delegation token leak when 
tokenRenewalInterval is not set
    
    Backport from master.
    
    ### What changes were proposed in this pull request?
    
    Cancel delegation token once they are used in `getTokenRenewalInterval`
    
    ### Why are the changes needed?
    
    When `tokenRenewalInterval` is not set, 
HadoopFSDelegationTokenProvider#getTokenRenewalInterval will fetch some tokens 
and renew them to get a interval value.
    
https://github.com/apache/spark/blob/dd259b0b27841e6dd7c07f8ca3cc05d275863dd5/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala#L60-L64
    These tokens do not call cancel(), resulting in a large number of existing 
tokens on HDFS not being cleared in a timely manner, causing additional 
pressure on the HDFS server.
    ### Does this PR introduce _any_ user-facing change?
    
    no
    
    ### How was this patch tested?
    
    manual test
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    no
    
    Closes #47823 from zhangshuyan0/branch-3.5-zsy.
    
    Authored-by: zhangshuyan <[email protected]>
    Signed-off-by: Chao Sun <[email protected]>
---
 .../apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
 
b/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
index 6ec281f5b440..c3f931f356ea 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
@@ -146,6 +146,9 @@ private[deploy] class HadoopFSDelegationTokenProvider
         val tokenKind = token.getKind.toString
         val interval = newExpiration - getIssueDate(tokenKind, identifier)
         logInfo(s"Renewal interval is $interval for token $tokenKind")
+        // The token here is only used to obtain renewal intervals. We should 
cancel it in
+        // a timely manner to avoid causing additional pressure on the server.
+        token.cancel(hadoopConf)
         interval
       }.toOption
     }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to