This is an automated email from the ASF dual-hosted git repository.
sunchao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push:
new eaadb39c2950 [SPARK-49300][CORE][3.5] Fix Hadoop delegation token leak
when tokenRenewalInterval is not set
eaadb39c2950 is described below
commit eaadb39c29509881d0778432e226ab0357e688c8
Author: zhangshuyan <[email protected]>
AuthorDate: Thu Aug 22 13:29:46 2024 -0700
[SPARK-49300][CORE][3.5] Fix Hadoop delegation token leak when
tokenRenewalInterval is not set
Backport from master.
### What changes were proposed in this pull request?
Cancel delegation token once they are used in `getTokenRenewalInterval`
### Why are the changes needed?
When `tokenRenewalInterval` is not set,
HadoopFSDelegationTokenProvider#getTokenRenewalInterval will fetch some tokens
and renew them to get a interval value.
https://github.com/apache/spark/blob/dd259b0b27841e6dd7c07f8ca3cc05d275863dd5/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala#L60-L64
These tokens do not call cancel(), resulting in a large number of existing
tokens on HDFS not being cleared in a timely manner, causing additional
pressure on the HDFS server.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
manual test
### Was this patch authored or co-authored using generative AI tooling?
no
Closes #47823 from zhangshuyan0/branch-3.5-zsy.
Authored-by: zhangshuyan <[email protected]>
Signed-off-by: Chao Sun <[email protected]>
---
.../apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala | 3 +++
1 file changed, 3 insertions(+)
diff --git
a/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
b/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
index 6ec281f5b440..c3f931f356ea 100644
---
a/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
+++
b/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
@@ -146,6 +146,9 @@ private[deploy] class HadoopFSDelegationTokenProvider
val tokenKind = token.getKind.toString
val interval = newExpiration - getIssueDate(tokenKind, identifier)
logInfo(s"Renewal interval is $interval for token $tokenKind")
+ // The token here is only used to obtain renewal intervals. We should
cancel it in
+ // a timely manner to avoid causing additional pressure on the server.
+ token.cancel(hadoopConf)
interval
}.toOption
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]