[ 
https://issues.apache.org/jira/browse/HADOOP-17377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18032541#comment-18032541
 ] 

ASF GitHub Bot commented on HADOOP-17377:
-----------------------------------------

anmolanmol1234 commented on code in PR #5273:
URL: https://github.com/apache/hadoop/pull/5273#discussion_r2456039983


##########
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAbfsMsiTokenProvider.java:
##########
@@ -90,4 +99,55 @@ private String getTrimmedPasswordString(AbfsConfiguration 
conf, String key,
     return value.trim();
   }
 
+  /**
+   * Test to verify that token fetch is retried for throttling errors (too 
many requests 429).
+   */
+  @Test
+  public void testRetryForThrottling() throws Exception {
+    AbfsConfiguration conf = getConfiguration();
+
+    // Exception to be thrown with throttling error code 429.
+    AzureADAuthenticator.HttpException httpException
+        = new AzureADAuthenticator.HttpException(HTTP_TOO_MANY_REQUESTS,
+        "abc", "abc", "abc", "abc", "abc");
+
+    String tenantGuid = "abcd";
+    String clientId = "abcd";
+    String authEndpoint = getTrimmedPasswordString(conf,
+        FS_AZURE_ACCOUNT_OAUTH_MSI_ENDPOINT,
+        DEFAULT_FS_AZURE_ACCOUNT_OAUTH_MSI_ENDPOINT);
+    String authority = getTrimmedPasswordString(conf,
+        FS_AZURE_ACCOUNT_OAUTH_MSI_AUTHORITY,
+        DEFAULT_FS_AZURE_ACCOUNT_OAUTH_MSI_AUTHORITY);
+
+    // Mock the getTokenSingleCall to throw exception so the retry logic comes 
into place.
+    try (MockedStatic<AzureADAuthenticator> adAuthenticator = 
Mockito.mockStatic(
+        AzureADAuthenticator.class, Mockito.CALLS_REAL_METHODS)) {
+      adAuthenticator.when(
+          () -> AzureADAuthenticator.getTokenSingleCall(Mockito.anyString(),
+              Mockito.anyString(), Mockito.any(), Mockito.anyString(),
+              Mockito.anyBoolean())).thenThrow(httpException);
+
+      // Mock the tokenFetchRetryPolicy to verify retries.
+      ExponentialRetryPolicy exponentialRetryPolicy = Mockito.spy(
+          conf.getOauthTokenFetchRetryPolicy());
+      Field tokenFetchRetryPolicy = 
AzureADAuthenticator.class.getDeclaredField(

Review Comment:
   taken





> ABFS: MsiTokenProvider doesn't retry HTTP 429 from the Instance Metadata 
> Service
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-17377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17377
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>    Affects Versions: 3.2.1
>            Reporter: Brandon
>            Priority: Major
>              Labels: pull-request-available
>
> *Summary*
>  The instance metadata service has its own guidance for error handling and 
> retry which are different from the Blob store. 
> [https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/how-to-use-vm-token#error-handling]
> In particular, it responds with HTTP 429 if request rate is too high. Whereas 
> Blob store will respond with HTTP 503. The retry policy used only accounts 
> for the latter as it will retry any status >=500. This can result in job 
> instability when running multiple processes on the same host.
> *Environment*
>  * Spark talking to an ABFS store
>  * Hadoop 3.2.1
>  * Running on an Azure VM with user-assigned identity, ABFS configured to use 
> MsiTokenProvider
>  * 6 executor processes on each VM
> *Example*
>  Here's an example error message and stack trace. It's always the same stack 
> trace. This appears in logs a few hundred to low thousands of times a day. 
> It's luckily skating by since the download operation is wrapped in 3 retries.
> {noformat}
> AADToken: HTTP connection failed for getting token from AzureAD. Http 
> response: 429 null
> Content-Type: application/json; charset=utf-8 Content-Length: 90 Request ID:  
> Proxies: none
> First 1K of Body: {"error":"invalid_request","error_description":"Temporarily 
> throttled, too many requests"}
>       at 
> org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:190)
>       at 
> org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:125)
>       at 
> org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:506)
>       at 
> org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:489)
>       at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getIsNamespaceEnabled(AzureBlobFileSystemStore.java:208)
>       at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:473)
>       at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:437)
>       at org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1717)
>       at org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747)
>       at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:724)
>       at org.apache.spark.util.Utils$.fetchFile(Utils.scala:496)
>       at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:812)
>       at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:803)
>       at 
> scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:792)
>       at 
> scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
>       at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
>       at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
>       at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
>       at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
>       at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:791)
>       at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:803)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:375)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748){noformat}
>  CC [~mackrorysd], [[email protected]]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to