[
https://issues.apache.org/jira/browse/HADOOP-17377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18032510#comment-18032510
]
ASF GitHub Bot commented on HADOOP-17377:
-----------------------------------------
steveloughran commented on PR #5273:
URL: https://github.com/apache/hadoop/pull/5273#issuecomment-3437618030
fun test run today, against s3 london. Most of the multipart upload/commit
tests were failing "missing part", from cli or IDE. Testing with S3 express was
happy. (`-Dparallel-tests -DtestsThreadCount=8 -Panalytics -Dscale`)
```
[ERROR] ITestS3AHugeMagicCommits.test_030_postCreationAssertions:192 »
AWSBadRequest Completing multipart upload on
job-00/test/tests3ascale/ITestS3AHugeMagicCommits/commit/commit.bin:
software.amazon.awssdk.services.s3.model.S3Exception: One or more of the
specified parts could not be found. The part may not have been uploaded, or
the specified entity tag may not match the part's entity tag. (Service: S3,
Status Code: 400, Request ID: JAEYPCZ4P3JYGMTD, Extended Request ID:
O/135mw9Xd2aEuFUh0ICWYc8DLXSpBUWaVGkEgEFGf0xO8o+XlZXY0hI+mvennOGt+C/UI7mNrQ=)
(SDK Attempt Count: 1):InvalidPart: One or more of the specified parts could
not be found. The part may not have been uploaded, or the specified entity tag
may not match the part's entity tag. (Service: S3, Status Code: 400, Request
ID: JAEYPCZ4P3JYGMTD, Extended Request ID:
O/135mw9Xd2aEuFUh0ICWYc8DLXSpBUWaVGkEgEFGf0xO8o+XlZXY0hI+mvennOGt+C/UI7mNrQ=)
(SDK Attempt Count: 1)
[ERROR]
ITestS3AHugeMagicCommits>AbstractSTestS3AHugeFiles.test_045_vectoredIOHugeFile:538->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/ITestS3AHugeMagicCommits/commit/commit.bin
in
s3a://stevel-london/job-00/test/tests3ascale/ITestS3AHugeMagicCommits/commit
[ERROR]
ITestS3AHugeFilesArrayBlocks>AbstractSTestS3AHugeFiles.test_010_CreateHugeFile:276
» AWSBadRequest Completing multipart upload on
job-00/test/tests3ascale/array/src/hugefile:
software.amazon.awssdk.services.s3.model.S3Exception: One or more of the
specified parts could not be found. The part may not have been uploaded, or
the specified entity tag may not match the part's entity tag. (Service: S3,
Status Code: 400, Request ID: 1NNBCSX4NCDN7G9X, Extended Request ID:
8vMmeyt1GfjGrf3UL9AN8vlwWSn9860f1gdeIBC3drmcjeQwC6wOPinMD8MSO6ggGw9ywwdcXroGTdVSFLYq0S0VdM/5bYfanDXJ43Eb4QU=)
(SDK Attempt Count: 1):InvalidPart: One or more of the specified parts could
not be found. The part may not have been uploaded, or the specified entity tag
may not match the part's entity tag. (Service: S3, Status Code: 400, Request
ID: 1NNBCSX4NCDN7G9X, Extended Request ID:
8vMmeyt1GfjGrf3UL9AN8vlwWSn9860f1gdeIBC3drmcjeQwC6wOPinMD8MSO6ggGw9ywwdcXroGTdVSFLYq0S0VdM/5bYfanDXJ43Eb4QU=)
(SDK Attempt Count: 1)
[ERROR]
ITestS3AHugeFilesArrayBlocks>AbstractSTestS3AHugeFiles.test_030_postCreationAssertions:433
» FileNotFound Huge file: not found
s3a://stevel-london/job-00/test/tests3ascale/array/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/array/src
[ERROR]
ITestS3AHugeFilesArrayBlocks>AbstractSTestS3AHugeFiles.test_040_PositionedReadHugeFile:478->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/array/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/array/src
[ERROR]
ITestS3AHugeFilesArrayBlocks>AbstractSTestS3AHugeFiles.test_045_vectoredIOHugeFile:538->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/array/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/array/src
[ERROR]
ITestS3AHugeFilesArrayBlocks>AbstractSTestS3AHugeFiles.test_050_readHugeFile:624->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/array/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/array/src
[ERROR]
ITestS3AHugeFilesArrayBlocks>AbstractSTestS3AHugeFiles.test_100_renameHugeFile:679->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/array/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/array/src
[ERROR]
ITestS3AHugeFilesByteBufferBlocks>AbstractSTestS3AHugeFiles.test_010_CreateHugeFile:276
» AWSBadRequest Completing multipart upload on
job-00/test/tests3ascale/bytebuffer/src/hugefile:
software.amazon.awssdk.services.s3.model.S3Exception: One or more of the
specified parts could not be found. The part may not have been uploaded, or
the specified entity tag may not match the part's entity tag. (Service: S3,
Status Code: 400, Request ID: K0K75V8AH7SVBHS3, Extended Request ID:
kDosbp+Z2PLZn9tVtRF9QfOqh1MgLbIKYaYFn2JeIptXlBV4v1a/wFukoXnaF7fCp6zx3vR8feE0fScUJEw+WhNW9lzu9dBxssOA62UA2kg=)
(SDK Attempt Count: 1):InvalidPart: One or more of the specified parts could
not be found. The part may not have been uploaded, or the specified entity tag
may not match the part's entity tag. (Service: S3, Status Code: 400, Request
ID: K0K75V8AH7SVBHS3, Extended Request ID:
kDosbp+Z2PLZn9tVtRF9QfOqh1MgLbIKYaYFn2JeIptXlBV4v1a/wFukoXnaF7fCp6zx3vR8feE0fScUJEw+WhNW9lzu9dBxssOA62UA2kg=)
(SDK Attempt Count: 1)
[ERROR]
ITestS3AHugeFilesByteBufferBlocks>AbstractSTestS3AHugeFiles.test_030_postCreationAssertions:433
» FileNotFound Huge file: not found
s3a://stevel-london/job-00/test/tests3ascale/bytebuffer/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/bytebuffer/src
[ERROR]
ITestS3AHugeFilesByteBufferBlocks>AbstractSTestS3AHugeFiles.test_040_PositionedReadHugeFile:478->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/bytebuffer/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/bytebuffer/src
[ERROR]
ITestS3AHugeFilesByteBufferBlocks>AbstractSTestS3AHugeFiles.test_045_vectoredIOHugeFile:538->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/bytebuffer/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/bytebuffer/src
[ERROR]
ITestS3AHugeFilesByteBufferBlocks>AbstractSTestS3AHugeFiles.test_050_readHugeFile:624->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/bytebuffer/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/bytebuffer/src
[ERROR]
ITestS3AHugeFilesByteBufferBlocks>AbstractSTestS3AHugeFiles.test_100_renameHugeFile:679->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/bytebuffer/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/bytebuffer/src
[ERROR]
ITestS3AHugeFilesDiskBlocks>AbstractSTestS3AHugeFiles.test_010_CreateHugeFile:276
» AWSBadRequest Completing multipart upload on
job-00/test/tests3ascale/disk/src/hugefile:
software.amazon.awssdk.services.s3.model.S3Exception: One or more of the
specified parts could not be found. The part may not have been uploaded, or
the specified entity tag may not match the part's entity tag. (Service: S3,
Status Code: 400, Request ID: 73T4YAYRWE63WAW5, Extended Request ID:
6ucEY2heh2NsxE8dBrlZp9AE4Tb+hbvnyxea1/yp5H85BEvkQdYsfNlRH5XZM1g4hHPDSoGMVtM=)
(SDK Attempt Count: 1):InvalidPart: One or more of the specified parts could
not be found. The part may not have been uploaded, or the specified entity tag
may not match the part's entity tag. (Service: S3, Status Code: 400, Request
ID: 73T4YAYRWE63WAW5, Extended Request ID:
6ucEY2heh2NsxE8dBrlZp9AE4Tb+hbvnyxea1/yp5H85BEvkQdYsfNlRH5XZM1g4hHPDSoGMVtM=)
(SDK Attempt Count: 1)
[ERROR]
ITestS3AHugeFilesDiskBlocks>AbstractSTestS3AHugeFiles.test_030_postCreationAssertions:433
» FileNotFound Huge file: not found
s3a://stevel-london/job-00/test/tests3ascale/disk/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/disk/src
[ERROR]
ITestS3AHugeFilesDiskBlocks>AbstractSTestS3AHugeFiles.test_040_PositionedReadHugeFile:478->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/disk/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/disk/src
[ERROR]
ITestS3AHugeFilesDiskBlocks>AbstractSTestS3AHugeFiles.test_045_vectoredIOHugeFile:538->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/disk/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/disk/src
[ERROR]
ITestS3AHugeFilesDiskBlocks>AbstractSTestS3AHugeFiles.test_050_readHugeFile:624->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/disk/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/disk/src
[ERROR]
ITestS3AHugeFilesDiskBlocks>AbstractSTestS3AHugeFiles.test_100_renameHugeFile:679->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/disk/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/disk/src
[ERROR]
ITestS3AHugeFilesSSECDiskBlocks>AbstractSTestS3AHugeFiles.test_010_CreateHugeFile:276
» AWSBadRequest Completing multipart upload on
job-00/test/tests3ascale/disk/src/hugefile:
software.amazon.awssdk.services.s3.model.S3Exception: One or more of the
specified parts could not be found. The part may not have been uploaded, or
the specified entity tag may not match the part's entity tag. (Service: S3,
Status Code: 400, Request ID: ZSY181YB49GQFR83, Extended Request ID:
FrPEfsXO3Gbhxi3m4ZmyYSiyfscQ1QSm/1lKjRPLHEbLWH5vtGked+fHvZl281Dm6u013/5VP6pj42h4XISftk7p9uEIDGw31E7Ymcoviq4=)
(SDK Attempt Count: 1):InvalidPart: One or more of the specified parts could
not be found. The part may not have been uploaded, or the specified entity tag
may not match the part's entity tag. (Service: S3, Status Code: 400, Request
ID: ZSY181YB49GQFR83, Extended Request ID:
FrPEfsXO3Gbhxi3m4ZmyYSiyfscQ1QSm/1lKjRPLHEbLWH5vtGked+fHvZl281Dm6u013/5VP6pj42h4XISftk7p9uEIDGw31E7Ymcoviq4=)
(SDK Attempt Count: 1)
[ERROR]
ITestS3AHugeFilesSSECDiskBlocks>AbstractSTestS3AHugeFiles.test_030_postCreationAssertions:433
» FileNotFound Huge file: not found
s3a://stevel-london/job-00/test/tests3ascale/disk/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/disk/src
[ERROR]
ITestS3AHugeFilesSSECDiskBlocks>AbstractSTestS3AHugeFiles.test_040_PositionedReadHugeFile:478->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/disk/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/disk/src
[ERROR]
ITestS3AHugeFilesSSECDiskBlocks>AbstractSTestS3AHugeFiles.test_045_vectoredIOHugeFile:538->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/disk/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/disk/src
[ERROR]
ITestS3AHugeFilesSSECDiskBlocks>AbstractSTestS3AHugeFiles.test_050_readHugeFile:624->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/disk/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/disk/src
[ERROR]
ITestS3AHugeFilesSSECDiskBlocks>AbstractSTestS3AHugeFiles.test_100_renameHugeFile:679->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/disk/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/disk/src
[ERROR]
ITestS3AHugeFilesStorageClass.test_010_CreateHugeFile:74->AbstractSTestS3AHugeFiles.test_010_CreateHugeFile:276
» AWSBadRequest Completing multipart upload on
job-00/test/tests3ascale/array/src/hugefile:
software.amazon.awssdk.services.s3.model.S3Exception: One or more of the
specified parts could not be found. The part may not have been uploaded, or
the specified entity tag may not match the part's entity tag. (Service: S3,
Status Code: 400, Request ID: APYCQNP1GY02DGDE, Extended Request ID:
lE0hQJ67sSwCYSMmO7tDEAvEIOCcpwIbLdfqqrNTpWT0bHIaacaIEzZusajj79rnFQlWudxsMHBIUXdS9ELiKR0T923lcULZy4Essx1LoTs=)
(SDK Attempt Count: 1):InvalidPart: One or more of the specified parts could
not be found. The part may not have been uploaded, or the specified entity tag
may not match the part's entity tag. (Service: S3, Status Code: 400, Request
ID: APYCQNP1GY02DGDE, Extended Request ID:
lE0hQJ67sSwCYSMmO7tDEAvEIOCcpwIbLdfqqrNTpWT0bHIaacaIEzZusajj79rnFQlWudxsMHBIUXdS9ELiKR0T923lcULZy4Essx1LoTs=)
(SDK Attempt Count: 1)
[ERROR]
ITestS3AHugeFilesStorageClass.test_030_postCreationAssertions:81->AbstractSTestS3AHugeFiles.test_030_postCreationAssertions:433
» FileNotFound Huge file: not found
s3a://stevel-london/job-00/test/tests3ascale/array/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/array/src
[ERROR]
ITestS3AHugeFilesStorageClass>AbstractSTestS3AHugeFiles.test_045_vectoredIOHugeFile:538->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/array/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/array/src
[ERROR]
ITestS3AHugeFilesStorageClass.test_100_renameHugeFile:108->AbstractSTestS3AHugeFiles.assumeHugeFileExists:404->AbstractSTestS3AHugeFiles.assumeFileExists:414
» FileNotFound huge file not created: not found
s3a://stevel-london/job-00/test/tests3ascale/array/src/hugefile in
s3a://stevel-london/job-00/test/tests3ascale/array/src
[INFO]
[ERROR] Tests run: 124, Failures: 1, Errors: 30, Skipped: 13
[INFO]
```
This has to be some transient issue with my s3 london bucket, as if in
progress upload parts were not being retained. Never seen this before; the
expiry time is set to 24h
> ABFS: MsiTokenProvider doesn't retry HTTP 429 from the Instance Metadata
> Service
> --------------------------------------------------------------------------------
>
> Key: HADOOP-17377
> URL: https://issues.apache.org/jira/browse/HADOOP-17377
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/azure
> Affects Versions: 3.2.1
> Reporter: Brandon
> Priority: Major
> Labels: pull-request-available
>
> *Summary*
> The instance metadata service has its own guidance for error handling and
> retry which are different from the Blob store.
> [https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/how-to-use-vm-token#error-handling]
> In particular, it responds with HTTP 429 if request rate is too high. Whereas
> Blob store will respond with HTTP 503. The retry policy used only accounts
> for the latter as it will retry any status >=500. This can result in job
> instability when running multiple processes on the same host.
> *Environment*
> * Spark talking to an ABFS store
> * Hadoop 3.2.1
> * Running on an Azure VM with user-assigned identity, ABFS configured to use
> MsiTokenProvider
> * 6 executor processes on each VM
> *Example*
> Here's an example error message and stack trace. It's always the same stack
> trace. This appears in logs a few hundred to low thousands of times a day.
> It's luckily skating by since the download operation is wrapped in 3 retries.
> {noformat}
> AADToken: HTTP connection failed for getting token from AzureAD. Http
> response: 429 null
> Content-Type: application/json; charset=utf-8 Content-Length: 90 Request ID:
> Proxies: none
> First 1K of Body: {"error":"invalid_request","error_description":"Temporarily
> throttled, too many requests"}
> at
> org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:190)
> at
> org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:125)
> at
> org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:506)
> at
> org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:489)
> at
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getIsNamespaceEnabled(AzureBlobFileSystemStore.java:208)
> at
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:473)
> at
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:437)
> at org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1717)
> at org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747)
> at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:724)
> at org.apache.spark.util.Utils$.fetchFile(Utils.scala:496)
> at
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:812)
> at
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:803)
> at
> scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:792)
> at
> scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
> at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
> at
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:791)
> at
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:803)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:375)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){noformat}
> CC [~mackrorysd], [[email protected]]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]