[
https://issues.apache.org/jira/browse/HADOOP-19877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080684#comment-18080684
]
ASF GitHub Bot commented on HADOOP-19877:
-----------------------------------------
ajfabbri commented on code in PR #8467:
URL: https://github.com/apache/hadoop/pull/8467#discussion_r3227703041
##########
.github/workflows/cloud_aws.yml:
##########
@@ -0,0 +1,45 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Cloud-AWS"
+
+on:
+ pull_request:
+ paths:
+ - 'hadoop-tools/hadoop-aws/**'
+ - '.github/workflows/*cloud_aws.yml'
+ - '.github/actions/build_image**'
+ - '.github/gha-tests/hadoop-aws*excludes.txt'
+
+jobs:
+ run-aws-integration:
+ # Security: write privileges are needed to update PR status and upload
test results.
+ # Package write is for building toolchain container images on demand, but
ghcr.io access is
+ # scoped to the repository the actions run on.
+ permissions:
+ packages: write
+ checks: write
Review Comment:
Can this be removed here?
##########
.github/gha-tests/hadoop-aws-localstack-excludes.txt:
##########
@@ -0,0 +1,79 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+# Initial stab at excluding tests that won't run in our
+# CI container environment with Localstack mocks of AWS services.
+
+# TODO see if we can enable any of these...
+
+# tests that depend on public S3 buckets
+**/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
Review Comment:
From test discussion w/ @steveloughran: May be able to disable these with
s3a test config (See testing.md under 3rd party).
##########
.github/workflows/tmpl_cloud_aws.yml:
##########
@@ -0,0 +1,230 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+name: s3a integration
+on:
+ workflow_call:
+ inputs:
+ java:
+ required: false
+ type: string
+ default: 17
+ toolchain_branch:
+ required: false
+ type: string
+ description: Branch to use for toolchain image build
+ default: trunk
+ os:
+ required: false
+ type: string
+ description: OS for container to run the build in
+ default: ubuntu_24
+ runner_os:
+ required: false
+ type: string
+ description: OS tag for runner (e.g., Linux, ubuntu-24.04)
+ default: ubuntu_24.04
+
+# Security: Minimal defaults for workflow.
+permissions: {}
+
+concurrency:
+ group: >-
+ cloud-aws
+ ${{ github.workflow }}
+ ${{ github.repository == 'apache/hadoop' && github.run_id || github.ref }}
+ ${{ inputs.java }}
+ ${{ inputs.toolchain_branch }}
+ ${{ inputs.os }}
+ cancel-in-progress: true
+
+env:
+ BUCKET_NAME: hadoop-ci
+
+jobs:
+ precondition:
+ runs-on: ${{ inputs.runner_os }}
+ outputs:
+ build_image_url: ${{ steps.img.outputs.build_image_url }}
+ steps:
+ - uses: actions/checkout@v6
+ with:
+ # Full fetch so build image URL can be computed for any branch
+ fetch-depth: 0
+ - uses: ./.github/actions/build_image_url
+ id: img
+ with:
+ os: ${{ inputs.os }}
+ branch: ${{ inputs.toolchain_branch }}
+ - name: debug base_image_url
+ run: |
+ echo "precondition url: ${{ steps.img.outputs.build_image_url }}"
+
+ build-image:
+ name: Toolchain image (JDK${{ inputs.java }}, ${{ inputs.os }}-${{
inputs.toolchain_branch }})
+ runs-on: ${{ inputs.runner_os }}
+ needs: [ precondition ]
+ permissions:
+ packages: write
+ outputs:
+ uid: ${{ steps.build_img.outputs.uid }}
+ steps:
+ - name: debug build url
+ run: |
+ echo "Build image URL: ${{
needs.precondition.outputs.build_image_url }}"
+ - uses: actions/checkout@v6
+ - uses: ./.github/actions/build_image
+ id: build_img
+ with:
+ branch: ${{ inputs.toolchain_branch }}
+ os: ${{ inputs.os }}
+ build_image_url: ${{ needs.precondition.outputs.build_image_url }}
+
+ test:
+ name: S3A Integration Tests (Java ${{ inputs.java }})
+ needs: [ precondition, build-image ]
+ runs-on: ${{ inputs.runner_os }}
+ permissions:
+ # Security: Minimal permissions for the test runner. Reporting happens in
+ # report_cloud_aws.yml.
+ contents: read
+ services:
+ localstack:
+ image: localstack/localstack:latest
Review Comment:
Would be nice to pin this a bit behind latest to mitigate supply chain issues
##########
.github/workflows/tmpl_cloud_aws.yml:
##########
@@ -0,0 +1,230 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+name: s3a integration
+on:
+ workflow_call:
+ inputs:
+ java:
+ required: false
+ type: string
+ default: 17
+ toolchain_branch:
+ required: false
+ type: string
+ description: Branch to use for toolchain image build
+ default: trunk
+ os:
+ required: false
+ type: string
+ description: OS for container to run the build in
+ default: ubuntu_24
+ runner_os:
+ required: false
+ type: string
+ description: OS tag for runner (e.g., Linux, ubuntu-24.04)
+ default: ubuntu_24.04
+
+# Security: Minimal defaults for workflow.
+permissions: {}
+
+concurrency:
+ group: >-
+ cloud-aws
+ ${{ github.workflow }}
+ ${{ github.repository == 'apache/hadoop' && github.run_id || github.ref }}
+ ${{ inputs.java }}
+ ${{ inputs.toolchain_branch }}
+ ${{ inputs.os }}
+ cancel-in-progress: true
+
+env:
+ BUCKET_NAME: hadoop-ci
+
+jobs:
+ precondition:
+ runs-on: ${{ inputs.runner_os }}
+ outputs:
+ build_image_url: ${{ steps.img.outputs.build_image_url }}
+ steps:
+ - uses: actions/checkout@v6
+ with:
+ # Full fetch so build image URL can be computed for any branch
+ fetch-depth: 0
+ - uses: ./.github/actions/build_image_url
+ id: img
+ with:
+ os: ${{ inputs.os }}
+ branch: ${{ inputs.toolchain_branch }}
+ - name: debug base_image_url
+ run: |
+ echo "precondition url: ${{ steps.img.outputs.build_image_url }}"
+
+ build-image:
+ name: Toolchain image (JDK${{ inputs.java }}, ${{ inputs.os }}-${{
inputs.toolchain_branch }})
+ runs-on: ${{ inputs.runner_os }}
+ needs: [ precondition ]
+ permissions:
+ packages: write
+ outputs:
+ uid: ${{ steps.build_img.outputs.uid }}
+ steps:
+ - name: debug build url
+ run: |
+ echo "Build image URL: ${{
needs.precondition.outputs.build_image_url }}"
+ - uses: actions/checkout@v6
+ - uses: ./.github/actions/build_image
+ id: build_img
+ with:
+ branch: ${{ inputs.toolchain_branch }}
+ os: ${{ inputs.os }}
+ build_image_url: ${{ needs.precondition.outputs.build_image_url }}
+
+ test:
+ name: S3A Integration Tests (Java ${{ inputs.java }})
+ needs: [ precondition, build-image ]
+ runs-on: ${{ inputs.runner_os }}
+ permissions:
+ # Security: Minimal permissions for the test runner. Reporting happens in
+ # report_cloud_aws.yml.
+ contents: read
+ services:
+ localstack:
+ image: localstack/localstack:latest
+ # Despite examples showing a `ports:` section, "You don't need to
+ # configure any ports for service containers. By default, all
+ # containers that are part of the same Docker network expose all ports
+ # to each other, and no ports are exposed outside of the Docker
+ # network." See:
+ #
https://docs.github.com/en/actions/tutorials/use-containerized-services/use-docker-service-containers#running-jobs-in-a-container
+ env:
+ SERVICES: s3,kms
+ AWS_DEFAULT_REGION: us-west-2
+ AWS_ACCESS_KEY_ID: test
+ AWS_SECRET_ACCESS_KEY: test
+ LOCALSTACK_AUTH_TOKEN: ${{ secrets.LOCALSTACK_CI_KEY }}
+ LOCALSTACK_HOST: s3.localstack
+
+ # Performance: Disable image's health check (localstack readiness): it
typically takes less
+ # than a minute, and the Maven build that runs first takes longer than
that.
+ # Also need to specify a dummy health-cmd or the github runner fails.
+ options: >-
+ --health-cmd "exit 0"
+ --health-interval 1s
+ --health-retries 1
+ --network-alias s3.localstack
+
+ container:
+ image: ${{ needs.precondition.outputs.build_image_url }}
+ options: >-
+ --user ${{ needs.build-image.outputs.uid }}
+ env:
+ # mvn verify doesn't return failure exit code due to HADOOP-18040
+ # (which seems incorrect, but let's just override this for now)
+ MAVEN_OPTS: >-
+ -Dmaven.test.failure.ignore=false
+ -Dmaven.repo.local=.m2/repository
+ -Dcheckstyle.skip -Dspotbugs.skip -Denforcer.skip -Drat.skip
+ steps:
+ - uses: actions/checkout@v6
+ # Performance: Caching TODO: We need to create a centralized maven build
cache that is
+ # built on trunk. This will always miss on a new PR: Caches can't be
+ # shared between PR branches. PR branches *can* access caches from their
+ # base branch, though. See:
+ #
https://docs.github.com/en/actions/reference/workflows-and-actions/dependency-caching#restrictions-for-accessing-a-cache
+ # As-is, first run on a PR always misses. Subsequent cached builds see
>100% speedup.
+ - name: Restore build cache
+ id: restore_cache
+ uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae #
v5.0.5
+ with:
+ path: |
+ .m2/repository/
+ key: ${{ inputs.os }}-maven-${{ hashFiles('**/pom.xml') }}-${{
inputs.toolchain_branch }}
+ # falls back to less-specific caches if exact key not found
+ restore-keys: |
+ ${{ inputs.os }}-maven-${{ hashFiles('**/pom.xml') }}-
+ ${{ inputs.os }}-maven-
+ - name: Setup JDK ${{ inputs.java }}
+ uses: actions/setup-java@v5
+ with:
+ distribution: zulu
+ java-version: ${{ inputs.java }}
+
+ - name: Maven Build
+ shell: bash
+ run: |
+ mkdir -p .m2/repository
+ ./mvnw -B -T 1C --no-transfer-progress -DskipTests -am -pl
hadoop-tools/hadoop-aws \
+ -Dmaven.javadoc.skip -Dcheckstyle.skip -Dspotbugs.skip
-Denforcer.skip -Drat.skip install
+
+ - name: Check local maven repository
+ shell: bash
+ run: |
+ if [ -d ".m2/repository" ]; then
+ echo "๐ Maven repository at $(pwd)/.m2/repository"
+ else
+ echo "๐ .m2/repository not found in $(pwd)"
+ exit 1
+ fi
+
+ - name: Save build cache
+ uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae #
v5.0.5
+ if: ${{ !cancelled() && steps.restore_cache.outputs.cache-hit !=
'true' }}
+ with:
+ path: |
+ .m2/repository/
+ key: ${{ steps.restore_cache.outputs.cache-primary-key }}
+
+ - name: Create localstack s3 bucket
+ shell: bash
+ run: |
+ # Use curl to interact with localstack if awslocal is not in the
container
+ curl -X PUT "http://localstack:4566/${{ env.BUCKET_NAME }}"
+
+ - name: Register bucket hostname
+ shell: bash
+ run: |
+ # Resolve s3.localstack IP and add entry for virtual-hosted bucket
FQDN
+ LSIP=$(getent hosts s3.localstack | awk '{print $1}')
+ echo "Registering ${{ env.BUCKET_NAME }}.s3.localstack as ${LSIP}"
+ echo "${LSIP} ${{ env.BUCKET_NAME }}.s3.localstack" | sudo tee -a
/etc/hosts
+
+ - name: S3A Integration Tests
+ shell: bash
+ env:
+ AWS_ACCESS_KEY_ID: test
+ AWS_SECRET_ACCESS_KEY: test
+ AWS_REGION: us-west-2
+ AWS_DEFAULT_REGION: us-west-2
+ run: |
+ cp .github/workflows/templates/auth-keys.xml.tmpl \
+ hadoop-tools/hadoop-aws/src/test/resources/auth-keys.xml
+ ./mvnw verify -B -pl hadoop-tools/hadoop-aws \
+ -Dparallel-tests \
+ -DskipITs=false \
+ -Dtest=none \
+ -Dscale=true \
Review Comment:
Future: should also increase fs.s3a.scale.huge.filesize parameter in our
config (auth-keys)
##########
.github/workflows/tmpl_cloud_aws.yml:
##########
@@ -0,0 +1,230 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+name: s3a integration
+on:
+ workflow_call:
+ inputs:
+ java:
+ required: false
+ type: string
+ default: 17
+ toolchain_branch:
+ required: false
+ type: string
+ description: Branch to use for toolchain image build
+ default: trunk
+ os:
+ required: false
+ type: string
+ description: OS for container to run the build in
+ default: ubuntu_24
+ runner_os:
+ required: false
+ type: string
+ description: OS tag for runner (e.g., Linux, ubuntu-24.04)
+ default: ubuntu_24.04
+
+# Security: Minimal defaults for workflow.
+permissions: {}
+
+concurrency:
+ group: >-
+ cloud-aws
+ ${{ github.workflow }}
+ ${{ github.repository == 'apache/hadoop' && github.run_id || github.ref }}
+ ${{ inputs.java }}
+ ${{ inputs.toolchain_branch }}
+ ${{ inputs.os }}
+ cancel-in-progress: true
+
+env:
+ BUCKET_NAME: hadoop-ci
+
+jobs:
+ precondition:
+ runs-on: ${{ inputs.runner_os }}
+ outputs:
+ build_image_url: ${{ steps.img.outputs.build_image_url }}
+ steps:
+ - uses: actions/checkout@v6
+ with:
+ # Full fetch so build image URL can be computed for any branch
+ fetch-depth: 0
+ - uses: ./.github/actions/build_image_url
+ id: img
+ with:
+ os: ${{ inputs.os }}
+ branch: ${{ inputs.toolchain_branch }}
+ - name: debug base_image_url
+ run: |
+ echo "precondition url: ${{ steps.img.outputs.build_image_url }}"
+
+ build-image:
+ name: Toolchain image (JDK${{ inputs.java }}, ${{ inputs.os }}-${{
inputs.toolchain_branch }})
+ runs-on: ${{ inputs.runner_os }}
+ needs: [ precondition ]
+ permissions:
+ packages: write
+ outputs:
+ uid: ${{ steps.build_img.outputs.uid }}
+ steps:
+ - name: debug build url
+ run: |
+ echo "Build image URL: ${{
needs.precondition.outputs.build_image_url }}"
+ - uses: actions/checkout@v6
+ - uses: ./.github/actions/build_image
+ id: build_img
+ with:
+ branch: ${{ inputs.toolchain_branch }}
+ os: ${{ inputs.os }}
+ build_image_url: ${{ needs.precondition.outputs.build_image_url }}
+
+ test:
+ name: S3A Integration Tests (Java ${{ inputs.java }})
+ needs: [ precondition, build-image ]
+ runs-on: ${{ inputs.runner_os }}
+ permissions:
+ # Security: Minimal permissions for the test runner. Reporting happens in
+ # report_cloud_aws.yml.
+ contents: read
+ services:
+ localstack:
+ image: localstack/localstack:latest
+ # Despite examples showing a `ports:` section, "You don't need to
+ # configure any ports for service containers. By default, all
+ # containers that are part of the same Docker network expose all ports
+ # to each other, and no ports are exposed outside of the Docker
+ # network." See:
+ #
https://docs.github.com/en/actions/tutorials/use-containerized-services/use-docker-service-containers#running-jobs-in-a-container
+ env:
+ SERVICES: s3,kms
+ AWS_DEFAULT_REGION: us-west-2
+ AWS_ACCESS_KEY_ID: test
+ AWS_SECRET_ACCESS_KEY: test
+ LOCALSTACK_AUTH_TOKEN: ${{ secrets.LOCALSTACK_CI_KEY }}
+ LOCALSTACK_HOST: s3.localstack
+
+ # Performance: Disable image's health check (localstack readiness): it
typically takes less
+ # than a minute, and the Maven build that runs first takes longer than
that.
+ # Also need to specify a dummy health-cmd or the github runner fails.
+ options: >-
+ --health-cmd "exit 0"
+ --health-interval 1s
+ --health-retries 1
+ --network-alias s3.localstack
+
+ container:
+ image: ${{ needs.precondition.outputs.build_image_url }}
+ options: >-
+ --user ${{ needs.build-image.outputs.uid }}
+ env:
+ # mvn verify doesn't return failure exit code due to HADOOP-18040
+ # (which seems incorrect, but let's just override this for now)
+ MAVEN_OPTS: >-
+ -Dmaven.test.failure.ignore=false
+ -Dmaven.repo.local=.m2/repository
+ -Dcheckstyle.skip -Dspotbugs.skip -Denforcer.skip -Drat.skip
+ steps:
+ - uses: actions/checkout@v6
+ # Performance: Caching TODO: We need to create a centralized maven build
cache that is
+ # built on trunk. This will always miss on a new PR: Caches can't be
+ # shared between PR branches. PR branches *can* access caches from their
+ # base branch, though. See:
+ #
https://docs.github.com/en/actions/reference/workflows-and-actions/dependency-caching#restrictions-for-accessing-a-cache
+ # As-is, first run on a PR always misses. Subsequent cached builds see
>100% speedup.
Review Comment:
Discussion: even a shared cache (built regularly on trunk) that *excludes*
any hadoop artifacts would get us most of the speedup, and still ensure we have
"pure" builds WRT hadoop code.
##########
.github/gha-tests/hadoop-aws-localstack-excludes.txt:
##########
@@ -0,0 +1,79 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+# Initial stab at excluding tests that won't run in our
+# CI container environment with Localstack mocks of AWS services.
+
+# TODO see if we can enable any of these...
+
+# tests that depend on public S3 buckets
+**/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
+**/org/apache/hadoop/fs/s3a/ITestS3ARequesterPays.java
+**/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardTool.java
+**/org/apache/hadoop/fs/s3a/tools/ITestMarkerTool.java
+**/ITestS3AAnalyticsAcceleratorStreamReading.java
+**/ITestS3AEndpointRegion.java
+
+
+# Tests requiring IAM roles / STS
+# We should be able to re-enable some of these. See:
+# https://docs.localstack.cloud/aws/services/sts/
+**/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestDelegatedMRJob.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFilesystem.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationTokens.java
+**/org/apache/hadoop/fs/s3a/ITestS3ATemporaryCredentials.java
+
+
+# failures that need to be investigated
+
+# Two methods fail: 1. testUpdateDeepDirectoryStructureNoChange():
+# AssertionFailedError: Files Skipped value 0 too below minimum 1 ==>
+# expected: <true> but was: <false>
Review Comment:
Test discussion: Hint: see if this is related to timestamps.
##########
.github/gha-tests/hadoop-aws-localstack-excludes.txt:
##########
@@ -0,0 +1,79 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+# Initial stab at excluding tests that won't run in our
+# CI container environment with Localstack mocks of AWS services.
+
+# TODO see if we can enable any of these...
+
+# tests that depend on public S3 buckets
+**/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
+**/org/apache/hadoop/fs/s3a/ITestS3ARequesterPays.java
+**/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardTool.java
+**/org/apache/hadoop/fs/s3a/tools/ITestMarkerTool.java
+**/ITestS3AAnalyticsAcceleratorStreamReading.java
+**/ITestS3AEndpointRegion.java
+
+
+# Tests requiring IAM roles / STS
+# We should be able to re-enable some of these. See:
+# https://docs.localstack.cloud/aws/services/sts/
+**/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestDelegatedMRJob.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFilesystem.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationTokens.java
+**/org/apache/hadoop/fs/s3a/ITestS3ATemporaryCredentials.java
+
+
+# failures that need to be investigated
+
+# Two methods fail: 1. testUpdateDeepDirectoryStructureNoChange():
+# AssertionFailedError: Files Skipped value 0 too below minimum 1 ==>
+# expected: <true> but was: <false>
+# 2. testUpdateDeepDirectoryStructureToRemote():
+# AssertionFailedError: Files Copied value 2 above maximum 1 ==> expected:
<true> but was: <false>
+**/org/apache/hadoop/fs/contract/s3a/ITestS3AContractDistCp.java
+
+# A number of failures with vectored read tests
+**/org/apache/hadoop/fs/contract/s3a/ITestS3AContractVectoredRead.java
+
+# Access key errors:
+# (test
case)->AbstractS3ATestBase.setup:111->AbstractFSContractTestBase.setup:197->AbstractFSContractTestBase.mkdirs:355
+# ยป AccessDenied s3a://hadoop-ci/job-00/test: getFileStatus on
+# s3a://hadoop-ci/job-00/test:
+# software.amazon.awssdk.services.s3.model.S3Exception: The AWS Access Key Id
you
+# provided does not exist in our records. (Service: S3, Status Code: 403
+**/org/apache/hadoop/fs/s3a/ITestS3APrefetchingCacheFiles.java
+**/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java
+
+# Localstack issue (guessing lack of persistence of upload parts across
sessions)
+**/org/apache/hadoop/fs/s3a/commit/ITestUploadRecovery.java
+**/org/apache/hadoop/fs/contract/s3a/ITestS3AContractMultipartUploader.java
+
+# Error: ITestConnectionTimeouts.testObjectUploadTimeouts:258 Expected a
+# java.lang.Exception to be thrown, but got the result: :
"01234567890123456789..."
+**/org/apache/hadoop/fs/s3a/impl/ITestConnectionTimeouts.java
+
+#
ITestS3AAWSCredentialsProvider.testBadCredentials:183->lambda$testBadCredentials$0:184
+# ->createFailingFS:160 Expected exception - got S3AFileSystem{
+**/org/apache/hadoop/fs/s3a/ITestS3AAWSCredentialsProvider.java
+
+# testSeeksWithLruEviction java.util.concurrent.TimeoutException: timed out
+# after 180 seconds
+**/org/apache/hadoop/fs/s3a/ITestS3APrefetchingLruEviction.java
Review Comment:
Test Discussion: Note: this prefetch implementation is problematic, and we
may end up dropping it. Vectored IO addresses some of this, and the Amazon
analytics input stream does the rest better.
##########
.github/gha-tests/hadoop-aws-localstack-excludes.txt:
##########
@@ -0,0 +1,79 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+# Initial stab at excluding tests that won't run in our
+# CI container environment with Localstack mocks of AWS services.
+
+# TODO see if we can enable any of these...
+
+# tests that depend on public S3 buckets
+**/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
+**/org/apache/hadoop/fs/s3a/ITestS3ARequesterPays.java
+**/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardTool.java
+**/org/apache/hadoop/fs/s3a/tools/ITestMarkerTool.java
+**/ITestS3AAnalyticsAcceleratorStreamReading.java
+**/ITestS3AEndpointRegion.java
+
+
+# Tests requiring IAM roles / STS
+# We should be able to re-enable some of these. See:
+# https://docs.localstack.cloud/aws/services/sts/
+**/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestDelegatedMRJob.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFilesystem.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationTokens.java
+**/org/apache/hadoop/fs/s3a/ITestS3ATemporaryCredentials.java
+
+
+# failures that need to be investigated
+
+# Two methods fail: 1. testUpdateDeepDirectoryStructureNoChange():
+# AssertionFailedError: Files Skipped value 0 too below minimum 1 ==>
+# expected: <true> but was: <false>
+# 2. testUpdateDeepDirectoryStructureToRemote():
+# AssertionFailedError: Files Copied value 2 above maximum 1 ==> expected:
<true> but was: <false>
+**/org/apache/hadoop/fs/contract/s3a/ITestS3AContractDistCp.java
+
+# A number of failures with vectored read tests
+**/org/apache/hadoop/fs/contract/s3a/ITestS3AContractVectoredRead.java
+
+# Access key errors:
+# (test
case)->AbstractS3ATestBase.setup:111->AbstractFSContractTestBase.setup:197->AbstractFSContractTestBase.mkdirs:355
+# ยป AccessDenied s3a://hadoop-ci/job-00/test: getFileStatus on
+# s3a://hadoop-ci/job-00/test:
+# software.amazon.awssdk.services.s3.model.S3Exception: The AWS Access Key Id
you
+# provided does not exist in our records. (Service: S3, Status Code: 403
+**/org/apache/hadoop/fs/s3a/ITestS3APrefetchingCacheFiles.java
+**/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java
+
+# Localstack issue (guessing lack of persistence of upload parts across
sessions)
+**/org/apache/hadoop/fs/s3a/commit/ITestUploadRecovery.java
+**/org/apache/hadoop/fs/contract/s3a/ITestS3AContractMultipartUploader.java
+
+# Error: ITestConnectionTimeouts.testObjectUploadTimeouts:258 Expected a
+# java.lang.Exception to be thrown, but got the result: :
"01234567890123456789..."
+**/org/apache/hadoop/fs/s3a/impl/ITestConnectionTimeouts.java
Review Comment:
Likely due to disabling audit. Should probably skip test when it is disabled
in config.
##########
.github/workflows/tmpl_cloud_aws.yml:
##########
@@ -0,0 +1,230 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+name: s3a integration
+on:
+ workflow_call:
+ inputs:
+ java:
+ required: false
+ type: string
+ default: 17
+ toolchain_branch:
+ required: false
+ type: string
+ description: Branch to use for toolchain image build
+ default: trunk
+ os:
+ required: false
+ type: string
+ description: OS for container to run the build in
+ default: ubuntu_24
+ runner_os:
+ required: false
+ type: string
+ description: OS tag for runner (e.g., Linux, ubuntu-24.04)
+ default: ubuntu_24.04
+
+# Security: Minimal defaults for workflow.
+permissions: {}
+
+concurrency:
+ group: >-
+ cloud-aws
+ ${{ github.workflow }}
+ ${{ github.repository == 'apache/hadoop' && github.run_id || github.ref }}
+ ${{ inputs.java }}
+ ${{ inputs.toolchain_branch }}
+ ${{ inputs.os }}
+ cancel-in-progress: true
+
+env:
+ BUCKET_NAME: hadoop-ci
+
+jobs:
+ precondition:
+ runs-on: ${{ inputs.runner_os }}
+ outputs:
+ build_image_url: ${{ steps.img.outputs.build_image_url }}
+ steps:
+ - uses: actions/checkout@v6
+ with:
+ # Full fetch so build image URL can be computed for any branch
+ fetch-depth: 0
+ - uses: ./.github/actions/build_image_url
+ id: img
+ with:
+ os: ${{ inputs.os }}
+ branch: ${{ inputs.toolchain_branch }}
+ - name: debug base_image_url
+ run: |
+ echo "precondition url: ${{ steps.img.outputs.build_image_url }}"
+
+ build-image:
+ name: Toolchain image (JDK${{ inputs.java }}, ${{ inputs.os }}-${{
inputs.toolchain_branch }})
+ runs-on: ${{ inputs.runner_os }}
+ needs: [ precondition ]
+ permissions:
+ packages: write
+ outputs:
+ uid: ${{ steps.build_img.outputs.uid }}
+ steps:
+ - name: debug build url
+ run: |
+ echo "Build image URL: ${{
needs.precondition.outputs.build_image_url }}"
+ - uses: actions/checkout@v6
+ - uses: ./.github/actions/build_image
+ id: build_img
+ with:
+ branch: ${{ inputs.toolchain_branch }}
+ os: ${{ inputs.os }}
+ build_image_url: ${{ needs.precondition.outputs.build_image_url }}
+
+ test:
+ name: S3A Integration Tests (Java ${{ inputs.java }})
+ needs: [ precondition, build-image ]
+ runs-on: ${{ inputs.runner_os }}
+ permissions:
+ # Security: Minimal permissions for the test runner. Reporting happens in
+ # report_cloud_aws.yml.
+ contents: read
+ services:
+ localstack:
+ image: localstack/localstack:latest
+ # Despite examples showing a `ports:` section, "You don't need to
+ # configure any ports for service containers. By default, all
+ # containers that are part of the same Docker network expose all ports
+ # to each other, and no ports are exposed outside of the Docker
+ # network." See:
+ #
https://docs.github.com/en/actions/tutorials/use-containerized-services/use-docker-service-containers#running-jobs-in-a-container
+ env:
+ SERVICES: s3,kms
+ AWS_DEFAULT_REGION: us-west-2
+ AWS_ACCESS_KEY_ID: test
+ AWS_SECRET_ACCESS_KEY: test
+ LOCALSTACK_AUTH_TOKEN: ${{ secrets.LOCALSTACK_CI_KEY }}
+ LOCALSTACK_HOST: s3.localstack
+
+ # Performance: Disable image's health check (localstack readiness): it
typically takes less
+ # than a minute, and the Maven build that runs first takes longer than
that.
+ # Also need to specify a dummy health-cmd or the github runner fails.
+ options: >-
+ --health-cmd "exit 0"
+ --health-interval 1s
+ --health-retries 1
+ --network-alias s3.localstack
+
+ container:
+ image: ${{ needs.precondition.outputs.build_image_url }}
+ options: >-
+ --user ${{ needs.build-image.outputs.uid }}
+ env:
+ # mvn verify doesn't return failure exit code due to HADOOP-18040
+ # (which seems incorrect, but let's just override this for now)
+ MAVEN_OPTS: >-
+ -Dmaven.test.failure.ignore=false
+ -Dmaven.repo.local=.m2/repository
+ -Dcheckstyle.skip -Dspotbugs.skip -Denforcer.skip -Drat.skip
+ steps:
+ - uses: actions/checkout@v6
+ # Performance: Caching TODO: We need to create a centralized maven build
cache that is
+ # built on trunk. This will always miss on a new PR: Caches can't be
Review Comment:
Future work discussion: we still want official builds to be 100% clean (no
existing maven cache)
##########
.github/workflows/tmpl_cloud_aws.yml:
##########
@@ -0,0 +1,230 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+name: s3a integration
+on:
+ workflow_call:
+ inputs:
+ java:
+ required: false
+ type: string
+ default: 17
+ toolchain_branch:
+ required: false
+ type: string
+ description: Branch to use for toolchain image build
+ default: trunk
+ os:
+ required: false
+ type: string
+ description: OS for container to run the build in
+ default: ubuntu_24
+ runner_os:
+ required: false
+ type: string
+ description: OS tag for runner (e.g., Linux, ubuntu-24.04)
+ default: ubuntu_24.04
+
+# Security: Minimal defaults for workflow.
+permissions: {}
+
+concurrency:
+ group: >-
+ cloud-aws
+ ${{ github.workflow }}
+ ${{ github.repository == 'apache/hadoop' && github.run_id || github.ref }}
+ ${{ inputs.java }}
+ ${{ inputs.toolchain_branch }}
+ ${{ inputs.os }}
+ cancel-in-progress: true
+
+env:
+ BUCKET_NAME: hadoop-ci
+
+jobs:
+ precondition:
+ runs-on: ${{ inputs.runner_os }}
+ outputs:
+ build_image_url: ${{ steps.img.outputs.build_image_url }}
+ steps:
+ - uses: actions/checkout@v6
+ with:
+ # Full fetch so build image URL can be computed for any branch
+ fetch-depth: 0
+ - uses: ./.github/actions/build_image_url
+ id: img
+ with:
+ os: ${{ inputs.os }}
+ branch: ${{ inputs.toolchain_branch }}
+ - name: debug base_image_url
+ run: |
+ echo "precondition url: ${{ steps.img.outputs.build_image_url }}"
+
+ build-image:
+ name: Toolchain image (JDK${{ inputs.java }}, ${{ inputs.os }}-${{
inputs.toolchain_branch }})
+ runs-on: ${{ inputs.runner_os }}
+ needs: [ precondition ]
+ permissions:
+ packages: write
+ outputs:
+ uid: ${{ steps.build_img.outputs.uid }}
+ steps:
+ - name: debug build url
+ run: |
+ echo "Build image URL: ${{
needs.precondition.outputs.build_image_url }}"
+ - uses: actions/checkout@v6
+ - uses: ./.github/actions/build_image
+ id: build_img
+ with:
+ branch: ${{ inputs.toolchain_branch }}
+ os: ${{ inputs.os }}
+ build_image_url: ${{ needs.precondition.outputs.build_image_url }}
+
+ test:
+ name: S3A Integration Tests (Java ${{ inputs.java }})
+ needs: [ precondition, build-image ]
+ runs-on: ${{ inputs.runner_os }}
+ permissions:
+ # Security: Minimal permissions for the test runner. Reporting happens in
+ # report_cloud_aws.yml.
+ contents: read
+ services:
+ localstack:
+ image: localstack/localstack:latest
+ # Despite examples showing a `ports:` section, "You don't need to
+ # configure any ports for service containers. By default, all
+ # containers that are part of the same Docker network expose all ports
+ # to each other, and no ports are exposed outside of the Docker
+ # network." See:
+ #
https://docs.github.com/en/actions/tutorials/use-containerized-services/use-docker-service-containers#running-jobs-in-a-container
+ env:
+ SERVICES: s3,kms
+ AWS_DEFAULT_REGION: us-west-2
+ AWS_ACCESS_KEY_ID: test
+ AWS_SECRET_ACCESS_KEY: test
+ LOCALSTACK_AUTH_TOKEN: ${{ secrets.LOCALSTACK_CI_KEY }}
+ LOCALSTACK_HOST: s3.localstack
+
+ # Performance: Disable image's health check (localstack readiness): it
typically takes less
+ # than a minute, and the Maven build that runs first takes longer than
that.
+ # Also need to specify a dummy health-cmd or the github runner fails.
+ options: >-
+ --health-cmd "exit 0"
+ --health-interval 1s
+ --health-retries 1
+ --network-alias s3.localstack
+
+ container:
+ image: ${{ needs.precondition.outputs.build_image_url }}
+ options: >-
+ --user ${{ needs.build-image.outputs.uid }}
+ env:
+ # mvn verify doesn't return failure exit code due to HADOOP-18040
+ # (which seems incorrect, but let's just override this for now)
+ MAVEN_OPTS: >-
+ -Dmaven.test.failure.ignore=false
+ -Dmaven.repo.local=.m2/repository
+ -Dcheckstyle.skip -Dspotbugs.skip -Denforcer.skip -Drat.skip
+ steps:
+ - uses: actions/checkout@v6
Review Comment:
TODO double check this is an immutable tag. Official action's via tag are
allowed by ASF policy, but good to be cautious.
##########
.github/workflows/templates/auth-keys.xml.tmpl:
##########
@@ -0,0 +1,63 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+
+<!
> run s3a integration tests in CI
> -------------------------------
>
> Key: HADOOP-19877
> URL: https://issues.apache.org/jira/browse/HADOOP-19877
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Aaron Fabbri
> Assignee: Aaron Fabbri
> Priority: Major
> Labels: pull-request-available
>
> * Get a decent portion of hadoop-aws (s3a) integration tests running in CI.
> * Use localstack (OSS license) or other S3 emulator as a target.
> * Update docs as needed.
> ย
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]