[ 
https://issues.apache.org/jira/browse/HADOOP-19877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080684#comment-18080684
 ] 

ASF GitHub Bot commented on HADOOP-19877:
-----------------------------------------

ajfabbri commented on code in PR #8467:
URL: https://github.com/apache/hadoop/pull/8467#discussion_r3227703041


##########
.github/workflows/cloud_aws.yml:
##########
@@ -0,0 +1,45 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Cloud-AWS"
+
+on:
+  pull_request:
+    paths:
+      - 'hadoop-tools/hadoop-aws/**'
+      - '.github/workflows/*cloud_aws.yml'
+      - '.github/actions/build_image**'
+      - '.github/gha-tests/hadoop-aws*excludes.txt'
+
+jobs:
+  run-aws-integration:
+    # Security: write privileges are needed to update PR status and upload 
test results.
+    # Package write is for building toolchain container images on demand, but 
ghcr.io access is
+    # scoped to the repository the actions run on.
+    permissions:
+      packages: write
+      checks: write

Review Comment:
   Can this be removed here?



##########
.github/gha-tests/hadoop-aws-localstack-excludes.txt:
##########
@@ -0,0 +1,79 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+# Initial stab at excluding tests that won't run in our
+# CI container environment with Localstack mocks of AWS services.
+
+# TODO see if we can enable any of these...
+
+# tests that depend on public S3 buckets
+**/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java

Review Comment:
   From test discussion w/ @steveloughran: May be able to disable these with 
s3a test config (See testing.md under 3rd party).



##########
.github/workflows/tmpl_cloud_aws.yml:
##########
@@ -0,0 +1,230 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+name: s3a integration
+on:
+  workflow_call:
+    inputs:
+      java:
+        required: false
+        type: string
+        default: 17
+      toolchain_branch:
+        required: false
+        type: string
+        description: Branch to use for toolchain image build
+        default: trunk
+      os:
+        required: false
+        type: string
+        description: OS for container to run the build in
+        default: ubuntu_24
+      runner_os:
+        required: false
+        type: string
+        description: OS tag for runner (e.g., Linux, ubuntu-24.04)
+        default: ubuntu_24.04
+
+# Security: Minimal defaults for workflow.
+permissions: {}
+
+concurrency:
+  group: >-
+    cloud-aws
+    ${{ github.workflow }}
+    ${{ github.repository == 'apache/hadoop' && github.run_id || github.ref }}
+    ${{ inputs.java }}
+    ${{ inputs.toolchain_branch }}
+    ${{ inputs.os }}
+  cancel-in-progress: true
+
+env:
+  BUCKET_NAME: hadoop-ci
+
+jobs:
+  precondition:
+    runs-on: ${{ inputs.runner_os }}
+    outputs:
+      build_image_url: ${{ steps.img.outputs.build_image_url }}
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          # Full fetch so build image URL can be computed for any branch
+          fetch-depth: 0
+      - uses: ./.github/actions/build_image_url
+        id: img
+        with:
+          os: ${{ inputs.os }}
+          branch: ${{ inputs.toolchain_branch }}
+      - name: debug base_image_url
+        run: |
+          echo "precondition url: ${{ steps.img.outputs.build_image_url }}"
+
+  build-image:
+    name: Toolchain image (JDK${{ inputs.java }}, ${{ inputs.os }}-${{ 
inputs.toolchain_branch }})
+    runs-on: ${{ inputs.runner_os }}
+    needs: [ precondition ]
+    permissions:
+      packages: write
+    outputs:
+      uid: ${{ steps.build_img.outputs.uid }}
+    steps:
+      - name: debug build url
+        run: |
+          echo "Build image URL: ${{ 
needs.precondition.outputs.build_image_url }}"
+      - uses: actions/checkout@v6
+      - uses: ./.github/actions/build_image
+        id: build_img
+        with:
+          branch: ${{ inputs.toolchain_branch }}
+          os: ${{ inputs.os }}
+          build_image_url: ${{ needs.precondition.outputs.build_image_url }}
+
+  test:
+    name: S3A Integration Tests (Java ${{ inputs.java }})
+    needs: [ precondition, build-image ]
+    runs-on: ${{ inputs.runner_os }}
+    permissions:
+      # Security: Minimal permissions for the test runner. Reporting happens in
+      # report_cloud_aws.yml.
+      contents: read
+    services:
+      localstack:
+        image: localstack/localstack:latest

Review Comment:
   Would be nice to pin this a bit behind latest to mitigate supply chain issues



##########
.github/workflows/tmpl_cloud_aws.yml:
##########
@@ -0,0 +1,230 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+name: s3a integration
+on:
+  workflow_call:
+    inputs:
+      java:
+        required: false
+        type: string
+        default: 17
+      toolchain_branch:
+        required: false
+        type: string
+        description: Branch to use for toolchain image build
+        default: trunk
+      os:
+        required: false
+        type: string
+        description: OS for container to run the build in
+        default: ubuntu_24
+      runner_os:
+        required: false
+        type: string
+        description: OS tag for runner (e.g., Linux, ubuntu-24.04)
+        default: ubuntu_24.04
+
+# Security: Minimal defaults for workflow.
+permissions: {}
+
+concurrency:
+  group: >-
+    cloud-aws
+    ${{ github.workflow }}
+    ${{ github.repository == 'apache/hadoop' && github.run_id || github.ref }}
+    ${{ inputs.java }}
+    ${{ inputs.toolchain_branch }}
+    ${{ inputs.os }}
+  cancel-in-progress: true
+
+env:
+  BUCKET_NAME: hadoop-ci
+
+jobs:
+  precondition:
+    runs-on: ${{ inputs.runner_os }}
+    outputs:
+      build_image_url: ${{ steps.img.outputs.build_image_url }}
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          # Full fetch so build image URL can be computed for any branch
+          fetch-depth: 0
+      - uses: ./.github/actions/build_image_url
+        id: img
+        with:
+          os: ${{ inputs.os }}
+          branch: ${{ inputs.toolchain_branch }}
+      - name: debug base_image_url
+        run: |
+          echo "precondition url: ${{ steps.img.outputs.build_image_url }}"
+
+  build-image:
+    name: Toolchain image (JDK${{ inputs.java }}, ${{ inputs.os }}-${{ 
inputs.toolchain_branch }})
+    runs-on: ${{ inputs.runner_os }}
+    needs: [ precondition ]
+    permissions:
+      packages: write
+    outputs:
+      uid: ${{ steps.build_img.outputs.uid }}
+    steps:
+      - name: debug build url
+        run: |
+          echo "Build image URL: ${{ 
needs.precondition.outputs.build_image_url }}"
+      - uses: actions/checkout@v6
+      - uses: ./.github/actions/build_image
+        id: build_img
+        with:
+          branch: ${{ inputs.toolchain_branch }}
+          os: ${{ inputs.os }}
+          build_image_url: ${{ needs.precondition.outputs.build_image_url }}
+
+  test:
+    name: S3A Integration Tests (Java ${{ inputs.java }})
+    needs: [ precondition, build-image ]
+    runs-on: ${{ inputs.runner_os }}
+    permissions:
+      # Security: Minimal permissions for the test runner. Reporting happens in
+      # report_cloud_aws.yml.
+      contents: read
+    services:
+      localstack:
+        image: localstack/localstack:latest
+        # Despite examples showing a `ports:` section, "You don't need to
+        # configure any ports for service containers. By default, all
+        # containers that are part of the same Docker network expose all ports
+        # to each other, and no ports are exposed outside of the Docker
+        # network." See:
+        # 
https://docs.github.com/en/actions/tutorials/use-containerized-services/use-docker-service-containers#running-jobs-in-a-container
+        env:
+          SERVICES: s3,kms
+          AWS_DEFAULT_REGION: us-west-2
+          AWS_ACCESS_KEY_ID: test
+          AWS_SECRET_ACCESS_KEY: test
+          LOCALSTACK_AUTH_TOKEN: ${{ secrets.LOCALSTACK_CI_KEY }}
+          LOCALSTACK_HOST: s3.localstack
+
+        # Performance: Disable image's health check (localstack readiness): it 
typically takes less
+        # than a minute, and the Maven build that runs first takes longer than 
that.
+        # Also need to specify a dummy health-cmd or the github runner fails.
+        options: >-
+          --health-cmd "exit 0"
+          --health-interval 1s
+          --health-retries 1
+          --network-alias s3.localstack
+
+    container:
+      image: ${{ needs.precondition.outputs.build_image_url }}
+      options: >-
+        --user ${{ needs.build-image.outputs.uid }}
+    env:
+      # mvn verify doesn't return failure exit code due to HADOOP-18040
+      # (which seems incorrect, but let's just override this for now)
+      MAVEN_OPTS: >-
+        -Dmaven.test.failure.ignore=false
+        -Dmaven.repo.local=.m2/repository
+        -Dcheckstyle.skip -Dspotbugs.skip -Denforcer.skip -Drat.skip
+    steps:
+      - uses: actions/checkout@v6
+      # Performance: Caching TODO: We need to create a centralized maven build 
cache that is
+      # built on trunk. This will always miss on a new PR: Caches can't be
+      # shared between PR branches. PR branches *can* access caches from their
+      # base branch, though. See:
+      # 
https://docs.github.com/en/actions/reference/workflows-and-actions/dependency-caching#restrictions-for-accessing-a-cache
+      # As-is, first run on a PR always misses. Subsequent cached builds see 
>100% speedup.
+      - name: Restore build cache
+        id: restore_cache
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # 
v5.0.5
+        with:
+          path: |
+            .m2/repository/
+          key: ${{ inputs.os }}-maven-${{ hashFiles('**/pom.xml') }}-${{ 
inputs.toolchain_branch }}
+          # falls back to less-specific caches if exact key not found
+          restore-keys: |
+            ${{ inputs.os }}-maven-${{ hashFiles('**/pom.xml') }}-
+            ${{ inputs.os }}-maven-
+      - name: Setup JDK ${{ inputs.java }}
+        uses: actions/setup-java@v5
+        with:
+          distribution: zulu
+          java-version: ${{ inputs.java }}
+
+      - name: Maven Build
+        shell: bash
+        run: |
+          mkdir -p .m2/repository
+          ./mvnw -B -T 1C --no-transfer-progress -DskipTests -am -pl 
hadoop-tools/hadoop-aws \
+            -Dmaven.javadoc.skip -Dcheckstyle.skip -Dspotbugs.skip 
-Denforcer.skip -Drat.skip install
+
+      - name: Check local maven repository
+        shell: bash
+        run: |
+          if [ -d ".m2/repository" ]; then
+            echo "๐Ÿ‘ Maven repository at $(pwd)/.m2/repository"
+          else
+            echo "๐Ÿ‘Ž .m2/repository not found in $(pwd)"
+            exit 1
+          fi
+
+      - name: Save build cache
+        uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # 
v5.0.5
+        if: ${{ !cancelled() && steps.restore_cache.outputs.cache-hit != 
'true' }}
+        with:
+          path: |
+            .m2/repository/
+          key: ${{ steps.restore_cache.outputs.cache-primary-key }}
+
+      - name: Create localstack s3 bucket
+        shell: bash
+        run: |
+          # Use curl to interact with localstack if awslocal is not in the 
container
+          curl -X PUT "http://localstack:4566/${{ env.BUCKET_NAME }}"
+
+      - name: Register bucket hostname
+        shell: bash
+        run: |
+          # Resolve s3.localstack IP and add entry for virtual-hosted bucket 
FQDN
+          LSIP=$(getent hosts s3.localstack | awk '{print $1}')
+          echo "Registering ${{ env.BUCKET_NAME }}.s3.localstack as ${LSIP}"
+          echo "${LSIP} ${{ env.BUCKET_NAME }}.s3.localstack" | sudo tee -a 
/etc/hosts
+
+      - name: S3A Integration Tests
+        shell: bash
+        env:
+          AWS_ACCESS_KEY_ID: test
+          AWS_SECRET_ACCESS_KEY: test
+          AWS_REGION: us-west-2
+          AWS_DEFAULT_REGION: us-west-2
+        run: |
+          cp .github/workflows/templates/auth-keys.xml.tmpl \
+            hadoop-tools/hadoop-aws/src/test/resources/auth-keys.xml
+          ./mvnw verify -B -pl hadoop-tools/hadoop-aws \
+            -Dparallel-tests \
+            -DskipITs=false \
+            -Dtest=none \
+            -Dscale=true \

Review Comment:
   Future: should also increase fs.s3a.scale.huge.filesize parameter in our 
config (auth-keys)



##########
.github/workflows/tmpl_cloud_aws.yml:
##########
@@ -0,0 +1,230 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+name: s3a integration
+on:
+  workflow_call:
+    inputs:
+      java:
+        required: false
+        type: string
+        default: 17
+      toolchain_branch:
+        required: false
+        type: string
+        description: Branch to use for toolchain image build
+        default: trunk
+      os:
+        required: false
+        type: string
+        description: OS for container to run the build in
+        default: ubuntu_24
+      runner_os:
+        required: false
+        type: string
+        description: OS tag for runner (e.g., Linux, ubuntu-24.04)
+        default: ubuntu_24.04
+
+# Security: Minimal defaults for workflow.
+permissions: {}
+
+concurrency:
+  group: >-
+    cloud-aws
+    ${{ github.workflow }}
+    ${{ github.repository == 'apache/hadoop' && github.run_id || github.ref }}
+    ${{ inputs.java }}
+    ${{ inputs.toolchain_branch }}
+    ${{ inputs.os }}
+  cancel-in-progress: true
+
+env:
+  BUCKET_NAME: hadoop-ci
+
+jobs:
+  precondition:
+    runs-on: ${{ inputs.runner_os }}
+    outputs:
+      build_image_url: ${{ steps.img.outputs.build_image_url }}
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          # Full fetch so build image URL can be computed for any branch
+          fetch-depth: 0
+      - uses: ./.github/actions/build_image_url
+        id: img
+        with:
+          os: ${{ inputs.os }}
+          branch: ${{ inputs.toolchain_branch }}
+      - name: debug base_image_url
+        run: |
+          echo "precondition url: ${{ steps.img.outputs.build_image_url }}"
+
+  build-image:
+    name: Toolchain image (JDK${{ inputs.java }}, ${{ inputs.os }}-${{ 
inputs.toolchain_branch }})
+    runs-on: ${{ inputs.runner_os }}
+    needs: [ precondition ]
+    permissions:
+      packages: write
+    outputs:
+      uid: ${{ steps.build_img.outputs.uid }}
+    steps:
+      - name: debug build url
+        run: |
+          echo "Build image URL: ${{ 
needs.precondition.outputs.build_image_url }}"
+      - uses: actions/checkout@v6
+      - uses: ./.github/actions/build_image
+        id: build_img
+        with:
+          branch: ${{ inputs.toolchain_branch }}
+          os: ${{ inputs.os }}
+          build_image_url: ${{ needs.precondition.outputs.build_image_url }}
+
+  test:
+    name: S3A Integration Tests (Java ${{ inputs.java }})
+    needs: [ precondition, build-image ]
+    runs-on: ${{ inputs.runner_os }}
+    permissions:
+      # Security: Minimal permissions for the test runner. Reporting happens in
+      # report_cloud_aws.yml.
+      contents: read
+    services:
+      localstack:
+        image: localstack/localstack:latest
+        # Despite examples showing a `ports:` section, "You don't need to
+        # configure any ports for service containers. By default, all
+        # containers that are part of the same Docker network expose all ports
+        # to each other, and no ports are exposed outside of the Docker
+        # network." See:
+        # 
https://docs.github.com/en/actions/tutorials/use-containerized-services/use-docker-service-containers#running-jobs-in-a-container
+        env:
+          SERVICES: s3,kms
+          AWS_DEFAULT_REGION: us-west-2
+          AWS_ACCESS_KEY_ID: test
+          AWS_SECRET_ACCESS_KEY: test
+          LOCALSTACK_AUTH_TOKEN: ${{ secrets.LOCALSTACK_CI_KEY }}
+          LOCALSTACK_HOST: s3.localstack
+
+        # Performance: Disable image's health check (localstack readiness): it 
typically takes less
+        # than a minute, and the Maven build that runs first takes longer than 
that.
+        # Also need to specify a dummy health-cmd or the github runner fails.
+        options: >-
+          --health-cmd "exit 0"
+          --health-interval 1s
+          --health-retries 1
+          --network-alias s3.localstack
+
+    container:
+      image: ${{ needs.precondition.outputs.build_image_url }}
+      options: >-
+        --user ${{ needs.build-image.outputs.uid }}
+    env:
+      # mvn verify doesn't return failure exit code due to HADOOP-18040
+      # (which seems incorrect, but let's just override this for now)
+      MAVEN_OPTS: >-
+        -Dmaven.test.failure.ignore=false
+        -Dmaven.repo.local=.m2/repository
+        -Dcheckstyle.skip -Dspotbugs.skip -Denforcer.skip -Drat.skip
+    steps:
+      - uses: actions/checkout@v6
+      # Performance: Caching TODO: We need to create a centralized maven build 
cache that is
+      # built on trunk. This will always miss on a new PR: Caches can't be
+      # shared between PR branches. PR branches *can* access caches from their
+      # base branch, though. See:
+      # 
https://docs.github.com/en/actions/reference/workflows-and-actions/dependency-caching#restrictions-for-accessing-a-cache
+      # As-is, first run on a PR always misses. Subsequent cached builds see 
>100% speedup.

Review Comment:
   Discussion: even a shared cache (built regularly on trunk) that *excludes* 
any hadoop artifacts would get us most of the speedup, and still ensure we have 
"pure" builds WRT hadoop code.



##########
.github/gha-tests/hadoop-aws-localstack-excludes.txt:
##########
@@ -0,0 +1,79 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+# Initial stab at excluding tests that won't run in our
+# CI container environment with Localstack mocks of AWS services.
+
+# TODO see if we can enable any of these...
+
+# tests that depend on public S3 buckets
+**/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
+**/org/apache/hadoop/fs/s3a/ITestS3ARequesterPays.java
+**/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardTool.java
+**/org/apache/hadoop/fs/s3a/tools/ITestMarkerTool.java
+**/ITestS3AAnalyticsAcceleratorStreamReading.java
+**/ITestS3AEndpointRegion.java
+
+
+# Tests requiring IAM roles / STS
+# We should be able to re-enable some of these. See:
+# https://docs.localstack.cloud/aws/services/sts/
+**/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestDelegatedMRJob.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFilesystem.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationTokens.java
+**/org/apache/hadoop/fs/s3a/ITestS3ATemporaryCredentials.java
+
+
+# failures that need to be investigated
+
+# Two methods fail: 1. testUpdateDeepDirectoryStructureNoChange():
+# AssertionFailedError: Files Skipped value 0 too below minimum 1 ==>
+#   expected: <true> but was: <false>

Review Comment:
   Test discussion: Hint: see if this is related to timestamps.



##########
.github/gha-tests/hadoop-aws-localstack-excludes.txt:
##########
@@ -0,0 +1,79 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+# Initial stab at excluding tests that won't run in our
+# CI container environment with Localstack mocks of AWS services.
+
+# TODO see if we can enable any of these...
+
+# tests that depend on public S3 buckets
+**/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
+**/org/apache/hadoop/fs/s3a/ITestS3ARequesterPays.java
+**/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardTool.java
+**/org/apache/hadoop/fs/s3a/tools/ITestMarkerTool.java
+**/ITestS3AAnalyticsAcceleratorStreamReading.java
+**/ITestS3AEndpointRegion.java
+
+
+# Tests requiring IAM roles / STS
+# We should be able to re-enable some of these. See:
+# https://docs.localstack.cloud/aws/services/sts/
+**/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestDelegatedMRJob.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFilesystem.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationTokens.java
+**/org/apache/hadoop/fs/s3a/ITestS3ATemporaryCredentials.java
+
+
+# failures that need to be investigated
+
+# Two methods fail: 1. testUpdateDeepDirectoryStructureNoChange():
+# AssertionFailedError: Files Skipped value 0 too below minimum 1 ==>
+#   expected: <true> but was: <false>
+# 2. testUpdateDeepDirectoryStructureToRemote():
+# AssertionFailedError: Files Copied value 2 above maximum 1 ==> expected: 
<true> but was: <false>
+**/org/apache/hadoop/fs/contract/s3a/ITestS3AContractDistCp.java
+
+# A number of failures with vectored read tests
+**/org/apache/hadoop/fs/contract/s3a/ITestS3AContractVectoredRead.java
+
+# Access key errors:
+# (test 
case)->AbstractS3ATestBase.setup:111->AbstractFSContractTestBase.setup:197->AbstractFSContractTestBase.mkdirs:355
+# ยป AccessDenied s3a://hadoop-ci/job-00/test: getFileStatus on
+# s3a://hadoop-ci/job-00/test:
+# software.amazon.awssdk.services.s3.model.S3Exception: The AWS Access Key Id 
you
+# provided does not exist in our records. (Service: S3, Status Code: 403
+**/org/apache/hadoop/fs/s3a/ITestS3APrefetchingCacheFiles.java
+**/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java
+
+# Localstack issue (guessing lack of persistence of upload parts across 
sessions)
+**/org/apache/hadoop/fs/s3a/commit/ITestUploadRecovery.java
+**/org/apache/hadoop/fs/contract/s3a/ITestS3AContractMultipartUploader.java
+
+# Error:    ITestConnectionTimeouts.testObjectUploadTimeouts:258 Expected a
+# java.lang.Exception to be thrown, but got the result: : 
"01234567890123456789..."
+**/org/apache/hadoop/fs/s3a/impl/ITestConnectionTimeouts.java
+
+# 
ITestS3AAWSCredentialsProvider.testBadCredentials:183->lambda$testBadCredentials$0:184
+# ->createFailingFS:160 Expected exception - got S3AFileSystem{
+**/org/apache/hadoop/fs/s3a/ITestS3AAWSCredentialsProvider.java
+
+# testSeeksWithLruEviction java.util.concurrent.TimeoutException: timed out
+# after 180 seconds
+**/org/apache/hadoop/fs/s3a/ITestS3APrefetchingLruEviction.java

Review Comment:
   Test Discussion: Note: this prefetch implementation is problematic, and we 
may end up dropping it. Vectored IO addresses some of this, and the Amazon 
analytics input stream does the rest better.



##########
.github/gha-tests/hadoop-aws-localstack-excludes.txt:
##########
@@ -0,0 +1,79 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+# Initial stab at excluding tests that won't run in our
+# CI container environment with Localstack mocks of AWS services.
+
+# TODO see if we can enable any of these...
+
+# tests that depend on public S3 buckets
+**/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
+**/org/apache/hadoop/fs/s3a/ITestS3ARequesterPays.java
+**/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardTool.java
+**/org/apache/hadoop/fs/s3a/tools/ITestMarkerTool.java
+**/ITestS3AAnalyticsAcceleratorStreamReading.java
+**/ITestS3AEndpointRegion.java
+
+
+# Tests requiring IAM roles / STS
+# We should be able to re-enable some of these. See:
+# https://docs.localstack.cloud/aws/services/sts/
+**/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestDelegatedMRJob.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFilesystem.java
+**/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationTokens.java
+**/org/apache/hadoop/fs/s3a/ITestS3ATemporaryCredentials.java
+
+
+# failures that need to be investigated
+
+# Two methods fail: 1. testUpdateDeepDirectoryStructureNoChange():
+# AssertionFailedError: Files Skipped value 0 too below minimum 1 ==>
+#   expected: <true> but was: <false>
+# 2. testUpdateDeepDirectoryStructureToRemote():
+# AssertionFailedError: Files Copied value 2 above maximum 1 ==> expected: 
<true> but was: <false>
+**/org/apache/hadoop/fs/contract/s3a/ITestS3AContractDistCp.java
+
+# A number of failures with vectored read tests
+**/org/apache/hadoop/fs/contract/s3a/ITestS3AContractVectoredRead.java
+
+# Access key errors:
+# (test 
case)->AbstractS3ATestBase.setup:111->AbstractFSContractTestBase.setup:197->AbstractFSContractTestBase.mkdirs:355
+# ยป AccessDenied s3a://hadoop-ci/job-00/test: getFileStatus on
+# s3a://hadoop-ci/job-00/test:
+# software.amazon.awssdk.services.s3.model.S3Exception: The AWS Access Key Id 
you
+# provided does not exist in our records. (Service: S3, Status Code: 403
+**/org/apache/hadoop/fs/s3a/ITestS3APrefetchingCacheFiles.java
+**/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java
+
+# Localstack issue (guessing lack of persistence of upload parts across 
sessions)
+**/org/apache/hadoop/fs/s3a/commit/ITestUploadRecovery.java
+**/org/apache/hadoop/fs/contract/s3a/ITestS3AContractMultipartUploader.java
+
+# Error:    ITestConnectionTimeouts.testObjectUploadTimeouts:258 Expected a
+# java.lang.Exception to be thrown, but got the result: : 
"01234567890123456789..."
+**/org/apache/hadoop/fs/s3a/impl/ITestConnectionTimeouts.java

Review Comment:
   Likely due to disabling audit. Should probably skip test when it is disabled 
in config.



##########
.github/workflows/tmpl_cloud_aws.yml:
##########
@@ -0,0 +1,230 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+name: s3a integration
+on:
+  workflow_call:
+    inputs:
+      java:
+        required: false
+        type: string
+        default: 17
+      toolchain_branch:
+        required: false
+        type: string
+        description: Branch to use for toolchain image build
+        default: trunk
+      os:
+        required: false
+        type: string
+        description: OS for container to run the build in
+        default: ubuntu_24
+      runner_os:
+        required: false
+        type: string
+        description: OS tag for runner (e.g., Linux, ubuntu-24.04)
+        default: ubuntu_24.04
+
+# Security: Minimal defaults for workflow.
+permissions: {}
+
+concurrency:
+  group: >-
+    cloud-aws
+    ${{ github.workflow }}
+    ${{ github.repository == 'apache/hadoop' && github.run_id || github.ref }}
+    ${{ inputs.java }}
+    ${{ inputs.toolchain_branch }}
+    ${{ inputs.os }}
+  cancel-in-progress: true
+
+env:
+  BUCKET_NAME: hadoop-ci
+
+jobs:
+  precondition:
+    runs-on: ${{ inputs.runner_os }}
+    outputs:
+      build_image_url: ${{ steps.img.outputs.build_image_url }}
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          # Full fetch so build image URL can be computed for any branch
+          fetch-depth: 0
+      - uses: ./.github/actions/build_image_url
+        id: img
+        with:
+          os: ${{ inputs.os }}
+          branch: ${{ inputs.toolchain_branch }}
+      - name: debug base_image_url
+        run: |
+          echo "precondition url: ${{ steps.img.outputs.build_image_url }}"
+
+  build-image:
+    name: Toolchain image (JDK${{ inputs.java }}, ${{ inputs.os }}-${{ 
inputs.toolchain_branch }})
+    runs-on: ${{ inputs.runner_os }}
+    needs: [ precondition ]
+    permissions:
+      packages: write
+    outputs:
+      uid: ${{ steps.build_img.outputs.uid }}
+    steps:
+      - name: debug build url
+        run: |
+          echo "Build image URL: ${{ 
needs.precondition.outputs.build_image_url }}"
+      - uses: actions/checkout@v6
+      - uses: ./.github/actions/build_image
+        id: build_img
+        with:
+          branch: ${{ inputs.toolchain_branch }}
+          os: ${{ inputs.os }}
+          build_image_url: ${{ needs.precondition.outputs.build_image_url }}
+
+  test:
+    name: S3A Integration Tests (Java ${{ inputs.java }})
+    needs: [ precondition, build-image ]
+    runs-on: ${{ inputs.runner_os }}
+    permissions:
+      # Security: Minimal permissions for the test runner. Reporting happens in
+      # report_cloud_aws.yml.
+      contents: read
+    services:
+      localstack:
+        image: localstack/localstack:latest
+        # Despite examples showing a `ports:` section, "You don't need to
+        # configure any ports for service containers. By default, all
+        # containers that are part of the same Docker network expose all ports
+        # to each other, and no ports are exposed outside of the Docker
+        # network." See:
+        # 
https://docs.github.com/en/actions/tutorials/use-containerized-services/use-docker-service-containers#running-jobs-in-a-container
+        env:
+          SERVICES: s3,kms
+          AWS_DEFAULT_REGION: us-west-2
+          AWS_ACCESS_KEY_ID: test
+          AWS_SECRET_ACCESS_KEY: test
+          LOCALSTACK_AUTH_TOKEN: ${{ secrets.LOCALSTACK_CI_KEY }}
+          LOCALSTACK_HOST: s3.localstack
+
+        # Performance: Disable image's health check (localstack readiness): it 
typically takes less
+        # than a minute, and the Maven build that runs first takes longer than 
that.
+        # Also need to specify a dummy health-cmd or the github runner fails.
+        options: >-
+          --health-cmd "exit 0"
+          --health-interval 1s
+          --health-retries 1
+          --network-alias s3.localstack
+
+    container:
+      image: ${{ needs.precondition.outputs.build_image_url }}
+      options: >-
+        --user ${{ needs.build-image.outputs.uid }}
+    env:
+      # mvn verify doesn't return failure exit code due to HADOOP-18040
+      # (which seems incorrect, but let's just override this for now)
+      MAVEN_OPTS: >-
+        -Dmaven.test.failure.ignore=false
+        -Dmaven.repo.local=.m2/repository
+        -Dcheckstyle.skip -Dspotbugs.skip -Denforcer.skip -Drat.skip
+    steps:
+      - uses: actions/checkout@v6
+      # Performance: Caching TODO: We need to create a centralized maven build 
cache that is
+      # built on trunk. This will always miss on a new PR: Caches can't be

Review Comment:
   Future work discussion: we still want official builds to be 100% clean (no 
existing maven cache)



##########
.github/workflows/tmpl_cloud_aws.yml:
##########
@@ -0,0 +1,230 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+name: s3a integration
+on:
+  workflow_call:
+    inputs:
+      java:
+        required: false
+        type: string
+        default: 17
+      toolchain_branch:
+        required: false
+        type: string
+        description: Branch to use for toolchain image build
+        default: trunk
+      os:
+        required: false
+        type: string
+        description: OS for container to run the build in
+        default: ubuntu_24
+      runner_os:
+        required: false
+        type: string
+        description: OS tag for runner (e.g., Linux, ubuntu-24.04)
+        default: ubuntu_24.04
+
+# Security: Minimal defaults for workflow.
+permissions: {}
+
+concurrency:
+  group: >-
+    cloud-aws
+    ${{ github.workflow }}
+    ${{ github.repository == 'apache/hadoop' && github.run_id || github.ref }}
+    ${{ inputs.java }}
+    ${{ inputs.toolchain_branch }}
+    ${{ inputs.os }}
+  cancel-in-progress: true
+
+env:
+  BUCKET_NAME: hadoop-ci
+
+jobs:
+  precondition:
+    runs-on: ${{ inputs.runner_os }}
+    outputs:
+      build_image_url: ${{ steps.img.outputs.build_image_url }}
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          # Full fetch so build image URL can be computed for any branch
+          fetch-depth: 0
+      - uses: ./.github/actions/build_image_url
+        id: img
+        with:
+          os: ${{ inputs.os }}
+          branch: ${{ inputs.toolchain_branch }}
+      - name: debug base_image_url
+        run: |
+          echo "precondition url: ${{ steps.img.outputs.build_image_url }}"
+
+  build-image:
+    name: Toolchain image (JDK${{ inputs.java }}, ${{ inputs.os }}-${{ 
inputs.toolchain_branch }})
+    runs-on: ${{ inputs.runner_os }}
+    needs: [ precondition ]
+    permissions:
+      packages: write
+    outputs:
+      uid: ${{ steps.build_img.outputs.uid }}
+    steps:
+      - name: debug build url
+        run: |
+          echo "Build image URL: ${{ 
needs.precondition.outputs.build_image_url }}"
+      - uses: actions/checkout@v6
+      - uses: ./.github/actions/build_image
+        id: build_img
+        with:
+          branch: ${{ inputs.toolchain_branch }}
+          os: ${{ inputs.os }}
+          build_image_url: ${{ needs.precondition.outputs.build_image_url }}
+
+  test:
+    name: S3A Integration Tests (Java ${{ inputs.java }})
+    needs: [ precondition, build-image ]
+    runs-on: ${{ inputs.runner_os }}
+    permissions:
+      # Security: Minimal permissions for the test runner. Reporting happens in
+      # report_cloud_aws.yml.
+      contents: read
+    services:
+      localstack:
+        image: localstack/localstack:latest
+        # Despite examples showing a `ports:` section, "You don't need to
+        # configure any ports for service containers. By default, all
+        # containers that are part of the same Docker network expose all ports
+        # to each other, and no ports are exposed outside of the Docker
+        # network." See:
+        # 
https://docs.github.com/en/actions/tutorials/use-containerized-services/use-docker-service-containers#running-jobs-in-a-container
+        env:
+          SERVICES: s3,kms
+          AWS_DEFAULT_REGION: us-west-2
+          AWS_ACCESS_KEY_ID: test
+          AWS_SECRET_ACCESS_KEY: test
+          LOCALSTACK_AUTH_TOKEN: ${{ secrets.LOCALSTACK_CI_KEY }}
+          LOCALSTACK_HOST: s3.localstack
+
+        # Performance: Disable image's health check (localstack readiness): it 
typically takes less
+        # than a minute, and the Maven build that runs first takes longer than 
that.
+        # Also need to specify a dummy health-cmd or the github runner fails.
+        options: >-
+          --health-cmd "exit 0"
+          --health-interval 1s
+          --health-retries 1
+          --network-alias s3.localstack
+
+    container:
+      image: ${{ needs.precondition.outputs.build_image_url }}
+      options: >-
+        --user ${{ needs.build-image.outputs.uid }}
+    env:
+      # mvn verify doesn't return failure exit code due to HADOOP-18040
+      # (which seems incorrect, but let's just override this for now)
+      MAVEN_OPTS: >-
+        -Dmaven.test.failure.ignore=false
+        -Dmaven.repo.local=.m2/repository
+        -Dcheckstyle.skip -Dspotbugs.skip -Denforcer.skip -Drat.skip
+    steps:
+      - uses: actions/checkout@v6

Review Comment:
   TODO double check this is an immutable tag. Official action's via tag are 
allowed by ASF policy, but good to be cautious.



##########
.github/workflows/templates/auth-keys.xml.tmpl:
##########
@@ -0,0 +1,63 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+
+<!

> run s3a integration tests in CI
> -------------------------------
>
>                 Key: HADOOP-19877
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19877
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
>            Priority: Major
>              Labels: pull-request-available
>
> * Get a decent portion of hadoop-aws (s3a) integration tests running in CI.
>  * Use localstack (OSS license) or other S3 emulator as a target.
>  * Update docs as needed.
> ย 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Reply via email to