This is an automated email from the ASF dual-hosted git repository.

zhengruifeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new c03ca2aa2d3b [SPARK-56864][INFRA][PYTHON] Consolidate 
python-ps-minimum image into python-minimum
c03ca2aa2d3b is described below

commit c03ca2aa2d3b598cb9524680421109de92c90d71
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Mon May 18 10:27:11 2026 +0800

    [SPARK-56864][INFRA][PYTHON] Consolidate python-ps-minimum image into 
python-minimum
    
    ### What changes were proposed in this pull request?
    
    This PR consolidates the `python-ps-minimum` Docker image and its CI 
workflow into the existing `python-minimum` image, eliminating a near-duplicate.
    
    Specifically:
    - Updates the label on `dev/spark-test-image/python-minimum/Dockerfile` to 
cover both PySpark and Pandas API on Spark.
    - Deletes `dev/spark-test-image/python-ps-minimum/Dockerfile`.
    - Deletes `.github/workflows/build_python_ps_minimum.yml`.
    - Adds `"pyspark-pandas": "true"` to 
`.github/workflows/build_python_minimum.yml` so Pandas API on Spark 
minimum-deps coverage is preserved.
    - Drops the `python-ps-minimum` entries from 
`.github/workflows/build_infra_images_cache.yml` (the `paths` trigger and the 
build/push step).
    - Removes the `build_python_ps_minimum.yml` badge from `README.md`.
    
    ### Why are the changes needed?
    
    To save CI resources. The two Dockerfiles were nearly identical. The only 
functional differences were in `BASIC_PIP_PKGS`:
    
    | Package | python-minimum | python-ps-minimum |
    |---|---|---|
    | `numpy` | pinned `==1.22.4` | unpinned |
    | `scikit-learn` | included | omitted |
    
    Everything else (base image, apt packages, Python version, venv setup, 
`CONNECT_PIP_PKGS`) was the same. Maintaining both images doubles the image 
build/cache cost and runs a duplicate scheduled workflow without commensurate 
test value. Reusing `python-minimum` (which has the stricter pin and a superset 
of packages) for the Pandas API on Spark minimum-deps job keeps coverage while 
halving the image footprint and the associated CI runtime.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. CI-only change.
    
    ### How was this patch tested?
    
    Existing CI. The merged `build_python_minimum.yml` now runs both `pyspark` 
and `pyspark-pandas` jobs against the `python-minimum` image.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Code (model: claude-opus-4-7)
    
    Closes #55872 from zhengruifeng/remove-python-ps-minimum.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 .github/workflows/build_infra_images_cache.yml    | 14 -----
 .github/workflows/build_python_minimum.yml        |  3 +-
 .github/workflows/build_python_ps_minimum.yml     | 47 ---------------
 README.md                                         |  1 -
 dev/spark-test-image/python-ps-minimum/Dockerfile | 70 -----------------------
 5 files changed, 2 insertions(+), 133 deletions(-)

diff --git a/.github/workflows/build_infra_images_cache.yml 
b/.github/workflows/build_infra_images_cache.yml
index 78fb1cffaf1b..ac8d1aaa82e3 100644
--- a/.github/workflows/build_infra_images_cache.yml
+++ b/.github/workflows/build_infra_images_cache.yml
@@ -31,7 +31,6 @@ on:
     - 'dev/spark-test-image/lint/Dockerfile'
     - 'dev/spark-test-image/sparkr/Dockerfile'
     - 'dev/spark-test-image/python-minimum/Dockerfile'
-    - 'dev/spark-test-image/python-ps-minimum/Dockerfile'
     - 'dev/spark-test-image/python-311/Dockerfile'
     - 'dev/spark-test-image/python-312/Dockerfile'
     - 'dev/spark-test-image/python-312-classic-only/Dockerfile'
@@ -125,19 +124,6 @@ jobs:
       - name: Image digest (PySpark with old dependencies)
         if: hashFiles('dev/spark-test-image/python-minimum/Dockerfile') != ''
         run: echo ${{ steps.docker_build_pyspark_python_minimum.outputs.digest 
}}
-      - name: Build and push (PySpark PS with old dependencies)
-        if: hashFiles('dev/spark-test-image/python-ps-minimum/Dockerfile') != 
''
-        id: docker_build_pyspark_python_ps_minimum
-        uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f
-        with:
-          context: ./dev/spark-test-image/python-ps-minimum/
-          push: true
-          tags: 
ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-ps-minimum-cache:${{
 github.ref_name }}-static
-          cache-from: 
type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-ps-minimum-cache:${{
 github.ref_name }}
-          cache-to: 
type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-ps-minimum-cache:${{
 github.ref_name }},mode=max
-      - name: Image digest (PySpark PS with old dependencies)
-        if: hashFiles('dev/spark-test-image/python-ps-minimum/Dockerfile') != 
''
-        run: echo ${{ 
steps.docker_build_pyspark_python_ps_minimum.outputs.digest }}
       - name: Build and push (PySpark with Python 3.11)
         if: hashFiles('dev/spark-test-image/python-311/Dockerfile') != ''
         id: docker_build_pyspark_python_311
diff --git a/.github/workflows/build_python_minimum.yml 
b/.github/workflows/build_python_minimum.yml
index 36bf7f6d7ba0..78f9ff4967c0 100644
--- a/.github/workflows/build_python_minimum.yml
+++ b/.github/workflows/build_python_minimum.yml
@@ -42,5 +42,6 @@ jobs:
         }
       jobs: >-
         {
-          "pyspark": "true"
+          "pyspark": "true",
+          "pyspark-pandas": "true"
         }
diff --git a/.github/workflows/build_python_ps_minimum.yml 
b/.github/workflows/build_python_ps_minimum.yml
deleted file mode 100644
index f29b3e1bedd5..000000000000
--- a/.github/workflows/build_python_ps_minimum.yml
+++ /dev/null
@@ -1,47 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-#
-
-name: "Build / Python-only (master, Minimum dependencies of Pandas API on 
Spark)"
-
-on:
-  schedule:
-    - cron: '0 10 */2 * *'
-  workflow_dispatch:
-
-jobs:
-  run-build:
-    permissions:
-      packages: write
-    name: Run
-    uses: ./.github/workflows/build_and_test.yml
-    if: github.repository == 'apache/spark'
-    with:
-      java: 17
-      branch: master
-      hadoop: hadoop3
-      envs: >-
-        {
-          "PYSPARK_IMAGE_TO_TEST": "python-ps-minimum",
-          "PYTHON_TO_TEST": "python3.11"
-        }
-      jobs: >-
-        {
-          "pyspark": "true",
-          "pyspark-pandas": "true"
-        }
diff --git a/README.md b/README.md
index 8395b60c2a96..4f6f9dd2eff0 100644
--- a/README.md
+++ b/README.md
@@ -51,7 +51,6 @@ This README file only contains basic setup instructions.
 |            | [![GitHub Actions 
Build](https://github.com/apache/spark/actions/workflows/build_python_3.14.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.14.yml)
                           |
 |            | [![GitHub Actions 
Build](https://github.com/apache/spark/actions/workflows/build_python_3.14_nogil.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.14_nogil.yml)
               |
 |            | [![GitHub Actions 
Build](https://github.com/apache/spark/actions/workflows/build_python_minimum.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_minimum.yml)
                     |
-|            | [![GitHub Actions 
Build](https://github.com/apache/spark/actions/workflows/build_python_ps_minimum.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_ps_minimum.yml)
               |
 |            | [![GitHub Actions 
Build](https://github.com/apache/spark/actions/workflows/build_python_connect40.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_connect40.yml)
                 |
 |            | [![GitHub Actions 
Build](https://github.com/apache/spark/actions/workflows/build_python_connect.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_connect.yml)
                     |
 |            | [![GitHub Actions 
Build](https://github.com/apache/spark/actions/workflows/build_sparkr_window.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_sparkr_window.yml)
                       |
diff --git a/dev/spark-test-image/python-ps-minimum/Dockerfile 
b/dev/spark-test-image/python-ps-minimum/Dockerfile
deleted file mode 100644
index afbbe5a0d282..000000000000
--- a/dev/spark-test-image/python-ps-minimum/Dockerfile
+++ /dev/null
@@ -1,70 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# Image for building and testing Spark branches. Based on Ubuntu 24.04.
-# See also in https://hub.docker.com/_/ubuntu
-FROM ubuntu:noble
-LABEL org.opencontainers.image.authors="Apache Spark project 
<[email protected]>"
-LABEL org.opencontainers.image.licenses="Apache-2.0"
-LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For Pandas 
API on Spark with old dependencies"
-# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
-LABEL org.opencontainers.image.version=""
-
-ENV FULL_REFRESH_DATE=20260210
-
-ENV DEBIAN_FRONTEND=noninteractive
-ENV DEBCONF_NONINTERACTIVE_SEEN=true
-
-RUN printf 'Types: deb\nURIs: https://mirrors.edge.kernel.org/ubuntu\nSuites: 
noble noble-updates noble-security\nComponents: main restricted universe 
multiverse\nSigned-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg\n' > 
/etc/apt/sources.list.d/mirror.sources
-
-# Should keep the installation consistent with 
https://apache.github.io/spark/api/python/getting_started/install.html
-RUN apt-get update && apt-get install -y \
-    build-essential \
-    ca-certificates \
-    curl \
-    gfortran \
-    git \
-    gnupg \
-    libgit2-dev \
-    liblapack-dev \
-    libopenblas-dev \
-    libssl-dev \
-    openjdk-17-jdk-headless \
-    pkg-config \
-    tzdata \
-    software-properties-common \
-    zlib1g-dev
-
-# Install Python 3.11
-RUN add-apt-repository ppa:deadsnakes/ppa
-RUN apt-get update && apt-get install -y \
-    python3.11 \
-    python3.11-venv \
-    && apt-get autoremove --purge -y \
-    && apt-get clean \
-    && rm -rf /var/lib/apt/lists/*
-
-# Setup virtual environment
-ENV VIRTUAL_ENV=/opt/spark-venv
-RUN python3.11 -m venv $VIRTUAL_ENV
-ENV PATH="$VIRTUAL_ENV/bin:$PATH"
-
-ARG BASIC_PIP_PKGS="pyarrow==18.0.0 pandas==2.2.0 six==1.16.0 numpy scipy 
coverage unittest-xml-reporting psutil"
-ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 
googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20 
protobuf==6.33.5"
-
-RUN python3.11 -m pip install --force $BASIC_PIP_PKGS $CONNECT_PIP_PKGS && \
-    python3.11 -m pip cache purge


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to