This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git
The following commit(s) were added to refs/heads/master by this push:
new 0f76cd1 [SPARK-52542] Use `/nonexistent` instead of nonexistent
`/opt/spark` (#87)
0f76cd1 is described below
commit 0f76cd1f98e924a07fb6a5551807015b634a92a2
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Mon Jun 23 09:16:26 2025 -0700
[SPARK-52542] Use `/nonexistent` instead of nonexistent `/opt/spark` (#87)
### What changes were proposed in this pull request?
This PR aims to use `/nonexistent` explicitly instead of nonexistent
`/home/spark` because the current status is misleading.
Please note that SPARK-40528 introduced `useradd --system` which created
`spark` user with a non-existent `/home/spark` directory from the beginning of
this repository, `spark-docker`.
- #12
https://github.com/apache/spark-docker/blob/c264d48dc510018095700ed33e700ccc34268bf2/Dockerfile.template#L21-L22
**Rejected Alternatives**
- We can set `HOME` to `/opt/spark` like Apache Spark behavior. However,
it's also different from `WORKDIR` (`/opt/spark/work-dir`).
- We can create `/home/spark`, but it could be more vulnerable than AS-IS
status. For `system` account, `/nonexistent` is frequently used as the security
practice to prevent any side effects of `HOME` directory.
```
$ docker run -it --rm apache/spark:4.0.0 cat /etc/passwd | grep /nonexistent
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin
```
### Why are the changes needed?
**Apache Spark 3.3.3**
```
$ docker run -it --rm apache/spark:3.3.3 /opt/spark/bin/spark-sql
...
25/06/20 20:15:41 WARN SparkSQLCLIDriver: WARNING: Directory for Hive
history file: /home/spark does not exist. History will not be available
during this session.
```
```
$ docker run -it --rm -uroot apache/spark:3.3.3 tail -1 /etc/passwd
spark:x:185:185::/home/spark:/bin/sh
$ docker run -it --rm -uroot apache/spark:3.3.3 ls -al /home/spark
ls: cannot access '/home/spark': No such file or directory
```
**Apache Spark 3.4.4**
```
$ docker run -it --rm -uroot apache/spark:3.4.4 tail -1 /etc/passwd
spark:x:185:185::/home/spark:/bin/sh
$ docker run -it --rm -uroot apache/spark:3.4.4 ls -al /home/spark
ls: cannot access '/home/spark': No such file or directory
```
**Apache Spark 3.5.6**
```
$ docker run -it --rm -uroot apache/spark:3.5.6 tail -1 /etc/passwd
spark:x:185:185::/home/spark:/bin/sh
$ docker run -it --rm -uroot apache/spark:3.5.6 ls /home/spark
ls: cannot access '/home/spark': No such file or directory
```
**Apache Spark 4.0.0**
```
$ docker run -it --rm -uroot apache/spark:4.0.0 tail -1 /etc/passwd
spark:x:185:185::/home/spark:/bin/sh
$ docker run -it --rm -uroot apache/spark:4.0.0 ls /home/spark
ls: cannot access '/home/spark': No such file or directory
```
### Does this PR introduce _any_ user-facing change?
No behavior change because it doesn't exist already.
### How was this patch tested?
Manual review.
---
4.0.0/scala2.13-java17-ubuntu/Dockerfile | 2 +-
4.0.0/scala2.13-java21-ubuntu/Dockerfile | 2 +-
Dockerfile.template | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/4.0.0/scala2.13-java17-ubuntu/Dockerfile
b/4.0.0/scala2.13-java17-ubuntu/Dockerfile
index 031fc3e..0c84167 100644
--- a/4.0.0/scala2.13-java17-ubuntu/Dockerfile
+++ b/4.0.0/scala2.13-java17-ubuntu/Dockerfile
@@ -19,7 +19,7 @@ FROM eclipse-temurin:17-jammy
ARG spark_uid=185
RUN groupadd --system --gid=${spark_uid} spark && \
- useradd --system --uid=${spark_uid} --gid=spark spark
+ useradd --system --uid=${spark_uid} --gid=spark -d /nonexistent spark
RUN set -ex; \
apt-get update; \
diff --git a/4.0.0/scala2.13-java21-ubuntu/Dockerfile
b/4.0.0/scala2.13-java21-ubuntu/Dockerfile
index 15bd36b..b34f6e0 100644
--- a/4.0.0/scala2.13-java21-ubuntu/Dockerfile
+++ b/4.0.0/scala2.13-java21-ubuntu/Dockerfile
@@ -19,7 +19,7 @@ FROM eclipse-temurin:21-jammy
ARG spark_uid=185
RUN groupadd --system --gid=${spark_uid} spark && \
- useradd --system --uid=${spark_uid} --gid=spark spark
+ useradd --system --uid=${spark_uid} --gid=spark -d /nonexistent spark
RUN set -ex; \
apt-get update; \
diff --git a/Dockerfile.template b/Dockerfile.template
index a410e06..ed07c88 100644
--- a/Dockerfile.template
+++ b/Dockerfile.template
@@ -19,7 +19,7 @@ FROM {{ BASE_IMAGE }}
ARG spark_uid=185
RUN groupadd --system --gid=${spark_uid} spark && \
- useradd --system --uid=${spark_uid} --gid=spark spark
+ useradd --system --uid=${spark_uid} --gid=spark -d /nonexistent spark
RUN set -ex; \
apt-get update; \
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]