This is an automated email from the ASF dual-hosted git repository.
tallison pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tika-docker.git
The following commit(s) were added to refs/heads/main by this push:
new 3d55f7e feat: Allow setting languages in image build (#30)
3d55f7e is described below
commit 3d55f7e69845f31312b9c0d045f6b03198cb07e7
Author: Peter Fačko <[email protected]>
AuthorDate: Mon Apr 13 12:36:24 2026 +0200
feat: Allow setting languages in image build (#30)
---
README.md | 4 ++--
full/Dockerfile | 8 ++------
2 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/README.md b/README.md
index a3359a3..913bd36 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@ This repo is used to create convenience Docker images for
Apache Tika Server pub
The images create a functional Apache Tika Server instance that contains the
latest Ubuntu running the appropriate version's server on Port 9998 using Java
8 (until version 1.20), Java 11 (1.21 and 1.24.1), Java 14 (until 1.27/2.0.0),
Java 16 (for 2.1.0), and Java 17 LTS for newer versions.
-There is a minimal version, which contains only Apache Tika and it's core
dependencies, and a full version, which also includes dependencies for the GDAL
and Tesseract OCR parsers. To balance showing functionality versus the size of
the full image, this file currently installs the language packs for the
following languages:
+There is a minimal version, which contains only Apache Tika and it's core
dependencies, and a full version, which also includes dependencies for the GDAL
and Tesseract OCR parsers. To balance showing functionality versus the size of
the full image, this file by default installs the language packs for the
following languages:
* English
* French
* German
@@ -12,7 +12,7 @@ There is a minimal version, which contains only Apache Tika
and it's core depend
* Spanish
* Japanese
-To install more languages simply update the apt-get command to include the
package containing the language you required, or include your own custom packs
using an ADD command.
+To install more languages, set the build argument `LANGUAGES` or include your
own custom packs using an ADD command.
## Available Tags
diff --git a/full/Dockerfile b/full/Dockerfile
index 1b91839..e9cf43c 100644
--- a/full/Dockerfile
+++ b/full/Dockerfile
@@ -45,6 +45,7 @@ RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get
-y install gnupg2 w
FROM base AS runtime
ARG UID_GID
ARG JRE='openjdk-21-jre-headless'
+ARG LANGUAGES='eng ita fra spa deu jpn'
RUN set -eux \
&& apt-get update \
&& apt-get install --yes --no-install-recommends gnupg2
software-properties-common \
@@ -53,12 +54,7 @@ RUN set -eux \
gdal-bin \
imagemagick \
tesseract-ocr \
- tesseract-ocr-eng \
- tesseract-ocr-ita \
- tesseract-ocr-fra \
- tesseract-ocr-spa \
- tesseract-ocr-deu \
- tesseract-ocr-jpn \
+ $(printf 'tesseract-ocr-%s ' $LANGUAGES) \
&& echo ttf-mscorefonts-installer msttcorefonts/accepted-mscorefonts-eula
select true | debconf-set-selections \
&& DEBIAN_FRONTEND=noninteractive apt-get install --yes
--no-install-recommends \
xfonts-utils \