This is an automated email from the ASF dual-hosted git repository.

tallison pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tika-docker.git


The following commit(s) were added to refs/heads/main by this push:
     new 3d55f7e  feat: Allow setting languages in image build (#30)
3d55f7e is described below

commit 3d55f7e69845f31312b9c0d045f6b03198cb07e7
Author: Peter Fačko <[email protected]>
AuthorDate: Mon Apr 13 12:36:24 2026 +0200

    feat: Allow setting languages in image build (#30)
---
 README.md       | 4 ++--
 full/Dockerfile | 8 ++------
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index a3359a3..913bd36 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@ This repo is used to create convenience Docker images for 
Apache Tika Server pub
 
 The images create a functional Apache Tika Server instance that contains the 
latest Ubuntu running the appropriate version's server on Port 9998 using Java 
8 (until version 1.20), Java 11 (1.21 and 1.24.1), Java 14 (until 1.27/2.0.0), 
Java 16 (for 2.1.0), and Java 17 LTS for newer versions.
 
-There is a minimal version, which contains only Apache Tika and it's core 
dependencies, and a full version, which also includes dependencies for the GDAL 
and Tesseract OCR parsers. To balance showing functionality versus the size of 
the full image, this file currently installs the language packs for the 
following languages:
+There is a minimal version, which contains only Apache Tika and it's core 
dependencies, and a full version, which also includes dependencies for the GDAL 
and Tesseract OCR parsers. To balance showing functionality versus the size of 
the full image, this file by default installs the language packs for the 
following languages:
 * English
 * French
 * German
@@ -12,7 +12,7 @@ There is a minimal version, which contains only Apache Tika 
and it's core depend
 * Spanish
 * Japanese
 
-To install more languages simply update the apt-get command to include the 
package containing the language you required, or include your own custom packs 
using an ADD command.
+To install more languages, set the build argument `LANGUAGES` or include your 
own custom packs using an ADD command.
 
 ## Available Tags
 
diff --git a/full/Dockerfile b/full/Dockerfile
index 1b91839..e9cf43c 100644
--- a/full/Dockerfile
+++ b/full/Dockerfile
@@ -45,6 +45,7 @@ RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get 
-y install gnupg2 w
 FROM base AS runtime
 ARG UID_GID
 ARG JRE='openjdk-21-jre-headless'
+ARG LANGUAGES='eng ita fra spa deu jpn'
 RUN set -eux \
     && apt-get update \
     && apt-get install --yes --no-install-recommends gnupg2 
software-properties-common \
@@ -53,12 +54,7 @@ RUN set -eux \
         gdal-bin \
         imagemagick \
         tesseract-ocr \
-        tesseract-ocr-eng \
-        tesseract-ocr-ita \
-        tesseract-ocr-fra \
-        tesseract-ocr-spa \
-        tesseract-ocr-deu \
-        tesseract-ocr-jpn \
+        $(printf 'tesseract-ocr-%s ' $LANGUAGES) \
     && echo ttf-mscorefonts-installer msttcorefonts/accepted-mscorefonts-eula 
select true | debconf-set-selections \
     && DEBIAN_FRONTEND=noninteractive apt-get install --yes 
--no-install-recommends \
         xfonts-utils \

Reply via email to