gortiz commented on issue #13461:
URL: https://github.com/apache/pinot/issues/13461#issuecomment-2428507902

   You know I'm strongly against using slim images based on alpine and to be 
honest I think it is a debate we should open in the context of vulnerabilities. 
But in my reasons to not use alpine are:
   1. It is a cheap, uninformative solution. Docker layers are designed to pay 
for base images only once. For example, one pinot image deriving from the same 
microsoft jdk 21 image pays less than 400MB for the base image. One thousand 
docker pinot images deriving from the same microsoft jdk 21 pay the exact same 
400 MBs (all together).
   2. On the contrary, each pinot image adds:
      - A layer with around 200 MBs when installing extra software with apt-get 
(this could be reused if we actually reuse the pinot base image, which I'm not 
sure if we are doing).
      - 959 MBs of Pinot code. This layer is always new. Of which:
          - 120 MBs belong examples (these are the same on each version, a new 
layer for them would reduce the actual size by 120 MBs!!!)
          - 304 MBs belong to our shaded jar. By using a layer for libraries 
(which don't change that often) and another for our code and specially do not 
using shading, we could reduce this significantly.
          - 463 MBs belong to plugins. Probably without shading we can reduce 
this by half. Of which:
             - 25 belong to confluent-avro
             - 78 belong to orc
             - 114 belong to parquet
             - 50 to gcs
             - 46 to kafka 2.0
             - 45 to pulsar
             - 36 to hadoop
             - 36 to spark 
   
   In case we want to actually reduce docker space what we need to do is to 
optimize our Pinot layer to divide it into more than one layer, do not include 
large examples (or dedicate a layer to them) and specially remove shading. We 
should also be sure we are reusing the base image when we create pinot images 
(instead of running apt-get for each one). By applying these changes we could 
reduce the _actual_ image size from around 1.5GBs to something like 500MBs.
   
   By using alpine we just reduce the size of the base image by at most a 
third. And that change won't actually matter in environments where we already 
have pulled another image sharing the same base (which will be common when 
upgrading pinot or storing images in a local repository).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to