gortiz commented on issue #13461: URL: https://github.com/apache/pinot/issues/13461#issuecomment-2428507902
You know I'm strongly against using slim images based on alpine and to be honest I think it is a debate we should open in the context of vulnerabilities. But in my reasons to not use alpine are: 1. It is a cheap, uninformative solution. Docker layers are designed to pay for base images only once. For example, one pinot image deriving from the same microsoft jdk 21 image pays less than 400MB for the base image. One thousand docker pinot images deriving from the same microsoft jdk 21 pay the exact same 400 MBs (all together). 2. On the contrary, each pinot image adds: - A layer with around 200 MBs when installing extra software with apt-get (this could be reused if we actually reuse the pinot base image, which I'm not sure if we are doing). - 959 MBs of Pinot code. This layer is always new. Of which: - 120 MBs belong examples (these are the same on each version, a new layer for them would reduce the actual size by 120 MBs!!!) - 304 MBs belong to our shaded jar. By using a layer for libraries (which don't change that often) and another for our code and specially do not using shading, we could reduce this significantly. - 463 MBs belong to plugins. Probably without shading we can reduce this by half. Of which: - 25 belong to confluent-avro - 78 belong to orc - 114 belong to parquet - 50 to gcs - 46 to kafka 2.0 - 45 to pulsar - 36 to hadoop - 36 to spark In case we want to actually reduce docker space what we need to do is to optimize our Pinot layer to divide it into more than one layer, do not include large examples (or dedicate a layer to them) and specially remove shading. We should also be sure we are reusing the base image when we create pinot images (instead of running apt-get for each one). By applying these changes we could reduce the _actual_ image size from around 1.5GBs to something like 500MBs. By using alpine we just reduce the size of the base image by at most a third. And that change won't actually matter in environments where we already have pulled another image sharing the same base (which will be common when upgrading pinot or storing images in a local repository). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org