gortiz commented on issue #8718:
URL: https://github.com/apache/pinot/issues/8718#issuecomment-1172448787

   I've just made an analysis of the docker image.
   
   It seems that the three bigger layers are:
   - 344MBs of base image (jdk-slim), which is highly reusable. That means that 
given two images, they will probably use the same base and therefore the 
space/download time would be paid only once
   - 617MBs of apt-update and apt-install, which is not reusable at all. We can 
improve this by creating a specific base image and reusing it.
   - 716MBs of apache-pinot, which are copied in a single layer. Of which:
      - 100MBs are examples, which are very static (they almost never change)
      - 454MBs are plugins, which are mostly shaded dependencies. They are 
highly optimizable with docker layers, but we cannot use that because they are 
shaded
      - 150MBs are pinot itself and their dependencies. I guess most of it 
would be the dependencies, which again could be layered, but they are shaded.
   
   This means that each time we change a single character and create a new 
docker image, we are storing and downloading in our pods (617 + 716)MBs of 
data. I think that almost 1GB of that data is static information we could just 
reuse if correctly using docker layers.
   
   What @xiangfu0 suggested about having different images with more or less 
plugins or that are able to download plugins at start time can be a solution, 
but I think it would have the side effect of making quite more difficult to 
understand to customers. If instead of doing that we just correctly use the 
layers, the first time a user downloads an image will need to download 1.3GBs 
of data, but if then he/she downloads a second version, it is very probable 
that most of the layers would be the same, so he/she would only need to 
download around 150MBs of data. It also applies to our own pods, which would 
only need to download these 150MBs instead of 1.3GBs on most upgrades.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to