Currently download.file() creates and terminates a new TLS connection for each download. This creates a lot of overhead which is expensive for both client and server (in particular the TLS handshake). Modern internet clients (including browsers) re-use connections for many http requests.
We can do this in R by creating a persistent libcurl "multi-handle". The R libcurl implementation already uses a multi-handle, however it destroys it after each download, which defeats the purpose. The purpose of the multi-handle is to keep it alive and let libcurl maintain a persistent connection pool. This is particularly relevant for install.packages() which needs to download many files from one and the same server. Here is a bare minimal proof of concept patch that re-uses one and the same multi-handle for all requests in R: https://github.com/r-devel/r-svn/pull/155/files Some quick benchmarking shows that this can lead to big speedups for download.packages() on high-bandwidth servers (such as CI). This quick test to download 100 packages from CRAN showed more than 10x speedup for me: https://github.com/r-devel/r-svn/pull/155 Moreover, I think this may make install.packages() more robust. In CI build logs that download many packages, I often see one or two downloads randomly failing with a TLS-connect error. I am hopeful this problem will disappear when using a single connection to the CRAN server to download all the packages. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel