On 12/10/24 00:35, Lluís Revilla wrote:
Dear R-devel,
I read with interest the recent blog post on how R will have parallel
downloads, on blog.r-project.org
(https://blog.r-project.org/2024/12/02/faster-downloads/index.html).
Thanks Tomas!
The blog mentions that one of the areas where this will be observed is
while installing them (which I did!). However, I noticed they might be
downloaded multiple times:
If one interrupts the install.packages (via Ctrl+C), or it fails due
to some system dependency missing and I fix that on a different
terminal session, or the internet connection is cut and I try again.
Yes, and this has been the case before - it's not new for simultaneous
downloads.
One possible way to make installations/downloads faster and also
reduce the bandwidth of repositories (and its mirrors) would be to
check if they need to be downloaded (again).
PACKAGES file on <repo>/src/contrib includes the MD5sum field that
could be used to check packages on the local folder (But it might be
faster to first check if any file exists there for the same package).
In short, I propose:
1) Checking before downloading packages their existence on the destdir
directory used by install.packages.
2) I suppose the most common scenario is to use install.packages with
the default destdir parameter (NULL). If 1) is implemented it might be
useful to keep the temporary directory common for a single R session.
When destdir is NULL (the default), non-local packages are downloaded to
a subdirectory of the temporary session directory (see
?install.packages), so the downloaded files would be readily available
to further installation attempts done by the same R session.
I think we could once extend download.file() to support re-use of
already downloaded files, so that it can continue an interrupted
download of a single file or re-use the whole file. This shouldn't be
the default because the files in general may change between downloads,
and may be even from different URLs, but it could be used by
install.packages(), where this shouldn't happen, at least when destdir
is NULL. I think an extra round of checking checksums shouldn't be
needed in install.packages().
Best
Tomas
I would appreciate feedback on these ideas.
Best,
Lluís Revilla
PD: New users encountering download & installation issues often keep
seeing the progress bar (and in the future "trying URL 'https://...")
of the same packages. There are some ways to prevent/avoid repeated
downloads, such as, using the system library dependency resolver, or
having local mirrors. But they are not easy/available for new useRs,
and sometimes they are difficult to avoid (like having a reliable
internet connection).
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel